AutoML Modes¶
Built-in modes¶
There are 3 built-in modes available in AutoML
:
- Explain - to be used when the user wants to explain and understand the data.
- Perform - to be used when the user wants to train a model that will be used in real-life use cases.
- Compete - To be used for machine learning competitions (maximum performance!).
AutoML Modes | |||
---|---|---|---|
Explain | Perform | Compete | |
Algorithms | |||
Baseline | |||
Linear | |||
Decision Tree | |||
Random Forest | |||
Extra Trees | |||
XGBoost | |||
LightGBM | |||
CatBoost | |||
Neural Network | |||
Nearest Neighbors | |||
Ensemble | |||
Stacking | |||
Steps | |||
simple_algorithms | |||
default_algorithms | |||
not_so_random | |||
golden_features | |||
insert_random_feature | |||
feature_selection | |||
hill_climbing_1 | |||
hill_climbing_2 | |||
ensemble | |||
stack | |||
ensemble_stacked | |||
Validation | |||
75%/25% train/test split | 5-fold CV, Shuffle, Stratify | 10-fold CV, Shuffle, Stratify | |
Explanations | |||
explain_level=2 |
explain_level=1 |
explain_level=0 |
|
Learning curves | |||
Importance plots | |||
SHAP plots | |||
Tuning | |||
Parameters | start_random_models=1 , hill_climbing_steps=0 , top_models_to_improve=0 |
start_random_models=5 , hill_climbing_steps=2 , top_models_to_improve=2 |
start_random_models=10 , hill_climbing_steps=2 , top_models_to_improve=3 |
Models with default hyperparemeters | 1 |
1 |
1 |
Models with not_so_random hyperparemeters | 0 |
4 |
9 |
hill_climbing steps |
0 |
2 |
2 |
Top models imporoved in each hill_climbing step |
0 |
2 |
3 |
Total models tuned for each algorithm1 | 1 |
about 13 2 |
about 22 2 |
Custom modes¶
User can define his own modes by setting the parameters in AutoML
constructor (AutoML
API).
Example setting:
automl = AutoML(
algorithms=["CatBoost", "Xgboost", "LightGBM"],
model_time_limit=30*60,
start_random_models=10,
hill_climbing_steps=3,
top_models_to_improve=3,
golden_features=True,
features_selection=False,
stack_models=True,
train_ensemble=True,
explain_level=0,
validation_strategy={
"validation_type": "kfold",
"k_folds": 4,
"shuffle": False,
"stratify": True,
}
)
- It will train models with
CatBoost
,Xgboost
andLightGBM
algorithms. - Each model will be trained for 30 minutes (
30*60
seconds).total_time_limit
is not set. - There will be trained about
10+3*3*2=28
unstacked models and10
stacked models for each algorithm. (There is stacked up to10
models for each algorithm) - There will trained
Ensemble
based on unstacked models andEnsemble_Stacked
from unstacked and stacked models. - In total there will be about
3*28+2=86
models trained. explain_level=0
means that there will be only learning curves saved. No other explanations will be computed.
-
Not every algorithm is tuned. Models which are not tuned:
Baseline
,Decision Tree
,Linear
,Nearest Neighbors
. ↩ -
The exact number cannot be given, because sometimes the newly generated hyperparameters are rejected during
not_so_random
orhill_climbing
steps because of model duplicates or invalid hyperparameters set. ↩↩