AutoML Modes¶
Built-in modes¶
There are 3 built-in modes available in AutoML:
- Explain - to be used when the user wants to explain and understand the data.
- Perform - to be used when the user wants to train a model that will be used in real-life use cases.
- Compete - To be used for machine learning competitions (maximum performance!).
| AutoML Modes | |||
|---|---|---|---|
| Explain | Perform | Compete | |
| Algorithms | |||
| Baseline | |||
| Linear | |||
| Decision Tree | |||
| Random Forest | |||
| Extra Trees | |||
| XGBoost | |||
| LightGBM | |||
| CatBoost | |||
| Neural Network | |||
| Nearest Neighbors | |||
| Ensemble | |||
| Stacking | |||
| Steps | |||
| simple_algorithms | |||
| default_algorithms | |||
| not_so_random | |||
| golden_features | |||
| insert_random_feature | |||
| feature_selection | |||
| hill_climbing_1 | |||
| hill_climbing_2 | |||
| ensemble | |||
| stack | |||
| ensemble_stacked | |||
| Validation | |||
| 75%/25% train/test split | 5-fold CV, Shuffle, Stratify | 10-fold CV, Shuffle, Stratify | |
| Explanations | |||
explain_level=2 |
explain_level=1 |
explain_level=0 |
|
| Learning curves | |||
| Importance plots | |||
| SHAP plots | |||
| Tuning | |||
| Parameters | start_random_models=1, hill_climbing_steps=0, top_models_to_improve=0 |
start_random_models=5, hill_climbing_steps=2, top_models_to_improve=2 |
start_random_models=10, hill_climbing_steps=2, top_models_to_improve=3 |
| Models with default hyperparemeters | 1 |
1 |
1 |
| Models with not_so_random hyperparemeters | 0 |
4 |
9 |
hill_climbing steps |
0 |
2 |
2 |
Top models imporoved in each hill_climbing step |
0 |
2 |
3 |
| Total models tuned for each algorithm1 | 1 |
about 132 |
about 222 |
Custom modes¶
User can define his own modes by setting the parameters in AutoML constructor (AutoML API).
Example setting:
automl = AutoML(
algorithms=["CatBoost", "Xgboost", "LightGBM"],
model_time_limit=30*60,
start_random_models=10,
hill_climbing_steps=3,
top_models_to_improve=3,
golden_features=True,
features_selection=False,
stack_models=True,
train_ensemble=True,
explain_level=0,
validation_strategy={
"validation_type": "kfold",
"k_folds": 4,
"shuffle": False,
"stratify": True,
}
)
- It will train models with
CatBoost,XgboostandLightGBMalgorithms. - Each model will be trained for 30 minutes (
30*60seconds).total_time_limitis not set. - There will be trained about
10+3*3*2=28unstacked models and10stacked models for each algorithm. (There is stacked up to10models for each algorithm) - There will trained
Ensemblebased on unstacked models andEnsemble_Stackedfrom unstacked and stacked models. - In total there will be about
3*28+2=86models trained. explain_level=0means that there will be only learning curves saved. No other explanations will be computed.
-
Not every algorithm is tuned. Models which are not tuned:
Baseline,Decision Tree,Linear,Nearest Neighbors. ↩ -
The exact number cannot be given, because sometimes the newly generated hyperparameters are rejected during
not_so_randomorhill_climbingsteps because of model duplicates or invalid hyperparameters set. ↩↩