AutoML Modes¶

Built-in modes¶

There are 3 built-in modes available in AutoML:

Explain - to be used when the user wants to explain and understand the data.
Perform - to be used when the user wants to train a model that will be used in real-life use cases.
Compete - To be used for machine learning competitions (maximum performance!).

		AutoML Modes
	Explain	Perform	Compete
		*Algorithms*
Baseline
Linear
Decision Tree
Random Forest
Extra Trees
XGBoost
LightGBM
CatBoost
Neural Network
Nearest Neighbors
Ensemble
Stacking
		*Steps*
simple_algorithms
default_algorithms
not_so_random
golden_features
insert_random_feature
feature_selection
hill_climbing_1
hill_climbing_2
ensemble
stack
ensemble_stacked
		*Validation*
	75%/25% train/test split	5-fold CV, Shuffle, Stratify	10-fold CV, Shuffle, Stratify
		*Explanations*
	`explain_level=2`	`explain_level=1`	`explain_level=0`
Learning curves
Importance plots
SHAP plots
		*Tuning*
Parameters	`start_random_models=1`, `hill_climbing_steps=0`, `top_models_to_improve=0`	`start_random_models=5`, `hill_climbing_steps=2`, `top_models_to_improve=2`	`start_random_models=10`, `hill_climbing_steps=2`, `top_models_to_improve=3`
Models with default hyperparemeters	`1`	`1`	`1`
Models with not_so_random hyperparemeters	`0`	`4`	`9`
`hill_climbing` steps	`0`	`2`	`2`
Top models imporoved in each `hill_climbing` step	`0`	`2`	`3`
Total models tuned for each algorithm¹	`1`	about `13`²	about `22`²

Custom modes¶

User can define his own modes by setting the parameters in AutoML constructor (AutoML API).

Example setting:

automl = AutoML(
    algorithms=["CatBoost", "Xgboost", "LightGBM"],
    model_time_limit=30*60,
    start_random_models=10,
    hill_climbing_steps=3,
    top_models_to_improve=3,
    golden_features=True,
    features_selection=False,
    stack_models=True,
    train_ensemble=True,
    explain_level=0,
    validation_strategy={
        "validation_type": "kfold",
        "k_folds": 4,
        "shuffle": False,
        "stratify": True,
    }
)

It will train models with CatBoost, Xgboost and LightGBM algorithms.
Each model will be trained for 30 minutes (30*60 seconds). total_time_limit is not set.
There will be trained about 10+3*3*2=28 unstacked models and 10 stacked models for each algorithm. (There is stacked up to 10 models for each algorithm)
There will trained Ensemble based on unstacked models and Ensemble_Stacked from unstacked and stacked models.
In total there will be about 3*28+2=86 models trained.
explain_level=0 means that there will be only learning curves saved. No other explanations will be computed.

Not every algorithm is tuned. Models which are not tuned: Baseline, Decision Tree, Linear, Nearest Neighbors. ↩
The exact number cannot be given, because sometimes the newly generated hyperparameters are rejected during not_so_random or hill_climbing steps because of model duplicates or invalid hyperparameters set. ↩↩