Universal Features Selection Procedure¶
Feature selection in AutoML
is performed in two steps during fitting:
random_feature
- in whichAutoML
decides which features are important and which should be dropped.features_selection
- training of the new models on selected features.
random_feature
step¶
In this step the following actions are performed:
- Select the best model so far and save its hyperparameters.
- Insert a
radnom_feature
to the dataset. The feature has uniform distribution from0
to1
range. - Train the model with the best hyperparameters on extended dataset.
- Compute permutation-based feature importance for the new model. Because of using permutation-based feature importance this procedure can be applied to any Machine Learning algorithm.
- For each feature count how many times it has smaller importance than
random_feature
. - If feature was less important at half of learners or more, then drop this feature.
Note
There can be several learners in the model. For the example in 10-fold cross-validation, there will be 10 learners. So feature to be dropped must be less important for half or more of the learners (at least 5
times to be dropped).
features_selection
step¶
- If there are no features to be dropped then this step is skipped.
- There is selected the best model for each of the algorithm type: [
Xgboost
,LightGBM
,CatBoost
,Neural Network
,Random Forest
,Extra Trees
]. The hyperparameters of the best models are copied and reused with selected features.