Universal Features Selection Procedure¶
Feature selection in AutoML is performed in two steps during fitting:
random_feature- in whichAutoMLdecides which features are important and which should be dropped.features_selection- training of the new models on selected features.
random_feature step¶
In this step the following actions are performed:
- Select the best model so far and save its hyperparameters.
- Insert a
radnom_featureto the dataset. The feature has uniform distribution from0to1range. - Train the model with the best hyperparameters on extended dataset.
- Compute permutation-based feature importance for the new model. Because of using permutation-based feature importance this procedure can be applied to any Machine Learning algorithm.
- For each feature count how many times it has smaller importance than
random_feature. - If feature was less important at half of learners or more, then drop this feature.
Note
There can be several learners in the model. For the example in 10-fold cross-validation, there will be 10 learners. So feature to be dropped must be less important for half or more of the learners (at least 5 times to be dropped).
features_selection step¶
- If there are no features to be dropped then this step is skipped.
- There is selected the best model for each of the algorithm type: [
Xgboost,LightGBM,CatBoost,Neural Network,Random Forest,Extra Trees]. The hyperparameters of the best models are copied and reused with selected features.