Universal Features Selection Procedure¶
Feature selection in
AutoML is performed in two steps during fitting:
random_feature- in which
AutoMLdecides which features are important and which should be dropped.
features_selection- training of the new models on selected features.
In this step the following actions are performed:
- Select the best model so far and save its hyperparameters.
- Insert a
radnom_featureto the dataset. The feature has uniform distribution from
- Train the model with the best hyperparameters on extended dataset.
- Compute permutation-based feature importance for the new model. Because of using permutation-based feature importance this procedure can be applied to any Machine Learning algorithm.
- For each feature count how many times it has smaller importance than
- If feature was less important at half of learners or more, then drop this feature.
There can be several learners in the model. For the example in 10-fold cross-validation, there will be 10 learners. So feature to be dropped must be less important for half or more of the learners (at least
5 times to be dropped).
- If there are no features to be dropped then this step is skipped.
- There is selected the best model for each of the algorithm type: [
Extra Trees]. The hyperparameters of the best models are copied and reused with selected features.