Golden Features¶
Golden Features are the features which have great predictive power. They can be constructed based on the original features. The common way to construct them is to try features differences or ratios. The mljar-superverised package has built-in step that performs Golden Features search.
The procedure to find Golden Features:
- Generate all possible unique pairs of original features.
- If there is more than
250,000pairs then subsample them randomly to250,000. - For each pair of features construct a new feature with substract or division operators.
- Based on the new feature train a
Decision Treewithmax_depth = 3(using only one feature). - For training there are used up to
2,500samples randomly selected from the dataset. The same for testing, also up to2,500samples randomly selected. - There is computed a score on test samples for each feature. The score is
loglossmetric for classification tasks, andmean squared errorfor regression tasks. - Newly generated features are sorted based on the score (the lower score the better).
- As a Golden Features there are selected new features with smallest score values and are inserted into the training data.
- The number of Golden Features selected depends on number of original features. It is
5%of original features number, but not less than5features and not more than50features. - The results of Golden Features search is saved into
golden_features.jsonfile.