Feature Selection

https://github.com/INGEOTEC/IngeoML/actions/workflows/test.yaml/badge.svg https://coveralls.io/repos/github/INGEOTEC/IngeoML/badge.svg?branch=develop https://badge.fury.io/py/IngeoML.svg https://readthedocs.org/projects/ingeoml/badge/?version=latest

IngeoML.feature_selection API

class SelectFromModelCV[source]
>>> from IngeoML import SelectFromModelCV
>>> from sklearn.svm import LinearSVC
>>> from sklearn.datasets import load_wine
>>> from sklearn.metrics import f1_score
>>> import pandas as pd
>>> import seaborn as sns
>>> X, y = load_wine(return_X_y=True)
>>> scoring = lambda y, hy: f1_score(y, hy, average='macro')
>>> select = SelectFromModelCV(estimator=LinearSVC(dual='auto'),
                               scoring=scoring,
                               prefit=False).fit(X, y)

The performance of the selection mechanisim can be seen in the following figure

>>> perf = select.cv_results_
>>> _ = [{'d': k, 'macro-f1': v} for k, v in perf.items()]
>>> df = pd.DataFrame(_)
>>> sns.set_style('whitegrid')    
>>> sns.lineplot(df, x='d', y='macro-f1')
_images/SelectFromModelCV.png
__init__(estimator: Any, *, prefit: bool = False, norm_order: float | int = 1, max_features: Callable[[...], Any] | int | None = None, importance_getter: str | Callable[[...], Any] = 'auto', min_features_to_select: int = 2, cv=None, scoring=None, max_iter: int = 10) None[source]
property max_iter

Number of points to sample between 2 and max_features

__new__(**kwargs)
fit(X, y, groups=None)[source]

Choose the number of features

property cv

Crossvalidation parameters

property scoring

Score function

property min_features_to_select

Minimum number of features to select