Feature Engineering
Categories: IBM Machine Learning
Updated:
Variable Transformation
We often assume normally distributed data. But often skewed -> Data transformation solve this
Log transformation
Log transformation is a transformation that takes the natural log of the data. Useful for linear regression. Solve right skewed
Polynomial Features
higher order relationship
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2)
polyfeat = poly.fit(data)
polyfeat.transform(data)
Feature Encoding
Variable selection involves choosing the set of features
-
Encoding: converting non-numeric features to numeric features
-
applied to categorical features
- Nominal: Red, blue
- Ordinal: high, medium, low
- Binary encoding: 0, 1
- One-hot encoding: multiple columns for each category with binary vaiables
- Ordinal encoding: converting ordered categories to numerical values. (e.g. 0,1,2,3,…)
-
applied to categorical features
- Scailing: converting the scale of numeric data so they are comparable
Feature Scaling
Adjusting a variable scale, so that comparison of variables with different scales
Why problematic?
If scale is so small, it will be hard to compare.
Approach
- Standard scailing: convert features to standard normal variable
- Min-max scailing: convert features to min-max. It is sensitive to outliers
- Robust scaling: similar to min-max but maps the interquartile range 1Q to 0 and 3Q to 1. Other range takes values outside of the (0,1) interval.
Common Variable Transformation
Feature type
-
Continuous: Standard, Min-max, Robust scaling
-
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler
-
-
Nominla: Categorical, unordered features: Binanry, One-hot
encoding
-
from sklearn.preprocessing import LabelEncoder, LabelBinarizer, OneHotEncoder
-
from pandas import get_dummies
-
-
Ordinal: Categorical, ordered features: Ordinal encoding
-
from sklearn.feature_extraction import DictVectorizer
-
from sklearn.preprocessing import OrdinalEncoder
-
ETC
-
- Boxcox transformation
- A Box Cox transformation is a transformation of non-normal dependent variables into a normal shape.
Leave a comment