What Is Feature Engineering?
Feature engineering transforms raw data into input variables that improve machine learning model performance. Learn why it is often more impactful than model selection.
Key Takeaways
- Feature engineering creates new input variables from raw data that help machine learning models make better predictions.
- Good features capture domain knowledge in a format that algorithms can learn from.
- Feature engineering often has a bigger impact on model accuracy than choosing a more sophisticated algorithm.
What features are
In machine learning, a feature is an individual measurable property of the data that serves as input to a model. Raw data rarely comes in a form that models can use effectively. Feature engineering is the process of transforming raw data into features that better represent the underlying patterns. For example, a raw timestamp can be engineered into features like day-of-week, hour-of-day, is-weekend, and days-since-last-purchase — each capturing a different aspect of time that might influence predictions.
Common techniques
Aggregation creates summary statistics — average order value over the past 90 days. Binning groups continuous values into categories — income ranges instead of exact figures. Encoding converts categorical data into numerical format — turning product categories into binary columns. Interaction features combine two variables — price multiplied by quantity equals revenue. Time-based features extract patterns from dates — recency, frequency, and seasonality. Domain expertise guides which transformations will be most informative.
Why it matters more than model choice
A simple logistic regression model with expertly engineered features frequently outperforms a complex deep learning model with raw, unprocessed data. Features encode human knowledge about the problem domain into the data. An African fintech building a credit scoring model, for example, might engineer features from mobile money transaction patterns — transaction frequency, average amount, merchant diversity — that directly capture creditworthiness signals specific to the market.
Feature engineering workflow
Start by understanding the business problem and the data available. Generate feature candidates based on domain knowledge and exploratory analysis. Evaluate each feature's predictive power using statistical tests or feature importance scores from preliminary models. Remove features that add noise without predictive value. Iterate — the best feature sets emerge through experimentation. Automate feature computation in your data pipeline so features are consistently generated for both training and production prediction.