Can feature engineering be automated?

Partially. Automated feature engineering tools like Featuretools generate candidate features from relational data. However, the most valuable features typically come from domain expertise that automated tools cannot replicate. A hybrid approach — using automated tools for exploration and human judgement for selection and refinement — usually produces the best results.

How many features should a model have?

There is no universal answer. Too few features limit the model's ability to learn patterns. Too many features increase computational cost and risk overfitting — where the model memorises training data rather than learning generalisable patterns. Most practical models use 20 to 200 features. Feature selection techniques help identify the optimal subset from a larger candidate set.

What is the difference between feature engineering and feature selection?

Feature engineering creates new features from raw data. Feature selection chooses the most useful subset from all available features (both original and engineered). They are complementary steps — first you engineer a broad set of candidate features, then you select the ones that contribute most to model performance while removing those that add noise.

AI & DataAdvanced5 min read

What Is Feature Engineering?

Feature engineering transforms raw data into input variables that improve machine learning model performance. Learn why it is often more impactful than model selection.

Key Takeaways

Feature engineering creates new input variables from raw data that help machine learning models make better predictions.
Good features capture domain knowledge in a format that algorithms can learn from.
Feature engineering often has a bigger impact on model accuracy than choosing a more sophisticated algorithm.

What features are

In machine learning, a feature is an individual measurable property of the data that serves as input to a model. Raw data rarely comes in a form that models can use effectively. Feature engineering is the process of transforming raw data into features that better represent the underlying patterns. For example, a raw timestamp can be engineered into features like day-of-week, hour-of-day, is-weekend, and days-since-last-purchase — each capturing a different aspect of time that might influence predictions.

Common techniques

Aggregation creates summary statistics — average order value over the past 90 days. Binning groups continuous values into categories — income ranges instead of exact figures. Encoding converts categorical data into numerical format — turning product categories into binary columns. Interaction features combine two variables — price multiplied by quantity equals revenue. Time-based features extract patterns from dates — recency, frequency, and seasonality. Domain expertise guides which transformations will be most informative.

Why it matters more than model choice

A simple logistic regression model with expertly engineered features frequently outperforms a complex deep learning model with raw, unprocessed data. Features encode human knowledge about the problem domain into the data. An African fintech building a credit scoring model, for example, might engineer features from mobile money transaction patterns — transaction frequency, average amount, merchant diversity — that directly capture creditworthiness signals specific to the market.

Feature engineering workflow

Start by understanding the business problem and the data available. Generate feature candidates based on domain knowledge and exploratory analysis. Evaluate each feature's predictive power using statistical tests or feature importance scores from preliminary models. Remove features that add noise without predictive value. Iterate — the best feature sets emerge through experimentation. Automate feature computation in your data pipeline so features are consistently generated for both training and production prediction.

What Is Predictive Analytics?4 min read · Intermediate What Is a Data Pipeline?4 min read · Intermediate What Is MLOps?5 min read · Advanced