Should small retailers prefer global or local explanations for churn models?

Both serve complementary purposes. Global explanations inform strategic decisions such as designing loyalty programs or adjusting store operations. Local explanations guide tactical decisions about individual customer outreach. A practical approach is to use global explanations for program design and local explanations for prioritizing and customizing individual retention actions.

How reliable are SHAP values for non-linear churn models?

TreeSHAP computes exact Shapley values for tree-based models, providing theoretically grounded attributions. For neural networks, KernelSHAP provides approximate values with statistical guarantees that improve with more samples. The reliability depends on model stability: if small changes in training data substantially alter the model, SHAP values will also vary. Ensemble methods and cross-validation help assess explanation stability.

Can interpretability methods detect when a churn model is using spurious correlations?

Yes, partially. SHAP dependence plots can reveal unexpected feature relationships that suggest spurious correlations — for instance, if a product code predicts churn due to a data collection artifact rather than a genuine behavioral signal. However, interpretability methods cannot definitively distinguish correlation from causation; domain expertise and causal analysis are needed to validate the model's learned relationships.

Point of Sale & RetailIntermediate9 min read

Interpretable ML for Retail Churn: Global vs. Local Explanations

Compare global and local interpretability methods for machine learning churn models built on PoS data, focusing on actionability for small retail operators.

Key Takeaways

Global explanations reveal which features drive churn predictions across the entire customer base, guiding strategic retention program design.
Local explanations identify why a specific customer is predicted to churn, enabling personalized retention interventions.
SHAP values unify global and local interpretability within a consistent game-theoretic framework applicable to any PoS-derived churn model.

The Interpretability Imperative in Retail Churn Prediction

Customer churn — the cessation of purchasing activity — is a critical concern for small retailers, where each customer represents a larger share of revenue than in mass-market settings. Machine learning models trained on PoS transaction features such as recency, frequency, monetary value, basket composition, and visit regularity can predict churn with high accuracy, but their utility depends on whether the retailer can understand and act on the predictions. A black-box model that flags customers as high-churn-risk without explaining why provides limited guidance for intervention design. Should the retailer offer a discount, improve product assortment, adjust store hours, or enhance service quality? The answer depends on the drivers of churn for the specific customer, which requires model interpretability. Interpretability is not a monolithic concept. Global interpretability describes the model's overall behavior — which features are most important across all customers, how features interact, and what the model has learned about the general churn process. Local interpretability explains individual predictions — why this specific customer received this specific churn score. Both levels serve distinct purposes in retail decision-making, and a comprehensive interpretability strategy addresses both. The stakes are particularly high for small retailers, where misallocating limited retention budgets based on misunderstood model outputs can be more costly than not using a model at all.

Global Interpretability Methods

Global methods summarize the model's learned relationships across the entire dataset. Permutation feature importance measures how much model performance degrades when a feature is randomly shuffled, quantifying each feature's contribution to predictive accuracy. For PoS churn models, this typically reveals that recency of last visit, trend in visit frequency, and changes in basket size are among the most important predictors. Partial dependence plots (PDPs) show the marginal effect of a single feature on the predicted churn probability, averaging over the values of all other features. A PDP for recency might show that churn probability remains flat for recency values up to two weeks, then rises steeply — suggesting that customers who haven't visited in two weeks should be targeted for re-engagement. Accumulated local effects (ALE) plots improve on PDPs by accounting for feature correlations, providing less biased effect estimates when features such as visit frequency and monetary value are correlated. Global surrogate models fit an interpretable model — typically a shallow decision tree or rule list — to the predictions of the complex model, distilling its behavior into a human-readable format. The surrogate's fidelity to the original model's predictions, measured by R-squared, quantifies how well the simplification captures the complex model's logic. For PoS churn models, surrogate trees with five to ten leaves often achieve fidelity above ninety percent, suggesting that the complex model's behavior is approximately piecewise linear in the most important features.

Local Interpretability: LIME and Individual Conditional Expectations

Local Interpretable Model-agnostic Explanations (LIME) explains individual predictions by fitting a simple linear model to the complex model's behavior in the neighborhood of the instance being explained. LIME perturbs the input features, obtains predictions from the complex model for each perturbation, and fits a weighted linear regression where perturbations closer to the original instance receive higher weight. The resulting coefficients indicate which features pushed the prediction toward or away from churn for that specific customer. For a PoS churn prediction, LIME might reveal that a particular customer's high churn score is primarily driven by a decline in visit frequency over the past month and a narrowing of product categories purchased, while their still-high monetary value partially offsets the churn risk. This granular explanation enables targeted intervention — perhaps a promotional offer on the categories the customer has stopped buying. Individual conditional expectation (ICE) plots extend partial dependence to the individual level, showing how the prediction for a single customer changes as one feature varies while others remain fixed. Comparing ICE curves across customers reveals heterogeneity in feature effects: visit frequency might be a strong churn predictor for daily shoppers but irrelevant for monthly bulk buyers. This heterogeneity is invisible in global PDP plots, which average across all customers. LIME and ICE are model-agnostic, applicable to any PoS churn model including gradient-boosted trees, neural networks, and random forests, making them versatile tools for retail analytics platforms.

SHAP: Unifying Global and Local Explanations

SHapley Additive exPlanations (SHAP) provide a theoretically grounded framework that unifies global and local interpretability. Rooted in cooperative game theory, SHAP values assign each feature a contribution to the prediction for a given instance, with the property that contributions sum to the difference between the prediction and the average prediction across all customers. This additive decomposition is unique under the axioms of efficiency, symmetry, linearity, and dummy — desirable properties that LIME and permutation importance satisfy only approximately. For tree-based models commonly used in PoS churn prediction, the TreeSHAP algorithm computes exact SHAP values in polynomial time, making it computationally feasible for real-time explanation of individual predictions. At the local level, a SHAP force plot for a specific customer shows which features increase and which decrease their churn probability relative to the baseline rate, providing an intuitive visualization for the retailer. At the global level, the mean absolute SHAP value for each feature across all customers yields a feature importance ranking that is consistent with the local explanations — a property that permutation importance does not guarantee. SHAP dependence plots combine the virtues of PDPs and interaction detection, revealing both the marginal effect of a feature and its interactions with other features. For analytics platforms serving small retailers, SHAP values can be precomputed for each customer and surfaced through an intuitive dashboard that explains churn predictions in business language rather than statistical jargon.

Actionability: From Explanation to Intervention

Interpretability is a means to an end: the ultimate goal is actionable insights that improve customer retention. The bridge from explanation to intervention requires several additional considerations. First, feature contributions must be mapped to controllable actions. A SHAP analysis showing that declining visit frequency drives churn is only useful if the retailer can take actions that influence visit frequency — such as loyalty programs, targeted communications, or assortment adjustments. Features that are predictive but non-actionable (such as customer age or residential distance) should be flagged as informational rather than intervention targets. Second, the causal validity of feature contributions must be assessed. SHAP and LIME measure predictive importance, not causal effects. A feature may predict churn because it is correlated with the true cause rather than being the cause itself. Interventions based on correlational explanations may be ineffective or even counterproductive. Combining interpretability with causal inference techniques — such as instrumental variables or difference-in-differences applied to natural experiments in PoS data — strengthens the causal basis for interventions. Third, the cost-effectiveness of interventions must be evaluated. Not every at-risk customer warrants the same retention investment. A decision framework that combines the predicted churn probability, the customer's lifetime value estimate, and the expected response to intervention determines optimal retention budget allocation. Platforms like askbiz.co can integrate these components into a unified retention workflow that surfaces interpretable churn predictions alongside recommended actions and expected returns.

Variational Autoencoders for Customer Embedding in PoS Data9 min read · Intermediate Multi-Task Learning for Joint Prediction of PoS KPIs9 min read · Intermediate Algorithmic Bias in PoS-Derived Customer Segmentation: Identification, Measurement, and Mitigation in SME Contexts10 min read · Advanced