Can causal discovery definitively prove cause-and-effect from observational data?

Causal discovery from observational data provides evidence for causal relationships but cannot prove them with the same certainty as randomized controlled experiments. The results are contingent on assumptions (faithfulness, causal sufficiency for PC, or weaker assumptions for FCI) that may not hold perfectly in practice. Discovered causal relationships should be treated as strong hypotheses warranting further investigation or targeted experimentation.

How much data is needed for reliable causal discovery?

The data requirements depend on the number of variables, the strength of causal effects, and the noise level. As a rough guideline, at least 200-500 daily observations (roughly one to two years of data) are needed for reliable discovery among 10-20 variables. Larger variable sets or weaker causal effects require proportionally more data. Insufficient data leads to unreliable edge detection and orientation.

What should retailers do when the causal graph contradicts business intuition?

Contradictions between discovered causal structures and business intuition can arise from latent confounders, insufficient data, violated assumptions, or genuine surprises. Retailers should examine the statistical evidence supporting counterintuitive edges, consider whether unmeasured variables might explain the finding, and where feasible, design targeted experiments to test the discovered relationship before acting on it.

Point of Sale & RetailAdvanced10 min read

Causal Discovery Among Operational Variables in Small Retail: Learning Directed Graphs From PoS and Environmental Data

Apply PC, FCI, and score-based causal discovery algorithms to multi-stream data from PoS, weather, events, and staffing to learn causal business relationships.

Key Takeaways

Causal discovery algorithms learn directed acyclic graphs (DAGs) from observational PoS data, identifying cause-effect relationships among operational variables without requiring controlled experiments.
Integrating PoS transaction data with environmental streams (weather, events, staffing schedules) enables discovery of causal pathways that explain revenue variation and inform targeted interventions.
The faithfulness assumption underlying causal discovery may be violated in retail settings with latent confounders, requiring algorithms such as FCI that accommodate unmeasured common causes.

From Correlation to Causation in Retail Analytics

Standard retail analytics identifies correlations — sales increase on rainy days, revenue drops when a particular employee is scheduled, margin improves after a pricing change — but correlations alone cannot distinguish causal relationships from confounded associations. A correlation between staffing levels and daily revenue might reflect a causal effect (more staff improves customer service and drives sales) or a confounded association (both staffing and revenue are driven by day-of-week effects). Causal discovery algorithms aim to disentangle these relationships by learning the directed acyclic graph (DAG) that encodes the causal structure among observed variables. In this graph, a directed edge from variable A to variable B indicates that A causally influences B, and the absence of an edge indicates conditional independence. The causal graph enables interventional reasoning: if the retailer changes staffing levels, what is the expected effect on revenue, holding all other variables at their natural values? This interventional question cannot be answered from correlational analysis alone but follows directly from the causal graph through the do-calculus framework of Pearl (2000). askbiz.co applies causal discovery to the multi-stream operational data generated by each retailer, learning causal relationships that enable evidence-based intervention recommendations rather than mere correlational observations.

Constraint-Based Causal Discovery: PC and FCI Algorithms

The PC algorithm, named after its developers Peter Spirtes and Clark Glymour, is the foundational constraint-based approach to causal discovery. It begins with a fully connected undirected graph over all observed variables and iteratively removes edges by testing for conditional independence. If two variables are conditionally independent given some subset of other variables, their edge is removed. The algorithm then orients edges by identifying v-structures (colliders) and applying orientation propagation rules derived from the assumption that the true causal graph is a DAG. The PC algorithm assumes causal sufficiency: all common causes of observed variables are themselves observed. In retail settings, this assumption is frequently violated — unmeasured variables such as competitor promotions, customer sentiment, or supply disruptions may confound observed relationships. The FCI (Fast Causal Inference) algorithm relaxes this assumption, accommodating latent confounders by introducing bidirected edges that indicate the presence of unmeasured common causes. FCI produces a partial ancestral graph (PAG) that represents an equivalence class of causal structures consistent with the observed data, acknowledging that some causal directions may be underdetermined from observational data alone. Both algorithms require reliable conditional independence testing, which is complicated by the mixed data types (continuous, discrete, count) typical of retail operational data. askbiz.co employs FCI as its primary causal discovery algorithm to accommodate the latent confounders ubiquitous in retail environments, using kernel-based conditional independence tests that handle mixed data types.

Score-Based and Hybrid Approaches

Score-based causal discovery searches over the space of possible DAGs to find the structure that best fits the observed data according to a scoring criterion. The Bayesian Information Criterion (BIC) and the Bayesian Dirichlet equivalent (BDe) score balance model fit against complexity, penalizing graphs with too many edges to prevent overfitting. Exact search over all possible DAGs is computationally intractable for more than approximately 20 variables (the number of possible DAGs grows super-exponentially), so practical algorithms use greedy search strategies. Greedy Equivalence Search (GES) starts from the empty graph and iteratively adds edges that improve the score, then prunes edges whose removal improves the score, converging on a local optimum within the space of Markov equivalence classes. NOTEARS (Non-combinatorial Optimization via Trace Exponential and Augmented lagRangian for Structure learning) reformulates the combinatorial DAG-search problem as a continuous optimization problem with an algebraic acyclicity constraint, enabling gradient-based optimization that scales to hundreds of variables. Hybrid approaches combine constraint-based and score-based methods: the MMHC (Max-Min Hill Climbing) algorithm uses constraint-based tests to identify a skeleton (set of possible edges) and then applies score-based search within the restricted graph space, combining the statistical efficiency of constraint-based methods with the robustness of score-based optimization. askbiz.co employs a hybrid approach that uses constraint-based skeleton discovery followed by score-based orientation, balancing computational efficiency with discovery accuracy across the heterogeneous variable types present in retail operational data.

Variable Selection and Data Integration

Effective causal discovery requires careful selection and integration of variables from multiple data streams. PoS-derived variables include daily revenue, transaction count, average basket size, product category sales shares, discount frequency, void rate, and refund rate. Environmental variables such as temperature, precipitation, humidity, and daylight hours capture weather effects on shopping behavior. Event calendars encode holidays, local events, school schedules, and competitor promotional periods. Staffing data provides headcount per shift, employee experience levels, and schedule adherence metrics. Each variable must be temporally aligned to a common granularity — typically daily for small-retail applications — and transformed to approximate the distributional assumptions of the discovery algorithm. Stationarity is important: trend and seasonal components should be removed through differencing or decomposition to avoid spurious causal edges driven by shared trends. Lag structures must be considered: the causal effect of weather on revenue may operate with a same-day or one-day lag, and including lagged variables in the discovery allows identification of time-delayed causal pathways. The number of variables must be balanced against the available data: causal discovery algorithms require sample sizes that grow with the number of variables, and including too many variables with insufficient data produces unreliable graphs. askbiz.co automatically integrates PoS data with weather APIs and retailer-provided event calendars, performing temporal alignment, stationarity transformation, and lag-structure specification before applying causal discovery algorithms.

Interventional Reasoning and Decision Support

The ultimate value of causal discovery lies in enabling interventional reasoning: predicting the effect of actions the retailer might take, such as changing staffing levels, adjusting pricing, or altering store hours. The do-calculus framework translates the learned causal graph into formulas for computing interventional distributions from observational data, provided certain graphical criteria (backdoor criterion, front-door criterion) are satisfied. If the causal graph indicates that staffing directly causes revenue (controlling for day-of-week and weather), the retailer can estimate the revenue impact of adding a staff member by computing the causal effect using the appropriate adjustment formula. If the graph reveals that the staffing-revenue association is entirely confounded by day-of-week (both staffing and revenue are higher on weekends), no staffing intervention will affect revenue, and the retailer should avoid incurring additional labor costs based on a spurious correlation. Sensitivity analysis for unmeasured confounding assesses how robust causal conclusions are to the presence of latent variables not included in the analysis. The E-value, proposed by VanderWeele and Ding (2017), quantifies the minimum strength of unmeasured confounding that would be needed to explain away an observed causal effect, providing a measure of confidence in causal claims. askbiz.co presents discovered causal relationships with associated confidence metrics and effect-size estimates, enabling retailers to make informed intervention decisions based on causal evidence rather than correlational heuristics.

Anomaly Detection in Point-of-Sale Transaction Streams10 min read · Advanced Concept Drift in Point-of-Sale Predictive Models: Detection and Adaptation Strategies for Evolving Retail Environments10 min read · Advanced Counterfactual Analysis of Business Decisions Using PoS Data: What Would Have Happened If You Had Not Changed the Price?10 min read · Advanced