Mathematical Formalism of Synthetic Controls in GeoSC

This document provides a rigorous econometric and statistical foundation for the methods implemented in GeoSC, intended for Data Scientists and Statisticians.

1. The Potential Outcomes Framework

Let $Y_{it}$ denote the outcome of interest for region $i \in \{1, \dots, N\}$ at time period $t \in \{1, \dots, T\}$. We observe a pre-treatment period $t \in \{1, \dots, T_0\}$ and a post-treatment period $t \in \{T_0+1, \dots, T\}$.

Without loss of generality, let unit $i=1$ be the treated unit, and units $i \in \{2, \dots, N\}$ be the donor pool (control units).

Following the Rubin Causal Model, we define potential outcomes:

  • $Y_{it}^N$: The outcome that would be observed for unit $i$ at time $t$ absent the intervention.
  • $Y_{it}^I$: The outcome that would be observed for unit $i$ at time $t$ exposed to the intervention.

The observed outcome is:

$$ Y_{it} = Y_{it}^N + \alpha_{it} D_{it} $$

Where $D_{it}$ is an indicator variable equal to 1 if unit $i$ receives treatment at time $t$, and 0 otherwise. $\alpha_{it} = Y_{it}^I - Y_{it}^N$ is the treatment effect for unit $i$ at time $t$.

Our goal is to estimate the Average Treatment Effect on the Treated (ATT) during the post-treatment period:

$$ \tau = \frac{1}{T - T_0} \sum_{t=T_0+1}^{T} \alpha_{1t} = \frac{1}{T - T_0} \sum_{t=T_0+1}^{T} (Y_{1t}^I - Y_{1t}^N) $$

Since $Y_{1t}^I$ is observed post-intervention ($Y_{1t}$), the fundamental problem of causal inference is estimating the unobserved counterfactual $Y_{1t}^N$.

2. The Standard Synthetic Control Estimator

The Synthetic Control Method (Abadie, Diamond, and Hainmueller, 2010) estimates the counterfactual $Y_{1t}^N$ as a weighted combination of the donor pool:

$$ \hat{Y}_{1t}^N = \sum_{j=2}^{N} w_j Y_{jt} $$

Where $\mathbf{W} = (w_2, \dots, w_N)'$ is a vector of weights satisfying:

  1. $w_j \geq 0 \quad \forall j$ (Non-negativity)
  2. $\sum_{j=2}^{N} w_j = 1$ (Simplex constraint)

The weights are chosen to minimise the discrepancy between the treated unit and the synthetic control in the pre-treatment period, typically by minimizing:

$$ || \mathbf{X}_1 - \mathbf{X}_0 \mathbf{W} ||_V = \sqrt{(\mathbf{X}_1 - \mathbf{X}_0 \mathbf{W})' V (\mathbf{X}_1 - \mathbf{X}_0 \mathbf{W})} $$

Where $\mathbf{X}_1$ is a $(K \times 1)$ vector of pre-intervention characteristics for the treated unit, $\mathbf{X}_0$ is a $(K \times J)$ matrix of the same variables for the donor pool, and $V$ is a positive semi-definite weighting matrix.

3. Why SparseSC over Synthetic Difference-in-Differences

Synthetic Difference-in-Differences (SDiD) (Arkhangelsky et al., 2021) proposes adding a time fixed effect $\hat{\lambda}$ and a unit fixed effect $\hat{\omega}$ to the synthetic control estimator:

$$ \hat{\tau}^{SDiD} = \frac{1}{T - T_0} \sum_{t=T_0+1}^{T} \left( Y_{1t} - \sum_{j=2}^{N} \hat{w}_j Y_{jt} - \hat{\omega} - \hat{\lambda}_t \right) $$

While theoretically appealing, SDiD relies heavily on a parallel trends assumption between the treated unit and the reweighted donor pool. In modern media-mix modelling and retail geo-experiments, geographic markets often exhibit highly nonlinear localized trends (e.g., localized COVID recovery, regional promotional overlap, weather shocks) that violate parallel trends even after reweighting.

GeoSC employs SparseSC (Sparse Synthetic Controls), which strictly enforces sparsity via an L1/L2 penalty (Lasso/Ridge regularisation) rather than relying on intercepts to correct for poor fit.

By encouraging a small, interpretable subset of donors and penalizing complex weights, SparseSC can reduce interpolation bias when comparable donors exist. It does not prove that the selected donor set shares the treated unit’s DGP, and parallel-trends or spillover diagnostics remain design checks rather than guarantees.

4. SparseSC Regularisation Mechanics

Let $\mathbf{Y}_0^{pre}$ be the $(T_0 \times J)$ matrix of pre-treatment outcomes for donors, and $\mathbf{Y}_1^{pre}$ be the $(T_0 \times 1)$ vector for the treated unit.

SparseSC modifies the standard SC optimisation by introducing a regularisation parameter $\lambda$:

$$ \hat{\mathbf{W}} = \arg \min_{\mathbf{W}} || \mathbf{Y}_1^{pre} - \mathbf{Y}_0^{pre} \mathbf{W} ||_2^2 + \lambda ||\mathbf{W}||_1 $$

Subject to the simplex constraints $\mathbf{1}'\mathbf{W} = 1, \mathbf{W} \geq 0$.

The out-of-sample tuning of $\lambda$ is achieved via rolling-origin cross-validation over the pre-treatment period, minimizing the out-of-sample mean squared prediction error (MSPE). This avoids overfitting to pre-treatment noise (a common failure mode in standard SCM when $T_0$ is small relative to $J$).

5. Placebo Inference and Empirical P-Values

Because geo-experiments often have $N_1 = 1$ or a small integer, asymptotic inference is usually not appropriate. GeoSC calculates donor-pool empirical p-values using in-space placebo permutations.

We iteratively reassign the treatment status to every donor unit $j \in \{2, \dots, N\}$, calculate a placebo synthetic control $\hat{Y}_{jt}^N$, and derive a placebo effect $\hat{\alpha}_{jt}$.

The ratio of post-treatment MSPE to pre-treatment MSPE is calculated for the actual treated unit and all placebos:

$$ r_i = \frac{\frac{1}{T-T_0} \sum_{t=T_0+1}^T (Y_{it} - \hat{Y}_{it}^N)^2}{\frac{1}{T_0} \sum_{t=1}^{T_0} (Y_{it} - \hat{Y}_{it}^N)^2} $$

The p-value is the empirical probability of observing a ratio at least as extreme as the treated unit’s ratio:

$$ p = \frac{1}{N} \sum_{j=1}^N \mathbf{I}(r_j \geq r_1) $$

This quantity is only as informative as the placebo reference set. With few eligible donors, the attainable p-values are coarse; with non-comparable donors, spillovers, or poor pre-period fit, the empirical comparison can be misleading.