Configuration Reference
GeoSC uses flat YAML keys for its configuration.
The example below reflects the supported YAML shape. The data-config/
directory is a source-tree example asset, not a packaged runtime asset, so
built wheel and sdist installs should replace those demo paths with user-owned
config and data paths.
Inference Config Parameters
Below are the standard configuration keys used in
geolift_analysis_config.yaml:
Data Contract Policies
Inference defaults are fail-closed:
duplicate_policy: "error"rejects duplicate location-date rows. Use"mean"or"sum"only when that aggregation is statistically intended.missing_outcome_policy: "error"rejects incomplete outcome panels. Explicit alternatives are"drop_unit","drop_period", and"impute_with_report".
Every inference run writes data_validation.json next to the result artifacts.
Review it before interpreting lift. The artifact includes raw outcome value
checks plus separate input-panel and model-panel quality summaries, so missing
values hidden by explicit duplicate aggregation remain auditable.
Inference also writes geolift_results.json with schema_version and
status. Treat ok as “the configured contract completed”, not as a proof of
causal validity. Treat partial as “estimate available but interpretation
metadata incomplete”, and treat failed as unusable for decision-making.
Assumption Gates
run_assumption_checks: falseskips parallel-trends and spillover checks but still writesassumption_validation.jsonwithoverall_status: "skipped".require_assumption_checks: truemakes failed checks a hard result gate:geolift_results.jsonis markedstatus: "failed".fail_on_assumption_error: truetreats validator exceptions as failed gates when checks are run.parallel_trends_methodsupports"placebo","regression", and"visual".spillover_methodsupports"correlation","change", and"visual".
Assumption checks are limited diagnostics. run_assumption_checks: false
produces a skipped artifact, advisory checks can mark results partial, and
required checks can fail the run. Passing checks cannot prove no spillover or
perfect parallel structure.
Power Config Parameters
Power analysis accepts the demo keys in
data-config/power_analysis_config.yaml, including:
random_seed makes repeated runs with the same config reproducible. Each
power_analysis_results.csv row records n_failed, failure_rate, valid,
random_seed, simulation_seed, dgp_rank, dgp_explained_var,
requested_max_n_pl, possible_placebos, effective_max_n_pl, dgp_backend,
generation_backend, seeded_reproducibility_mode, and effect_pattern. Rows
with valid: false exceeded the configured failure tolerance or had no
successful simulations. power_failure_rate_threshold must be a finite number
in [0, 1]; non-finite values are rejected at construction. When random_seed
is set, GPU DGP estimation and GPU post-period generation are disabled so
seeded CPU simulation draws are reproducible. The shipped demo config sets
use_gpu: false to match this seeded reproducibility default.
Power values are Monte Carlo estimates, not guarantees; compare them with
power_ci_lower, power_ci_upper, n_simulations, and failure_rate.
If a power-analysis design has fewer control units than treatment units the
calculator now fails fast at construction with a ValueError rather than
forwarding an infeasible placebo count to SparseSC. plot_power_curves filters
rows with valid: false out of the decision plots (logging the count of
excluded rows) and skips plot creation entirely when no valid rows remain.
Donor Config Parameters
Donor evaluation accepts donor_dominance_threshold in addition to the existing
metric thresholds. donor_eval_results.csv includes rank, quality_band,
warning_flags, and selected_weight. selected_weight is a normalized
recommendation weight within the ranked donor set; it is not a fitted SparseSC
weight. The stage also writes versioned donor_pool_quality.json with threshold
settings, warning-flag counts, selected-weight concentration, effective donor
count, and by-treatment quality summaries.
The canonical artifact field inventory is maintained in
.planning/artifact-schemas.md; release preflight validates the critical
inference, power, and donor fields with bounded fixture outputs.