Five classes of climate data go in. The framework applies the same operators to all of them.
1. CMIP6 ensemble output (cloud-streaming)
The framework streams CMIP6 model output directly from Google Cloud's Pangeo zarr archive — gs://cmip6/CMIP6/... — using the gcsfs + xarray + zarr stack. This avoids local downloads and gives access to the full multi-model ensemble.
Per observable, the framework requests:
- siconc (sea-ice concentration) — 21 models in SSP5-8.5; up to 12 in scenario fan
- msftmz (AMOC streamfunction) — 19 models in SSP5-8.5; 13 in SSP1-2.6, 14 in SSP2-4.5, 19 in SSP5-8.5
- npp (Net Primary Productivity, Amazon basin) — 25–26 models per SSP
- cSoil (high-latitude soil carbon, ≥55°N) — 6 models per SSP
- tas, hfds, rlut, rsdt (temperature, ocean heat flux, TOA radiation) — varies by experiment
Cadence: monthly aggregated to annual. Spatial: cosine-weighted area mean over the relevant region (Arctic Ocean for sea-ice, Atlantic basin for AMOC, Amazon polygon for NPP, ≥55°N land for permafrost).
2. Observational satellite + reanalysis records
- NSIDC Sea Ice Index daily — 1979–2025, 47 years (instance #17). Three derivative shadows: extent, area, mid-September snapshot.
- RAPID-MOCHA-WBTS array — AMOC overturning at 26.5°N, 2004–2024 monthly (n = 240, instance #29). DOI 10.5285/48d0bf43-0598-ceb2-e063-7086abc062f1.
- CERES EBAF Edition 4.2 TOA flux — 2000–2023 monthly (instance #18, energy-budget closure).
- NOAA NCEI 0–2000m Ocean Heat Content — 2005–2025, 5-yr running mean.
- IMBIE Greenland (2020) + Antarctica (2018) — cumulative ice-mass × \(L_{\text{fusion}}\), 1992–2017.
- HadCRUT5 + GISTEMP + NOAA GMST × atmospheric heat capacity — three independent atmosphere-temperature shadows for instance #18.
- NASA Ozone Watch TOMS+OMI+OMPS hole-minimum + zonal-mean (1979–2025), NASA SBUV V8.7 polar bands (1970–2023) — three ozone shadows for instance #25.
- HURDAT2 + IBTrACS — ~3000 tropical cyclones across 6 ocean basins (instance #30).
3. Anthropogenic CO₂ records
- Mauna Loa (Keeling, 1958–2024, 818 monthly rows) — modern record.
- South Pole flask (1975–2025) — independent shadow.
4. Paleoclimate ice cores and proxy stacks
- Vostok ice core (Petit et al. 1999, 312 yr BP – 414 ka) — paleoclimate CO₂.
- EPICA Dome C (Lüthi et al. 2008, 107 yr BP – 798 ka) — paleoclimate CO₂.
- NGRIP δ¹⁸O 50-yr resolution (Andersen et al. 2004, 4918 pts) — Greenland temperature proxy.
- GISP2 reconstructed temperature (Alley 2000, 1632 pts).
- GRIP δ¹⁸O (Johnsen et al., 5425 pts) — independent third Greenland shadow.
- NGRIP Ca²⁺ (Rasmussen et al. 2008, H1–H3 only) — high-latitude Heinrich proxy.
- DSDP609 IRD lithics (Bond et al. 1992) — North Atlantic Heinrich proxy.
- Hulu Cave δ¹⁸O (Wang et al. 2001) + Botuverá Cave δ¹⁸O (Wang et al. 2007) — tropical Heinrich proxies.
- LR04 benthic δ¹⁸O stack (Lisiecki–Raymo 2005) + ODP 1123 + ODP 982 — Mid-Pleistocene Transition.
5. AR6 SSP central trajectories (for forward projection)
The IPCC AR6 (Fox-Kemper et al. 2021) provides central trajectories for sea-level rise per SSP from 2020 to 2150. Used as the "what if mitigation succeeds at this level" branch in the GMSL scenario fan (scenarios/sea-level/).
What is excluded
The framework does not ingest:
- Climate-domain threshold constants (°C tipping windows from the Lenton et al. literature, ppm targets from IPCC) — would violate Law II (Domain Interiority).
- Sub-cascade-time data (weather forecasts, hourly station records) — too short for the framework's brake-extraction.
- Single-shadow data (one detector, one model run) where cross-shadow consensus is the framework's call — would be uninformative.
All raw inputs and cached intermediate files: domains/climate/data/.