Skip to content

Data Reference

Canonical reference for the data collection pipeline that gathers and validates input parameters for pharmaceutical forecasting models.

This document defines every input variable in the patient flow model and provides two fully worked disease examples — Oropharyngeal Cancer (solid tumor archetype) and Multiple Myeloma (hematology archetype). Use these as templates when onboarding new diseases into the forecasting system.

Purpose: This is the specification for an automated data pipeline that will collect, cross-reference, and validate disease parameters from external sources (WHO, SEER, clinical literature) before they enter the forecasting model. Each parameter table below defines what the pipeline must populate, its expected range, and its source.


Two disease archetypes determine how patients flow through therapy lines:

AspectSolid TumorHematology
type value"Solid Tumor" (case-sensitive)"Hematology" (case-sensitive)
Stage splitEarly vs. Metastatic (dual funnel)None (single funnel)
Line categories"early" + "metastatic""therapy"
Relapse ratesFeed patients between funnelsNot applicable
Stage mix UIEditableHidden

type is a hard requirement. A disease without a type row returns null from deriveDiseaseConfig and will not load in the forecasting app. Values must be exactly "Solid Tumor" or "Hematology" (case-sensitive, no alternatives).

Solid Tumor splits incidence into early and metastatic pools, each running an independent therapy cascade. Relapse rates allow early-stage patients to re-enter either pool.

Hematology routes all incidence into a single linear cascade (1L → 2L → 3L → …) with no stage segmentation. Internally, HEMATOLOGY_STAGE_MIX sets earlyStagePercent: 100 and all relapse rates to 0.


Scoped to geography + indication (line = null).

The forecasting app detects available geographies by scanning for reference rows whose name matches /^pop[A-Z]/. At minimum, each geography needs one popBase row:

VariableFormatDescription
popBaseMillions (e.g., 344.1)Population in millions. Active consumer: geo enumerator (via /^pop[A-Z]/ regex) and per-capita incidence fallback — when the cascade sources incidenceBase from another geo, the result is scaled by targetPopBase / sourcePopBase so patient counts reflect the selected geo’s population. See FORECASTING_MODEL.md “Per-Capita Incidence Fallback”.
popGrowthPercentage (e.g., 0.51 = 0.51%)Annual population growth. Present in API, not consumed by the frontend — reserved for future use.

Deprecated: popAcceleration is no longer needed. Growth rates are self-contained — all acceleration is baked into the growth rate values themselves.

VariableDefaultDescription
incidenceBaseFrom reference dataNew disease cases diagnosed per year in this geography. Entry point of the entire funnel.
incidenceGrowth0.5 (%)Annual growth rate to project incidence over the forecast horizon (PROJECTION_DATA_LENGTH = 21 entries: 1 pre-launch year + 20 post-launch years). Value is a percentage (e.g., 0.68 = 0.68%/year).

Deprecated: incidenceAcceleration is not used by the forecasting app (zero references in the codebase). Do not include it in data collection.

Projection formula (array of length PROJECTION_DATA_LENGTH = 21 — 1 pre-launch row + 20 post-launch years):

incidenceEvolution[n] = incidenceBase × (1 + growthRate / 100)^n
VariableDefaultDescription
healthcareAccess95 (%)% of patients who can access healthcare (insurance, proximity, awareness). Scenario-level: applied once at the top of the patient flow (multiplied into stage addressable before line-level math). Edited in the SummaryPanel.
treatmentRate100 (%)% of accessible patients who receive drug treatment.

treatmentRate is written to each therapy line during initialization as a per-line default; the line-level value is what drives computation (see Section 5). healthcareAccess is scenario-level only — there is no per-line HA storage.


3. Stage Mix Parameters (Solid Tumor Only)

Section titled “3. Stage Mix Parameters (Solid Tumor Only)”

Hematology bypasses these entirely via hardcoded HEMATOLOGY_STAGE_MIX.

VariableDefault (Solid Tumor)HematologyDescription
earlyStagePercentFrom reference data100 (hardcoded)% of newly diagnosed patients at early/localized stage.
metStagePercentFrom reference data0 (hardcoded)% at metastatic/advanced stage.
earlyToEarlyRelapseFrom reference data0% of early-stage patients who relapse and remain localized, re-entering the early pool.
earlyToMetRelapseFrom reference data0% of early-stage patients who relapse and progress to metastatic, adding to the met pool.
unknownStagePercentFrom reference data0Unknown staging. Child of Early Stage in the configuration UI, constraint solver, Tornado, and Monte Carlo — changing Unknown subtracts from Early (not Met). At forecasting time, the stage-mix normalizer (Section 8, Step 2) redistributes the remaining mass proportionally by dividing by earlyPct + metPct.
earlyStageLocalizedPercentFrom reference data (opt)N/ASub-breakdown of early stage. When both sub-breakdowns exist, earlyStagePercent = localized + locallyAdvanced.
earlyStageLocallyAdvancedPercentFrom reference data (opt)N/ASub-breakdown: locally advanced (stage III–IVB by default; III–IIIB for lung and breast).

Normalization: The model divides each percentage by earlyPct + metPct, redistributing unknown-stage patients proportionally without changing the early:met ratio.

Stage Hierarchy, Unknown Handling & Audit Patterns

Section titled “Stage Hierarchy, Unknown Handling & Audit Patterns”

Canonical reference for encoding stage-mix data during an audit — especially when the source reports Unknown / Unstaged patients.

Unknown is always a child of Early Stage — never Metastatic. earlyStagePercent can optionally split into earlyStageLocalizedPercent + earlyStageLocallyAdvancedPercent.

Two equivalent ways to encode Unknown:

  1. Audit-time fold — absorb Unknown into a sub-variable during extraction; unknownStagePercent stays empty.
  2. Runtime absorption — populate unknownStagePercent explicitly; the solver handles it.

Both are mathematically equivalent. Choice is about dataset consistency.

  • UI (SummaryPanel.tsx:237-377, SummaryCard.tsx:82-99) — Unknown nested under Early; edits subtract from earlyStagePercent.
  • Sub-variable scaling — Localized + LocallyAdvanced rescale proportionally when Unknown changes (mass conservation).
  • Constraint solver (src/features/forecasting/math/tornado.ts:92-96, src/features/forecasting/math/monte-carlo.ts:237-244) — earlyStagePercent = max(0, 100 − metStagePercent − unknownStagePercent).

Introduced in commit 40c0877.

When Localized + Locally Advanced sub-breakdowns exist, fold Unknown into earlyStageLocallyAdvancedPercent at audit time.

Example — Endometrial (SEER: Localized 67, Regional 18, Met 11, Unknown 4):

earlyStageLocalizedPercent = 67
earlyStageLocallyAdvancedPercent = 22 # 18 + 4 (folded)
earlyStagePercent = 89 # 67 + 22
metStagePercent = 11
unknownStagePercent = — # not populated

Why Locally Advanced? Unstaged patients behaviorally correlate with locally-advanced disease (harder to stage cleanly), not localized. Applied uniformly across OPC, HNSCC, SQNSCLC, CCA, ESCC audits.

Note on displayed range: The default range label is III–IVB. For lung and breast cancer indications, the UI displays III–IIIB instead, because IVB is treated as metastatic (not locally advanced) in those staging conventions. The override list lives in src/features/reports/forecasting/utils/locallyAdvancedRange.ts.

  • No sub-breakdown in source → fold Unknown into earlyStagePercent directly; sub-variables and unknownStagePercent stay empty.
  • Unknown >10–15% of incidence → use unknownStagePercent explicit; folding distorts sub-ratios.
  • Pre-convention audit being re-audited without fresh source data → keep unknownStagePercent explicit; flag for follow-up.
  • Incompatible staging (prostate Biochemical Recurrent, sarcoma grade, neuroendocrine grade, etc.) → remap to existing slots with clinical rationale, OR use unknownStagePercent as escape hatch, OR flag for data-model extension. Don’t force-fit.
  1. Clean Early vs Met split? No → consider type: Hematology or flag for design review.
  2. Source reports Unknown? No → encode Early + Met, done.
  3. Have Localized + LocallyAdvanced breakdown? Yes → OPC fold.
  4. No sub-breakdown, Unknown small? → fold into earlyStagePercent.
  5. Unknown substantial OR staging incompatible? → use unknownStagePercent explicit; document in the audit markdown.

Grade-based diseases, diseases with 3+ Early sub-stages, and molecular-subtype-driven models may need data-model extensions. Extend this section with a concrete worked example when such a disease is actually audited — don’t pre-encode hypotheticals.


Optional. Applied to first line only.

VariableDefaultDescription
activeBiomarkernullWhich biomarker definition is active.
biomarkerPercentageFrom reference data% of patients with this biomarker.
biomarkerTestingRate100 (%)% of patients tested for this biomarker.
biomarkerFactor = (percentage / 100) × (testingRate / 100)

When no biomarker is active, biomarkerFactor = 1.0.


Each therapy line (reference or custom) carries these variables.

VariableDefaultDescription
transitionRate100 (%)% of patients progressing from the previous line. Line 1 does not use this.
treatmentRate100 (%)Line-level treatment rate. Applied at the funnel entry of each stage group — early line and first metastatic line for solid tumors, 1L for hematology. Stored on every line but skipped in compute (factor = 1) for non-applicable lines. Custom lines participate in their effective stage group (stageCategory ?? "therapy"). Gated by isDTRApplicable() in src/features/forecasting/math/forecasting.ts.

Healthcare Access is not a per-line variable. It is a scenario-level value applied once at the top of the patient flow (see Section 2). To model line-specific access penalties, add a Custom Variable on the affected line.

Neoadjuvant/Adjuvant (Early-Stage Solid Tumor Only)

Section titled “Neoadjuvant/Adjuvant (Early-Stage Solid Tumor Only)”
VariableDefaultDescription
neoAdjuvantEnabledfalseGate for the neo/adj multiplier. When disabled, factor = 1.
neoAdjSetting"neo"Display label ("neo" / "adj" / "other"). Does not affect computation.
neoAdjPercent100 (%)When enabled: factor = neoAdjPercent / 100.

Up to 5 per line. Each acts as a multiplicative percentage filter:

customMultiplier = variables.reduce((acc, v) => acc × (v.value / 100), 1)

Any variable with changeable: true and startYear set evolves over time:

if year < startYear: baseValue
else: min(baseValue × (1 + min/100)^(year - startYear), max)

Note: min is the annual growth rate (%); max is the ceiling value.

CategoryUsed ByPool Source
"early"Solid tumor early-stage linesearlyAddressable
"metastatic"Solid tumor met-stage linesmetAddressable
"therapy"All hematology linestotalAddressable
"custom"User-created linesBased on stageCategory, or totalAddressable if unset
VariableDefaultDescription
retreatmentOption11 = subtract captured patients from pool before cascading. 0 = do not subtract. UI mapping: stored 1 renders as Switch OFF (“not eligible at subsequent lines”); stored 0 renders as Switch ON (“eligible at subsequent lines”).

Per-line parameters driving market share and revenue.

VariableDefaultDescription
launch"{selectedYear}-08-01"Drug launch date — first day of the stated month (e.g., 2025-08-01 means Aug 1, 2025; that calendar year contributes 5 months of revenue, Aug–Dec). Can be past (already-launched products) or future. Year offset indexes into uptake curve; month drives the partial-year WeightedShare formula.
launchOrder1Position among competitors (1–10). Indexes into LAUNCH_ORDERS lookup.
bestInClassfalseAdds BEST_IN_CLASS bonus to peak share.
delayVsCompetition0Quarters behind competition. Penalty if > 3: delay × 0.5.
classShare100 (%)Share of patients suitable for this therapy class.
peakShareComputedbaseShare + bicBonus - penalty, clamped 0–100.
customEffectivePeakSharenullUser override bypassing peakShare × classShare / 100.
speedToPeak"3 Year Slow"Uptake curve (34 options: 1–12 years × Slow/Medium/Fast).
monthsOfTherapy12Total time a typical patient stays on the drug (not protocol length, not treatment-cycle length). Affects cohort-based revenue spread — see Sales Calculations in FORECASTING_MODEL.md.
compliance85 (%)Patient compliance. Affects sales only, not patient flow.
events1 empty eventMarket events with impactPercent and startDate.

Product economics, scoped to geography + indication.

VariableDefaultDescription
yearOfFirstLaunchCurrent yearBase year for price calculations.
launchPricePerMonth6,000Monthly price at launch.
annualNetPriceChange2 (%)Annual price escalation. netPrice = launchPrice × (1 + change/100)^years.
marketExclusivityYears12Years of market exclusivity from first launch.
loeDatefirstLaunch + exclusivityLoss of exclusivity date. Erosion curves apply after this.
moleculeType"Biologic""Biologic" or "Small Molecule". Selects erosion curve post-LOE.

incidenceEvolution[year] = incidenceBase × (1 + growthRate / 100)^year
knownTotal = earlyPct + metPct
earlyIncidence = incidence × (earlyPct / knownTotal)
metIncidence = incidence × (metPct / knownTotal)

Unknown stage is redistributed proportionally via normalization (dividing by knownTotal instead of 100). Runtime normalization preserves the Unknown-subtracts-from-Early invariant enforced in the configuration UI and constraint solver (Section 3).

Step 3 — Relapse Augmentation (Solid Tumor only)

Section titled “Step 3 — Relapse Augmentation (Solid Tumor only)”
earlyAddressable = earlyIncidence + earlyIncidence × (earlyToEarlyRelapse / 100)
metAddressable = metIncidence + earlyIncidence × (earlyToMetRelapse / 100)
totalAddressable = earlyAddressable + metAddressable
Line CategoryAddressable Pool
"early"earlyAddressable
"metastatic"metAddressable
"therapy"totalAddressable
eligible = addressable × biomarkerFactor × (healthcareAccess / 100)
× neoAdjFactor × (treatmentRate / 100) × customMultiplier
newPatients = eligible × (marketShare / 100)
eligible = ((pool - prevNewPatients × retreatment) × transitionRate / 100)
× (treatmentRate / 100) × customMultiplier

treatmentRate is applied here as a special case.

eligible = ((pool - prevNewPatients × retreatment) × transitionRate / 100)
× customMultiplier

treatmentRate is not applied for lines 3L and beyond.

effectivePeakShare = customEffectivePeakShare ?? (peakShare × classShare / 100)
uptake = weightedUptake(speedToPeak, yearOffset, launchMonth)
erosion = erosionFactor(moleculeType, yearsSinceLOE)
marketShare = clamp(0, 100, uptake × effectivePeakShare / 100 × erosion)

9. Worked Example: Oropharyngeal Cancer (Solid Tumor)

Section titled “9. Worked Example: Oropharyngeal Cancer (Solid Tumor)”

Disease ID: 5040 | Type: Solid Tumor | Geography: USA

ParameterValueSource
incidenceBase15,004WHO (gco.iarc.fr), 2025
incidenceGrowth0.68%WHO (gco.iarc.fr), 2022

EU5: incidenceBase = 30,200, growth = 0.28%.

ParameterValueSource
earlyStagePercent88.7%SEER (n=7,248), 2021
earlyStageLocalizedPercent15.8%SEER
earlyStageLocallyAdvancedPercent72.9%SEER
metStagePercent11.2%SEER (n=7,248), 2021
ParameterUSAEU5Source
earlyToEarlyRelapse24.6%13.11%Frontiers in Oncology / PMC
earlyToMetRelapse26.37%14.48%Frontiers in Oncology / PMC
ParameterValueSource
biomarkerPercentage68%Carlander AF et al. (n=42,024)
biomarkerTestingRate93%Carlson KM et al. (n=64,845)
Computed biomarkerFactor0.63240.68 × 0.93

Early Stage:

LinetransitionRatetreatmentRateCategory
Early Stage100%47.92%early

Metastatic (USA):

LinetransitionRatetreatmentRateSource
Met 1L100%78.4%Cao C et al. (n=813)
Met 2L43.8%78.4%Lee DY et al. (n=2,577)
Met 3L43.8%78.4%Lee DY et al.
Met 4L43.8%78.4%Lee DY et al.

9.6 Patient Flow Walkthrough (Year 0, USA, no biomarker)

Section titled “9.6 Patient Flow Walkthrough (Year 0, USA, no biomarker)”
Incidence: 15,004
knownTotal: 88.7 + 11.2 = 99.9
── EARLY FUNNEL ──
earlyIncidence: 15,004 × (88.7 / 99.9) = 13,322
earlyRelapsed: 13,322 × 24.6% = 3,277
earlyAddressable: 13,322 + 3,277 = 16,599
Early 1L eligible: 16,599 × 95% × 47.92% = 7,557
(with biomarker): 16,599 × 0.6324 × 95% × 47.92% = 4,778
── METASTATIC FUNNEL ──
metIncidence: 15,004 × (11.2 / 99.9) = 1,682
metRelapsed: 13,322 × 26.37% = 3,513
metAddressable: 1,682 + 3,513 = 5,195
Met 1L eligible: 5,195 × 95% × 78.4% = 3,869
Met 2L eligible: (met1L_elig − met1L_new × retreatment) × 43.8% × 78.4%
Met 3L eligible: (met2L_elig − met2L_new × retreatment) × 43.8%
Met 4L eligible: (met3L_elig − met3L_new × retreatment) × 43.8%

Note: Met 2L applies treatmentRate (78.4%) as the special 2L case. Met 3L+ do not.

PositionNameMapping
0Definitive
1NeoadjuvantNeoadjuvant
2AdjuvantAdjuvant
3First Line (1L)1L
4Second Line and Beyond (2L+)2L+

10. Worked Example: Multiple Myeloma (Hematology)

Section titled “10. Worked Example: Multiple Myeloma (Hematology)”

Disease ID: 4841 | Type: Hematology | Geography: USA

ParameterValueSource
incidenceBase36,110SEER, 2025
incidenceGrowth1.3023%WHO GCO Cancer Tomorrow, 2022
healthcareAccess95%

Other geographies: EU5 = 29,618 (growth 0.9881%), Japan = 6,988 (growth 0.2744%).

Not applicable — HEMATOLOGY_STAGE_MIX applied automatically:

earlyStagePercent: 100 → all incidence enters therapy pool
metStagePercent: 0 → no metastatic split
earlyToEarlyRelapse: 0 → no relapse augmentation
earlyToMetRelapse: 0

Per-line eligibility split for hematology indications (AML / Multiple Myeloma / DLBCL / Follicular Lymphoma / Hodgkin Lymphoma — case-insensitive allowlist match). Mirrors the Neoadjuvant/Adjuvant pattern: one CSV reference row plus per-line instance fields.

Reference data (CSV)

nameScopeUnitNotes
transplantEligibleIndication-level (line empty)face-value % of incidenceDisease-specific. MM guidance: 45 (CIBMTR Utilization 2023 / NCCN Insights 2025). Other diseases require their own audit-sourced value.

Per-line instance fields (on UnifiedLineData, persisted as JSONB via PUT /api/forecasting/:id — schema-less):

FieldTypeDefaultMeaning
transplantSplitEnabledboolfalsePer-line toggle. When false, factor = 1.0 (no narrowing).
transplantSplitSetting"eligible" | "non-eligible" | "other""eligible"Which cohort this line forecasts.
transplantSplitPercentnumber (0–100)setting-aware (see below)The % applied.

Setting-aware defaults. When the toggle flips ON, the percent auto-fills based on the setting and the disease-level transplantEligible reference value E:

  • "eligible" → defaults to E (or 50 if no reference)
  • "non-eligible" → defaults to 100 − E (or 50 if no reference)
  • "other" → defaults to 100 (no smart default)

The user may override the auto-filled percent at any time. The factor applied to addressable is transplantSplitPercent / 100 when enabled, else 1.

UI. Inline block in LinePanel, gated on isTransplantAllowlisted(indication) && variable.id === "treatmentRate". Mirrors neoAdj layout: Switch + segmented control (Eligible | Non-eligible | Other) + percent row. The percent row hides the “Add to Monte Carlo” toggle (transplant variables are not exposed to MC/Tornado in this design).

No transplantReceiving, no transplantEnabled reference row, no factor-map atoms. The previous design’s per-line-receiving rows and 14-atom orchestration have been replaced by the per-line fields above. Disease audits should NOT include transplantReceiving rows.

LinetransitionRatetreatmentRateSource
1st Line100% (implicit entry)90% (DTR / 1L uptake)NCCN 2025; US RWE 2025 (per Supportive Data/Multiple Myeloma.docx)
2nd Line75%100%US RWE 2025 (PMC); ISPOR/Nexus 2023
3rd Line55%100%US RWE 2025 (PMC); ISPOR/Nexus 2023
4th Line42.5%100%US RWE 2025 (PMC); ISPOR/Nexus 2023

All lines have category "therapy". Xlsx ASCO 2016 alternatives (54% / 47% / 42% for 2L/3L/4L transitions, is_guidance=FALSE) are retained in new-statistics.csv for source-trail transparency.

10.5 Patient Flow Walkthrough (Year 0, USA)

Section titled “10.5 Patient Flow Walkthrough (Year 0, USA)”
Incidence: 36,110
── SINGLE THERAPY FUNNEL ──
totalAddressable: 36,110 (no stage split, no relapse)
1L eligible: 36,110 × 95% × 90% = 30,874
2L eligible: (30,874 − 1L_new × retreatment) × 75%
3L eligible: (2L_elig − 2L_new × retreatment) × 55%
4L eligible: (3L_elig − 3L_new × retreatment) × 42.5%

Note: 1L applies treatmentRate 90% as the MM DTR (1L uptake). 2L+ do not apply DTR — MM is hematology, so DTR is gated to the first therapy line only (isDTRApplicable in src/features/forecasting/math/forecasting.ts). The stored treatmentRate (100%) on 2L/3L/4L is unused by the computation.

PositionNameMapping
0Newly Diagnosed / First Line (1L)1L
1Second Line (2L)2L
2Third Line (3L)3L
3Fourth Line (4L)4L
4Fifth Line and Beyond (5L+)5L+

11. Multi-Source Reference Data & Guidance Cascade

Section titled “11. Multi-Source Reference Data & Guidance Cascade”

Reference rows are no longer one-per-variable. Multiple rows may exist for the same {geo, indication, line, name} — one canonical guidance row plus zero or more alternatives. YEAR is metadata on each row, not part of the uniqueness key.

Every reference row carries an isGuidance boolean (is_guidance in the API / CSV).

FieldTypeDescription
isGuidancebooleantrue marks the canonical row used by computations. Uniqueness invariant (“exactly one is_guidance = TRUE row per variable”) is enforced at the DB layer via a partial unique index on (geo, indication, line, name) WHERE is_guidance = TRUE AND deleted_at IS NULL in bioloupe-data-gov.
yearintegerPublication year metadata. Required on measurement rows (Thor auto-fills blank YEAR on those with Date.current.year); optional on label/enum rows where year has no meaning (type, displayName, category, biomarkerName — see ForecastingStatistic::YEAR_OPTIONAL_NAMES in bioloupe-data-gov/app/models/forecasting_statistic.rb).
sourcesSourceEntry[]Citation metadata attached to the row. Surfaces in InfoButton tooltips and DOCX report.

Backend contract: schema + selection cascade are enforced in the bioloupe-data-gov Rails backend (lib/tasks/forecasting_statistics.thor for imports; DB schema for invariants).

Zod validation: src/lib/schemas/statistics.ts (field is_guidance).

The frontend picks the row used for computation via selectGuidance() (at src/features/reports/forecasting/utils/selectGuidance.ts):

  1. Filter rows matching the requested {geo, indication, line, name}.
  2. Pick the single isGuidance = true row as guidance. If none exist, fall back to the newest-year row.
  3. Return the selected row plus non-selected rows as alternatives, sorted newest-year-first.

useReferenceSource(name, lineId?) wraps this cascade and returns { value, sources, year, alternatives }. Most patient-flow InfoButtons read from this hook. Biomarker InfoButtons are the exception — extractBiomarkers (src/features/reports/forecasting/atoms.ts) applies the cascade internally on each BiomarkerDef, so the geo-fallback chain (geo → USA → EU5 → Japan) can resolve a sourceGeo before selectGuidance runs. Biomarker sources/alternatives reach InfoButton via BiomarkerDef.{percentageSource, percentageYear, percentageAlternatives, testingRateSource, testingRateYear, testingRateAlternatives}.

Non-guidance rows appear in two places:

  • InfoButton tooltips (src/components/ui/info-button.tsx): an “Alternative values” section lists each alternative row with its value, year, and source citations.
  • DOCX report (src/features/reports/forecasting/reporting/collectReportData.ts): the report data collector routes each reference lookup through selectGuidance/useReferenceSource so the exported document always shows the same value the UI computes from.

Adding a newer source is a decision, not a passive update:

  • Promote the new source: set is_guidance=TRUE on the new row; the existing canonical must be demoted to is_guidance=FALSE in the same commit (Thor + DB enforce “exactly one TRUE per variable”).
  • Keep the current canonical: add the new row with is_guidance=FALSE. YEAR is the new source’s real publication year — no tuple alignment required.
  • The admin console’s conflict modal enforces this choice automatically when you try to save a second TRUE row for the same variable.

Mechanism: selectGuidance() at src/features/reports/forecasting/utils/selectGuidance.ts returns the single isGuidance=true row as guidance and sorts the rest newest-year-first for display.

See docs/DESIGN_SYSTEM.md:53,86-88 for UI-layer guidance on rendering alternatives in the InfoButton.


At runtime, the forecasting app fetches statistics from the bioloupe-data-gov Rails backend. The API response is validated by Zod (src/lib/schemas/statistics.ts) and consumed via useStatistics() (src/hooks/useStatistics.ts). Division by 100 happens at computation time; parseValue() auto-detects types from string values where applicable.

CSVs (new-statistics.csv and per-disease imports) are the seed / import format used to populate the backend. The canonical column contract is defined by the Thor importer at bioloupe-data-gov/lib/tasks/forecasting_statistics.thor and the statistics table schema. The rules below apply to the CSV → backend import path.

Year default on import. If a row’s year column is blank, the Thor importer auto-fills it with the current calendar year (Date.current.year). Explicit years are preferred — blank is only acceptable for active-year data.

All rates and percentages are stored as face-value numbers (0–100 scale), not decimals (0–1).

VariableFormatExampleWrong
incidenceBaseAbsolute count154270
incidenceGrowthPercentage0.68 = 0.68%/yr0.0068 (100× too small)
transitionRatePercentage43.8 = 43.8%0.438
treatmentRatePercentage78.4 = 78.4%0.784
healthcareAccessPercentage95 = 95%0.95
earlyStagePercentPercentage77 = 77%0.77
biomarkerPercentagePercentage68 = 68%0.68
transplantEligiblePercentage50 = 50% of cohort0.5
popBaseMillions344.1 = 344.1M
displayNameString"Metastatic Line 1"
typeEnum (case-sensitive)"Solid Tumor" or "Hematology""solid tumor"
is_guidanceBooleantrue (canonical) / false (alternative). Defaults to true.
yearInteger2025 (publication year metadata; blank allowed only on convention rows — Thor auto-fills Date.current.year on measurement rows).

Historical note: A previous statistics_corrected.csv divided growth rates by 100 (storing 0.0051 instead of 0.51), which caused near-zero growth in the forecasting model. That file has been deleted. The current new-statistics.csv — used for backend seeding only — uses face-value percentages.


ParameterPop-LevelLine-LevelSolid TumorHematology
incidenceBaseyesyesyes
incidenceGrowthyesyesyes
earlyStagePercentyesyeshardcoded 100
metStagePercentyesyeshardcoded 0
unknownStagePercentyesyes (child of Early)hardcoded 0
earlyToEarlyRelapseyesyeshardcoded 0
earlyToMetRelapseyesyeshardcoded 0
healthcareAccessyesyesyes
treatmentRatedefaultoverride (1L, 2L only)yesyes
transitionRateyesyesyes
biomarkerPercentageyesyesyes
transplantEligibleyesyes (AML/MM/DLBCL/FL/HL)
retreatmentOptionyesyesyes
neoAdjuvantEnabledyesearly only
customVariablesyesyesyes
Market paramsyesyesyes
Assumption paramsyesyesyes