Data Reference
Disease Data Reference
Section titled “Disease Data Reference”Canonical reference for the data collection pipeline that gathers and validates input parameters for pharmaceutical forecasting models.
This document defines every input variable in the patient flow model and provides two fully worked disease examples — Oropharyngeal Cancer (solid tumor archetype) and Multiple Myeloma (hematology archetype). Use these as templates when onboarding new diseases into the forecasting system.
Purpose: This is the specification for an automated data pipeline that will collect, cross-reference, and validate disease parameters from external sources (WHO, SEER, clinical literature) before they enter the forecasting model. Each parameter table below defines what the pipeline must populate, its expected range, and its source.
1. Architecture Overview
Section titled “1. Architecture Overview”Two disease archetypes determine how patients flow through therapy lines:
| Aspect | Solid Tumor | Hematology |
|---|---|---|
type value | "Solid Tumor" (case-sensitive) | "Hematology" (case-sensitive) |
| Stage split | Early vs. Metastatic (dual funnel) | None (single funnel) |
| Line categories | "early" + "metastatic" | "therapy" |
| Relapse rates | Feed patients between funnels | Not applicable |
| Stage mix UI | Editable | Hidden |
typeis a hard requirement. A disease without atyperow returnsnullfromderiveDiseaseConfigand will not load in the forecasting app. Values must be exactly"Solid Tumor"or"Hematology"(case-sensitive, no alternatives).
Solid Tumor splits incidence into early and metastatic pools, each running an independent therapy cascade. Relapse rates allow early-stage patients to re-enter either pool.
Hematology routes all incidence into a single linear cascade (1L → 2L → 3L → …) with no stage segmentation. Internally, HEMATOLOGY_STAGE_MIX sets earlyStagePercent: 100 and all relapse rates to 0.
2. Population-Level Parameters
Section titled “2. Population-Level Parameters”Scoped to geography + indication (line = null).
Geography Detection (Population Rows)
Section titled “Geography Detection (Population Rows)”The forecasting app detects available geographies by scanning for reference rows whose name matches /^pop[A-Z]/. At minimum, each geography needs one popBase row:
| Variable | Format | Description |
|---|---|---|
popBase | Millions (e.g., 344.1) | Population in millions. Active consumer: geo enumerator (via /^pop[A-Z]/ regex) and per-capita incidence fallback — when the cascade sources incidenceBase from another geo, the result is scaled by targetPopBase / sourcePopBase so patient counts reflect the selected geo’s population. See FORECASTING_MODEL.md “Per-Capita Incidence Fallback”. |
popGrowth | Percentage (e.g., 0.51 = 0.51%) | Annual population growth. Present in API, not consumed by the frontend — reserved for future use. |
Deprecated:
popAccelerationis no longer needed. Growth rates are self-contained — all acceleration is baked into the growth rate values themselves.
Incidence
Section titled “Incidence”| Variable | Default | Description |
|---|---|---|
incidenceBase | From reference data | New disease cases diagnosed per year in this geography. Entry point of the entire funnel. |
incidenceGrowth | 0.5 (%) | Annual growth rate to project incidence over the forecast horizon (PROJECTION_DATA_LENGTH = 21 entries: 1 pre-launch year + 20 post-launch years). Value is a percentage (e.g., 0.68 = 0.68%/year). |
Deprecated:
incidenceAccelerationis not used by the forecasting app (zero references in the codebase). Do not include it in data collection.
Projection formula (array of length PROJECTION_DATA_LENGTH = 21 — 1 pre-launch row + 20 post-launch years):
incidenceEvolution[n] = incidenceBase × (1 + growthRate / 100)^nHealthcare Access & Treatment Rate
Section titled “Healthcare Access & Treatment Rate”| Variable | Default | Description |
|---|---|---|
healthcareAccess | 95 (%) | % of patients who can access healthcare (insurance, proximity, awareness). Scenario-level: applied once at the top of the patient flow (multiplied into stage addressable before line-level math). Edited in the SummaryPanel. |
treatmentRate | 100 (%) | % of accessible patients who receive drug treatment. |
treatmentRate is written to each therapy line during initialization as a per-line default; the line-level value is what drives computation (see Section 5). healthcareAccess is scenario-level only — there is no per-line HA storage.
3. Stage Mix Parameters (Solid Tumor Only)
Section titled “3. Stage Mix Parameters (Solid Tumor Only)”Hematology bypasses these entirely via hardcoded HEMATOLOGY_STAGE_MIX.
| Variable | Default (Solid Tumor) | Hematology | Description |
|---|---|---|---|
earlyStagePercent | From reference data | 100 (hardcoded) | % of newly diagnosed patients at early/localized stage. |
metStagePercent | From reference data | 0 (hardcoded) | % at metastatic/advanced stage. |
earlyToEarlyRelapse | From reference data | 0 | % of early-stage patients who relapse and remain localized, re-entering the early pool. |
earlyToMetRelapse | From reference data | 0 | % of early-stage patients who relapse and progress to metastatic, adding to the met pool. |
unknownStagePercent | From reference data | 0 | Unknown staging. Child of Early Stage in the configuration UI, constraint solver, Tornado, and Monte Carlo — changing Unknown subtracts from Early (not Met). At forecasting time, the stage-mix normalizer (Section 8, Step 2) redistributes the remaining mass proportionally by dividing by earlyPct + metPct. |
earlyStageLocalizedPercent | From reference data (opt) | N/A | Sub-breakdown of early stage. When both sub-breakdowns exist, earlyStagePercent = localized + locallyAdvanced. |
earlyStageLocallyAdvancedPercent | From reference data (opt) | N/A | Sub-breakdown: locally advanced (stage III–IVB by default; III–IIIB for lung and breast). |
Normalization: The model divides each percentage by earlyPct + metPct, redistributing unknown-stage patients proportionally without changing the early:met ratio.
Stage Hierarchy, Unknown Handling & Audit Patterns
Section titled “Stage Hierarchy, Unknown Handling & Audit Patterns”Canonical reference for encoding stage-mix data during an audit — especially when the source reports Unknown / Unstaged patients.
Conceptual Model
Section titled “Conceptual Model”Unknown is always a child of Early Stage — never Metastatic. earlyStagePercent can optionally split into earlyStageLocalizedPercent + earlyStageLocallyAdvancedPercent.
Two equivalent ways to encode Unknown:
- Audit-time fold — absorb Unknown into a sub-variable during extraction;
unknownStagePercentstays empty. - Runtime absorption — populate
unknownStagePercentexplicitly; the solver handles it.
Both are mathematically equivalent. Choice is about dataset consistency.
Runtime Behavior
Section titled “Runtime Behavior”- UI (
SummaryPanel.tsx:237-377,SummaryCard.tsx:82-99) — Unknown nested under Early; edits subtract fromearlyStagePercent. - Sub-variable scaling — Localized + LocallyAdvanced rescale proportionally when Unknown changes (mass conservation).
- Constraint solver (
src/features/forecasting/math/tornado.ts:92-96,src/features/forecasting/math/monte-carlo.ts:237-244) —earlyStagePercent = max(0, 100 − metStagePercent − unknownStagePercent).
Introduced in commit 40c0877.
Preferred Pattern — OPC-Style Fold
Section titled “Preferred Pattern — OPC-Style Fold”When Localized + Locally Advanced sub-breakdowns exist, fold Unknown into earlyStageLocallyAdvancedPercent at audit time.
Example — Endometrial (SEER: Localized 67, Regional 18, Met 11, Unknown 4):
earlyStageLocalizedPercent = 67earlyStageLocallyAdvancedPercent = 22 # 18 + 4 (folded)earlyStagePercent = 89 # 67 + 22metStagePercent = 11unknownStagePercent = — # not populatedWhy Locally Advanced? Unstaged patients behaviorally correlate with locally-advanced disease (harder to stage cleanly), not localized. Applied uniformly across OPC, HNSCC, SQNSCLC, CCA, ESCC audits.
Note on displayed range: The default range label is III–IVB. For lung and breast cancer indications, the UI displays III–IIIB instead, because IVB is treated as metastatic (not locally advanced) in those staging conventions. The override list lives in src/features/reports/forecasting/utils/locallyAdvancedRange.ts.
Fallback Patterns
Section titled “Fallback Patterns”- No sub-breakdown in source → fold Unknown into
earlyStagePercentdirectly; sub-variables andunknownStagePercentstay empty. - Unknown >10–15% of incidence → use
unknownStagePercentexplicit; folding distorts sub-ratios. - Pre-convention audit being re-audited without fresh source data → keep
unknownStagePercentexplicit; flag for follow-up. - Incompatible staging (prostate Biochemical Recurrent, sarcoma grade, neuroendocrine grade, etc.) → remap to existing slots with clinical rationale, OR use
unknownStagePercentas escape hatch, OR flag for data-model extension. Don’t force-fit.
Decision Rule for New Audits
Section titled “Decision Rule for New Audits”- Clean Early vs Met split? No → consider
type: Hematologyor flag for design review. - Source reports Unknown? No → encode Early + Met, done.
- Have Localized + LocallyAdvanced breakdown? Yes → OPC fold.
- No sub-breakdown, Unknown small? → fold into
earlyStagePercent. - Unknown substantial OR staging incompatible? → use
unknownStagePercentexplicit; document in the audit markdown.
Future Extensions
Section titled “Future Extensions”Grade-based diseases, diseases with 3+ Early sub-stages, and molecular-subtype-driven models may need data-model extensions. Extend this section with a concrete worked example when such a disease is actually audited — don’t pre-encode hypotheticals.
4. Biomarker Parameters
Section titled “4. Biomarker Parameters”Optional. Applied to first line only.
| Variable | Default | Description |
|---|---|---|
activeBiomarker | null | Which biomarker definition is active. |
biomarkerPercentage | From reference data | % of patients with this biomarker. |
biomarkerTestingRate | 100 (%) | % of patients tested for this biomarker. |
biomarkerFactor = (percentage / 100) × (testingRate / 100)When no biomarker is active, biomarkerFactor = 1.0.
5. Line-Level Parameters
Section titled “5. Line-Level Parameters”Each therapy line (reference or custom) carries these variables.
Built-In Variables (per line)
Section titled “Built-In Variables (per line)”| Variable | Default | Description |
|---|---|---|
transitionRate | 100 (%) | % of patients progressing from the previous line. Line 1 does not use this. |
treatmentRate | 100 (%) | Line-level treatment rate. Applied at the funnel entry of each stage group — early line and first metastatic line for solid tumors, 1L for hematology. Stored on every line but skipped in compute (factor = 1) for non-applicable lines. Custom lines participate in their effective stage group (stageCategory ?? "therapy"). Gated by isDTRApplicable() in src/features/forecasting/math/forecasting.ts. |
Healthcare Access is not a per-line variable. It is a scenario-level value applied once at the top of the patient flow (see Section 2). To model line-specific access penalties, add a Custom Variable on the affected line.
Neoadjuvant/Adjuvant (Early-Stage Solid Tumor Only)
Section titled “Neoadjuvant/Adjuvant (Early-Stage Solid Tumor Only)”| Variable | Default | Description |
|---|---|---|
neoAdjuvantEnabled | false | Gate for the neo/adj multiplier. When disabled, factor = 1. |
neoAdjSetting | "neo" | Display label ("neo" / "adj" / "other"). Does not affect computation. |
neoAdjPercent | 100 (%) | When enabled: factor = neoAdjPercent / 100. |
Custom Variables (User-Created)
Section titled “Custom Variables (User-Created)”Up to 5 per line. Each acts as a multiplicative percentage filter:
customMultiplier = variables.reduce((acc, v) => acc × (v.value / 100), 1)Time-Varying (Changeable) Variables
Section titled “Time-Varying (Changeable) Variables”Any variable with changeable: true and startYear set evolves over time:
if year < startYear: baseValueelse: min(baseValue × (1 + min/100)^(year - startYear), max)Note: min is the annual growth rate (%); max is the ceiling value.
Line Categorization
Section titled “Line Categorization”| Category | Used By | Pool Source |
|---|---|---|
"early" | Solid tumor early-stage lines | earlyAddressable |
"metastatic" | Solid tumor met-stage lines | metAddressable |
"therapy" | All hematology lines | totalAddressable |
"custom" | User-created lines | Based on stageCategory, or totalAddressable if unset |
Retreatment Option
Section titled “Retreatment Option”| Variable | Default | Description |
|---|---|---|
retreatmentOption | 1 | 1 = subtract captured patients from pool before cascading. 0 = do not subtract. UI mapping: stored 1 renders as Switch OFF (“not eligible at subsequent lines”); stored 0 renders as Switch ON (“eligible at subsequent lines”). |
6. Market & Commercial Parameters
Section titled “6. Market & Commercial Parameters”Per-line parameters driving market share and revenue.
| Variable | Default | Description |
|---|---|---|
launch | "{selectedYear}-08-01" | Drug launch date — first day of the stated month (e.g., 2025-08-01 means Aug 1, 2025; that calendar year contributes 5 months of revenue, Aug–Dec). Can be past (already-launched products) or future. Year offset indexes into uptake curve; month drives the partial-year WeightedShare formula. |
launchOrder | 1 | Position among competitors (1–10). Indexes into LAUNCH_ORDERS lookup. |
bestInClass | false | Adds BEST_IN_CLASS bonus to peak share. |
delayVsCompetition | 0 | Quarters behind competition. Penalty if > 3: delay × 0.5. |
classShare | 100 (%) | Share of patients suitable for this therapy class. |
peakShare | Computed | baseShare + bicBonus - penalty, clamped 0–100. |
customEffectivePeakShare | null | User override bypassing peakShare × classShare / 100. |
speedToPeak | "3 Year Slow" | Uptake curve (34 options: 1–12 years × Slow/Medium/Fast). |
monthsOfTherapy | 12 | Total time a typical patient stays on the drug (not protocol length, not treatment-cycle length). Affects cohort-based revenue spread — see Sales Calculations in FORECASTING_MODEL.md. |
compliance | 85 (%) | Patient compliance. Affects sales only, not patient flow. |
events | 1 empty event | Market events with impactPercent and startDate. |
7. Assumption Parameters
Section titled “7. Assumption Parameters”Product economics, scoped to geography + indication.
| Variable | Default | Description |
|---|---|---|
yearOfFirstLaunch | Current year | Base year for price calculations. |
launchPricePerMonth | 6,000 | Monthly price at launch. |
annualNetPriceChange | 2 (%) | Annual price escalation. netPrice = launchPrice × (1 + change/100)^years. |
marketExclusivityYears | 12 | Years of market exclusivity from first launch. |
loeDate | firstLaunch + exclusivity | Loss of exclusivity date. Erosion curves apply after this. |
moleculeType | "Biologic" | "Biologic" or "Small Molecule". Selects erosion curve post-LOE. |
8. Computation Pipeline
Section titled “8. Computation Pipeline”Step 1 — Incidence Projection
Section titled “Step 1 — Incidence Projection”incidenceEvolution[year] = incidenceBase × (1 + growthRate / 100)^yearStep 2 — Stage Split (Solid Tumor only)
Section titled “Step 2 — Stage Split (Solid Tumor only)”knownTotal = earlyPct + metPctearlyIncidence = incidence × (earlyPct / knownTotal)metIncidence = incidence × (metPct / knownTotal)Unknown stage is redistributed proportionally via normalization (dividing by knownTotal instead of 100). Runtime normalization preserves the Unknown-subtracts-from-Early invariant enforced in the configuration UI and constraint solver (Section 3).
Step 3 — Relapse Augmentation (Solid Tumor only)
Section titled “Step 3 — Relapse Augmentation (Solid Tumor only)”earlyAddressable = earlyIncidence + earlyIncidence × (earlyToEarlyRelapse / 100)metAddressable = metIncidence + earlyIncidence × (earlyToMetRelapse / 100)totalAddressable = earlyAddressable + metAddressableStep 4 — Pool Assignment
Section titled “Step 4 — Pool Assignment”| Line Category | Addressable Pool |
|---|---|
"early" | earlyAddressable |
"metastatic" | metAddressable |
"therapy" | totalAddressable |
Step 5 — First Line (1L) Eligible
Section titled “Step 5 — First Line (1L) Eligible”eligible = addressable × biomarkerFactor × (healthcareAccess / 100) × neoAdjFactor × (treatmentRate / 100) × customMultiplier
newPatients = eligible × (marketShare / 100)Step 6 — Second Line (2L)
Section titled “Step 6 — Second Line (2L)”eligible = ((pool - prevNewPatients × retreatment) × transitionRate / 100) × (treatmentRate / 100) × customMultipliertreatmentRate is applied here as a special case.
Step 7 — Lines 3L+ (Cascade)
Section titled “Step 7 — Lines 3L+ (Cascade)”eligible = ((pool - prevNewPatients × retreatment) × transitionRate / 100) × customMultipliertreatmentRate is not applied for lines 3L and beyond.
Step 8 — Market Share
Section titled “Step 8 — Market Share”effectivePeakShare = customEffectivePeakShare ?? (peakShare × classShare / 100)uptake = weightedUptake(speedToPeak, yearOffset, launchMonth)erosion = erosionFactor(moleculeType, yearsSinceLOE)marketShare = clamp(0, 100, uptake × effectivePeakShare / 100 × erosion)9. Worked Example: Oropharyngeal Cancer (Solid Tumor)
Section titled “9. Worked Example: Oropharyngeal Cancer (Solid Tumor)”Disease ID: 5040 | Type: Solid Tumor | Geography: USA
9.1 Incidence
Section titled “9.1 Incidence”| Parameter | Value | Source |
|---|---|---|
incidenceBase | 15,004 | WHO (gco.iarc.fr), 2025 |
incidenceGrowth | 0.68% | WHO (gco.iarc.fr), 2022 |
EU5: incidenceBase = 30,200, growth = 0.28%.
9.2 Stage Mix
Section titled “9.2 Stage Mix”| Parameter | Value | Source |
|---|---|---|
earlyStagePercent | 88.7% | SEER (n=7,248), 2021 |
↳ earlyStageLocalizedPercent | 15.8% | SEER |
↳ earlyStageLocallyAdvancedPercent | 72.9% | SEER |
metStagePercent | 11.2% | SEER (n=7,248), 2021 |
9.3 Relapse Rates
Section titled “9.3 Relapse Rates”| Parameter | USA | EU5 | Source |
|---|---|---|---|
earlyToEarlyRelapse | 24.6% | 13.11% | Frontiers in Oncology / PMC |
earlyToMetRelapse | 26.37% | 14.48% | Frontiers in Oncology / PMC |
9.4 Biomarker: HPV Status
Section titled “9.4 Biomarker: HPV Status”| Parameter | Value | Source |
|---|---|---|
biomarkerPercentage | 68% | Carlander AF et al. (n=42,024) |
biomarkerTestingRate | 93% | Carlson KM et al. (n=64,845) |
Computed biomarkerFactor | 0.6324 | 0.68 × 0.93 |
9.5 Therapy Lines
Section titled “9.5 Therapy Lines”Early Stage:
| Line | transitionRate | treatmentRate | Category |
|---|---|---|---|
| Early Stage | 100% | 47.92% | early |
Metastatic (USA):
| Line | transitionRate | treatmentRate | Source |
|---|---|---|---|
| Met 1L | 100% | 78.4% | Cao C et al. (n=813) |
| Met 2L | 43.8% | 78.4% | Lee DY et al. (n=2,577) |
| Met 3L | 43.8% | 78.4% | Lee DY et al. |
| Met 4L | 43.8% | 78.4% | Lee DY et al. |
9.6 Patient Flow Walkthrough (Year 0, USA, no biomarker)
Section titled “9.6 Patient Flow Walkthrough (Year 0, USA, no biomarker)”Incidence: 15,004knownTotal: 88.7 + 11.2 = 99.9
── EARLY FUNNEL ──earlyIncidence: 15,004 × (88.7 / 99.9) = 13,322earlyRelapsed: 13,322 × 24.6% = 3,277earlyAddressable: 13,322 + 3,277 = 16,599
Early 1L eligible: 16,599 × 95% × 47.92% = 7,557 (with biomarker): 16,599 × 0.6324 × 95% × 47.92% = 4,778
── METASTATIC FUNNEL ──metIncidence: 15,004 × (11.2 / 99.9) = 1,682metRelapsed: 13,322 × 26.37% = 3,513metAddressable: 1,682 + 3,513 = 5,195
Met 1L eligible: 5,195 × 95% × 78.4% = 3,869Met 2L eligible: (met1L_elig − met1L_new × retreatment) × 43.8% × 78.4%Met 3L eligible: (met2L_elig − met2L_new × retreatment) × 43.8%Met 4L eligible: (met3L_elig − met3L_new × retreatment) × 43.8%Note: Met 2L applies treatmentRate (78.4%) as the special 2L case. Met 3L+ do not.
9.7 Treatment Line Definitions
Section titled “9.7 Treatment Line Definitions”| Position | Name | Mapping |
|---|---|---|
| 0 | Definitive | — |
| 1 | Neoadjuvant | Neoadjuvant |
| 2 | Adjuvant | Adjuvant |
| 3 | First Line (1L) | 1L |
| 4 | Second Line and Beyond (2L+) | 2L+ |
10. Worked Example: Multiple Myeloma (Hematology)
Section titled “10. Worked Example: Multiple Myeloma (Hematology)”Disease ID: 4841 | Type: Hematology | Geography: USA
10.1 Incidence
Section titled “10.1 Incidence”| Parameter | Value | Source |
|---|---|---|
incidenceBase | 36,110 | SEER, 2025 |
incidenceGrowth | 1.3023% | WHO GCO Cancer Tomorrow, 2022 |
healthcareAccess | 95% | — |
Other geographies: EU5 = 29,618 (growth 0.9881%), Japan = 6,988 (growth 0.2744%).
10.2 Stage Mix
Section titled “10.2 Stage Mix”Not applicable — HEMATOLOGY_STAGE_MIX applied automatically:
earlyStagePercent: 100 → all incidence enters therapy poolmetStagePercent: 0 → no metastatic splitearlyToEarlyRelapse: 0 → no relapse augmentationearlyToMetRelapse: 010.3 Transplant Split
Section titled “10.3 Transplant Split”Per-line eligibility split for hematology indications (AML / Multiple Myeloma / DLBCL / Follicular Lymphoma / Hodgkin Lymphoma — case-insensitive allowlist match). Mirrors the Neoadjuvant/Adjuvant pattern: one CSV reference row plus per-line instance fields.
Reference data (CSV)
name | Scope | Unit | Notes |
|---|---|---|---|
transplantEligible | Indication-level (line empty) | face-value % of incidence | Disease-specific. MM guidance: 45 (CIBMTR Utilization 2023 / NCCN Insights 2025). Other diseases require their own audit-sourced value. |
Per-line instance fields (on UnifiedLineData, persisted as JSONB via PUT /api/forecasting/:id — schema-less):
| Field | Type | Default | Meaning |
|---|---|---|---|
transplantSplitEnabled | bool | false | Per-line toggle. When false, factor = 1.0 (no narrowing). |
transplantSplitSetting | "eligible" | "non-eligible" | "other" | "eligible" | Which cohort this line forecasts. |
transplantSplitPercent | number (0–100) | setting-aware (see below) | The % applied. |
Setting-aware defaults. When the toggle flips ON, the percent auto-fills based on the setting and the disease-level transplantEligible reference value E:
"eligible"→ defaults toE(or 50 if no reference)"non-eligible"→ defaults to100 − E(or 50 if no reference)"other"→ defaults to 100 (no smart default)
The user may override the auto-filled percent at any time. The factor applied to addressable is transplantSplitPercent / 100 when enabled, else 1.
UI. Inline block in LinePanel, gated on isTransplantAllowlisted(indication) && variable.id === "treatmentRate". Mirrors neoAdj layout: Switch + segmented control (Eligible | Non-eligible | Other) + percent row. The percent row hides the “Add to Monte Carlo” toggle (transplant variables are not exposed to MC/Tornado in this design).
No transplantReceiving, no transplantEnabled reference row, no factor-map atoms. The previous design’s per-line-receiving rows and 14-atom orchestration have been replaced by the per-line fields above. Disease audits should NOT include transplantReceiving rows.
10.4 Therapy Lines (USA)
Section titled “10.4 Therapy Lines (USA)”| Line | transitionRate | treatmentRate | Source |
|---|---|---|---|
| 1st Line | 100% (implicit entry) | 90% (DTR / 1L uptake) | NCCN 2025; US RWE 2025 (per Supportive Data/Multiple Myeloma.docx) |
| 2nd Line | 75% | 100% | US RWE 2025 (PMC); ISPOR/Nexus 2023 |
| 3rd Line | 55% | 100% | US RWE 2025 (PMC); ISPOR/Nexus 2023 |
| 4th Line | 42.5% | 100% | US RWE 2025 (PMC); ISPOR/Nexus 2023 |
All lines have category "therapy". Xlsx ASCO 2016 alternatives (54% / 47% / 42% for 2L/3L/4L transitions, is_guidance=FALSE) are retained in new-statistics.csv for source-trail transparency.
10.5 Patient Flow Walkthrough (Year 0, USA)
Section titled “10.5 Patient Flow Walkthrough (Year 0, USA)”Incidence: 36,110
── SINGLE THERAPY FUNNEL ──totalAddressable: 36,110 (no stage split, no relapse)
1L eligible: 36,110 × 95% × 90% = 30,8742L eligible: (30,874 − 1L_new × retreatment) × 75%3L eligible: (2L_elig − 2L_new × retreatment) × 55%4L eligible: (3L_elig − 3L_new × retreatment) × 42.5%Note: 1L applies treatmentRate 90% as the MM DTR (1L uptake). 2L+ do not apply DTR — MM is hematology, so DTR is gated to the first therapy line only (isDTRApplicable in src/features/forecasting/math/forecasting.ts). The stored treatmentRate (100%) on 2L/3L/4L is unused by the computation.
10.6 Treatment Line Definitions
Section titled “10.6 Treatment Line Definitions”| Position | Name | Mapping |
|---|---|---|
| 0 | Newly Diagnosed / First Line (1L) | 1L |
| 1 | Second Line (2L) | 2L |
| 2 | Third Line (3L) | 3L |
| 3 | Fourth Line (4L) | 4L |
| 4 | Fifth Line and Beyond (5L+) | 5L+ |
11. Multi-Source Reference Data & Guidance Cascade
Section titled “11. Multi-Source Reference Data & Guidance Cascade”Reference rows are no longer one-per-variable. Multiple rows may exist for the same {geo, indication, line, name} — one canonical guidance row plus zero or more alternatives. YEAR is metadata on each row, not part of the uniqueness key.
Data Model
Section titled “Data Model”Every reference row carries an isGuidance boolean (is_guidance in the API / CSV).
| Field | Type | Description |
|---|---|---|
isGuidance | boolean | true marks the canonical row used by computations. Uniqueness invariant (“exactly one is_guidance = TRUE row per variable”) is enforced at the DB layer via a partial unique index on (geo, indication, line, name) WHERE is_guidance = TRUE AND deleted_at IS NULL in bioloupe-data-gov. |
year | integer | Publication year metadata. Required on measurement rows (Thor auto-fills blank YEAR on those with Date.current.year); optional on label/enum rows where year has no meaning (type, displayName, category, biomarkerName — see ForecastingStatistic::YEAR_OPTIONAL_NAMES in bioloupe-data-gov/app/models/forecasting_statistic.rb). |
sources | SourceEntry[] | Citation metadata attached to the row. Surfaces in InfoButton tooltips and DOCX report. |
Backend contract: schema + selection cascade are enforced in the bioloupe-data-gov Rails backend (lib/tasks/forecasting_statistics.thor for imports; DB schema for invariants).
Zod validation: src/lib/schemas/statistics.ts (field is_guidance).
Selection Cascade
Section titled “Selection Cascade”The frontend picks the row used for computation via selectGuidance() (at src/features/reports/forecasting/utils/selectGuidance.ts):
- Filter rows matching the requested
{geo, indication, line, name}. - Pick the single
isGuidance = truerow as guidance. If none exist, fall back to the newest-year row. - Return the selected row plus non-selected rows as
alternatives, sorted newest-year-first.
useReferenceSource(name, lineId?) wraps this cascade and returns { value, sources, year, alternatives }. Most patient-flow InfoButtons read from this hook. Biomarker InfoButtons are the exception — extractBiomarkers (src/features/reports/forecasting/atoms.ts) applies the cascade internally on each BiomarkerDef, so the geo-fallback chain (geo → USA → EU5 → Japan) can resolve a sourceGeo before selectGuidance runs. Biomarker sources/alternatives reach InfoButton via BiomarkerDef.{percentageSource, percentageYear, percentageAlternatives, testingRateSource, testingRateYear, testingRateAlternatives}.
Alternatives Surfacing
Section titled “Alternatives Surfacing”Non-guidance rows appear in two places:
InfoButtontooltips (src/components/ui/info-button.tsx): an “Alternative values” section lists each alternative row with its value, year, and source citations.- DOCX report (
src/features/reports/forecasting/reporting/collectReportData.ts): the report data collector routes each reference lookup throughselectGuidance/useReferenceSourceso the exported document always shows the same value the UI computes from.
Newer Source Arrives
Section titled “Newer Source Arrives”Adding a newer source is a decision, not a passive update:
- Promote the new source: set
is_guidance=TRUEon the new row; the existing canonical must be demoted tois_guidance=FALSEin the same commit (Thor + DB enforce “exactly one TRUE per variable”). - Keep the current canonical: add the new row with
is_guidance=FALSE. YEAR is the new source’s real publication year — no tuple alignment required. - The admin console’s conflict modal enforces this choice automatically when you try to save a second TRUE row for the same variable.
Mechanism: selectGuidance() at src/features/reports/forecasting/utils/selectGuidance.ts returns the single isGuidance=true row as guidance and sorts the rest newest-year-first for display.
Design System Reference
Section titled “Design System Reference”See docs/DESIGN_SYSTEM.md:53,86-88 for UI-layer guidance on rendering alternatives in the InfoButton.
12. Data Pipeline & Import Format
Section titled “12. Data Pipeline & Import Format”At runtime, the forecasting app fetches statistics from the bioloupe-data-gov Rails backend. The API response is validated by Zod (src/lib/schemas/statistics.ts) and consumed via useStatistics() (src/hooks/useStatistics.ts). Division by 100 happens at computation time; parseValue() auto-detects types from string values where applicable.
CSVs (new-statistics.csv and per-disease imports) are the seed / import format used to populate the backend. The canonical column contract is defined by the Thor importer at bioloupe-data-gov/lib/tasks/forecasting_statistics.thor and the statistics table schema. The rules below apply to the CSV → backend import path.
Year default on import. If a row’s year column is blank, the Thor importer auto-fills it with the current calendar year (Date.current.year). Explicit years are preferred — blank is only acceptable for active-year data.
All rates and percentages are stored as face-value numbers (0–100 scale), not decimals (0–1).
| Variable | Format | Example | Wrong |
|---|---|---|---|
incidenceBase | Absolute count | 154270 | — |
incidenceGrowth | Percentage | 0.68 = 0.68%/yr | 0.0068 (100× too small) |
transitionRate | Percentage | 43.8 = 43.8% | 0.438 |
treatmentRate | Percentage | 78.4 = 78.4% | 0.784 |
healthcareAccess | Percentage | 95 = 95% | 0.95 |
earlyStagePercent | Percentage | 77 = 77% | 0.77 |
biomarkerPercentage | Percentage | 68 = 68% | 0.68 |
transplantEligible | Percentage | 50 = 50% of cohort | 0.5 |
popBase | Millions | 344.1 = 344.1M | — |
displayName | String | "Metastatic Line 1" | — |
type | Enum (case-sensitive) | "Solid Tumor" or "Hematology" | "solid tumor" |
is_guidance | Boolean | true (canonical) / false (alternative). Defaults to true. | — |
year | Integer | 2025 (publication year metadata; blank allowed only on convention rows — Thor auto-fills Date.current.year on measurement rows). | — |
Historical note: A previous
statistics_corrected.csvdivided growth rates by 100 (storing0.0051instead of0.51), which caused near-zero growth in the forecasting model. That file has been deleted. The currentnew-statistics.csv— used for backend seeding only — uses face-value percentages.
Quick Reference: Parameter Scope Matrix
Section titled “Quick Reference: Parameter Scope Matrix”| Parameter | Pop-Level | Line-Level | Solid Tumor | Hematology |
|---|---|---|---|---|
incidenceBase | yes | — | yes | yes |
incidenceGrowth | yes | — | yes | yes |
earlyStagePercent | yes | — | yes | hardcoded 100 |
metStagePercent | yes | — | yes | hardcoded 0 |
unknownStagePercent | yes | — | yes (child of Early) | hardcoded 0 |
earlyToEarlyRelapse | yes | — | yes | hardcoded 0 |
earlyToMetRelapse | yes | — | yes | hardcoded 0 |
healthcareAccess | yes | — | yes | yes |
treatmentRate | default | override (1L, 2L only) | yes | yes |
transitionRate | — | yes | yes | yes |
biomarkerPercentage | yes | — | yes | yes |
transplantEligible | yes | — | — | yes (AML/MM/DLBCL/FL/HL) |
retreatmentOption | yes | — | yes | yes |
neoAdjuvantEnabled | — | yes | early only | — |
customVariables | — | yes | yes | yes |
| Market params | — | yes | yes | yes |
| Assumption params | yes | — | yes | yes |