Publication Issues Tracker 2
Publication Issues Tracker (continued)
Section titled “Publication Issues Tracker (continued)”Continuation of docs/publication_issues_tracker.md. New issues are logged here to keep file sizes manageable.
Last updated: 2026-04-06
Issue index
Section titled “Issue index”| # | Title | Short description | Status |
|---|---|---|---|
| 52 | ”All Arms” view join fans out across all trial_arm_interventions, duplicating dose | View v23 line 474-475 matches every drug_intervention to “All Arms” outcomes regardless of which arm the intervention belongs to. Creates duplicate rows with conflicting dose profiles (RP2D vs escalation range). ~2,776 pubs, ~15k row groups. | Investigation complete |
Each issue entry should keep analysis and remediation separate.
Recommended issue structure:
Short summaryWhere this sits in the current pipelineExact restriction causing the dropConcrete examplesDownstream impactWhat the issue is notScaleSpot checksOpen characterization questionsExplored solution directionSolution applied
Solution applied should remain empty until an actual fix is agreed and implemented.
Backfill pattern: When an issue requires backfilling historical data, see the “One-Off Backfill Tasks” section in
.claude/skills/backend-expert/SKILL.md.
52. “All Arms” view join fans out across all trial_arm_interventions, duplicating dose
Section titled “52. “All Arms” view join fans out across all trial_arm_interventions, duplicating dose”Short summary
Section titled “Short summary”View v23’s drug join for “All Arms” outcomes uses a broad match (aoe.arm_name = 'All Arms') that matches every trial_arm_intervention for the publication — not just the relevant one. When a publication has multiple dose-level arms (e.g., dose escalation with range 4.8-16.0 mg/kg AND dose expansion with single_dose 12.0 mg/kg), each “All Arms” outcome row gets duplicated with conflicting dose profiles. The query layer then takes first_row, making the dose shown effectively non-deterministic.
Where this sits in the current pipeline
Section titled “Where this sits in the current pipeline”View layer: db/views/vw_publication_efficacy_data_v23.sql, lines 470-475 (drug join for Source 0 “All Arms”):
-- Source 0 (trial_arm_interventions): direct trial_arm_id match(di.trial_arm_id IS NOT NULL AND aoe.trial_arm_id IS NOT NULL AND di.trial_arm_id = aoe.trial_arm_id)-- Source 0 for "All Arms" outcomes (trial_arm_id set but no interventions on that arm)OR (di.trial_arm_id IS NOT NULL AND aoe.trial_arm_id IS NOT NULL AND aoe.arm_name = 'All Arms')Query layer: app/queries/tpp/clinical_evidence_query.rb, line 386 — build_single_row takes dose from first_row:
single_dose: first_row['single_dose'],dose_min: first_row['dose_min'],dose_max: first_row['dose_max'],rp2d: first_row['rp2d'],Exact restriction causing the drop
Section titled “Exact restriction causing the drop”The aoe.arm_name = 'All Arms' clause is a catch-all: it matches any drug_intervention row that has a trial_arm_id set, regardless of which arm the intervention belongs to. When multiple arms have interventions with different dose profiles, the fan-out creates N duplicate rows per outcome (one per intervention).
The arm_dose_lookup CTE then joins on trial_arm_intervention_id, pulling each intervention’s distinct dose. The deduped_rows CTE (SELECT DISTINCT *) doesn’t collapse these because the dose columns differ.
Concrete examples
Section titled “Concrete examples”Pub 74158 (DS-7300 / Ifinatamab deruxtecan, B7-H3 ADC phase I/II extended follow-up):
- Two
trial_arm_interventions:- TAI 33097 on arm “Dose escalation (4.8-16.0 mg/kg)”:
dose_min=4.8, dose_max=16.0, dose_context_type=escalation - TAI 33098 on arm “Dose expansion (12.0 mg/kg)”:
single_dose=12.0, rp2d=12.0, dose_context_type=rp2d
- TAI 33097 on arm “Dose escalation (4.8-16.0 mg/kg)”:
- sqNSCLC outcomes are on arm “All Arms” (arm 25400), which has NO interventions
- View produces duplicate rows: some with
single_dose=12.0 mg/kg, others withdose_min=4.8 / dose_max=16.0 - Abstract explicitly states efficacy is pooled across 4.8-16.0 mg/kg cohorts — the range is correct, but the query may show 12.0 mg/kg (RP2D) depending on sort order
- Audit issue 8512 flagged
single_dose=12.0 mg/kgas incorrect for sqNSCLC
Downstream impact
Section titled “Downstream impact”- Clinical Evidence report shows a non-deterministic dose for “All Arms” rows — whichever duplicate sorts first wins
- The RP2D (single_dose) may be shown when the efficacy population spans the full dose range, misrepresenting the study population
- Row duplication in the view inflates materialized view size and may cause subtle metric selection issues in
extract_efficacy_metrics
What the issue is not
Section titled “What the issue is not”- Not a data extraction issue — the LLM correctly identified both dose arms
- Not an arm-linking issue (Issue 49) — the “All Arms” arm intentionally has no interventions
- Not the same as Issue 31 (dose bleed onto control arms via COALESCE) — this is a JOIN fan-out, not a fallback chain
- Not the same as Issue 51 (per-arm dose not populated) — here both arms have correct dose, but both are matched to outcomes they don’t belong to
| Metric | Count |
|---|---|
| Pubs with “All Arms” outcomes + conflicting dose profiles across different arms (single/rp2d vs range) | 715 |
| Pubs where the view produces duplicate rows with different dose variants for “All Arms” outcomes | 2,776 |
| Total duplicate row groups in the view | 15,025 |
Spot checks
Section titled “Spot checks”- Pub 74158: Confirmed — sqNSCLC rows duplicated with RP2D (12.0) and range (4.8-16.0). Abstract pools all cohorts.
Open characterization questions
Section titled “Open characterization questions”- How many of the 2,776 pubs have the “wrong” dose surfaced by the query (i.e.,
first_rowpicks RP2D when range would be more accurate)? - Should “All Arms” outcomes inherit dose from the broadest-range intervention, or should dose be null/aggregated?
- Does
dose_context_type(escalation vs rp2d) provide enough signal to pick the right intervention?
Explored solution direction
Section titled “Explored solution direction”Option A — View-layer fix: When aoe.arm_name = 'All Arms', prefer the intervention with dose_context_type = 'escalation' (or the one with dose_min/dose_max range) over single_dose/rp2d. Could use a ROW_NUMBER() window with a priority ordering.
Option B — Query-layer fix: In build_single_row, when multiple view rows exist for the same outcome with different dose profiles, prefer the range (dose_min/dose_max) over single_dose for “All Arms” rows — since pooled data is better described by the range.
Option C — Data-model fix: Link “All Arms” outcomes to all relevant arms explicitly (via trial_arm_id or a new junction), then aggregate dose at the query level.
Option A is the most targeted fix with lowest risk. Option B is simpler but doesn’t address the view duplication. Option C is the cleanest long-term but highest effort.