Publications

Scientific publications report the results of clinical trials. Conference abstracts presented at major oncology meetings (ASCO, ESMO, AACR, ASH, EHA) often precede full journal articles by months or years, making them the earliest source of structured efficacy data. The publications domain captures these records and the results extracted from them across four entities.

Entities

erDiagram
    publications ||--o{ publication_trials : "reports on"
    publications ||--o{ publication_outcomes : "extracts"
    publications ||--o{ publication_interventions : "describes"
    publication_trials }o--|| clinical_trials : "references"
    publication_interventions }o--o| drugs : "maps to"
    publications }o--o| clinical_trials : "primary trial"

publications

The central table. One row per publication, identified by DOI and/or PubMed ID.

Column	Purpose
`title`	Publication title
`doi`	Digital Object Identifier. Unique. Example: `10.1056/NEJMoa2206923`
`pmid`	PubMed ID. Unique. Example: `36652991`
`publication_type`	`journal_article`, `conference_abstract`, `poster`, `oral_presentation`
`journal`	Journal name (e.g., “New England Journal of Medicine”)
`conference`	Conference if presented at a meeting: `ASCO`, `AACR`, `ASH`, `EHA`, `ESMO`, `WCLC`, `SNO`
`published_at`	Publication date
`authors`	JSONB array of author names
`source`	Where originally indexed: `pubmed`, `asco`, `aacr`, `ash`, `eha`, `esmo`
`extracted`	Whether structured data has been extracted from this publication
`has_results`	Whether the publication reports clinical trial results
`is_partial`	Whether the reported results are interim (not final)
`total_participants`	Total number of study participants
`trial_outcome`	Overall assessment: `positive`, `negative`, `neutral`, `unclear`
`result_type`	`final`, `interim`, `subgroup`, `post_hoc`
`clinical_trial_id`	FK to `clinical_trials` — direct link to the primary associated trial
`embedding`	Vector embedding for semantic search

Indexed on doi, pmid, published_at, and clinical_trial_id.

publication_trials

Many-to-many bridge between publications and clinical trials. One row per publication-trial pair.

Column	Purpose
`publication_id`	FK to `publications` (cascade delete)
`clinical_trial_id`	FK to `clinical_trials` (cascade delete)

A unique constraint on (publication_id, clinical_trial_id) prevents duplicate links. Use this table rather than the direct clinical_trial_id on publications when a publication reports on multiple trials.

publication_outcomes

Extracted efficacy results from a publication. One row per endpoint per arm.

Column	Purpose
`publication_id`	FK to `publications` (cascade delete)
`endpoint_name`	Endpoint abbreviation: `OS`, `PFS`, `ORR`, `DOR`, `CR`, `DCR`
`endpoint_type`	`primary`, `secondary`, `exploratory`
`arm_description`	Which treatment arm this result describes
`value`	Measured value as text (e.g., “12.3 months”, “45.2%“)
`unit`	Unit of measurement (`months`, `%`, `events`)
`confidence_interval`	e.g., “95% CI 0.43-0.78”
`p_value`	Statistical p-value as text
`hazard_ratio`	HR as text. HR < 1.0 favors the experimental arm
`median_follow_up`	Median follow-up duration (e.g., “24.5 months”)
`subgroup_description`	Subgroup definition if this is a subgroup analysis

Indexed on publication_id and endpoint_name.

publication_interventions

Drugs and regimens discussed in a publication. One row per intervention.

Column	Purpose
`publication_id`	FK to `publications` (cascade delete)
`name`	Intervention name as mentioned in the publication
`intervention_type`	`drug`, `biological`, `procedure`, `radiation`
`intervention_role`	`investigational`, `supportive`, `combination`, `comparator`
`drug_id`	FK to `drugs` (nullable — set when the intervention maps to a known drug entity)
`dose`	Dosing information as text
`route_of_administration`	e.g., `intravenous`, `oral`, `subcutaneous`
`schedule`	Dosing schedule (e.g., “every 3 weeks”, “daily”)
`treatment_duration`	How long treatment was administered (e.g., “up to 24 months”)
`number_of_cycles`	Number of treatment cycles (check constraint: >= 1)

Indexed on publication_id and drug_id.

Design decisions

Publication outcomes are separate from trial arm results. Both capture efficacy data, but they come from different extraction pipelines. Trial arm results flow from AACT (ClinicalTrials.gov structured data), while publication outcomes are extracted from abstracts and full-text articles via an LLM pipeline. Keeping them separate preserves provenance and avoids conflating machine-readable registry data with narrative extractions.

Both feed mv_efficacy_evidence. Downstream consumers that need all available efficacy evidence query the materialized view rather than joining both source tables manually.

Direct trial FK plus bridge table. Publications carry a clinical_trial_id for the primary associated trial (fast single-trial lookups), while publication_trials captures the full many-to-many relationship for multi-trial publications.

Materialized views

mv_efficacy_evidence

A UNION ALL of trial_arm_results (with source_type = 'trial') and publication_outcomes (with source_type = 'publication'). Provides a single table for all endpoint-level efficacy data regardless of origin.

Key columns: source_type, source_id, endpoint_name, endpoint_type, measure_value, unit, p_value, hazard_ratio, confidence_interval, median_follow_up, clinical_trial_id, nct_id, publication_id.

Indexed on endpoint_name, clinical_trial_id, and publication_id.

Example queries

Find all publications for a trial by NCT ID:

SELECT p.title, p.doi, p.journal, p.published_at, p.result_type
FROM publications p
JOIN publication_trials pt ON pt.publication_id = p.id
JOIN clinical_trials ct ON ct.id = pt.clinical_trial_id
WHERE ct.nct_id = 'NCT04380636'
ORDER BY p.published_at;

Get OS and PFS results from ASCO 2025 abstracts:

SELECT p.title, po.endpoint_name, po.arm_description,
       po.value, po.hazard_ratio, po.p_value
FROM publication_outcomes po
JOIN publications p ON p.id = po.publication_id
WHERE p.conference = 'ASCO'
  AND p.published_at >= '2025-01-01'
  AND po.endpoint_name IN ('OS', 'PFS')
ORDER BY p.title, po.endpoint_name;

Query unified efficacy evidence for a drug across both trial results and publications:

SELECT source_type, endpoint_name, measure_value, unit,
       hazard_ratio, p_value, nct_id, publication_id
FROM mv_efficacy_evidence
WHERE clinical_trial_id = :trial_id
  AND endpoint_type = 'primary'
ORDER BY source_type, endpoint_name;