Publications
Scientific publications report the results of clinical trials. Conference abstracts presented at major oncology meetings (ASCO, ESMO, AACR, ASH, EHA) often precede full journal articles by months or years, making them the earliest source of structured efficacy data. The publications domain captures these records and the results extracted from them across four entities.
Entities
Section titled “Entities”erDiagram
publications ||--o{ publication_trials : "reports on"
publications ||--o{ publication_outcomes : "extracts"
publications ||--o{ publication_interventions : "describes"
publication_trials }o--|| clinical_trials : "references"
publication_interventions }o--o| drugs : "maps to"
publications }o--o| clinical_trials : "primary trial"
publications
Section titled “publications”The central table. One row per publication, identified by DOI and/or PubMed ID.
| Column | Purpose |
|---|---|
title | Publication title |
doi | Digital Object Identifier. Unique. Example: 10.1056/NEJMoa2206923 |
pmid | PubMed ID. Unique. Example: 36652991 |
publication_type | journal_article, conference_abstract, poster, oral_presentation |
journal | Journal name (e.g., “New England Journal of Medicine”) |
conference | Conference if presented at a meeting: ASCO, AACR, ASH, EHA, ESMO, WCLC, SNO |
published_at | Publication date |
authors | JSONB array of author names |
source | Where originally indexed: pubmed, asco, aacr, ash, eha, esmo |
extracted | Whether structured data has been extracted from this publication |
has_results | Whether the publication reports clinical trial results |
is_partial | Whether the reported results are interim (not final) |
total_participants | Total number of study participants |
trial_outcome | Overall assessment: positive, negative, neutral, unclear |
result_type | final, interim, subgroup, post_hoc |
clinical_trial_id | FK to clinical_trials — direct link to the primary associated trial |
embedding | Vector embedding for semantic search |
Indexed on doi, pmid, published_at, and clinical_trial_id.
publication_trials
Section titled “publication_trials”Many-to-many bridge between publications and clinical trials. One row per publication-trial pair.
| Column | Purpose |
|---|---|
publication_id | FK to publications (cascade delete) |
clinical_trial_id | FK to clinical_trials (cascade delete) |
A unique constraint on (publication_id, clinical_trial_id) prevents duplicate links. Use this table rather than the direct clinical_trial_id on publications when a publication reports on multiple trials.
publication_outcomes
Section titled “publication_outcomes”Extracted efficacy results from a publication. One row per endpoint per arm.
| Column | Purpose |
|---|---|
publication_id | FK to publications (cascade delete) |
endpoint_name | Endpoint abbreviation: OS, PFS, ORR, DOR, CR, DCR |
endpoint_type | primary, secondary, exploratory |
arm_description | Which treatment arm this result describes |
value | Measured value as text (e.g., “12.3 months”, “45.2%“) |
unit | Unit of measurement (months, %, events) |
confidence_interval | e.g., “95% CI 0.43-0.78” |
p_value | Statistical p-value as text |
hazard_ratio | HR as text. HR < 1.0 favors the experimental arm |
median_follow_up | Median follow-up duration (e.g., “24.5 months”) |
subgroup_description | Subgroup definition if this is a subgroup analysis |
Indexed on publication_id and endpoint_name.
publication_interventions
Section titled “publication_interventions”Drugs and regimens discussed in a publication. One row per intervention.
| Column | Purpose |
|---|---|
publication_id | FK to publications (cascade delete) |
name | Intervention name as mentioned in the publication |
intervention_type | drug, biological, procedure, radiation |
intervention_role | investigational, supportive, combination, comparator |
drug_id | FK to drugs (nullable — set when the intervention maps to a known drug entity) |
dose | Dosing information as text |
route_of_administration | e.g., intravenous, oral, subcutaneous |
schedule | Dosing schedule (e.g., “every 3 weeks”, “daily”) |
treatment_duration | How long treatment was administered (e.g., “up to 24 months”) |
number_of_cycles | Number of treatment cycles (check constraint: >= 1) |
Indexed on publication_id and drug_id.
Design decisions
Section titled “Design decisions”Publication outcomes are separate from trial arm results. Both capture efficacy data, but they come from different extraction pipelines. Trial arm results flow from AACT (ClinicalTrials.gov structured data), while publication outcomes are extracted from abstracts and full-text articles via an LLM pipeline. Keeping them separate preserves provenance and avoids conflating machine-readable registry data with narrative extractions.
Both feed mv_efficacy_evidence. Downstream consumers that need all available efficacy evidence query the materialized view rather than joining both source tables manually.
Direct trial FK plus bridge table. Publications carry a clinical_trial_id for the primary associated trial (fast single-trial lookups), while publication_trials captures the full many-to-many relationship for multi-trial publications.
Materialized views
Section titled “Materialized views”mv_efficacy_evidence
Section titled “mv_efficacy_evidence”A UNION ALL of trial_arm_results (with source_type = 'trial') and publication_outcomes (with source_type = 'publication'). Provides a single table for all endpoint-level efficacy data regardless of origin.
Key columns: source_type, source_id, endpoint_name, endpoint_type, measure_value, unit, p_value, hazard_ratio, confidence_interval, median_follow_up, clinical_trial_id, nct_id, publication_id.
Indexed on endpoint_name, clinical_trial_id, and publication_id.
Example queries
Section titled “Example queries”Find all publications for a trial by NCT ID:
SELECT p.title, p.doi, p.journal, p.published_at, p.result_typeFROM publications pJOIN publication_trials pt ON pt.publication_id = p.idJOIN clinical_trials ct ON ct.id = pt.clinical_trial_idWHERE ct.nct_id = 'NCT04380636'ORDER BY p.published_at;Get OS and PFS results from ASCO 2025 abstracts:
SELECT p.title, po.endpoint_name, po.arm_description, po.value, po.hazard_ratio, po.p_valueFROM publication_outcomes poJOIN publications p ON p.id = po.publication_idWHERE p.conference = 'ASCO' AND p.published_at >= '2025-01-01' AND po.endpoint_name IN ('OS', 'PFS')ORDER BY p.title, po.endpoint_name;Query unified efficacy evidence for a drug across both trial results and publications:
SELECT source_type, endpoint_name, measure_value, unit, hazard_ratio, p_value, nct_id, publication_idFROM mv_efficacy_evidenceWHERE clinical_trial_id = :trial_id AND endpoint_type = 'primary'ORDER BY source_type, endpoint_name;