Skip to content

Publications

Scientific publications report the results of clinical trials. Conference abstracts presented at major oncology meetings (ASCO, ESMO, AACR, ASH, EHA) often precede full journal articles by months or years, making them the earliest source of structured efficacy data. The publications domain captures these records and the results extracted from them across four entities.

erDiagram
    publications ||--o{ publication_trials : "reports on"
    publications ||--o{ publication_outcomes : "extracts"
    publications ||--o{ publication_interventions : "describes"
    publication_trials }o--|| clinical_trials : "references"
    publication_interventions }o--o| drugs : "maps to"
    publications }o--o| clinical_trials : "primary trial"

The central table. One row per publication, identified by DOI and/or PubMed ID.

ColumnPurpose
titlePublication title
doiDigital Object Identifier. Unique. Example: 10.1056/NEJMoa2206923
pmidPubMed ID. Unique. Example: 36652991
publication_typejournal_article, conference_abstract, poster, oral_presentation
journalJournal name (e.g., “New England Journal of Medicine”)
conferenceConference if presented at a meeting: ASCO, AACR, ASH, EHA, ESMO, WCLC, SNO
published_atPublication date
authorsJSONB array of author names
sourceWhere originally indexed: pubmed, asco, aacr, ash, eha, esmo
extractedWhether structured data has been extracted from this publication
has_resultsWhether the publication reports clinical trial results
is_partialWhether the reported results are interim (not final)
total_participantsTotal number of study participants
trial_outcomeOverall assessment: positive, negative, neutral, unclear
result_typefinal, interim, subgroup, post_hoc
clinical_trial_idFK to clinical_trials — direct link to the primary associated trial
embeddingVector embedding for semantic search

Indexed on doi, pmid, published_at, and clinical_trial_id.

Many-to-many bridge between publications and clinical trials. One row per publication-trial pair.

ColumnPurpose
publication_idFK to publications (cascade delete)
clinical_trial_idFK to clinical_trials (cascade delete)

A unique constraint on (publication_id, clinical_trial_id) prevents duplicate links. Use this table rather than the direct clinical_trial_id on publications when a publication reports on multiple trials.

Extracted efficacy results from a publication. One row per endpoint per arm.

ColumnPurpose
publication_idFK to publications (cascade delete)
endpoint_nameEndpoint abbreviation: OS, PFS, ORR, DOR, CR, DCR
endpoint_typeprimary, secondary, exploratory
arm_descriptionWhich treatment arm this result describes
valueMeasured value as text (e.g., “12.3 months”, “45.2%“)
unitUnit of measurement (months, %, events)
confidence_intervale.g., “95% CI 0.43-0.78”
p_valueStatistical p-value as text
hazard_ratioHR as text. HR < 1.0 favors the experimental arm
median_follow_upMedian follow-up duration (e.g., “24.5 months”)
subgroup_descriptionSubgroup definition if this is a subgroup analysis

Indexed on publication_id and endpoint_name.

Drugs and regimens discussed in a publication. One row per intervention.

ColumnPurpose
publication_idFK to publications (cascade delete)
nameIntervention name as mentioned in the publication
intervention_typedrug, biological, procedure, radiation
intervention_roleinvestigational, supportive, combination, comparator
drug_idFK to drugs (nullable — set when the intervention maps to a known drug entity)
doseDosing information as text
route_of_administratione.g., intravenous, oral, subcutaneous
scheduleDosing schedule (e.g., “every 3 weeks”, “daily”)
treatment_durationHow long treatment was administered (e.g., “up to 24 months”)
number_of_cyclesNumber of treatment cycles (check constraint: >= 1)

Indexed on publication_id and drug_id.

Publication outcomes are separate from trial arm results. Both capture efficacy data, but they come from different extraction pipelines. Trial arm results flow from AACT (ClinicalTrials.gov structured data), while publication outcomes are extracted from abstracts and full-text articles via an LLM pipeline. Keeping them separate preserves provenance and avoids conflating machine-readable registry data with narrative extractions.

Both feed mv_efficacy_evidence. Downstream consumers that need all available efficacy evidence query the materialized view rather than joining both source tables manually.

Direct trial FK plus bridge table. Publications carry a clinical_trial_id for the primary associated trial (fast single-trial lookups), while publication_trials captures the full many-to-many relationship for multi-trial publications.

A UNION ALL of trial_arm_results (with source_type = 'trial') and publication_outcomes (with source_type = 'publication'). Provides a single table for all endpoint-level efficacy data regardless of origin.

Key columns: source_type, source_id, endpoint_name, endpoint_type, measure_value, unit, p_value, hazard_ratio, confidence_interval, median_follow_up, clinical_trial_id, nct_id, publication_id.

Indexed on endpoint_name, clinical_trial_id, and publication_id.

Find all publications for a trial by NCT ID:

SELECT p.title, p.doi, p.journal, p.published_at, p.result_type
FROM publications p
JOIN publication_trials pt ON pt.publication_id = p.id
JOIN clinical_trials ct ON ct.id = pt.clinical_trial_id
WHERE ct.nct_id = 'NCT04380636'
ORDER BY p.published_at;

Get OS and PFS results from ASCO 2025 abstracts:

SELECT p.title, po.endpoint_name, po.arm_description,
po.value, po.hazard_ratio, po.p_value
FROM publication_outcomes po
JOIN publications p ON p.id = po.publication_id
WHERE p.conference = 'ASCO'
AND p.published_at >= '2025-01-01'
AND po.endpoint_name IN ('OS', 'PFS')
ORDER BY p.title, po.endpoint_name;

Query unified efficacy evidence for a drug across both trial results and publications:

SELECT source_type, endpoint_name, measure_value, unit,
hazard_ratio, p_value, nct_id, publication_id
FROM mv_efficacy_evidence
WHERE clinical_trial_id = :trial_id
AND endpoint_type = 'primary'
ORDER BY source_type, endpoint_name;