Skip to content

Clinical Trials

The clinical trials domain is the largest in bioloupe-data, spanning 14 entities across five functional areas. It models the full lifecycle of a clinical trial — from study design and eligibility criteria through enrollment, endpoint definitions, and structured results. The root entity is clinical_trials, and every child table cascades from it.

erDiagram
    clinical_trials ||--o{ trial_arms : "has"
    clinical_trials ||--o{ trial_diseases : "studies"
    clinical_trials ||--o{ trial_sponsors : "funded by"
    clinical_trials ||--o{ trial_locations : "conducted at"
    clinical_trials ||--o{ trial_investigators : "led by"
    clinical_trials ||--o{ trial_outcomes : "measures"
    clinical_trials ||--o{ trial_arm_results : "reports"
    clinical_trials ||--o{ trial_adverse_events : "reports"
    clinical_trials ||--o{ trial_subgroups : "analyzes"
    clinical_trials ||--o{ trial_eligibilities : "requires"

    trial_arms ||--o{ trial_arm_interventions : "administers"
    trial_arms ||--o{ trial_arm_results : "produces"

    trial_arm_interventions }o--o| interventions : "links to"

    trial_diseases }o--|| diseases : "references"
    trial_diseases ||--o{ trial_disease_biomarkers : "requires"
    trial_disease_biomarkers }o--o| biomarkers : "references"

    trial_outcomes ||--o{ trial_arm_results : "measured by"

    trial_eligibilities }o--|| diseases : "scoped to"
    trial_eligibilities ||--o{ trial_eligibility_biomarkers : "requires"
    trial_eligibility_biomarkers }o--o| biomarkers : "references"

    trial_sponsors }o--o| organisations : "is"
    trial_subgroups }o--o| diseases : "scoped to"

Three tables form the core design hierarchy.

clinical_trials — One row per registered trial. The natural key is nct_id. Key columns include phase (early_phase1 through phase4), overall_status (recruiting, completed, terminated, etc.), study_type (interventional, observational, expanded_access), and study design fields: intervention_model, allocation, masking, and primary_purpose. Enrollment is stored as a count with an enrollment_type flag (actual vs. anticipated). Free-text inclusion/exclusion criteria live here alongside age, sex, and healthy-volunteer flags. The table also carries a vector embedding column for semantic search and full provenance tracking.

trial_arms — Treatment arms within a trial. Each arm has a title (e.g., “Pembrolizumab 200 mg Q3W”), an arm_type (experimental, active_comparator, placebo_comparator, sham_comparator, no_intervention), and a sequence_order for display.

trial_arm_interventions — Links interventions to arms. Each row represents one drug, procedure, or device administered in an arm. Stores component_type (drug, biological, procedure, radiation, device, behavioral, diagnostic_test), structured dosing as JSONB (dose, route, duration, schedule, frequency), and an is_investigational flag that distinguishes the experimental agent from standard-of-care components. The FK to interventions is nullable — a fallback name column covers unmatched interventions. The pair (trial_arm, intervention) is unique.

trial_diseases — Clean bridge table linking trials to the diseases they study. Derived from the ClinicalTrials.gov conditions field. This table carries no JSONB or faceted attributes — disease facets belong in trial_eligibilities.

trial_disease_biomarkers — Biomarker requirements per trial-disease pair. For example, a trial may require PD-L1 >= 50% for NSCLC. Each row stores a name (always populated as a fallback), a value threshold (e.g., “positive”, ”>= 50%”, “mutant”), and an optional FK to biomarkers.

trial_sponsors — Lead sponsors and collaborators. Each row has a name (denormalized, always populated), sponsor_type (lead or collaborator), agency_class (industry, nih, fed, other), and an optional FK to organisations.

trial_locations — Geographic sites where a trial is conducted. Stores facility_name, city, state, country, and country_code (ISO 3166-1 alpha-2). A composite index on (country_code, clinical_trial) supports geographic filtering.

trial_investigators — Principal investigators and study directors. Stores name, role (principal_investigator, sub_investigator, study_director, study_chair), and affiliation.

Four tables capture structured trial results data.

trial_outcomes — Endpoint definitions. Each outcome has an outcome_type (primary, secondary, other), a title (e.g., “Overall Survival”, “Progression-Free Survival”), a description, and a time_frame (e.g., “Up to 24 months”).

trial_arm_results — Structured, queryable efficacy data per arm per endpoint. This is the most analytically valuable table in the domain. Key columns:

ColumnPurpose
endpoint_nameAbbreviation: OS, PFS, ORR, DOR, CR, DCR
endpoint_typeprimary, secondary, exploratory
measure_valueResult as text (e.g., “12.3 months”, “45.2%“)
p_valueStatistical significance (e.g., “<0.001”)
hazard_ratioHR < 1.0 favors experimental arm
confidence_intervale.g., “95% CI 0.43-0.78”
median_follow_upDuration of follow-up
subgroup_descriptionPopulated for subgroup analyses

The trial_arm_id FK is nullable because some results are cross-arm comparisons. The optional trial_outcome_id FK links back to the endpoint definition. Carries full provenance.

trial_adverse_events — Safety data categorized by MedDRA System Organ Class. Each row records a term (e.g., neutropenia, fatigue), event_type (serious, other, deaths), arm_description, and counts: subjects_affected, subjects_at_risk, and event_count. A check constraint enforces subjects_affected <= subjects_at_risk.

trial_subgroups — Subgroup analyses defined in trial results or publications. Each row captures a subgroup_type dimension (biomarker, age, sex, prior_therapy, disease_stage, race, geography, performance_status), a subgroup_value (e.g., “PD-L1 >= 50%”, “age >= 65”), and an optional data_cutoff_date. An optional FK to diseases scopes disease-specific subgroups.

Two tables model structured, queryable eligibility criteria — the normalized counterpart to the free-text criteria on clinical_trials.

trial_eligibilities — One row per (trial, disease) pair. Unique constraint enforced. Captures faceted criteria as JSONB arrays and scalar fields:

ColumnExample values
extentsmetastatic, locally_advanced, unresectable, brain_metastasis
statusesnewly_diagnosed, relapsed, refractory, treatment_naive
stagesIII, IIIB, IV, T2N1M0 (from AJCC/TNM/FIGO/Ann Arbor)
staging_systemAJCC, TNM, FIGO, Ann_Arbor, Rai, Binet, INSS
subtypestriple-negative, HER2-positive, Ph-positive
riskshigh, intermediate, low, favourable, unfavourable
treatment_settingfirst_line, second_line, adjuvant, neoadjuvant, maintenance, salvage
min_prior_lines / max_prior_linesInteger range of allowed prior therapy lines

A check constraint enforces max_prior_lines >= min_prior_lines. Carries full provenance.

trial_eligibility_biomarkers — Biomarker-based inclusion and exclusion criteria per eligibility record. Each row stores biomarker_name, required_value (e.g., “positive”, ”>= 50%”), a numeric_threshold (decimal), a comparison operator (gte, lte, gt, lt, eq, range), and an is_inclusion flag (TRUE = patient must have this biomarker status; FALSE = patient must NOT). Optional FK to biomarkers.

trial_diseases is a clean bridge table. It carries no JSONB columns or disease facets. Structured disease-level criteria (stage, subtype, extent, treatment setting, biomarker requirements) live in trial_eligibilities. This separation keeps the many-to-many link simple and pushes domain complexity into a purpose-built table with proper constraints.

trial_eligibilities models structured inclusion/exclusion criteria. Free-text criteria on clinical_trials are useful for display but cannot be queried. The eligibility tables normalize these into typed, indexed fields so you can answer questions like “find all phase III trials accepting treatment-naive metastatic NSCLC patients with PD-L1 >= 50%.”

Drugs connect through interventions, not directly. There is no drug_id on any clinical trials table. The path from a drug to its trials runs through the regimens domain: drugs -> interventions -> trial_arm_interventions -> trial_arms -> clinical_trials. This indirection supports non-drug interventions (procedures, devices, radiation) and avoids a direct coupling that would not generalize.

Two materialized views sit on top of the clinical trials domain.

mv_drug_trials — Pre-computed drug-to-trial bridge with a derived role. One row per (drug, trial, role) triple. The role is computed from arm type and the is_investigational flag: investigational, combination_partner, active_comparator, placebo_comparator, sham_comparator, or other. This view eliminates the four-table join needed to answer “which trials is drug X in?”

mv_efficacy_evidence — Unified cross-source endpoint data. UNIONs trial_arm_results (source_type = trial) with publication_outcomes (source_type = publication) into a single queryable surface. Denormalizes nct_id for zero-join filtering. Indexed on endpoint_name, clinical_trial_id, and publication_id.

Find all phase III trials for a specific disease with their primary endpoints:

SELECT
ct.nct_id,
ct.brief_title,
ct.overall_status,
to.title AS endpoint,
to.time_frame
FROM clinical_trials ct
JOIN trial_diseases td ON td.clinical_trial_id = ct.id
JOIN diseases d ON d.id = td.disease_id
JOIN trial_outcomes to ON to.clinical_trial_id = ct.id
WHERE d.name = 'Non-Small Cell Lung Cancer'
AND ct.phase = 'phase3'
AND to.outcome_type = 'primary'
ORDER BY ct.start_date DESC;

Find trials accepting treatment-naive metastatic patients with a specific biomarker:

SELECT
ct.nct_id,
ct.brief_title,
te.treatment_setting,
teb.biomarker_name,
teb.required_value
FROM clinical_trials ct
JOIN trial_eligibilities te ON te.clinical_trial_id = ct.id
JOIN trial_eligibility_biomarkers teb ON teb.trial_eligibility_id = te.id
WHERE te.extents @> '["metastatic"]'
AND te.statuses @> '["treatment_naive"]'
AND teb.biomarker_name = 'PD-L1'
AND teb.is_inclusion = true
AND ct.overall_status = 'recruiting';

Compare efficacy data across trials and publications for a given endpoint:

SELECT
ee.source_type,
ee.nct_id,
ee.endpoint_name,
ee.measure_value,
ee.hazard_ratio,
ee.confidence_interval,
ee.p_value,
ee.median_follow_up
FROM mv_efficacy_evidence ee
WHERE ee.endpoint_name = 'PFS'
AND ee.endpoint_type = 'primary'
AND ee.clinical_trial_id IN (
SELECT td.clinical_trial_id
FROM trial_diseases td
JOIN diseases d ON d.id = td.disease_id
WHERE d.name = 'Non-Small Cell Lung Cancer'
)
ORDER BY ee.source_type, ee.hazard_ratio;