Clinical Trials
The clinical trials domain is the largest in bioloupe-data, spanning 14 entities across five functional areas. It models the full lifecycle of a clinical trial — from study design and eligibility criteria through enrollment, endpoint definitions, and structured results. The root entity is clinical_trials, and every child table cascades from it.
Entities
Section titled “Entities”erDiagram
clinical_trials ||--o{ trial_arms : "has"
clinical_trials ||--o{ trial_diseases : "studies"
clinical_trials ||--o{ trial_sponsors : "funded by"
clinical_trials ||--o{ trial_locations : "conducted at"
clinical_trials ||--o{ trial_investigators : "led by"
clinical_trials ||--o{ trial_outcomes : "measures"
clinical_trials ||--o{ trial_arm_results : "reports"
clinical_trials ||--o{ trial_adverse_events : "reports"
clinical_trials ||--o{ trial_subgroups : "analyzes"
clinical_trials ||--o{ trial_eligibilities : "requires"
trial_arms ||--o{ trial_arm_interventions : "administers"
trial_arms ||--o{ trial_arm_results : "produces"
trial_arm_interventions }o--o| interventions : "links to"
trial_diseases }o--|| diseases : "references"
trial_diseases ||--o{ trial_disease_biomarkers : "requires"
trial_disease_biomarkers }o--o| biomarkers : "references"
trial_outcomes ||--o{ trial_arm_results : "measured by"
trial_eligibilities }o--|| diseases : "scoped to"
trial_eligibilities ||--o{ trial_eligibility_biomarkers : "requires"
trial_eligibility_biomarkers }o--o| biomarkers : "references"
trial_sponsors }o--o| organisations : "is"
trial_subgroups }o--o| diseases : "scoped to"
Trial structure
Section titled “Trial structure”Three tables form the core design hierarchy.
clinical_trials — One row per registered trial. The natural key is nct_id. Key columns include phase (early_phase1 through phase4), overall_status (recruiting, completed, terminated, etc.), study_type (interventional, observational, expanded_access), and study design fields: intervention_model, allocation, masking, and primary_purpose. Enrollment is stored as a count with an enrollment_type flag (actual vs. anticipated). Free-text inclusion/exclusion criteria live here alongside age, sex, and healthy-volunteer flags. The table also carries a vector embedding column for semantic search and full provenance tracking.
trial_arms — Treatment arms within a trial. Each arm has a title (e.g., “Pembrolizumab 200 mg Q3W”), an arm_type (experimental, active_comparator, placebo_comparator, sham_comparator, no_intervention), and a sequence_order for display.
trial_arm_interventions — Links interventions to arms. Each row represents one drug, procedure, or device administered in an arm. Stores component_type (drug, biological, procedure, radiation, device, behavioral, diagnostic_test), structured dosing as JSONB (dose, route, duration, schedule, frequency), and an is_investigational flag that distinguishes the experimental agent from standard-of-care components. The FK to interventions is nullable — a fallback name column covers unmatched interventions. The pair (trial_arm, intervention) is unique.
Trial-disease link
Section titled “Trial-disease link”trial_diseases — Clean bridge table linking trials to the diseases they study. Derived from the ClinicalTrials.gov conditions field. This table carries no JSONB or faceted attributes — disease facets belong in trial_eligibilities.
trial_disease_biomarkers — Biomarker requirements per trial-disease pair. For example, a trial may require PD-L1 >= 50% for NSCLC. Each row stores a name (always populated as a fallback), a value threshold (e.g., “positive”, ”>= 50%”, “mutant”), and an optional FK to biomarkers.
Sponsors, locations, investigators
Section titled “Sponsors, locations, investigators”trial_sponsors — Lead sponsors and collaborators. Each row has a name (denormalized, always populated), sponsor_type (lead or collaborator), agency_class (industry, nih, fed, other), and an optional FK to organisations.
trial_locations — Geographic sites where a trial is conducted. Stores facility_name, city, state, country, and country_code (ISO 3166-1 alpha-2). A composite index on (country_code, clinical_trial) supports geographic filtering.
trial_investigators — Principal investigators and study directors. Stores name, role (principal_investigator, sub_investigator, study_director, study_chair), and affiliation.
Results
Section titled “Results”Four tables capture structured trial results data.
trial_outcomes — Endpoint definitions. Each outcome has an outcome_type (primary, secondary, other), a title (e.g., “Overall Survival”, “Progression-Free Survival”), a description, and a time_frame (e.g., “Up to 24 months”).
trial_arm_results — Structured, queryable efficacy data per arm per endpoint. This is the most analytically valuable table in the domain. Key columns:
| Column | Purpose |
|---|---|
endpoint_name | Abbreviation: OS, PFS, ORR, DOR, CR, DCR |
endpoint_type | primary, secondary, exploratory |
measure_value | Result as text (e.g., “12.3 months”, “45.2%“) |
p_value | Statistical significance (e.g., “<0.001”) |
hazard_ratio | HR < 1.0 favors experimental arm |
confidence_interval | e.g., “95% CI 0.43-0.78” |
median_follow_up | Duration of follow-up |
subgroup_description | Populated for subgroup analyses |
The trial_arm_id FK is nullable because some results are cross-arm comparisons. The optional trial_outcome_id FK links back to the endpoint definition. Carries full provenance.
trial_adverse_events — Safety data categorized by MedDRA System Organ Class. Each row records a term (e.g., neutropenia, fatigue), event_type (serious, other, deaths), arm_description, and counts: subjects_affected, subjects_at_risk, and event_count. A check constraint enforces subjects_affected <= subjects_at_risk.
trial_subgroups — Subgroup analyses defined in trial results or publications. Each row captures a subgroup_type dimension (biomarker, age, sex, prior_therapy, disease_stage, race, geography, performance_status), a subgroup_value (e.g., “PD-L1 >= 50%”, “age >= 65”), and an optional data_cutoff_date. An optional FK to diseases scopes disease-specific subgroups.
Eligibility
Section titled “Eligibility”Two tables model structured, queryable eligibility criteria — the normalized counterpart to the free-text criteria on clinical_trials.
trial_eligibilities — One row per (trial, disease) pair. Unique constraint enforced. Captures faceted criteria as JSONB arrays and scalar fields:
| Column | Example values |
|---|---|
extents | metastatic, locally_advanced, unresectable, brain_metastasis |
statuses | newly_diagnosed, relapsed, refractory, treatment_naive |
stages | III, IIIB, IV, T2N1M0 (from AJCC/TNM/FIGO/Ann Arbor) |
staging_system | AJCC, TNM, FIGO, Ann_Arbor, Rai, Binet, INSS |
subtypes | triple-negative, HER2-positive, Ph-positive |
risks | high, intermediate, low, favourable, unfavourable |
treatment_setting | first_line, second_line, adjuvant, neoadjuvant, maintenance, salvage |
min_prior_lines / max_prior_lines | Integer range of allowed prior therapy lines |
A check constraint enforces max_prior_lines >= min_prior_lines. Carries full provenance.
trial_eligibility_biomarkers — Biomarker-based inclusion and exclusion criteria per eligibility record. Each row stores biomarker_name, required_value (e.g., “positive”, ”>= 50%”), a numeric_threshold (decimal), a comparison operator (gte, lte, gt, lt, eq, range), and an is_inclusion flag (TRUE = patient must have this biomarker status; FALSE = patient must NOT). Optional FK to biomarkers.
Design decisions
Section titled “Design decisions”trial_diseases is a clean bridge table. It carries no JSONB columns or disease facets. Structured disease-level criteria (stage, subtype, extent, treatment setting, biomarker requirements) live in trial_eligibilities. This separation keeps the many-to-many link simple and pushes domain complexity into a purpose-built table with proper constraints.
trial_eligibilities models structured inclusion/exclusion criteria. Free-text criteria on clinical_trials are useful for display but cannot be queried. The eligibility tables normalize these into typed, indexed fields so you can answer questions like “find all phase III trials accepting treatment-naive metastatic NSCLC patients with PD-L1 >= 50%.”
Drugs connect through interventions, not directly. There is no drug_id on any clinical trials table. The path from a drug to its trials runs through the regimens domain: drugs -> interventions -> trial_arm_interventions -> trial_arms -> clinical_trials. This indirection supports non-drug interventions (procedures, devices, radiation) and avoids a direct coupling that would not generalize.
Materialized views
Section titled “Materialized views”Two materialized views sit on top of the clinical trials domain.
mv_drug_trials — Pre-computed drug-to-trial bridge with a derived role. One row per (drug, trial, role) triple. The role is computed from arm type and the is_investigational flag: investigational, combination_partner, active_comparator, placebo_comparator, sham_comparator, or other. This view eliminates the four-table join needed to answer “which trials is drug X in?”
mv_efficacy_evidence — Unified cross-source endpoint data. UNIONs trial_arm_results (source_type = trial) with publication_outcomes (source_type = publication) into a single queryable surface. Denormalizes nct_id for zero-join filtering. Indexed on endpoint_name, clinical_trial_id, and publication_id.
Example queries
Section titled “Example queries”Find all phase III trials for a specific disease with their primary endpoints:
SELECT ct.nct_id, ct.brief_title, ct.overall_status, to.title AS endpoint, to.time_frameFROM clinical_trials ctJOIN trial_diseases td ON td.clinical_trial_id = ct.idJOIN diseases d ON d.id = td.disease_idJOIN trial_outcomes to ON to.clinical_trial_id = ct.idWHERE d.name = 'Non-Small Cell Lung Cancer' AND ct.phase = 'phase3' AND to.outcome_type = 'primary'ORDER BY ct.start_date DESC;Find trials accepting treatment-naive metastatic patients with a specific biomarker:
SELECT ct.nct_id, ct.brief_title, te.treatment_setting, teb.biomarker_name, teb.required_valueFROM clinical_trials ctJOIN trial_eligibilities te ON te.clinical_trial_id = ct.idJOIN trial_eligibility_biomarkers teb ON teb.trial_eligibility_id = te.idWHERE te.extents @> '["metastatic"]' AND te.statuses @> '["treatment_naive"]' AND teb.biomarker_name = 'PD-L1' AND teb.is_inclusion = true AND ct.overall_status = 'recruiting';Compare efficacy data across trials and publications for a given endpoint:
SELECT ee.source_type, ee.nct_id, ee.endpoint_name, ee.measure_value, ee.hazard_ratio, ee.confidence_interval, ee.p_value, ee.median_follow_upFROM mv_efficacy_evidence eeWHERE ee.endpoint_name = 'PFS' AND ee.endpoint_type = 'primary' AND ee.clinical_trial_id IN ( SELECT td.clinical_trial_id FROM trial_diseases td JOIN diseases d ON d.id = td.disease_id WHERE d.name = 'Non-Small Cell Lung Cancer' )ORDER BY ee.source_type, ee.hazard_ratio;