Skip to content

Schema Overview

The bioloupe-data schema contains 59 tables across 15 domains, 5 materialized views, 1 abstract base entity, and 1 embeddable. Every table carries COMMENT ON metadata so that AI agents can discover purpose and constraints without documentation. The materialized view layer provides zero-JOIN surfaces for text-to-SQL workloads — an agent can answer most pipeline questions without writing a single JOIN.

The following diagram shows the top-level relationships between major entities. Not every table is shown — only the 18 that define the “shape” of the graph.

erDiagram
    drugs ||--o{ drug_ownerships : "owned by"
    drugs ||--o{ drug_target_actions : "acts on"
    drugs ||--o{ drug_approvals : "approved as"
    drugs ||--o{ brand_drugs : "marketed as"
    drugs ||--o{ drug_milestones : "milestones"
    drugs }o--o| technologies : "modality"
    drugs }o--o| drugs : "parent / biosimilar"

    drug_target_actions }o--|| targets : "targets"
    drug_ownerships }o--|| organisations : "company"
    organisations }o--o| organisations : "subsidiary of"
    organisations ||--o{ deal_participants : "participates in"
    deal_participants }o--|| deals : "deal"
    deals ||--o{ deal_drugs : "involves"
    deal_drugs }o--|| drugs : "drug"

    drug_approvals ||--o{ indications : "indicated for"
    indications }o--|| diseases : "treats"
    diseases }o--o| therapeutic_areas : "area"
    diseases ||--o{ disease_closures : "ontology"
    diseases ||--o{ disease_biomarkers : "characterized by"
    disease_biomarkers }o--|| biomarkers : "biomarker"

    clinical_trials ||--o{ trial_arms : "arms"
    trial_arms ||--o{ trial_arm_interventions : "interventions"
    trial_arm_interventions }o--|| interventions : "intervention"
    clinical_trials ||--o{ trial_diseases : "studies"
    trial_diseases }o--|| diseases : "disease"
    clinical_trials ||--o{ trial_sponsors : "sponsored by"
    trial_sponsors }o--|| organisations : "sponsor"

    publications }o--o| clinical_trials : "linked trial"
    publications ||--o{ publication_interventions : "interventions"
    publication_interventions }o--|| drugs : "drug"

    news ||--o{ news_drug_mentions : "mentions drug"
    news ||--o{ news_disease_mentions : "mentions disease"
    news ||--o{ news_org_mentions : "mentions org"
    news ||--o{ news_trial_mentions : "mentions trial"
DomainTablesPurposeKey Entity
Drugs8Drug lifecycle, ownership, approvals, targetsdrugs
Clinical Trials14Trial structure, arms, results, eligibilityclinical_trials
Diseases4Disease ontology, epidemiology, biomarkersdiseases
Targets1Molecular drug targetstargets
Technologies1Drug modalities and platformstechnologies
Biomarkers1Molecular biomarkersbiomarkers
Organisations4Companies, deals, ownershiporganisations
Regimens3Treatment protocols, interventionsregimens
Indications4Approved uses, therapeutic approachesindications
Guidelines1Standard-of-care treatment guidelinesguidelines
Publications4Scientific literature, outcomespublications
News7Press releases, entity mentionsnews
Lookups6Reference data (countries, endpoints, MOAs)countries
Audit1Schema change trackingaudit_logs
Views5 matviewsPre-computed intelligencemv_drug_pipeline

Every table extends a single abstract base entity that provides three columns and one constraint:

ColumnTypeDefaultPurpose
iduuidgen_random_uuid()Primary key
created_attimestamptznow()Row creation timestamp
updated_attimestamptznow()Row last-update timestamp

A chk_dates CHECK constraint enforces updated_at >= created_at on every table.

Most entities embed a provenance block directly into the table (no prefix). These four columns track where a record came from and how much to trust it:

ColumnTypePurpose
data_sourcetextSystem of record: bioloupe, clinicaltrials_gov, fda, ema, pubmed, nci, manual, llm
source_referencetextExternal ID or URL in the source system
confidencenumeric(3,2)Extraction confidence score 0.00—1.00. Only set for LLM-extracted data. CHECK: 0 <= confidence <= 1
last_verified_attimestamptzWhen the record was last validated against its source
  • Tables: plural snake_case (e.g., drug_target_actions, trial_arm_interventions)
  • Primary keys: id uuid on every table
  • Foreign keys: {referenced_table_singular}_id uuid (e.g., drug_id, disease_id, clinical_trial_id)
  • Timestamps: timestamptz for all date/time columns — no timestamp without time zone anywhere in the schema
  • JSON columns: jsonb for structured data; arrays stored as JSONB with explicit defaults ([])
  • Join tables: {table_a}_{table_b} naming pattern

Lookup tables are seeded with curated reference data:

TableRowsExamples
countries197ISO 3166-1 country records
endpoints39Overall Survival, PFS, ORR, DFS, CR
modes_of_action35Inhibitor, Agonist, Antagonist, Modulator
therapeutic_areas18Oncology, Hematology, Immunology, Neurology

Five materialized views provide pre-joined, denormalized surfaces for AI agent queries. All views refresh via a single SELECT refresh_all_matviews() call that resolves dependency order automatically using CONCURRENTLY for zero-downtime refreshes.

ViewPurpose
mv_drug_pipelineFull drug profile with targets, owners, and highest phase. Zero JOINs for pipeline queries.
mv_development_programsHighest development phase per drug-disease pair. Core intelligence for pipeline analysis.
mv_competitive_landscapeDrug-disease-phase matrix with technology and owner metadata. Built on mv_development_programs.
mv_drug_trialsDrug-to-trial bridge with derived role (investigational, comparator, combination partner).
mv_efficacy_evidenceUnified endpoint data from trials and publications in a single queryable surface.

For full view definitions and refresh strategies, see Materialized Views.

The schema enforces data integrity through three categories of constraints:

CategoryCountExamples
CHECK constraints66chk_dates (updated_at >= created_at) on every table, enrollment >= 0, confidence BETWEEN 0 AND 1, completion_date >= start_date
Unique constraints37drugs.bloupe_id, clinical_trials.nct_id, diseases.ncit_code, publications.doi, publications.pmid, news.url
Foreign key constraints2316 use ON DELETE CASCADE (child rows deleted with parent), 7 use ON DELETE SET NULL (reference cleared but row preserved)

CASCADE deletes flow through bridge and child tables (e.g., deleting a drug_approval cascades to its indications). SET NULL is used for optional references where the child row remains meaningful without the parent (e.g., drugs.technology_id, drugs.parent_id, organisations.subsidiary_of_id).