Schema Overview
The bioloupe-data schema contains 59 tables across 15 domains, 5 materialized views, 1 abstract base entity, and 1 embeddable. Every table carries COMMENT ON metadata so that AI agents can discover purpose and constraints without documentation. The materialized view layer provides zero-JOIN surfaces for text-to-SQL workloads — an agent can answer most pipeline questions without writing a single JOIN.
Schema ERD
Section titled “Schema ERD”The following diagram shows the top-level relationships between major entities. Not every table is shown — only the 18 that define the “shape” of the graph.
erDiagram
drugs ||--o{ drug_ownerships : "owned by"
drugs ||--o{ drug_target_actions : "acts on"
drugs ||--o{ drug_approvals : "approved as"
drugs ||--o{ brand_drugs : "marketed as"
drugs ||--o{ drug_milestones : "milestones"
drugs }o--o| technologies : "modality"
drugs }o--o| drugs : "parent / biosimilar"
drug_target_actions }o--|| targets : "targets"
drug_ownerships }o--|| organisations : "company"
organisations }o--o| organisations : "subsidiary of"
organisations ||--o{ deal_participants : "participates in"
deal_participants }o--|| deals : "deal"
deals ||--o{ deal_drugs : "involves"
deal_drugs }o--|| drugs : "drug"
drug_approvals ||--o{ indications : "indicated for"
indications }o--|| diseases : "treats"
diseases }o--o| therapeutic_areas : "area"
diseases ||--o{ disease_closures : "ontology"
diseases ||--o{ disease_biomarkers : "characterized by"
disease_biomarkers }o--|| biomarkers : "biomarker"
clinical_trials ||--o{ trial_arms : "arms"
trial_arms ||--o{ trial_arm_interventions : "interventions"
trial_arm_interventions }o--|| interventions : "intervention"
clinical_trials ||--o{ trial_diseases : "studies"
trial_diseases }o--|| diseases : "disease"
clinical_trials ||--o{ trial_sponsors : "sponsored by"
trial_sponsors }o--|| organisations : "sponsor"
publications }o--o| clinical_trials : "linked trial"
publications ||--o{ publication_interventions : "interventions"
publication_interventions }o--|| drugs : "drug"
news ||--o{ news_drug_mentions : "mentions drug"
news ||--o{ news_disease_mentions : "mentions disease"
news ||--o{ news_org_mentions : "mentions org"
news ||--o{ news_trial_mentions : "mentions trial"
Domain inventory
Section titled “Domain inventory”| Domain | Tables | Purpose | Key Entity |
|---|---|---|---|
| Drugs | 8 | Drug lifecycle, ownership, approvals, targets | drugs |
| Clinical Trials | 14 | Trial structure, arms, results, eligibility | clinical_trials |
| Diseases | 4 | Disease ontology, epidemiology, biomarkers | diseases |
| Targets | 1 | Molecular drug targets | targets |
| Technologies | 1 | Drug modalities and platforms | technologies |
| Biomarkers | 1 | Molecular biomarkers | biomarkers |
| Organisations | 4 | Companies, deals, ownership | organisations |
| Regimens | 3 | Treatment protocols, interventions | regimens |
| Indications | 4 | Approved uses, therapeutic approaches | indications |
| Guidelines | 1 | Standard-of-care treatment guidelines | guidelines |
| Publications | 4 | Scientific literature, outcomes | publications |
| News | 7 | Press releases, entity mentions | news |
| Lookups | 6 | Reference data (countries, endpoints, MOAs) | countries |
| Audit | 1 | Schema change tracking | audit_logs |
| Views | 5 matviews | Pre-computed intelligence | mv_drug_pipeline |
Cross-cutting patterns
Section titled “Cross-cutting patterns”BaseEntity
Section titled “BaseEntity”Every table extends a single abstract base entity that provides three columns and one constraint:
| Column | Type | Default | Purpose |
|---|---|---|---|
id | uuid | gen_random_uuid() | Primary key |
created_at | timestamptz | now() | Row creation timestamp |
updated_at | timestamptz | now() | Row last-update timestamp |
A chk_dates CHECK constraint enforces updated_at >= created_at on every table.
Provenance embeddable
Section titled “Provenance embeddable”Most entities embed a provenance block directly into the table (no prefix). These four columns track where a record came from and how much to trust it:
| Column | Type | Purpose |
|---|---|---|
data_source | text | System of record: bioloupe, clinicaltrials_gov, fda, ema, pubmed, nci, manual, llm |
source_reference | text | External ID or URL in the source system |
confidence | numeric(3,2) | Extraction confidence score 0.00—1.00. Only set for LLM-extracted data. CHECK: 0 <= confidence <= 1 |
last_verified_at | timestamptz | When the record was last validated against its source |
Naming conventions
Section titled “Naming conventions”- Tables: plural
snake_case(e.g.,drug_target_actions,trial_arm_interventions) - Primary keys:
id uuidon every table - Foreign keys:
{referenced_table_singular}_id uuid(e.g.,drug_id,disease_id,clinical_trial_id) - Timestamps:
timestamptzfor all date/time columns — notimestamp without time zoneanywhere in the schema - JSON columns:
jsonbfor structured data; arrays stored as JSONB with explicit defaults ([]) - Join tables:
{table_a}_{table_b}naming pattern
Reference data
Section titled “Reference data”Lookup tables are seeded with curated reference data:
| Table | Rows | Examples |
|---|---|---|
countries | 197 | ISO 3166-1 country records |
endpoints | 39 | Overall Survival, PFS, ORR, DFS, CR |
modes_of_action | 35 | Inhibitor, Agonist, Antagonist, Modulator |
therapeutic_areas | 18 | Oncology, Hematology, Immunology, Neurology |
Materialized view layer
Section titled “Materialized view layer”Five materialized views provide pre-joined, denormalized surfaces for AI agent queries. All views refresh via a single SELECT refresh_all_matviews() call that resolves dependency order automatically using CONCURRENTLY for zero-downtime refreshes.
| View | Purpose |
|---|---|
mv_drug_pipeline | Full drug profile with targets, owners, and highest phase. Zero JOINs for pipeline queries. |
mv_development_programs | Highest development phase per drug-disease pair. Core intelligence for pipeline analysis. |
mv_competitive_landscape | Drug-disease-phase matrix with technology and owner metadata. Built on mv_development_programs. |
mv_drug_trials | Drug-to-trial bridge with derived role (investigational, comparator, combination partner). |
mv_efficacy_evidence | Unified endpoint data from trials and publications in a single queryable surface. |
For full view definitions and refresh strategies, see Materialized Views.
Constraint inventory
Section titled “Constraint inventory”The schema enforces data integrity through three categories of constraints:
| Category | Count | Examples |
|---|---|---|
| CHECK constraints | 66 | chk_dates (updated_at >= created_at) on every table, enrollment >= 0, confidence BETWEEN 0 AND 1, completion_date >= start_date |
| Unique constraints | 37 | drugs.bloupe_id, clinical_trials.nct_id, diseases.ncit_code, publications.doi, publications.pmid, news.url |
| Foreign key constraints | 23 | 16 use ON DELETE CASCADE (child rows deleted with parent), 7 use ON DELETE SET NULL (reference cleared but row preserved) |
CASCADE deletes flow through bridge and child tables (e.g., deleting a drug_approval cascades to its indications). SET NULL is used for optional references where the child row remains meaningful without the parent (e.g., drugs.technology_id, drugs.parent_id, organisations.subsidiary_of_id).