Lookups

Lookup tables hold reference data that is seeded once and rarely changes. They provide controlled vocabularies, geographic codes, and ontology mappings that other domains reference through foreign keys. The lookups domain contains six reference entities plus one audit entity that tracks changes across the entire database.

Reference tables

countries

198 countries with ISO 3166-1 codes, organized into five geographic regions.

Column	Type	Purpose
`name`	text (unique)	Full country name (e.g., “United States”, “Germany”)
`alpha2`	text (unique)	ISO 3166-1 alpha-2 code (e.g., US, DE, JP)
`alpha3`	text (unique)	ISO 3166-1 alpha-3 code (e.g., USA, DEU, JPN)
`region`	text	Geographic region: `africa`, `americas`, `asia_pacific`, `europe`, `middle_east`

All three name/code columns carry unique constraints. Region is nullable but populated for every seeded row.

endpoints

39 clinical trial endpoints grouped into four categories.

Column	Type	Purpose
`name`	text (unique)	Full endpoint name (e.g., “Overall Survival”, “Progression-Free Survival”)
`abbreviation`	text	Common abbreviation (e.g., OS, PFS, ORR, DOR, CR, DCR)
`description`	text	Definition of what this endpoint measures
`category`	text	`efficacy`, `safety`, `pharmacokinetic`, `quality_of_life`

modes_of_action

35 drug mechanisms describing how a compound interacts with its molecular target.

Column	Type	Purpose
`name`	text (unique)	Mechanism name (e.g., “Inhibitor”, “Antagonist”, “Agonist”, “Degrader”)
`description`	text	Mechanistic description of the mode of action

Entries range from classical pharmacology (Inhibitor, Agonist, Antagonist) through targeted modalities (Antibody-Drug Conjugate, CAR-T Cell Therapy, Bispecific Engagement) to epigenetic and gene-level interventions (Histone Deacetylase Inhibition, RNA Interference, Gene Therapy). Referenced by drug_target_actions to link drugs to targets with a specific mechanism.

therapeutic_areas

18 broad disease categories used to classify drugs and development programs.

Column	Type	Purpose
`name`	text (unique)	Therapeutic area name (e.g., `oncology`, `malignant_hematology`, `neurology`)
`description`	text	Description of the therapeutic area

Full list: oncology, malignant_hematology, non_malignant_hematology, cardiology, neurology, immunology, respiratory, infectious_disease, metabolic, dermatology, ophthalmology, rare_disease, gastroenterology, rheumatology, endocrinology, nephrology, pulmonology, psychiatry.

ncit_concepts

NCI Thesaurus concepts for standardized biomedical coding. Unlike the fixed-size lookup tables above, this table grows as new concepts are imported.

Column	Type	Purpose
`ncit_code`	text (unique)	NCI Thesaurus code (e.g., C1234). Natural key.
`name`	text	Concept preferred name
`semantic_types`	jsonb	Array of NCI Thesaurus semantic type labels
`description`	text	Concept definition text

concept_mappings

Maps internal Bioloupe entities to external ontology codes for interoperability. Supports cross-referencing drugs, diseases, targets, biomarkers, endpoints, and modes of action against ten external vocabularies.

Column	Type	Purpose
`entity_type`	text	Type of entity being mapped: `drug`, `disease`, `target`, `biomarker`, `endpoint`, `mode_of_action`
`entity_id`	uuid	UUID of the entity in this database
`vocabulary`	text	External vocabulary: `omop`, `icd10`, `mesh`, `snomed`, `rxnorm`, `meddra`, `atc`, `unii`, `chembl`, `drugbank`
`code`	text	Code/identifier in the external vocabulary
`preferred_term`	text	Preferred term as it appears in the external vocabulary

A unique constraint on (entity_type, entity_id, vocabulary, code) prevents duplicate mappings. Composite indexes on (entity_type, entity_id) and (vocabulary, code) support both directions of lookup: “what external codes does this entity have?” and “which entity does this external code refer to?”

Seeding

Lookup tables are populated by idempotent seeders that use upsert logic. Re-running a seeder never creates duplicates — it matches on the natural key (typically name) and updates any changed fields.

Table	Seeded rows
`countries`	198
`endpoints`	39
`modes_of_action`	35
`therapeutic_areas`	18

ncit_concepts and concept_mappings are populated by ETL pipelines rather than seeders, and grow over time.

Design decisions

Decision	Why
`therapeutic_areas` as a lookup table with FK constraints	Replaced a free-text JSONB array on `drugs`. A dedicated table enforces a controlled vocabulary, prevents typos, and makes the set of valid values discoverable without scanning data.
Idempotent upsert seeding	Seeders can run repeatedly during development, CI, and production bootstrapping without risk of duplicates or constraint violations. The natural key (usually `name`) is the match target.
`concept_mappings` as a polymorphic bridge table	A single table handles cross-references for six entity types and ten vocabularies. Adding a new entity type or vocabulary requires no schema change.

Audit

audit_log

Tracks every insert, update, and delete across the database. Populated by PostgreSQL triggers, not application code, so changes from any source (API, ETL pipeline, manual SQL) are captured.

Column	Type	Purpose
`id`	bigint (identity)	Auto-incrementing primary key
`entity_type`	text	Table name that was modified (e.g., `drugs`, `organisations`)
`entity_id`	text	Primary key of the modified row (text to support both UUID and bigint)
`action`	text	`insert`, `update`, `delete`
`changed_fields`	jsonb	Array of column names that changed (update only)
`old_values`	jsonb	Previous column values (update and delete)
`new_values`	jsonb	New column values (insert and update)
`changed_by`	text	User or system identifier
`changed_at`	timestamptz	Event timestamp (uses `clock_timestamp()` for actual wall-clock time)
`application_name`	text	Source application (e.g., `bioloupe-api`, `etl-pipeline`)
`client_addr`	text	Client IP address

Indexes on (entity_type, entity_id), changed_at, and action support common audit queries: “show me the history of this row,” “what changed in the last hour,” and “how many deletes happened today.”

Example queries

Which therapeutic areas are available?

SELECT name, description
FROM therapeutic_areas
ORDER BY name;

Audit trail for a specific drug

SELECT action, changed_fields, old_values, new_values,
       changed_by, changed_at, application_name
FROM audit_log
WHERE entity_type = 'drugs'
  AND entity_id = '550e8400-e29b-41d4-a716-446655440000'
ORDER BY changed_at DESC;