Skip to content

Lookups

Lookup tables hold reference data that is seeded once and rarely changes. They provide controlled vocabularies, geographic codes, and ontology mappings that other domains reference through foreign keys. The lookups domain contains six reference entities plus one audit entity that tracks changes across the entire database.

198 countries with ISO 3166-1 codes, organized into five geographic regions.

ColumnTypePurpose
nametext (unique)Full country name (e.g., “United States”, “Germany”)
alpha2text (unique)ISO 3166-1 alpha-2 code (e.g., US, DE, JP)
alpha3text (unique)ISO 3166-1 alpha-3 code (e.g., USA, DEU, JPN)
regiontextGeographic region: africa, americas, asia_pacific, europe, middle_east

All three name/code columns carry unique constraints. Region is nullable but populated for every seeded row.

39 clinical trial endpoints grouped into four categories.

ColumnTypePurpose
nametext (unique)Full endpoint name (e.g., “Overall Survival”, “Progression-Free Survival”)
abbreviationtextCommon abbreviation (e.g., OS, PFS, ORR, DOR, CR, DCR)
descriptiontextDefinition of what this endpoint measures
categorytextefficacy, safety, pharmacokinetic, quality_of_life

35 drug mechanisms describing how a compound interacts with its molecular target.

ColumnTypePurpose
nametext (unique)Mechanism name (e.g., “Inhibitor”, “Antagonist”, “Agonist”, “Degrader”)
descriptiontextMechanistic description of the mode of action

Entries range from classical pharmacology (Inhibitor, Agonist, Antagonist) through targeted modalities (Antibody-Drug Conjugate, CAR-T Cell Therapy, Bispecific Engagement) to epigenetic and gene-level interventions (Histone Deacetylase Inhibition, RNA Interference, Gene Therapy). Referenced by drug_target_actions to link drugs to targets with a specific mechanism.

18 broad disease categories used to classify drugs and development programs.

ColumnTypePurpose
nametext (unique)Therapeutic area name (e.g., oncology, malignant_hematology, neurology)
descriptiontextDescription of the therapeutic area

Full list: oncology, malignant_hematology, non_malignant_hematology, cardiology, neurology, immunology, respiratory, infectious_disease, metabolic, dermatology, ophthalmology, rare_disease, gastroenterology, rheumatology, endocrinology, nephrology, pulmonology, psychiatry.

NCI Thesaurus concepts for standardized biomedical coding. Unlike the fixed-size lookup tables above, this table grows as new concepts are imported.

ColumnTypePurpose
ncit_codetext (unique)NCI Thesaurus code (e.g., C1234). Natural key.
nametextConcept preferred name
semantic_typesjsonbArray of NCI Thesaurus semantic type labels
descriptiontextConcept definition text

Maps internal Bioloupe entities to external ontology codes for interoperability. Supports cross-referencing drugs, diseases, targets, biomarkers, endpoints, and modes of action against ten external vocabularies.

ColumnTypePurpose
entity_typetextType of entity being mapped: drug, disease, target, biomarker, endpoint, mode_of_action
entity_iduuidUUID of the entity in this database
vocabularytextExternal vocabulary: omop, icd10, mesh, snomed, rxnorm, meddra, atc, unii, chembl, drugbank
codetextCode/identifier in the external vocabulary
preferred_termtextPreferred term as it appears in the external vocabulary

A unique constraint on (entity_type, entity_id, vocabulary, code) prevents duplicate mappings. Composite indexes on (entity_type, entity_id) and (vocabulary, code) support both directions of lookup: “what external codes does this entity have?” and “which entity does this external code refer to?”

Lookup tables are populated by idempotent seeders that use upsert logic. Re-running a seeder never creates duplicates — it matches on the natural key (typically name) and updates any changed fields.

TableSeeded rows
countries198
endpoints39
modes_of_action35
therapeutic_areas18

ncit_concepts and concept_mappings are populated by ETL pipelines rather than seeders, and grow over time.

DecisionWhy
therapeutic_areas as a lookup table with FK constraintsReplaced a free-text JSONB array on drugs. A dedicated table enforces a controlled vocabulary, prevents typos, and makes the set of valid values discoverable without scanning data.
Idempotent upsert seedingSeeders can run repeatedly during development, CI, and production bootstrapping without risk of duplicates or constraint violations. The natural key (usually name) is the match target.
concept_mappings as a polymorphic bridge tableA single table handles cross-references for six entity types and ten vocabularies. Adding a new entity type or vocabulary requires no schema change.

Tracks every insert, update, and delete across the database. Populated by PostgreSQL triggers, not application code, so changes from any source (API, ETL pipeline, manual SQL) are captured.

ColumnTypePurpose
idbigint (identity)Auto-incrementing primary key
entity_typetextTable name that was modified (e.g., drugs, organisations)
entity_idtextPrimary key of the modified row (text to support both UUID and bigint)
actiontextinsert, update, delete
changed_fieldsjsonbArray of column names that changed (update only)
old_valuesjsonbPrevious column values (update and delete)
new_valuesjsonbNew column values (insert and update)
changed_bytextUser or system identifier
changed_attimestamptzEvent timestamp (uses clock_timestamp() for actual wall-clock time)
application_nametextSource application (e.g., bioloupe-api, etl-pipeline)
client_addrtextClient IP address

Indexes on (entity_type, entity_id), changed_at, and action support common audit queries: “show me the history of this row,” “what changed in the last hour,” and “how many deletes happened today.”

Which therapeutic areas are available?

SELECT name, description
FROM therapeutic_areas
ORDER BY name;

Audit trail for a specific drug

SELECT action, changed_fields, old_values, new_values,
changed_by, changed_at, application_name
FROM audit_log
WHERE entity_type = 'drugs'
AND entity_id = '550e8400-e29b-41d4-a716-446655440000'
ORDER BY changed_at DESC;