AI Agent Querying
This guide is for the AI agent (and humans) writing text-to-SQL queries against bioloupe-data. It covers how to discover the schema, which materialized views to prefer, common query patterns, and pitfalls to avoid.
How the agent discovers the schema
Section titled “How the agent discovers the schema”Every table and column in the database carries a COMMENT ON annotation. These comments describe:
- What the column means — e.g., “INN or common name of the drug (e.g., pembrolizumab, trastuzumab).”
- What values it takes — e.g., “Values: ‘marketed’, ‘in_development’, ‘discontinued’, ‘withdrawn’, ‘suspended’.”
- Foreign key relationships — e.g., “FK to drugs.id.”
- Denormalization notes — e.g., “Denormalized drug INN name for zero-join queries.”
The agent reads these comments to understand the schema without external documentation. To inspect them:
-- Table commentSELECT obj_description('drugs'::regclass);
-- All column comments for a tableSELECT column_name, col_description('drugs'::regclass, ordinal_position)FROM information_schema.columnsWHERE table_name = 'drugs'ORDER BY ordinal_position;Query strategy: matviews first
Section titled “Query strategy: matviews first”Five materialized views pre-compute the most common joins. Always check whether a matview answers the question before going to base tables.
flowchart TD
Q["Incoming question"] --> A{"Can a single matview answer it?"}
A -- Yes --> MV["Query that matview (zero JOINs)"]
A -- No --> B{"Does it need 2+ matviews?"}
B -- Yes --> MV2["JOIN matviews together"]
B -- No --> BT["Go to base tables"]
Matview cheat sheet
Section titled “Matview cheat sheet”| Question pattern | Use this | Key columns |
|---|---|---|
| Drug profile / overview | mv_drug_pipeline | drug_name, drug_status, drug_type, primary_target_name, target_names, current_owner_names, highest_phase_overall |
| Drug x disease phase | mv_development_programs | drug_name, disease_name, highest_phase, highest_phase_rank, phase_source, nct_ids, approval_date |
| Competitive landscape | mv_competitive_landscape | drug_name, disease_name, highest_phase, drug_type, technology_name, drug_class, current_owner_names |
| Trial involvement | mv_drug_trials | drug_id, clinical_trial_id, role |
| Efficacy data | mv_efficacy_evidence | endpoint_name, endpoint_type, measure_value, p_value, hazard_ratio, nct_id, source_type |
Phase values and sort order
Section titled “Phase values and sort order”All phase columns use the same enumeration and ranking:
| Phase value | Rank |
|---|---|
approved | 100 |
phase4 | 4 |
phase3 | 3 |
phase2_3 | 3 |
phase2 | 2 |
phase1_2 | 2 |
phase1 | 1 |
early_phase1 | 1 |
Always sort by highest_phase_rank (integer), not by highest_phase (text).
Drug trial roles
Section titled “Drug trial roles”mv_drug_trials.role classifies how the drug participates in a trial:
| Role | Meaning |
|---|---|
investigational | The drug being studied |
combination_partner | Combined with the investigational drug |
active_comparator | Control arm, active drug |
placebo_comparator | Control arm, placebo |
sham_comparator | Control arm, sham procedure |
other | None of the above |
Common query patterns
Section titled “Common query patterns”1. List all Phase 3 drugs for breast cancer with their owners
Section titled “1. List all Phase 3 drugs for breast cancer with their owners”SELECT drug_name, current_owner_names, highest_phaseFROM mv_competitive_landscapeWHERE disease_name ILIKE '%breast cancer%' AND highest_phase = 'phase3'ORDER BY drug_name;Uses mv_competitive_landscape because it pre-joins drug, disease, phase, and owner data in a single row.
2. What targets does pembrolizumab act on?
Section titled “2. What targets does pembrolizumab act on?”SELECT drug_name, target_names, primary_target_nameFROM mv_drug_pipelineWHERE drug_name ILIKE '%pembrolizumab%';mv_drug_pipeline aggregates all targets into an array, so no joins to drug_target_actions or targets are needed.
3. What clinical trials are recruiting for NSCLC?
Section titled “3. What clinical trials are recruiting for NSCLC?”SELECT ct.nct_id, ct.brief_title, ct.phase, ct.enrollmentFROM mv_development_programs dpJOIN clinical_trials ct ON ct.id = ANY( SELECT unnest(dp2.nct_ids::text[])::uuid FROM mv_development_programs dp2 WHERE dp2.disease_name ILIKE '%non-small cell lung%')WHERE ct.overall_status = 'recruiting';Alternative approach using mv_drug_trials when you need trial details with drug context:
SELECT DISTINCT ct.nct_id, ct.brief_title, ct.phase, ct.overall_statusFROM mv_drug_trials dtJOIN clinical_trials ct ON ct.id = dt.clinical_trial_idJOIN trial_diseases td ON td.clinical_trial_id = ct.idJOIN diseases dis ON dis.id = td.disease_idWHERE dis.name ILIKE '%non-small cell lung%' AND ct.overall_status = 'recruiting'ORDER BY ct.phase DESC;4. Show me recent deals involving Pfizer
Section titled “4. Show me recent deals involving Pfizer”Deals live in base tables since no matview covers them:
SELECT oh.deal_type, oh.headline, oh.total_value, oh.announcement_dateFROM organisation_histories ohJOIN organisation_history_participants ohp ON ohp.organisation_history_id = oh.idJOIN organisations o ON o.id = ohp.organisation_idWHERE o.name ILIKE '%Pfizer%'ORDER BY oh.announcement_date DESC NULLS LASTLIMIT 20;5. Compare efficacy of two drugs in melanoma
Section titled “5. Compare efficacy of two drugs in melanoma”SELECT d.name AS drug_name, ee.endpoint_name, ee.measure_value, ee.unit, ee.p_value, ee.hazard_ratio, ee.source_typeFROM mv_efficacy_evidence eeJOIN mv_drug_trials dt ON dt.clinical_trial_id = ee.clinical_trial_idJOIN drugs d ON d.id = dt.drug_idJOIN trial_diseases td ON td.clinical_trial_id = ee.clinical_trial_idJOIN diseases dis ON dis.id = td.disease_idWHERE d.name IN ('pembrolizumab', 'nivolumab') AND dis.name ILIKE '%melanoma%' AND ee.endpoint_type = 'primary'ORDER BY d.name, ee.endpoint_name;6. Which drugs were approved in the last 6 months?
Section titled “6. Which drugs were approved in the last 6 months?”SELECT drug_name, disease_name, approval_dateFROM mv_development_programsWHERE phase_source = 'approval' AND approval_date >= NOW() - INTERVAL '6 months'ORDER BY approval_date DESC;7. What biomarkers are relevant for lung cancer?
Section titled “7. What biomarkers are relevant for lung cancer?”Biomarkers live in base tables:
SELECT b.name AS biomarker, db.prevalence, db.potential_valuesFROM disease_biomarkers dbJOIN biomarkers b ON b.id = db.biomarker_idJOIN diseases dis ON dis.id = db.disease_idWHERE dis.name ILIKE '%lung cancer%'ORDER BY b.name;8. Full drug profile with pipeline status across all diseases
Section titled “8. Full drug profile with pipeline status across all diseases”SELECT p.drug_name, p.drug_status, p.drug_type, p.primary_target_name, p.current_owner_names, dp.disease_name, dp.highest_phase, dp.phase_sourceFROM mv_drug_pipeline pLEFT JOIN mv_development_programs dp ON dp.drug_id = p.drug_idWHERE p.drug_name ILIKE '%trastuzumab%'ORDER BY dp.highest_phase_rank DESC;This joins two matviews: mv_drug_pipeline for the drug profile and mv_development_programs for per-disease phase data.
Anti-patterns
Section titled “Anti-patterns”Do not query base tables when a matview exists. If you need a drug’s targets, owners, and phase, use mv_drug_pipeline instead of joining drugs, drug_target_actions, targets, drug_ownerships, organisations, and mv_development_programs yourself.
Do not JOIN 4+ base tables when a matview pre-computes the answer. The matviews exist specifically to flatten complex joins into single-table queries.
Do not use SELECT *. Always select specific columns. Matviews contain array columns (target_names, current_owner_names, nct_ids) that are expensive to serialize unnecessarily.
Do not assume drug_status is computed. It is a curated value on the drugs table, not derived from approvals or trials. Values: marketed, in_development, discontinued, withdrawn, suspended.
Do not query drugs for development phase. The drugs table has no phase column. Development phase per disease lives in mv_development_programs. The highest phase across all diseases lives in mv_drug_pipeline.highest_phase_overall.
Do not sort phases alphabetically. Use highest_phase_rank (integer) for ordering. Alphabetical sorting puts approved before phase1, but early_phase1 before phase2.
Do not forget ILIKE for name searches. Drug and disease names vary in casing across sources. Use ILIKE with % wildcards for resilient matching.
Schema metadata
Section titled “Schema metadata”COMMENT ON TABLE and COMMENT ON COLUMN are the agent’s primary schema discovery mechanism. Every table and column in bioloupe-data carries a comment that follows consistent patterns:
- Column purpose: what the column stores (“Drug INN name.”)
- Value enumeration: valid values for constrained columns (“Values: ‘marketed’, ‘in_development’, …”)
- FK references: which table and column a foreign key points to (“FK to drugs.id.”)
- Denormalization flags: when a column duplicates data from another table for query convenience (“Denormalized drug INN name for zero-join queries.”)
- Examples: sample values where helpful (“Example: NCT04380636”)
When encountering an unfamiliar table, the agent should read the table comment first (it summarizes the table’s role and row granularity), then read column comments to understand available filters and joins.