Skip to content

AI Agent Querying

This guide is for the AI agent (and humans) writing text-to-SQL queries against bioloupe-data. It covers how to discover the schema, which materialized views to prefer, common query patterns, and pitfalls to avoid.

Every table and column in the database carries a COMMENT ON annotation. These comments describe:

  • What the column means — e.g., “INN or common name of the drug (e.g., pembrolizumab, trastuzumab).”
  • What values it takes — e.g., “Values: ‘marketed’, ‘in_development’, ‘discontinued’, ‘withdrawn’, ‘suspended’.”
  • Foreign key relationships — e.g., “FK to drugs.id.”
  • Denormalization notes — e.g., “Denormalized drug INN name for zero-join queries.”

The agent reads these comments to understand the schema without external documentation. To inspect them:

-- Table comment
SELECT obj_description('drugs'::regclass);
-- All column comments for a table
SELECT column_name, col_description('drugs'::regclass, ordinal_position)
FROM information_schema.columns
WHERE table_name = 'drugs'
ORDER BY ordinal_position;

Five materialized views pre-compute the most common joins. Always check whether a matview answers the question before going to base tables.

flowchart TD
    Q["Incoming question"] --> A{"Can a single matview answer it?"}
    A -- Yes --> MV["Query that matview (zero JOINs)"]
    A -- No --> B{"Does it need 2+ matviews?"}
    B -- Yes --> MV2["JOIN matviews together"]
    B -- No --> BT["Go to base tables"]
Question patternUse thisKey columns
Drug profile / overviewmv_drug_pipelinedrug_name, drug_status, drug_type, primary_target_name, target_names, current_owner_names, highest_phase_overall
Drug x disease phasemv_development_programsdrug_name, disease_name, highest_phase, highest_phase_rank, phase_source, nct_ids, approval_date
Competitive landscapemv_competitive_landscapedrug_name, disease_name, highest_phase, drug_type, technology_name, drug_class, current_owner_names
Trial involvementmv_drug_trialsdrug_id, clinical_trial_id, role
Efficacy datamv_efficacy_evidenceendpoint_name, endpoint_type, measure_value, p_value, hazard_ratio, nct_id, source_type

All phase columns use the same enumeration and ranking:

Phase valueRank
approved100
phase44
phase33
phase2_33
phase22
phase1_22
phase11
early_phase11

Always sort by highest_phase_rank (integer), not by highest_phase (text).

mv_drug_trials.role classifies how the drug participates in a trial:

RoleMeaning
investigationalThe drug being studied
combination_partnerCombined with the investigational drug
active_comparatorControl arm, active drug
placebo_comparatorControl arm, placebo
sham_comparatorControl arm, sham procedure
otherNone of the above

1. List all Phase 3 drugs for breast cancer with their owners

Section titled “1. List all Phase 3 drugs for breast cancer with their owners”
SELECT drug_name, current_owner_names, highest_phase
FROM mv_competitive_landscape
WHERE disease_name ILIKE '%breast cancer%'
AND highest_phase = 'phase3'
ORDER BY drug_name;

Uses mv_competitive_landscape because it pre-joins drug, disease, phase, and owner data in a single row.

2. What targets does pembrolizumab act on?

Section titled “2. What targets does pembrolizumab act on?”
SELECT drug_name, target_names, primary_target_name
FROM mv_drug_pipeline
WHERE drug_name ILIKE '%pembrolizumab%';

mv_drug_pipeline aggregates all targets into an array, so no joins to drug_target_actions or targets are needed.

3. What clinical trials are recruiting for NSCLC?

Section titled “3. What clinical trials are recruiting for NSCLC?”
SELECT ct.nct_id, ct.brief_title, ct.phase, ct.enrollment
FROM mv_development_programs dp
JOIN clinical_trials ct ON ct.id = ANY(
SELECT unnest(dp2.nct_ids::text[])::uuid
FROM mv_development_programs dp2
WHERE dp2.disease_name ILIKE '%non-small cell lung%'
)
WHERE ct.overall_status = 'recruiting';

Alternative approach using mv_drug_trials when you need trial details with drug context:

SELECT DISTINCT ct.nct_id, ct.brief_title, ct.phase, ct.overall_status
FROM mv_drug_trials dt
JOIN clinical_trials ct ON ct.id = dt.clinical_trial_id
JOIN trial_diseases td ON td.clinical_trial_id = ct.id
JOIN diseases dis ON dis.id = td.disease_id
WHERE dis.name ILIKE '%non-small cell lung%'
AND ct.overall_status = 'recruiting'
ORDER BY ct.phase DESC;

Deals live in base tables since no matview covers them:

SELECT oh.deal_type, oh.headline, oh.total_value, oh.announcement_date
FROM organisation_histories oh
JOIN organisation_history_participants ohp ON ohp.organisation_history_id = oh.id
JOIN organisations o ON o.id = ohp.organisation_id
WHERE o.name ILIKE '%Pfizer%'
ORDER BY oh.announcement_date DESC NULLS LAST
LIMIT 20;

5. Compare efficacy of two drugs in melanoma

Section titled “5. Compare efficacy of two drugs in melanoma”
SELECT
d.name AS drug_name,
ee.endpoint_name,
ee.measure_value,
ee.unit,
ee.p_value,
ee.hazard_ratio,
ee.source_type
FROM mv_efficacy_evidence ee
JOIN mv_drug_trials dt ON dt.clinical_trial_id = ee.clinical_trial_id
JOIN drugs d ON d.id = dt.drug_id
JOIN trial_diseases td ON td.clinical_trial_id = ee.clinical_trial_id
JOIN diseases dis ON dis.id = td.disease_id
WHERE d.name IN ('pembrolizumab', 'nivolumab')
AND dis.name ILIKE '%melanoma%'
AND ee.endpoint_type = 'primary'
ORDER BY d.name, ee.endpoint_name;

6. Which drugs were approved in the last 6 months?

Section titled “6. Which drugs were approved in the last 6 months?”
SELECT drug_name, disease_name, approval_date
FROM mv_development_programs
WHERE phase_source = 'approval'
AND approval_date >= NOW() - INTERVAL '6 months'
ORDER BY approval_date DESC;

7. What biomarkers are relevant for lung cancer?

Section titled “7. What biomarkers are relevant for lung cancer?”

Biomarkers live in base tables:

SELECT b.name AS biomarker, db.prevalence, db.potential_values
FROM disease_biomarkers db
JOIN biomarkers b ON b.id = db.biomarker_id
JOIN diseases dis ON dis.id = db.disease_id
WHERE dis.name ILIKE '%lung cancer%'
ORDER BY b.name;

8. Full drug profile with pipeline status across all diseases

Section titled “8. Full drug profile with pipeline status across all diseases”
SELECT
p.drug_name,
p.drug_status,
p.drug_type,
p.primary_target_name,
p.current_owner_names,
dp.disease_name,
dp.highest_phase,
dp.phase_source
FROM mv_drug_pipeline p
LEFT JOIN mv_development_programs dp ON dp.drug_id = p.drug_id
WHERE p.drug_name ILIKE '%trastuzumab%'
ORDER BY dp.highest_phase_rank DESC;

This joins two matviews: mv_drug_pipeline for the drug profile and mv_development_programs for per-disease phase data.

Do not query base tables when a matview exists. If you need a drug’s targets, owners, and phase, use mv_drug_pipeline instead of joining drugs, drug_target_actions, targets, drug_ownerships, organisations, and mv_development_programs yourself.

Do not JOIN 4+ base tables when a matview pre-computes the answer. The matviews exist specifically to flatten complex joins into single-table queries.

Do not use SELECT *. Always select specific columns. Matviews contain array columns (target_names, current_owner_names, nct_ids) that are expensive to serialize unnecessarily.

Do not assume drug_status is computed. It is a curated value on the drugs table, not derived from approvals or trials. Values: marketed, in_development, discontinued, withdrawn, suspended.

Do not query drugs for development phase. The drugs table has no phase column. Development phase per disease lives in mv_development_programs. The highest phase across all diseases lives in mv_drug_pipeline.highest_phase_overall.

Do not sort phases alphabetically. Use highest_phase_rank (integer) for ordering. Alphabetical sorting puts approved before phase1, but early_phase1 before phase2.

Do not forget ILIKE for name searches. Drug and disease names vary in casing across sources. Use ILIKE with % wildcards for resilient matching.

COMMENT ON TABLE and COMMENT ON COLUMN are the agent’s primary schema discovery mechanism. Every table and column in bioloupe-data carries a comment that follows consistent patterns:

  • Column purpose: what the column stores (“Drug INN name.”)
  • Value enumeration: valid values for constrained columns (“Values: ‘marketed’, ‘in_development’, …”)
  • FK references: which table and column a foreign key points to (“FK to drugs.id.”)
  • Denormalization flags: when a column duplicates data from another table for query convenience (“Denormalized drug INN name for zero-join queries.”)
  • Examples: sample values where helpful (“Example: NCT04380636”)

When encountering an unfamiliar table, the agent should read the table comment first (it summarizes the table’s role and row granularity), then read column comments to understand available filters and joins.