Skip to content

News

Pharma news articles and press releases from wire services, ingested and processed by AI extraction pipelines that identify structured entity mentions. The domain comprises 7 entities: one core news table storing raw articles and six mention tables that link articles to drugs, diseases, organisations, clinical trials, targets, and technologies.

erDiagram
    news ||--o{ news_drug_mentions : "mentions"
    news ||--o{ news_disease_mentions : "mentions"
    news ||--o{ news_org_mentions : "mentions"
    news ||--o{ news_trial_mentions : "mentions"
    news ||--o{ news_target_mentions : "mentions"
    news ||--o{ news_technology_mentions : "mentions"
    news_drug_mentions }o--|| drugs : "drug"
    news_disease_mentions }o--|| diseases : "disease"
    news_org_mentions }o--|| organisations : "organisation"
    news_trial_mentions }o--o| clinical_trials : "trial"
    news_target_mentions }o--|| targets : "target"
    news_technology_mentions }o--|| technologies : "technology"

Raw articles and press releases. One row per article, identified by a unique url or wire service release_id.

ColumnTypePurpose
titletextArticle or press release headline
bodytext (nullable)Full article content
descriptiontext (nullable)Summary or excerpt, separate from the full body
urltext (unique, nullable)Original URL of the news article
release_idtext (unique, nullable)Wire service release identifier (natural key)
wire_sourcetext (nullable)Wire service origin: business_wire, globe_newswire, cision, accesswire, pr_newswire
categoryjsonbArray of news categories: clinical_trial_results, regulatory, partnership, financial, pipeline_update, safety, commercial, corporate
published_attimestamptzPublication date/time
tagsjsonbArray of free-form tags
therapeutic_areasjsonbArray of therapeutic areas mentioned
is_pharmaboolean (nullable)Whether the article is pharma/biotech relevant
embeddingtext (nullable)Vector embedding for semantic search

Indexes: published_at, wire_source, release_id.

Links a news article to a drug mentioned in it. One row per unique (news_id, drug_id) pair. Cascade-deletes when the news article or drug is removed.

ColumnTypePurpose
news_idFKThe news article
drug_idFKThe mentioned drug

Links a news article to a disease mentioned in it. One row per unique (news_id, disease_id) pair.

ColumnTypePurpose
news_idFKThe news article
disease_idFKThe mentioned disease

Links a news article to an organisation mentioned in it. One row per unique (news_id, organisation_id) pair.

ColumnTypePurpose
news_idFKThe news article
organisation_idFKThe mentioned organisation

Links a news article to a clinical trial mentioned in it. Richer than the other mention tables — carries extracted trial context from the article text.

ColumnTypePurpose
news_idFKThe news article
clinical_trial_idFK (nullable)The matched clinical trial. NULL when the trial has not been resolved to a clinical_trials row.
nct_idtext (nullable)ClinicalTrials.gov NCT ID, stored for unmatched trials when clinical_trial_id is NULL
trial_outcometext (nullable)Outcome reported in the news: positive, negative, neutral, unclear
summarytext (nullable)Brief summary of trial context extracted from the article
total_participantsinteger (nullable)Number of participants mentioned
patient_populationtext (nullable)Patient population description from the article

Links a news article to a molecular target mentioned in it. One row per unique (news_id, target_id) pair.

ColumnTypePurpose
news_idFKThe news article
target_idFKThe mentioned target

Links a news article to a drug technology/modality mentioned in it. One row per unique (news_id, technology_id) pair.

ColumnTypePurpose
news_idFKThe news article
technology_idFKThe mentioned technology
DecisionWhyAlternative considered
Separate mention table per entity typeEach mention table carries proper FK constraints to its target entity. Queries are type-safe and indexable without discriminator columns.Polymorphic news_mentions table with mentionable_type/mentionable_id — rejected because it cannot enforce FK constraints and requires casting in every query.
Mention tables are extraction output, news is raw sourceThe news row is the immutable source record. Mention rows are derived data produced by AI extraction pipelines. Re-running extraction replaces mentions without touching the source article.Inline JSONB arrays of entity IDs on the news row — rejected because it prevents JOINs and indexing on the referenced entity.
news_trial_mentions carries extra columnsTrial mentions contain structured intelligence (outcome, participant count, population) that does not apply to other entity types. A shared mention schema would leave these columns NULL for five of six types.Separate news_trial_details table joined to a generic mention — rejected as unnecessary indirection.
clinical_trial_id is nullable on trial mentionsNot every trial referenced in a press release has been registered or ingested into clinical_trials. The nct_id column preserves the identifier for future matching.Require a clinical_trials row before creating the mention — rejected because it would discard intelligence from articles referencing unregistered trials.

Which drugs were mentioned in regulatory news this month?

SELECT DISTINCT d.name, n.title, n.published_at
FROM news n
JOIN news_drug_mentions ndm ON ndm.news_id = n.id
JOIN drugs d ON d.id = ndm.drug_id
WHERE n.category @> '["regulatory"]'
AND n.published_at >= date_trunc('month', CURRENT_DATE)
ORDER BY n.published_at DESC;

Positive trial readouts reported in news, with company and drug

SELECT n.title,
d.name AS drug,
o.name AS company,
ntm.trial_outcome,
ntm.total_participants,
ntm.patient_population,
n.published_at
FROM news_trial_mentions ntm
JOIN news n ON n.id = ntm.news_id
LEFT JOIN news_drug_mentions ndm ON ndm.news_id = n.id
LEFT JOIN drugs d ON d.id = ndm.drug_id
LEFT JOIN news_org_mentions nom ON nom.news_id = n.id
LEFT JOIN organisations o ON o.id = nom.organisation_id
WHERE ntm.trial_outcome = 'positive'
ORDER BY n.published_at DESC
LIMIT 20;

News volume by wire source and category

SELECT wire_source,
jsonb_array_elements_text(category) AS cat,
count(*) AS article_count
FROM news
WHERE published_at >= CURRENT_DATE - INTERVAL '90 days'
AND is_pharma = true
GROUP BY wire_source, cat
ORDER BY article_count DESC;