News

Pharma news articles and press releases from wire services, ingested and processed by AI extraction pipelines that identify structured entity mentions. The domain comprises 7 entities: one core news table storing raw articles and six mention tables that link articles to drugs, diseases, organisations, clinical trials, targets, and technologies.

Entities

erDiagram
    news ||--o{ news_drug_mentions : "mentions"
    news ||--o{ news_disease_mentions : "mentions"
    news ||--o{ news_org_mentions : "mentions"
    news ||--o{ news_trial_mentions : "mentions"
    news ||--o{ news_target_mentions : "mentions"
    news ||--o{ news_technology_mentions : "mentions"
    news_drug_mentions }o--|| drugs : "drug"
    news_disease_mentions }o--|| diseases : "disease"
    news_org_mentions }o--|| organisations : "organisation"
    news_trial_mentions }o--o| clinical_trials : "trial"
    news_target_mentions }o--|| targets : "target"
    news_technology_mentions }o--|| technologies : "technology"

news

Raw articles and press releases. One row per article, identified by a unique url or wire service release_id.

Column	Type	Purpose
`title`	text	Article or press release headline
`body`	text (nullable)	Full article content
`description`	text (nullable)	Summary or excerpt, separate from the full body
`url`	text (unique, nullable)	Original URL of the news article
`release_id`	text (unique, nullable)	Wire service release identifier (natural key)
`wire_source`	text (nullable)	Wire service origin: `business_wire`, `globe_newswire`, `cision`, `accesswire`, `pr_newswire`
`category`	jsonb	Array of news categories: `clinical_trial_results`, `regulatory`, `partnership`, `financial`, `pipeline_update`, `safety`, `commercial`, `corporate`
`published_at`	timestamptz	Publication date/time
`tags`	jsonb	Array of free-form tags
`therapeutic_areas`	jsonb	Array of therapeutic areas mentioned
`is_pharma`	boolean (nullable)	Whether the article is pharma/biotech relevant
`embedding`	text (nullable)	Vector embedding for semantic search

Indexes: published_at, wire_source, release_id.

news_drug_mentions

Links a news article to a drug mentioned in it. One row per unique (news_id, drug_id) pair. Cascade-deletes when the news article or drug is removed.

Column	Type	Purpose
`news_id`	FK	The news article
`drug_id`	FK	The mentioned drug

news_disease_mentions

Links a news article to a disease mentioned in it. One row per unique (news_id, disease_id) pair.

Column	Type	Purpose
`news_id`	FK	The news article
`disease_id`	FK	The mentioned disease

news_org_mentions

Links a news article to an organisation mentioned in it. One row per unique (news_id, organisation_id) pair.

Column	Type	Purpose
`news_id`	FK	The news article
`organisation_id`	FK	The mentioned organisation

news_trial_mentions

Links a news article to a clinical trial mentioned in it. Richer than the other mention tables — carries extracted trial context from the article text.

Column	Type	Purpose
`news_id`	FK	The news article
`clinical_trial_id`	FK (nullable)	The matched clinical trial. NULL when the trial has not been resolved to a `clinical_trials` row.
`nct_id`	text (nullable)	ClinicalTrials.gov NCT ID, stored for unmatched trials when `clinical_trial_id` is NULL
`trial_outcome`	text (nullable)	Outcome reported in the news: `positive`, `negative`, `neutral`, `unclear`
`summary`	text (nullable)	Brief summary of trial context extracted from the article
`total_participants`	integer (nullable)	Number of participants mentioned
`patient_population`	text (nullable)	Patient population description from the article

news_target_mentions

Links a news article to a molecular target mentioned in it. One row per unique (news_id, target_id) pair.

Column	Type	Purpose
`news_id`	FK	The news article
`target_id`	FK	The mentioned target

news_technology_mentions

Links a news article to a drug technology/modality mentioned in it. One row per unique (news_id, technology_id) pair.

Column	Type	Purpose
`news_id`	FK	The news article
`technology_id`	FK	The mentioned technology

Design decisions

Decision	Why	Alternative considered
Separate mention table per entity type	Each mention table carries proper FK constraints to its target entity. Queries are type-safe and indexable without discriminator columns.	Polymorphic `news_mentions` table with `mentionable_type`/`mentionable_id` — rejected because it cannot enforce FK constraints and requires casting in every query.
Mention tables are extraction output, `news` is raw source	The `news` row is the immutable source record. Mention rows are derived data produced by AI extraction pipelines. Re-running extraction replaces mentions without touching the source article.	Inline JSONB arrays of entity IDs on the `news` row — rejected because it prevents JOINs and indexing on the referenced entity.
`news_trial_mentions` carries extra columns	Trial mentions contain structured intelligence (outcome, participant count, population) that does not apply to other entity types. A shared mention schema would leave these columns NULL for five of six types.	Separate `news_trial_details` table joined to a generic mention — rejected as unnecessary indirection.
`clinical_trial_id` is nullable on trial mentions	Not every trial referenced in a press release has been registered or ingested into `clinical_trials`. The `nct_id` column preserves the identifier for future matching.	Require a `clinical_trials` row before creating the mention — rejected because it would discard intelligence from articles referencing unregistered trials.

Example queries

Which drugs were mentioned in regulatory news this month?

SELECT DISTINCT d.name, n.title, n.published_at
FROM news n
JOIN news_drug_mentions ndm ON ndm.news_id = n.id
JOIN drugs d ON d.id = ndm.drug_id
WHERE n.category @> '["regulatory"]'
  AND n.published_at >= date_trunc('month', CURRENT_DATE)
ORDER BY n.published_at DESC;

Positive trial readouts reported in news, with company and drug

SELECT n.title,
       d.name   AS drug,
       o.name   AS company,
       ntm.trial_outcome,
       ntm.total_participants,
       ntm.patient_population,
       n.published_at
FROM news_trial_mentions ntm
JOIN news n ON n.id = ntm.news_id
LEFT JOIN news_drug_mentions ndm ON ndm.news_id = n.id
LEFT JOIN drugs d ON d.id = ndm.drug_id
LEFT JOIN news_org_mentions nom ON nom.news_id = n.id
LEFT JOIN organisations o ON o.id = nom.organisation_id
WHERE ntm.trial_outcome = 'positive'
ORDER BY n.published_at DESC
LIMIT 20;

News volume by wire source and category

SELECT wire_source,
       jsonb_array_elements_text(category) AS cat,
       count(*) AS article_count
FROM news
WHERE published_at >= CURRENT_DATE - INTERVAL '90 days'
  AND is_pharma = true
GROUP BY wire_source, cat
ORDER BY article_count DESC;