News
Pharma news articles and press releases from wire services, ingested and processed by AI extraction pipelines that identify structured entity mentions. The domain comprises 7 entities: one core news table storing raw articles and six mention tables that link articles to drugs, diseases, organisations, clinical trials, targets, and technologies.
Entities
Section titled “Entities”erDiagram
news ||--o{ news_drug_mentions : "mentions"
news ||--o{ news_disease_mentions : "mentions"
news ||--o{ news_org_mentions : "mentions"
news ||--o{ news_trial_mentions : "mentions"
news ||--o{ news_target_mentions : "mentions"
news ||--o{ news_technology_mentions : "mentions"
news_drug_mentions }o--|| drugs : "drug"
news_disease_mentions }o--|| diseases : "disease"
news_org_mentions }o--|| organisations : "organisation"
news_trial_mentions }o--o| clinical_trials : "trial"
news_target_mentions }o--|| targets : "target"
news_technology_mentions }o--|| technologies : "technology"
Raw articles and press releases. One row per article, identified by a unique url or wire service release_id.
| Column | Type | Purpose |
|---|---|---|
title | text | Article or press release headline |
body | text (nullable) | Full article content |
description | text (nullable) | Summary or excerpt, separate from the full body |
url | text (unique, nullable) | Original URL of the news article |
release_id | text (unique, nullable) | Wire service release identifier (natural key) |
wire_source | text (nullable) | Wire service origin: business_wire, globe_newswire, cision, accesswire, pr_newswire |
category | jsonb | Array of news categories: clinical_trial_results, regulatory, partnership, financial, pipeline_update, safety, commercial, corporate |
published_at | timestamptz | Publication date/time |
tags | jsonb | Array of free-form tags |
therapeutic_areas | jsonb | Array of therapeutic areas mentioned |
is_pharma | boolean (nullable) | Whether the article is pharma/biotech relevant |
embedding | text (nullable) | Vector embedding for semantic search |
Indexes: published_at, wire_source, release_id.
news_drug_mentions
Section titled “news_drug_mentions”Links a news article to a drug mentioned in it. One row per unique (news_id, drug_id) pair. Cascade-deletes when the news article or drug is removed.
| Column | Type | Purpose |
|---|---|---|
news_id | FK | The news article |
drug_id | FK | The mentioned drug |
news_disease_mentions
Section titled “news_disease_mentions”Links a news article to a disease mentioned in it. One row per unique (news_id, disease_id) pair.
| Column | Type | Purpose |
|---|---|---|
news_id | FK | The news article |
disease_id | FK | The mentioned disease |
news_org_mentions
Section titled “news_org_mentions”Links a news article to an organisation mentioned in it. One row per unique (news_id, organisation_id) pair.
| Column | Type | Purpose |
|---|---|---|
news_id | FK | The news article |
organisation_id | FK | The mentioned organisation |
news_trial_mentions
Section titled “news_trial_mentions”Links a news article to a clinical trial mentioned in it. Richer than the other mention tables — carries extracted trial context from the article text.
| Column | Type | Purpose |
|---|---|---|
news_id | FK | The news article |
clinical_trial_id | FK (nullable) | The matched clinical trial. NULL when the trial has not been resolved to a clinical_trials row. |
nct_id | text (nullable) | ClinicalTrials.gov NCT ID, stored for unmatched trials when clinical_trial_id is NULL |
trial_outcome | text (nullable) | Outcome reported in the news: positive, negative, neutral, unclear |
summary | text (nullable) | Brief summary of trial context extracted from the article |
total_participants | integer (nullable) | Number of participants mentioned |
patient_population | text (nullable) | Patient population description from the article |
news_target_mentions
Section titled “news_target_mentions”Links a news article to a molecular target mentioned in it. One row per unique (news_id, target_id) pair.
| Column | Type | Purpose |
|---|---|---|
news_id | FK | The news article |
target_id | FK | The mentioned target |
news_technology_mentions
Section titled “news_technology_mentions”Links a news article to a drug technology/modality mentioned in it. One row per unique (news_id, technology_id) pair.
| Column | Type | Purpose |
|---|---|---|
news_id | FK | The news article |
technology_id | FK | The mentioned technology |
Design decisions
Section titled “Design decisions”| Decision | Why | Alternative considered |
|---|---|---|
| Separate mention table per entity type | Each mention table carries proper FK constraints to its target entity. Queries are type-safe and indexable without discriminator columns. | Polymorphic news_mentions table with mentionable_type/mentionable_id — rejected because it cannot enforce FK constraints and requires casting in every query. |
Mention tables are extraction output, news is raw source | The news row is the immutable source record. Mention rows are derived data produced by AI extraction pipelines. Re-running extraction replaces mentions without touching the source article. | Inline JSONB arrays of entity IDs on the news row — rejected because it prevents JOINs and indexing on the referenced entity. |
news_trial_mentions carries extra columns | Trial mentions contain structured intelligence (outcome, participant count, population) that does not apply to other entity types. A shared mention schema would leave these columns NULL for five of six types. | Separate news_trial_details table joined to a generic mention — rejected as unnecessary indirection. |
clinical_trial_id is nullable on trial mentions | Not every trial referenced in a press release has been registered or ingested into clinical_trials. The nct_id column preserves the identifier for future matching. | Require a clinical_trials row before creating the mention — rejected because it would discard intelligence from articles referencing unregistered trials. |
Example queries
Section titled “Example queries”Which drugs were mentioned in regulatory news this month?
SELECT DISTINCT d.name, n.title, n.published_atFROM news nJOIN news_drug_mentions ndm ON ndm.news_id = n.idJOIN drugs d ON d.id = ndm.drug_idWHERE n.category @> '["regulatory"]' AND n.published_at >= date_trunc('month', CURRENT_DATE)ORDER BY n.published_at DESC;Positive trial readouts reported in news, with company and drug
SELECT n.title, d.name AS drug, o.name AS company, ntm.trial_outcome, ntm.total_participants, ntm.patient_population, n.published_atFROM news_trial_mentions ntmJOIN news n ON n.id = ntm.news_idLEFT JOIN news_drug_mentions ndm ON ndm.news_id = n.idLEFT JOIN drugs d ON d.id = ndm.drug_idLEFT JOIN news_org_mentions nom ON nom.news_id = n.idLEFT JOIN organisations o ON o.id = nom.organisation_idWHERE ntm.trial_outcome = 'positive'ORDER BY n.published_at DESCLIMIT 20;News volume by wire source and category
SELECT wire_source, jsonb_array_elements_text(category) AS cat, count(*) AS article_countFROM newsWHERE published_at >= CURRENT_DATE - INTERVAL '90 days' AND is_pharma = trueGROUP BY wire_source, catORDER BY article_count DESC;