Standard of Care pipeline
The Standard of Care pipeline turns FDA oncology and hematology approvals into disease-specific treatment entries with linked label evidence. It starts from approved US DrugApproval records, structures their FDA indication language into Indication and IndicatedTherapeuticApproach records, extracts trial efficacy and safety data from the label, then matches the right label trials and measurements back to each treatment scenario.
This page covers the automated FDA-label driven process only. It intentionally does not document the semi-manual or manual guideline entry workflows.
The core idea
Section titled “The core idea”A single FDA label can contain many approved uses. Each use can contain multiple treatment approaches: monotherapy, combination therapy, different treatment lines, or explicitly distinct phases such as induction and consolidation.
The Standard of Care process normalizes that into this shape:
flowchart LR DA["DrugApproval\nFDA / Purple Book approval"] --> IND["Indication\nstructured approved use"] IND --> ITA["IndicatedTherapeuticApproach\none regimen in one context"] ITA --> GL["Guideline\nSOC display entry"] DA --> LT["LabelTrial\ntrial from FDA label"] ITA --> ALT["ApproachLabelTrial\napproach-to-trial match"] LT --> ALT LT --> TOM["TrialOutcomeMeasure\nendpoint result"] TOM --> TAO["TrialArmOutcome\narm-level value"] ALT --> KTAO["KeyTrialArmOutcome\nheadline evidence"] TAO --> KTAO
Guideline is the disease-facing Standard of Care entry. For automated records it is not a guideline-source import; it is a normalized label-derived SOC row linked to an FDA indication and one therapeutic approach. Guidelines marked with standard_of_care = true appear in the final SoC table and are what we actually consider in analytical steps.
End-to-end flow
Section titled “End-to-end flow”The workflow is defined in StandardOfCareWorkflow.
-
standard_of_care:populate_guidelines --mode=syncCreates or syncsGuidelinerows from reviewed, approved USA indications for diseases marked as standard-of-care. -
regulatory:trials:extract_label_trialsReads FDA label Clinical Studies sections and writes structured trial overviews intodrug_approvals.label['structured']['clinical_trials']. -
regulatory:trials:extract_trials_endpoints,extract_trials_subgroups,extract_trials_resultsExtracts endpoint definitions, patient/subgroup structure, and result values from each label trial. -
regulatory:trials:scout_trial_nctidsSearches for missing registry IDs when the label only names a study. -
regulatory:trials:match_trial_arms,classify_endpoint_domains,match_endpointsMatches extracted result arms to database study plan arms, classifies endpoint domains, and links extracted endpoints to canonicalEndpointrecords. -
regulatory:trials:update_ae_sections,segment_ae_sections,extract_ae_reports,match_ae_arms,extract_ae_detailsExtracts adverse event data from the FDA label Adverse Reactions section and ties AE tables to the relevant label trials and arms. -
regulatory:trials:match_approach_trialsDeterministically matches structured therapeutic approaches to label trials using approval-study identifiers when available. -
regulatory:trials:match_approach_trials_llmUses an LLM fallback for approach-to-trial matching when deterministic identifiers are not enough. -
regulatory:trials:post_process_label_trialsMaterializes label trial JSON into first-classLabelTrial,TrialEndpoint,TrialSubgroup,TrialOutcomeMeasure,TrialArmOutcomeandAdverseEventrows. -
regulatory:trials:match_approach_trial_armsSelects the investigational study plan arm that best represents the exact approach/trial pair. -
regulatory:trials:determine_key_label_trial_resultsSelects the key arm-level results for each approach/trial pair. -
standard_of_care:match_segments_llmMatches indication language to disease base segments and biomarker segments. -
standard_of_care:post_process_guidelines --entry-types=drug_approvalApplies the matched segment data toGuidelineentries and marks completed SOC rows.
FDA approvals and labels
Section titled “FDA approvals and labels”The upstream regulatory pipeline creates DrugApproval records from FDA, Purple Book, EMA, KEGG, and CDE sources. Standard of Care only uses approved US records from FDA-like sources:
source_typeinFdaDatumorPurplebookRecordapproval_status = drug_approvedregion = USA- linked
Drug - reviewed structured indications
- disease is under the standard-of-care disease scope
DailyMed labels are fetched by Fda::LabelSync and stored in drug_approvals.label.
Important label paths:
| JSON path | Role |
|---|---|
label['spl']['indications'] | Parsed FDA Indications and Usage section. |
label['spl']['clinical_trials'] | Parsed FDA Clinical Studies section, usually structured by section title and content. |
label['spl']['adverse_events'] | Raw Adverse Reactions section. Kept raw because AE tables vary heavily across labels. |
label['structured']['clinical_trials'] | LLM-structured label trial overviews, endpoints, subgroups, results, arm matches, endpoint matches, and audits. |
label['structured']['adverse_events'] | Segmented AE sections, matched AE reports, arm counts, and detailed AE rows before post-processing. |
label['structured']['matched_approaches'] | Approach ID to matched label trial references. This is the bridge from indications to evidence before rows are materialized. |
label['structured']['needs_trials_update'] | Set when a label or generated SOC entry changes and label trial extraction needs to rerun. |
label['structured']['needs_ct_post_process_sync'] | Set when matched trial data changed and LabelTrial rows need reconciliation. |
label['structured']['needs_ae_update'] / needs_ae_post_process_sync | Set when AE label content changes and downstream AE extraction/post-processing needs to rerun. |
Indications and therapeutic approaches
Section titled “Indications and therapeutic approaches”The indication pipeline structures FDA indication text before SOC begins.
regulatory:indications:parse runs ApprovalLlmClassification::TaskV2. The LLM reads label indications and extracts:
- disease name, subtypes, extents, stages, statuses, risks
- eligible and non-eligible populations
- biomarkers
- prior therapy gates and their AND/OR logic
- one or more therapeutic approaches
- treatment lines and treatment settings
- raw label text for traceability
regulatory:indications:post_process materializes that JSON into:
indicationsindicated_biomarkersindicated_prior_therapy_groupsindicated_prior_therapiesindicated_therapeutic_approachesindicated_combination_partners
Disease names are resolved through TermMatch and linked through indication_diseases. Biomarker and therapy names are resolved to Biomarker, Drug, or NcitConcept where possible.
Treatment approach boundaries
Section titled “Treatment approach boundaries”An IndicatedTherapeuticApproach represents one coherent way the approved product is used in one clinical context.
The parser creates separate approaches for:
- different explicit treatment lines, such as 1L vs 2L
- clearly independent monotherapy vs combination indications
- different required partner sets
- explicitly distinct phases when partner composition differs
It does not split restatements, synonyms, or continuation monotherapy that belongs to the same course unless the label states an independent monotherapy indication.
Treatment lines are stored twice:
- raw label-derived values in
indicated_therapeutic_approaches.treatment_lines - parsed numeric range in
min_line/max_line, plus phase-like settings intreatment_settings
Line parsing uses IndicatedTherapeuticApproach::LINE_MAPPING:
| Label value | Parsed range |
|---|---|
1L | min_line = 0, max_line = 0 |
2L | min_line = 1, max_line = 1 |
3L+ | min_line = 2, max_line = nil |
1L+ | min_line = 0, max_line = nil |
Settings such as Induction, Consolidation, Maintenance, Bridging, Neoadjuvant, and Adjuvant are stored separately in treatment_settings.
What guideline entries are for
Section titled “What guideline entries are for”For automated SOC, Guideline is the normalized disease-facing treatment entry derived from an FDA indication and one therapeutic approach.
StandardOfCareStructuring::PopulateGuidelines creates one Guideline per:
- disease in the SOC scope
- reviewed
Indicationlinked to that disease or one of its descendants IndicatedTherapeuticApproachunder that indication
Each automated Guideline stores:
indication_idindicated_therapeutic_approach_iddisease_idtreatment_string- linked
TreatmentLine,DiseaseBaseSegmentandDiseaseBiomarkerSegmentrows - approval date and accelerated/full approval status
related_diseasessource = 'Label'
The entry exists so disease pages and TPP reports can ask: “For this disease, segment, biomarker segment, and line, which approved treatments are standard of care and what label evidence supports them?”
How treatment strings and therapies are built
Section titled “How treatment strings and therapies are built”For monotherapy, treatment_string is the approved drug name.
For combinations, it is:
Primary drug + partner 1 + partner 2The primary therapy is also written to guideline_therapies. The drugs_guidelines HABTM association is kept for existing consumers.
How treatment lines are linked
Section titled “How treatment lines are linked”PopulateGuidelines#link_treatment_lines reads treatment-line names from the IndicatedTherapeuticApproach. It then maps those values to TreatmentLine records associated with the guideline disease.
Important behavior:
- It matches by
treatment_lines.line_mapping, not only displayname. Induction,Consolidation, andMaintenancecan be standalone treatment lines for some diseases. If a disease has a standalone line with that mapping, it is linked directly.- If those phase names are not standalone lines, they are appended to numeric lines as
"{line}; {phase}". - If a phase appears without a numeric line, it defaults to
1L; {phase}. - If the indication type is prevention,
Preventionis added as a treatment line. - Plus and non-plus mappings fall back both ways:
2L+can match2L, and2Lcan match2L+if the exact disease line is missing.
How entries become standard of care
Section titled “How entries become standard of care”PostProcessGuidelines does not mark every generated automated guideline as SOC just because it exists. After segment matching, it finalizes automated entries by disease:
- A guideline is eligible only if it is post-processed and has at least one linked
TreatmentLine. - Duplicate entries are suppressed when they have the same disease, base segments, biomarker segments, treatment lines, and regimen signature.
- The duplicate winner is the entry with the richer evidence set: more label trials, trial results, arm outcomes, adverse events, and key result selections.
- The winning entries get
standard_of_care = true; suppressed or ineligible entries getstandard_of_care = false.
Label trial extraction
Section titled “Label trial extraction”ApprovalLlmClassification::TrialExtraction::LabelTrialsTask reads label['spl']['clinical_trials'] and identifies clinical trials that present detailed results.
For each trial it extracts:
- registry ID, such as
NCT########; unknown if absent - trial title, such as
KEYNOTE-426 - patient population and study design
The result initially lives in drug_approvals.label['structured']['clinical_trials']. It is later materialized into LabelTrial.
Trials are not extracted from every approval. The default scope is approved FDA/Purple Book approvals with standard-of-care treatment-line diseases and a Clinical Studies section.
How indications are matched to label trials
Section titled “How indications are matched to label trials”The system matches IndicatedTherapeuticApproach records to structured label trials in two passes.
Deterministic pass
Section titled “Deterministic pass”MatchApproachTrialsTask uses approval-study fields already attached to the indication when they are available:
indications.full_approval_studiesindications.accelerated_approval_studies
For each study, it compares:
clinical_trial_numberagainst structured label trialidstudy_nameagainst structured label trialtitle
It also sets needs_ct_post_process_sync = true.
LLM fallback
Section titled “LLM fallback”MatchApproachTrialsLlmTask handles approaches not matched deterministically or flagged for sync.
It can only return trial ID/title pairs from the provided enum. If no trial passes all filters, it writes an empty trials array with matched_by = 'llm'.
Materializing label trial evidence
Section titled “Materializing label trial evidence”PostProcessLabelTrialsTask converts label['structured']['matched_approaches'] and label['structured']['clinical_trials'] into relational rows.
For each matched approach/trial pair:
- Resolve the trial JSON by identifier and title.
- Find or create
LabelTrialkeyed bydrug_approval_id,trial_identifier, andtrial_title. - Link
LabelTrialtoClinicalTrialwhen the registry ID matchesclinical_trials.nct_id. - Find or create
ApproachLabelTrialfor the approach and label trial. - Create or sync endpoints, subgroups, outcome measures, adverse events, arm outcomes, and trial disease details under the
LabelTrial.
The task is sync-oriented. It removes stale approach links and stale label trials when upstream matches change, while skipping LabelTrial rows that are automation-protected (manually_touched or auditor_touched); the LabelTrial.automation_owned and LabelTrial.automation_protected scopes encode this split.
Efficacy rows
Section titled “Efficacy rows”Label trial efficacy data is stored as:
TrialEndpoint: endpoint definitions for the label trialTrialSubgroup: patient population or result subgroupTrialOutcomeMeasure: one endpoint measurement for one subgroupTrialArmOutcome: arm-level result value for that measurement
TrialOutcomeMeasure and TrialEndpoint use polymorphic source_type/source_id. For the automated FDA-label path, source_type = 'LabelTrial'.
Adverse event rows
Section titled “Adverse event rows”AE extraction follows a separate label-section path because FDA Adverse Reactions sections are inconsistent.
The AE process:
AeUpdateSectionsTaskdetects changed AE regions when label content changes.AeSegmentSectionsTasksplits the AE section into analyzable report segments.AeExtractReportsTaskidentifies which trial each AE segment belongs to and extracts report-level arm counts.AeMatchArmsTaskmatches AE arms to study plan arms.AeExtractDetailsTaskextracts detailed adverse events and arm-level measurements.PostProcessLabelTrialsTaskcreatesAdverseEventandTrialArmOutcomerows under the matchingLabelTrial.
Only valid trial-report AE segments are materialized. Pooled analyses, postmarketing sections, and reports without usable single-arm or matched-arm data are filtered out.
After rows exist, regulatory:trials:standardize_adverse_events deterministically matches AE names to safety endpoints where possible, and regulatory:trials:classify_adverse_events_llm uses an LLM fallback for unmatched safety endpoint classification.
How measurements are matched to endpoints
Section titled “How measurements are matched to endpoints”Measurements become useful only after their endpoint language is normalized.
The extraction and post-processing layers use several endpoint matching stages:
- The label trial endpoint task extracts endpoint names, abbreviations, and definitions from the FDA label trial text.
- Endpoint domain classification groups endpoints into clinical domains.
- Endpoint matching links extracted endpoints to canonical
Endpointrows. - During post-processing,
resolve_endpointprefers:- explicit
endpoint_idfrom the extraction/matching output - exact abbreviation match
Endpoint.flexifindagainst endpoint synonyms
- explicit
The result is stored on trial_endpoints.endpoint_id. Downstream queries use that canonical endpoint ID when available and fall back to endpoint name or abbreviation when necessary.
How key measurements are chosen for a guideline entry
Section titled “How key measurements are chosen for a guideline entry”Key results are chosen per ApproachLabelTrial, not globally per trial.
That distinction matters because the same label trial can support several approaches or indications. The key result for a 1L monotherapy scenario may not be the same as the key result for a combination or subgroup scenario.
TrialsDetermineKeyResults works like this:
- Get disease-specific key endpoints from
diseases.key_endpoints_jsonb. - If the disease has no key endpoint config, walk up parent diseases until one is found.
- Pick endpoint abbreviations based on treatment setting:
neoadjuvantendpoints when the approach has neoadjuvant treatment lines/settingsadjuvantendpoints when the approach has adjuvant treatment lines/settingsotherendpoints for all other treatment settingsallas fallback
- Filter the label trial’s outcome measures to those key endpoints.
- Try deterministic selection when:
- all key measures are in one subgroup
- each key endpoint appears once
- each selected outcome has a single confident investigational arm
- Otherwise use an LLM prompt to select up to one result per key endpoint, choosing the subgroup and investigational arm closest to the indication, disease, treatment line, and exact therapeutic approach.
Selections are written to key_trial_arm_outcomes, which points to the exact trial_arm_outcomes rows that should be surfaced as headline evidence.
How approach arms are matched
Section titled “How approach arms are matched”MatchApproachTrialArmsTask chooses the investigational study plan arm that represents the approach/trial pair.
It considers only ApproachLabelTrial rows that:
- are linked to an automated
IndicatedTherapeuticApproach - have a
LabelTriallinked to aClinicalTrial - have label trial outcome measures
The task builds candidates from investigational TrialArmOutcome rows that already have study_plan_arm_id.
Selection rules:
- If there is exactly one candidate investigational study plan arm, persist it deterministically.
- If multiple candidates exist, ask the LLM to select the one matching the approach.
- The LLM prioritizes exact therapeutic fit, single-agent vs combination status, partner composition, and structured study plan interventions.
- It must not choose control or comparator arms.
The selected arm is stored on:
approach_label_trials.matched_study_plan_arm_idapproach_label_trials.arm_matched_byapproach_label_trials.arm_match_reasoning
KeyTrialArmOutcome selection uses this arm when selecting the right arm-level result.
Disease segments
Section titled “Disease segments”Disease segments are disease-specific filters used to organize SOC treatment entries on disease pages.
There are two segment families:
| Segment type | Table | Meaning |
|---|---|---|
| Base segment | disease_base_segments | Non-biomarker disease qualifiers, such as histology, transplant eligibility, site-specific metastasis, or child disease. |
| Biomarker segment | disease_biomarker_segments | Biomarker-defined cohorts, such as HER2-positive, EGFR-mutant, PD-L1-high. |
Both segment types can link to guidelines through HABTM join tables:
disease_base_segments_guidelinesdisease_biomarker_segments_guidelines
Biomarker segments can also link to actual biomarkers through biomarker_segment_biomarkers.
StandardOfCareStructuring::MatchSegmentsLlm matches label indication text to these segment lists. It gives the LLM:
- the disease’s allowed base segments
- the disease’s allowed biomarker segments
- the indication raw text
- the FDA label indication section for context
The task is conservative:
- It prefers exact candidate strings.
- It allows deterministic abbreviation/symbol equivalences such as
HER2+toHER2-Positive. - It does not extract treatment-line phrases as base segments.
- It avoids generic adult/pediatric and raw staging tokens unless the exact segment is in the candidate list.
- It stores insufficient-data reasoning when the label does not support confident segment assignment.
The matched result is stored in guidelines.llm_data['matched_segments'] and later applied to the guideline’s segment associations.
Treatment lines
Section titled “Treatment lines”TreatmentLine is the disease-specific treatment-line vocabulary used by SOC views. A treatment line can be a simple line (First line), a phase-specific line (First line Maintenance), or a disease-specific special line. The machine key used by automated matching is line_mapping.
DiseaseTreatmentLine connects lines to diseases and preserves display ordering through position.
Automated FDA indications provide treatment-line values as strings on IndicatedTherapeuticApproach. PopulateGuidelines maps those values to disease-specific TreatmentLine rows.
Treatment line handling has two layers:
- On the indication/approach: line semantics are extracted from FDA label text and parsed to
min_line,max_line, andtreatment_settings. - On the guideline: those semantics are mapped into disease-specific
TreatmentLinerecords for filtering and display.
This lets the system preserve label-derived line logic while still using disease-specific display lines in the SOC UI.
Table roles and associations
Section titled “Table roles and associations”Regulatory and indication tables
Section titled “Regulatory and indication tables”| Table | Role | Key associations |
|---|---|---|
drug_approvals | Unified regulatory approval. Stores FDA label JSON and structured LLM/intermediate workflow data. | has_many :indications; has_many :label_trials; polymorphic source via source_type/source_id; belongs_to :drug. |
indications | One structured approved use extracted from FDA label indication text. | belongs_to :drug_approval; has_many :indicated_therapeutic_approaches; has_many :indication_diseases; has_many :diseases, through: :indication_diseases; has_many :guidelines. |
indication_diseases | Join from structured indication to canonical diseases. | belongs_to :indication; belongs_to :disease. |
indicated_therapeutic_approaches | One approved treatment scenario under an indication. Stores line and setting semantics. | belongs_to :indication; has_many :indicated_combination_partners; has_many :guidelines; has_many :approach_label_trials; has_many :label_trials, through: :approach_label_trials. |
indicated_combination_partners | Current-regimen partner therapies for a non-single-agent approach. | belongs_to :indicated_therapeutic_approach; optional polymorphic-like partner_type/partner_id. |
indicated_biomarkers | Biomarker qualifiers extracted from indication text. | belongs_to :indication; optional biomarker_id. |
indicated_prior_therapy_groups | AND/OR grouping for prior therapy requirements. | belongs_to :indication; has_many :indicated_prior_therapies. |
indicated_prior_therapies | Individual prior therapy gates such as progressed after platinum therapy. | belongs_to :indication; optional therapy polymorphic fields; optional group. |
SOC entry and segment tables
Section titled “SOC entry and segment tables”| Table | Role | Key associations |
|---|---|---|
guidelines | Disease-facing SOC treatment entry. Automated rows link one disease, one indication, and one therapeutic approach. | belongs_to :disease; optional belongs_to :indication; optional belongs_to :indicated_therapeutic_approach; HABTM drugs, treatment_lines, disease_base_segments, disease_biomarker_segments; has_many :guideline_therapies. |
guideline_therapies | Polymorphic therapy link for the primary therapy displayed by a guideline. | belongs_to :guideline; belongs_to :therapy, polymorphic: true. |
treatment_lines | Canonical disease-specific line/setting display vocabulary. | has_many :disease_treatment_lines; has_many :diseases, through: :disease_treatment_lines; HABTM guidelines. |
disease_treatment_lines | Disease-to-treatment-line join with display order. | belongs_to :disease; belongs_to :treatment_line. |
disease_base_segments | Non-biomarker disease segment options. | belongs_to :disease; optional belongs_to :child_disease; HABTM guidelines. |
disease_biomarker_segments | Biomarker-defined disease segment options. | belongs_to :disease; optional belongs_to :child_disease; HABTM guidelines; has_many :biomarker_segment_biomarkers; has_many :standard_of_care_prevalences. |
biomarker_segment_biomarkers | Biomarker composition of a biomarker segment. | belongs_to :disease_biomarker_segment; belongs_to :biomarker. |
standard_of_care_prevalences | Prevalence metadata for biomarker segments, optionally treatment-line-specific. | belongs_to :disease; belongs_to :disease_biomarker_segment; optional belongs_to :treatment_line; belongs_to :biomarker_prevalence. |
disease_key_endpoints | Disease-to-endpoint join for key SOC efficacy endpoints. | belongs_to :disease; belongs_to :endpoint. |
Label trial evidence tables
Section titled “Label trial evidence tables”| Table | Role | Key associations |
|---|---|---|
label_trials | One clinical trial referenced in an FDA label. This is the shared evidence container. | belongs_to :drug_approval; optional belongs_to :clinical_trial; has_many :approach_label_trials; polymorphic source for trial result tables. |
approach_label_trials | Join from an approach to a label trial. This is where “this trial supports this approach” is stored. | belongs_to :indicated_therapeutic_approach; belongs_to :label_trial; optional belongs_to :matched_study_plan_arm; has_many :key_trial_arm_outcomes. |
trial_endpoints | Endpoint definitions extracted from label trial text. | Polymorphic source; optional belongs_to :endpoint; optional belongs_to :clinical_trial; has_many :trial_outcome_measures. |
trial_subgroups | Result subgroup or analysis population. | Polymorphic source; optional belongs_to :clinical_trial; optional belongs_to :disease; has_many :trial_outcome_measures. |
trial_outcome_measures | One measurement for one endpoint and subgroup. | Polymorphic source; belongs_to :trial_endpoint; belongs_to :trial_subgroup; optional belongs_to :clinical_trial; has_many :trial_arm_outcomes. |
trial_arm_outcomes | Arm-level value for an efficacy measurement or adverse event. | Optional belongs_to :trial_outcome_measure; optional belongs_to :adverse_event; optional belongs_to :study_plan_arm. |
adverse_events | Safety event measurement extracted from label AE report. | Polymorphic source; optional belongs_to :clinical_trial; optional belongs_to :endpoint; has_many :trial_arm_outcomes. |
trial_disease_details | Disease/population details extracted from trial text. | Polymorphic source; optional belongs_to :disease; optional belongs_to :clinical_trial; has_many :trial_disease_biomarkers. |
key_trial_arm_outcomes | Marks the exact arm outcomes that are headline/key results for an approach-trial pair. | belongs_to :approach_label_trial; belongs_to :trial_arm_outcome. |
Query consumption
Section titled “Query consumption”Disease SOC pages use Diseases::StandardOfCareQuery.
The query starts from Guideline.standard_of_care, filters by disease, optional base segments, biomarker segments, and top endpoints, then joins through:
guidelines -> approach_label_trials -> label_trials -> trial_outcome_measures -> trial_endpoints -> key_trial_arm_outcomes -> trial_arm_outcomesFor label-trial-backed entries, endpoint values are only surfaced through KeyTrialArmOutcome when top endpoint filtering is requested. Legacy guideline-sourced rows are kept as a fallback for older data, but the automated FDA-label path should use LabelTrial as the evidence source.
TPP Standard of Care reports use Tpp::StandardOfCareQuery. It loads automated guideline entries, finds matching ApproachLabelTrial rows by indicated_therapeutic_approach_id, and reads measurements from the shared LabelTrial rows. It uses KeyTrialArmOutcome IDs to prefer headline arm outcomes when summarizing efficacy and safety.
Sync and invalidation rules
Section titled “Sync and invalidation rules”The pipeline is designed to rerun without blindly duplicating results.
Important invalidation behavior:
- If a new automated guideline is created, the approval label is marked
needs_trials_update. - If guideline sync changes indication text, treatment string, treatment lines, related diseases, or accelerated status, downstream trial/segment fields can be cleared and recomputed.
- If Clinical Studies label content changes,
Fda::LabelSyncmarksneeds_trials_update. - If Adverse Reactions label content changes,
Fda::LabelSyncrecords changed AE regions and marksneeds_ae_update. - If approach-to-trial matches change,
needs_ct_post_process_syncis set. - If post-processed trial result structure changes, persisted approach arm matches and key result selections are invalidated.
- Stale
ApproachLabelTrial,LabelTrial, subgroup, outcome, endpoint, and arm rows are reconciled during sync.
The main invariant is that label trial evidence should live once on LabelTrial, while treatment-specific relevance lives on ApproachLabelTrial and KeyTrialArmOutcome.
Parallel.ai audits and corrections
Section titled “Parallel.ai audits and corrections”Once SOC entries are produced, a second pass uses Parallel.ai web search to audit structured fields against official sources (FDA labels, Drugs@FDA, ClinicalTrials.gov, sponsor releases, peer-reviewed pivotal trials).
The framework lives under app/tasks/standard_of_care_structuring/parallel_soc_audits/. BaseAuditor defines the shared JSON schema, scope filters (approval_ids, disease_ids, target-specific ids, limit), Parallel.ai task-group submission, and the AuditIssue write path. Each subclass picks a target table, builds one prompt per record, and lists the editable fields in a Field Contract that constrains what the LLM may correct.
| Auditor | Targets | Checks |
|---|---|---|
IndicationAuditor | Indication, IndicatedTherapeuticApproach, IndicatedCombinationPartner | Treatment lines, single-agent vs combination flag, partner composition. |
LabelTrialAuditor | LabelTrial, ApproachLabelTrial, TrialDiseaseDetail, TrialSubgroup, TrialOutcomeMeasure, TrialArmOutcome | Trial identifier, title, and structured result fields against the FDA label and registry. |
ApproachSegmentAuditor | Guideline segment associations | Disease base segment and biomarker segment IDs linked to each guideline row, restricted to the disease’s candidate segment lists. |
DrugLineAuditor | Guideline rows grouped by disease and treatment line | Whether the drug list per disease/line matches standard-of-care drugs; reports missing and should_not_be_there only. |
IndicationAuditor, LabelTrialAuditor, and ApproachSegmentAuditor write findings to audit_issues keyed by issue_type (soc_indication_audit, soc_label_trial_audit, soc_approach_segment_audit). DrugLineAuditor is read-only: it logs results but does not create issues, since drug coverage corrections require human review.
Applying corrections
Section titled “Applying corrections”CorrectionApplicator and its subclasses read open AuditIssue rows for a given issue_type and apply each correction as a direct field replacement on the target record. Corrections are filtered the same way audits are scoped (issue_ids, approval_ids, disease_ids, target ids, limit), and dry_run logs intended changes without writing.
| Applicator | Issue type | Special handling |
|---|---|---|
IndicationCorrectionApplicator | soc_indication_audit | Replaces indicated_combination_partners rows wholesale when combination_partners is corrected, and resolves partner names through Therapy.find_therapy. |
LabelTrialCorrectionApplicator | soc_label_trial_audit | Field replacement on the target model row. |
ApproachSegmentCorrectionApplicator | soc_approach_segment_audit | Replaces guideline segment associations to match the corrected ID list. |
Trial-extraction audits
Section titled “Trial-extraction audits”AuditApproachTrialMatches (app/tasks/approval_llm_classification/trial_extraction/audit_approach_trial_matches.rb) follows the same Parallel.ai task-group pattern but lives in the trial-extraction pipeline rather than the SOC module. The --batched, --parallelism, and --accept_findings Thor options on regulatory:trials:audit_approach_trial_matches are kept as no-ops for backwards compatibility; Parallel.ai task groups are always used.
Key commands
Section titled “Key commands”| Command | Purpose |
|---|---|
bundle exec thor standard_of_care:populate_guidelines --mode=sync | Create or sync automated SOC guideline entries from reviewed FDA indications. |
bundle exec thor regulatory:trials:extract_label_trials | Identify label trials from FDA Clinical Studies text. |
bundle exec thor regulatory:trials:extract_trials_endpoints | Extract endpoints for label trials. |
bundle exec thor regulatory:trials:extract_trials_subgroups | Extract trial subgroups and analysis populations. |
bundle exec thor regulatory:trials:extract_trials_results | Extract efficacy results. |
bundle exec thor regulatory:trials:scout_trial_nctids | Find missing trial registry IDs. |
bundle exec thor regulatory:trials:match_trial_arms | Match extracted result arms to study plan arms. |
bundle exec thor regulatory:trials:classify_endpoint_domains | Classify endpoint domains. |
bundle exec thor regulatory:trials:match_endpoints | Link extracted endpoints to canonical endpoints. |
bundle exec thor regulatory:trials:segment_ae_sections | Segment FDA AE label sections. |
bundle exec thor regulatory:trials:extract_ae_reports | Match AE report overviews to label trials. |
bundle exec thor regulatory:trials:match_ae_arms | Match AE arms to study plan arms. |
bundle exec thor regulatory:trials:extract_ae_details | Extract detailed adverse event rows. |
bundle exec thor regulatory:trials:match_approach_trials | Deterministically match approaches to label trials. |
bundle exec thor regulatory:trials:match_approach_trials_llm | LLM fallback for approach-to-trial matching. |
bundle exec thor regulatory:trials:post_process_label_trials | Materialize label trial evidence into relational rows. |
bundle exec thor regulatory:trials:match_approach_trial_arms | Pick the approach’s investigational arm. |
bundle exec thor regulatory:trials:determine_key_label_trial_results | Pick key arm outcomes for each approach-trial pair. |
bundle exec thor standard_of_care:match_segments_llm | Match FDA indication text to disease base and biomarker segments. |
bundle exec thor standard_of_care:post_process_guidelines --entry-types=drug_approval | Apply automated SOC post-processing. |
bundle exec thor standard_of_care:audit_indications_parallel | Parallel.ai audit of structured indications and approach line/partner fields. |
bundle exec thor standard_of_care:apply_indication_parallel_audit_corrections | Apply open soc_indication_audit corrections. |
bundle exec thor standard_of_care:audit_label_trials_parallel | Parallel.ai audit of LabelTrial identifiers, titles, and result fields. |
bundle exec thor standard_of_care:apply_label_trial_parallel_audit_corrections | Apply open soc_label_trial_audit corrections. |
bundle exec thor standard_of_care:audit_approach_segments_parallel | Parallel.ai audit of guideline disease base/biomarker segment associations. |
bundle exec thor standard_of_care:apply_approach_segment_parallel_audit_corrections | Apply open soc_approach_segment_audit corrections. |
bundle exec thor standard_of_care:audit_drug_lines_parallel | Parallel.ai audit of SOC drug coverage by disease and treatment line (read-only report). |
Mental model
Section titled “Mental model”Keep these ownership boundaries in mind:
DrugApprovalowns the FDA label and intermediate JSON.Indicationowns the approved disease/population context.IndicatedTherapeuticApproachowns the exact approved regimen and treatment-line semantics.Guidelineowns the disease-facing SOC row and display/filter associations.LabelTrialowns extracted label evidence once.ApproachLabelTrialexplains why one label trial supports one approach.KeyTrialArmOutcomeexplains which specific measurement is closest to the guideline entry.
When debugging an SOC entry, start from guidelines.indicated_therapeutic_approach_id, then inspect the approach’s approach_label_trials, the linked label_trials, and finally the key_trial_arm_outcomes that select the evidence shown to users.