Skip to content

Standard of Care pipeline

The Standard of Care pipeline turns FDA oncology and hematology approvals into disease-specific treatment entries with linked label evidence. It starts from approved US DrugApproval records, structures their FDA indication language into Indication and IndicatedTherapeuticApproach records, extracts trial efficacy and safety data from the label, then matches the right label trials and measurements back to each treatment scenario.

This page covers the automated FDA-label driven process only. It intentionally does not document the semi-manual or manual guideline entry workflows.

A single FDA label can contain many approved uses. Each use can contain multiple treatment approaches: monotherapy, combination therapy, different treatment lines, or explicitly distinct phases such as induction and consolidation.

The Standard of Care process normalizes that into this shape:

flowchart LR
  DA["DrugApproval\nFDA / Purple Book approval"] --> IND["Indication\nstructured approved use"]
  IND --> ITA["IndicatedTherapeuticApproach\none regimen in one context"]
  ITA --> GL["Guideline\nSOC display entry"]
  DA --> LT["LabelTrial\ntrial from FDA label"]
  ITA --> ALT["ApproachLabelTrial\napproach-to-trial match"]
  LT --> ALT
  LT --> TOM["TrialOutcomeMeasure\nendpoint result"]
  TOM --> TAO["TrialArmOutcome\narm-level value"]
  ALT --> KTAO["KeyTrialArmOutcome\nheadline evidence"]
  TAO --> KTAO

Guideline is the disease-facing Standard of Care entry. For automated records it is not a guideline-source import; it is a normalized label-derived SOC row linked to an FDA indication and one therapeutic approach. Guidelines marked with standard_of_care = true appear in the final SoC table and are what we actually consider in analytical steps.

The workflow is defined in StandardOfCareWorkflow.

  1. standard_of_care:populate_guidelines --mode=sync Creates or syncs Guideline rows from reviewed, approved USA indications for diseases marked as standard-of-care.

  2. regulatory:trials:extract_label_trials Reads FDA label Clinical Studies sections and writes structured trial overviews into drug_approvals.label['structured']['clinical_trials'].

  3. regulatory:trials:extract_trials_endpoints, extract_trials_subgroups, extract_trials_results Extracts endpoint definitions, patient/subgroup structure, and result values from each label trial.

  4. regulatory:trials:scout_trial_nctids Searches for missing registry IDs when the label only names a study.

  5. regulatory:trials:match_trial_arms, classify_endpoint_domains, match_endpoints Matches extracted result arms to database study plan arms, classifies endpoint domains, and links extracted endpoints to canonical Endpoint records.

  6. regulatory:trials:update_ae_sections, segment_ae_sections, extract_ae_reports, match_ae_arms, extract_ae_details Extracts adverse event data from the FDA label Adverse Reactions section and ties AE tables to the relevant label trials and arms.

  7. regulatory:trials:match_approach_trials Deterministically matches structured therapeutic approaches to label trials using approval-study identifiers when available.

  8. regulatory:trials:match_approach_trials_llm Uses an LLM fallback for approach-to-trial matching when deterministic identifiers are not enough.

  9. regulatory:trials:post_process_label_trials Materializes label trial JSON into first-class LabelTrial, TrialEndpoint, TrialSubgroup, TrialOutcomeMeasure, TrialArmOutcome and AdverseEvent rows.

  10. regulatory:trials:match_approach_trial_arms Selects the investigational study plan arm that best represents the exact approach/trial pair.

  11. regulatory:trials:determine_key_label_trial_results Selects the key arm-level results for each approach/trial pair.

  12. standard_of_care:match_segments_llm Matches indication language to disease base segments and biomarker segments.

  13. standard_of_care:post_process_guidelines --entry-types=drug_approval Applies the matched segment data to Guideline entries and marks completed SOC rows.

The upstream regulatory pipeline creates DrugApproval records from FDA, Purple Book, EMA, KEGG, and CDE sources. Standard of Care only uses approved US records from FDA-like sources:

  • source_type in FdaDatum or PurplebookRecord
  • approval_status = drug_approved
  • region = USA
  • linked Drug
  • reviewed structured indications
  • disease is under the standard-of-care disease scope

DailyMed labels are fetched by Fda::LabelSync and stored in drug_approvals.label.

Important label paths:

JSON pathRole
label['spl']['indications']Parsed FDA Indications and Usage section.
label['spl']['clinical_trials']Parsed FDA Clinical Studies section, usually structured by section title and content.
label['spl']['adverse_events']Raw Adverse Reactions section. Kept raw because AE tables vary heavily across labels.
label['structured']['clinical_trials']LLM-structured label trial overviews, endpoints, subgroups, results, arm matches, endpoint matches, and audits.
label['structured']['adverse_events']Segmented AE sections, matched AE reports, arm counts, and detailed AE rows before post-processing.
label['structured']['matched_approaches']Approach ID to matched label trial references. This is the bridge from indications to evidence before rows are materialized.
label['structured']['needs_trials_update']Set when a label or generated SOC entry changes and label trial extraction needs to rerun.
label['structured']['needs_ct_post_process_sync']Set when matched trial data changed and LabelTrial rows need reconciliation.
label['structured']['needs_ae_update'] / needs_ae_post_process_syncSet when AE label content changes and downstream AE extraction/post-processing needs to rerun.

The indication pipeline structures FDA indication text before SOC begins.

regulatory:indications:parse runs ApprovalLlmClassification::TaskV2. The LLM reads label indications and extracts:

  • disease name, subtypes, extents, stages, statuses, risks
  • eligible and non-eligible populations
  • biomarkers
  • prior therapy gates and their AND/OR logic
  • one or more therapeutic approaches
  • treatment lines and treatment settings
  • raw label text for traceability

regulatory:indications:post_process materializes that JSON into:

  • indications
  • indicated_biomarkers
  • indicated_prior_therapy_groups
  • indicated_prior_therapies
  • indicated_therapeutic_approaches
  • indicated_combination_partners

Disease names are resolved through TermMatch and linked through indication_diseases. Biomarker and therapy names are resolved to Biomarker, Drug, or NcitConcept where possible.

An IndicatedTherapeuticApproach represents one coherent way the approved product is used in one clinical context.

The parser creates separate approaches for:

  • different explicit treatment lines, such as 1L vs 2L
  • clearly independent monotherapy vs combination indications
  • different required partner sets
  • explicitly distinct phases when partner composition differs

It does not split restatements, synonyms, or continuation monotherapy that belongs to the same course unless the label states an independent monotherapy indication.

Treatment lines are stored twice:

  • raw label-derived values in indicated_therapeutic_approaches.treatment_lines
  • parsed numeric range in min_line / max_line, plus phase-like settings in treatment_settings

Line parsing uses IndicatedTherapeuticApproach::LINE_MAPPING:

Label valueParsed range
1Lmin_line = 0, max_line = 0
2Lmin_line = 1, max_line = 1
3L+min_line = 2, max_line = nil
1L+min_line = 0, max_line = nil

Settings such as Induction, Consolidation, Maintenance, Bridging, Neoadjuvant, and Adjuvant are stored separately in treatment_settings.

For automated SOC, Guideline is the normalized disease-facing treatment entry derived from an FDA indication and one therapeutic approach.

StandardOfCareStructuring::PopulateGuidelines creates one Guideline per:

  • disease in the SOC scope
  • reviewed Indication linked to that disease or one of its descendants
  • IndicatedTherapeuticApproach under that indication

Each automated Guideline stores:

  • indication_id
  • indicated_therapeutic_approach_id
  • disease_id
  • treatment_string
  • linked TreatmentLine, DiseaseBaseSegment and DiseaseBiomarkerSegment rows
  • approval date and accelerated/full approval status
  • related_diseases
  • source = 'Label'

The entry exists so disease pages and TPP reports can ask: “For this disease, segment, biomarker segment, and line, which approved treatments are standard of care and what label evidence supports them?”

How treatment strings and therapies are built

Section titled “How treatment strings and therapies are built”

For monotherapy, treatment_string is the approved drug name.

For combinations, it is:

Primary drug + partner 1 + partner 2

The primary therapy is also written to guideline_therapies. The drugs_guidelines HABTM association is kept for existing consumers.

PopulateGuidelines#link_treatment_lines reads treatment-line names from the IndicatedTherapeuticApproach. It then maps those values to TreatmentLine records associated with the guideline disease.

Important behavior:

  • It matches by treatment_lines.line_mapping, not only display name.
  • Induction, Consolidation, and Maintenance can be standalone treatment lines for some diseases. If a disease has a standalone line with that mapping, it is linked directly.
  • If those phase names are not standalone lines, they are appended to numeric lines as "{line}; {phase}".
  • If a phase appears without a numeric line, it defaults to 1L; {phase}.
  • If the indication type is prevention, Prevention is added as a treatment line.
  • Plus and non-plus mappings fall back both ways: 2L+ can match 2L, and 2L can match 2L+ if the exact disease line is missing.

PostProcessGuidelines does not mark every generated automated guideline as SOC just because it exists. After segment matching, it finalizes automated entries by disease:

  • A guideline is eligible only if it is post-processed and has at least one linked TreatmentLine.
  • Duplicate entries are suppressed when they have the same disease, base segments, biomarker segments, treatment lines, and regimen signature.
  • The duplicate winner is the entry with the richer evidence set: more label trials, trial results, arm outcomes, adverse events, and key result selections.
  • The winning entries get standard_of_care = true; suppressed or ineligible entries get standard_of_care = false.

ApprovalLlmClassification::TrialExtraction::LabelTrialsTask reads label['spl']['clinical_trials'] and identifies clinical trials that present detailed results.

For each trial it extracts:

  • registry ID, such as NCT########; unknown if absent
  • trial title, such as KEYNOTE-426
  • patient population and study design

The result initially lives in drug_approvals.label['structured']['clinical_trials']. It is later materialized into LabelTrial.

Trials are not extracted from every approval. The default scope is approved FDA/Purple Book approvals with standard-of-care treatment-line diseases and a Clinical Studies section.

How indications are matched to label trials

Section titled “How indications are matched to label trials”

The system matches IndicatedTherapeuticApproach records to structured label trials in two passes.

MatchApproachTrialsTask uses approval-study fields already attached to the indication when they are available:

  • indications.full_approval_studies
  • indications.accelerated_approval_studies

For each study, it compares:

  • clinical_trial_number against structured label trial id
  • study_name against structured label trial title

It also sets needs_ct_post_process_sync = true.

MatchApproachTrialsLlmTask handles approaches not matched deterministically or flagged for sync.

It can only return trial ID/title pairs from the provided enum. If no trial passes all filters, it writes an empty trials array with matched_by = 'llm'.

PostProcessLabelTrialsTask converts label['structured']['matched_approaches'] and label['structured']['clinical_trials'] into relational rows.

For each matched approach/trial pair:

  1. Resolve the trial JSON by identifier and title.
  2. Find or create LabelTrial keyed by drug_approval_id, trial_identifier, and trial_title.
  3. Link LabelTrial to ClinicalTrial when the registry ID matches clinical_trials.nct_id.
  4. Find or create ApproachLabelTrial for the approach and label trial.
  5. Create or sync endpoints, subgroups, outcome measures, adverse events, arm outcomes, and trial disease details under the LabelTrial.

The task is sync-oriented. It removes stale approach links and stale label trials when upstream matches change, while skipping LabelTrial rows that are automation-protected (manually_touched or auditor_touched); the LabelTrial.automation_owned and LabelTrial.automation_protected scopes encode this split.

Label trial efficacy data is stored as:

  • TrialEndpoint: endpoint definitions for the label trial
  • TrialSubgroup: patient population or result subgroup
  • TrialOutcomeMeasure: one endpoint measurement for one subgroup
  • TrialArmOutcome: arm-level result value for that measurement

TrialOutcomeMeasure and TrialEndpoint use polymorphic source_type/source_id. For the automated FDA-label path, source_type = 'LabelTrial'.

AE extraction follows a separate label-section path because FDA Adverse Reactions sections are inconsistent.

The AE process:

  1. AeUpdateSectionsTask detects changed AE regions when label content changes.
  2. AeSegmentSectionsTask splits the AE section into analyzable report segments.
  3. AeExtractReportsTask identifies which trial each AE segment belongs to and extracts report-level arm counts.
  4. AeMatchArmsTask matches AE arms to study plan arms.
  5. AeExtractDetailsTask extracts detailed adverse events and arm-level measurements.
  6. PostProcessLabelTrialsTask creates AdverseEvent and TrialArmOutcome rows under the matching LabelTrial.

Only valid trial-report AE segments are materialized. Pooled analyses, postmarketing sections, and reports without usable single-arm or matched-arm data are filtered out.

After rows exist, regulatory:trials:standardize_adverse_events deterministically matches AE names to safety endpoints where possible, and regulatory:trials:classify_adverse_events_llm uses an LLM fallback for unmatched safety endpoint classification.

Measurements become useful only after their endpoint language is normalized.

The extraction and post-processing layers use several endpoint matching stages:

  1. The label trial endpoint task extracts endpoint names, abbreviations, and definitions from the FDA label trial text.
  2. Endpoint domain classification groups endpoints into clinical domains.
  3. Endpoint matching links extracted endpoints to canonical Endpoint rows.
  4. During post-processing, resolve_endpoint prefers:
    • explicit endpoint_id from the extraction/matching output
    • exact abbreviation match
    • Endpoint.flexifind against endpoint synonyms

The result is stored on trial_endpoints.endpoint_id. Downstream queries use that canonical endpoint ID when available and fall back to endpoint name or abbreviation when necessary.

How key measurements are chosen for a guideline entry

Section titled “How key measurements are chosen for a guideline entry”

Key results are chosen per ApproachLabelTrial, not globally per trial.

That distinction matters because the same label trial can support several approaches or indications. The key result for a 1L monotherapy scenario may not be the same as the key result for a combination or subgroup scenario.

TrialsDetermineKeyResults works like this:

  1. Get disease-specific key endpoints from diseases.key_endpoints_jsonb.
  2. If the disease has no key endpoint config, walk up parent diseases until one is found.
  3. Pick endpoint abbreviations based on treatment setting:
    • neoadjuvant endpoints when the approach has neoadjuvant treatment lines/settings
    • adjuvant endpoints when the approach has adjuvant treatment lines/settings
    • other endpoints for all other treatment settings
    • all as fallback
  4. Filter the label trial’s outcome measures to those key endpoints.
  5. Try deterministic selection when:
    • all key measures are in one subgroup
    • each key endpoint appears once
    • each selected outcome has a single confident investigational arm
  6. Otherwise use an LLM prompt to select up to one result per key endpoint, choosing the subgroup and investigational arm closest to the indication, disease, treatment line, and exact therapeutic approach.

Selections are written to key_trial_arm_outcomes, which points to the exact trial_arm_outcomes rows that should be surfaced as headline evidence.

MatchApproachTrialArmsTask chooses the investigational study plan arm that represents the approach/trial pair.

It considers only ApproachLabelTrial rows that:

  • are linked to an automated IndicatedTherapeuticApproach
  • have a LabelTrial linked to a ClinicalTrial
  • have label trial outcome measures

The task builds candidates from investigational TrialArmOutcome rows that already have study_plan_arm_id.

Selection rules:

  • If there is exactly one candidate investigational study plan arm, persist it deterministically.
  • If multiple candidates exist, ask the LLM to select the one matching the approach.
  • The LLM prioritizes exact therapeutic fit, single-agent vs combination status, partner composition, and structured study plan interventions.
  • It must not choose control or comparator arms.

The selected arm is stored on:

  • approach_label_trials.matched_study_plan_arm_id
  • approach_label_trials.arm_matched_by
  • approach_label_trials.arm_match_reasoning

KeyTrialArmOutcome selection uses this arm when selecting the right arm-level result.

Disease segments are disease-specific filters used to organize SOC treatment entries on disease pages.

There are two segment families:

Segment typeTableMeaning
Base segmentdisease_base_segmentsNon-biomarker disease qualifiers, such as histology, transplant eligibility, site-specific metastasis, or child disease.
Biomarker segmentdisease_biomarker_segmentsBiomarker-defined cohorts, such as HER2-positive, EGFR-mutant, PD-L1-high.

Both segment types can link to guidelines through HABTM join tables:

  • disease_base_segments_guidelines
  • disease_biomarker_segments_guidelines

Biomarker segments can also link to actual biomarkers through biomarker_segment_biomarkers.

StandardOfCareStructuring::MatchSegmentsLlm matches label indication text to these segment lists. It gives the LLM:

  • the disease’s allowed base segments
  • the disease’s allowed biomarker segments
  • the indication raw text
  • the FDA label indication section for context

The task is conservative:

  • It prefers exact candidate strings.
  • It allows deterministic abbreviation/symbol equivalences such as HER2+ to HER2-Positive.
  • It does not extract treatment-line phrases as base segments.
  • It avoids generic adult/pediatric and raw staging tokens unless the exact segment is in the candidate list.
  • It stores insufficient-data reasoning when the label does not support confident segment assignment.

The matched result is stored in guidelines.llm_data['matched_segments'] and later applied to the guideline’s segment associations.

TreatmentLine is the disease-specific treatment-line vocabulary used by SOC views. A treatment line can be a simple line (First line), a phase-specific line (First line Maintenance), or a disease-specific special line. The machine key used by automated matching is line_mapping.

DiseaseTreatmentLine connects lines to diseases and preserves display ordering through position.

Automated FDA indications provide treatment-line values as strings on IndicatedTherapeuticApproach. PopulateGuidelines maps those values to disease-specific TreatmentLine rows.

Treatment line handling has two layers:

  • On the indication/approach: line semantics are extracted from FDA label text and parsed to min_line, max_line, and treatment_settings.
  • On the guideline: those semantics are mapped into disease-specific TreatmentLine records for filtering and display.

This lets the system preserve label-derived line logic while still using disease-specific display lines in the SOC UI.

TableRoleKey associations
drug_approvalsUnified regulatory approval. Stores FDA label JSON and structured LLM/intermediate workflow data.has_many :indications; has_many :label_trials; polymorphic source via source_type/source_id; belongs_to :drug.
indicationsOne structured approved use extracted from FDA label indication text.belongs_to :drug_approval; has_many :indicated_therapeutic_approaches; has_many :indication_diseases; has_many :diseases, through: :indication_diseases; has_many :guidelines.
indication_diseasesJoin from structured indication to canonical diseases.belongs_to :indication; belongs_to :disease.
indicated_therapeutic_approachesOne approved treatment scenario under an indication. Stores line and setting semantics.belongs_to :indication; has_many :indicated_combination_partners; has_many :guidelines; has_many :approach_label_trials; has_many :label_trials, through: :approach_label_trials.
indicated_combination_partnersCurrent-regimen partner therapies for a non-single-agent approach.belongs_to :indicated_therapeutic_approach; optional polymorphic-like partner_type/partner_id.
indicated_biomarkersBiomarker qualifiers extracted from indication text.belongs_to :indication; optional biomarker_id.
indicated_prior_therapy_groupsAND/OR grouping for prior therapy requirements.belongs_to :indication; has_many :indicated_prior_therapies.
indicated_prior_therapiesIndividual prior therapy gates such as progressed after platinum therapy.belongs_to :indication; optional therapy polymorphic fields; optional group.
TableRoleKey associations
guidelinesDisease-facing SOC treatment entry. Automated rows link one disease, one indication, and one therapeutic approach.belongs_to :disease; optional belongs_to :indication; optional belongs_to :indicated_therapeutic_approach; HABTM drugs, treatment_lines, disease_base_segments, disease_biomarker_segments; has_many :guideline_therapies.
guideline_therapiesPolymorphic therapy link for the primary therapy displayed by a guideline.belongs_to :guideline; belongs_to :therapy, polymorphic: true.
treatment_linesCanonical disease-specific line/setting display vocabulary.has_many :disease_treatment_lines; has_many :diseases, through: :disease_treatment_lines; HABTM guidelines.
disease_treatment_linesDisease-to-treatment-line join with display order.belongs_to :disease; belongs_to :treatment_line.
disease_base_segmentsNon-biomarker disease segment options.belongs_to :disease; optional belongs_to :child_disease; HABTM guidelines.
disease_biomarker_segmentsBiomarker-defined disease segment options.belongs_to :disease; optional belongs_to :child_disease; HABTM guidelines; has_many :biomarker_segment_biomarkers; has_many :standard_of_care_prevalences.
biomarker_segment_biomarkersBiomarker composition of a biomarker segment.belongs_to :disease_biomarker_segment; belongs_to :biomarker.
standard_of_care_prevalencesPrevalence metadata for biomarker segments, optionally treatment-line-specific.belongs_to :disease; belongs_to :disease_biomarker_segment; optional belongs_to :treatment_line; belongs_to :biomarker_prevalence.
disease_key_endpointsDisease-to-endpoint join for key SOC efficacy endpoints.belongs_to :disease; belongs_to :endpoint.
TableRoleKey associations
label_trialsOne clinical trial referenced in an FDA label. This is the shared evidence container.belongs_to :drug_approval; optional belongs_to :clinical_trial; has_many :approach_label_trials; polymorphic source for trial result tables.
approach_label_trialsJoin from an approach to a label trial. This is where “this trial supports this approach” is stored.belongs_to :indicated_therapeutic_approach; belongs_to :label_trial; optional belongs_to :matched_study_plan_arm; has_many :key_trial_arm_outcomes.
trial_endpointsEndpoint definitions extracted from label trial text.Polymorphic source; optional belongs_to :endpoint; optional belongs_to :clinical_trial; has_many :trial_outcome_measures.
trial_subgroupsResult subgroup or analysis population.Polymorphic source; optional belongs_to :clinical_trial; optional belongs_to :disease; has_many :trial_outcome_measures.
trial_outcome_measuresOne measurement for one endpoint and subgroup.Polymorphic source; belongs_to :trial_endpoint; belongs_to :trial_subgroup; optional belongs_to :clinical_trial; has_many :trial_arm_outcomes.
trial_arm_outcomesArm-level value for an efficacy measurement or adverse event.Optional belongs_to :trial_outcome_measure; optional belongs_to :adverse_event; optional belongs_to :study_plan_arm.
adverse_eventsSafety event measurement extracted from label AE report.Polymorphic source; optional belongs_to :clinical_trial; optional belongs_to :endpoint; has_many :trial_arm_outcomes.
trial_disease_detailsDisease/population details extracted from trial text.Polymorphic source; optional belongs_to :disease; optional belongs_to :clinical_trial; has_many :trial_disease_biomarkers.
key_trial_arm_outcomesMarks the exact arm outcomes that are headline/key results for an approach-trial pair.belongs_to :approach_label_trial; belongs_to :trial_arm_outcome.

Disease SOC pages use Diseases::StandardOfCareQuery.

The query starts from Guideline.standard_of_care, filters by disease, optional base segments, biomarker segments, and top endpoints, then joins through:

guidelines
-> approach_label_trials
-> label_trials
-> trial_outcome_measures
-> trial_endpoints
-> key_trial_arm_outcomes
-> trial_arm_outcomes

For label-trial-backed entries, endpoint values are only surfaced through KeyTrialArmOutcome when top endpoint filtering is requested. Legacy guideline-sourced rows are kept as a fallback for older data, but the automated FDA-label path should use LabelTrial as the evidence source.

TPP Standard of Care reports use Tpp::StandardOfCareQuery. It loads automated guideline entries, finds matching ApproachLabelTrial rows by indicated_therapeutic_approach_id, and reads measurements from the shared LabelTrial rows. It uses KeyTrialArmOutcome IDs to prefer headline arm outcomes when summarizing efficacy and safety.

The pipeline is designed to rerun without blindly duplicating results.

Important invalidation behavior:

  • If a new automated guideline is created, the approval label is marked needs_trials_update.
  • If guideline sync changes indication text, treatment string, treatment lines, related diseases, or accelerated status, downstream trial/segment fields can be cleared and recomputed.
  • If Clinical Studies label content changes, Fda::LabelSync marks needs_trials_update.
  • If Adverse Reactions label content changes, Fda::LabelSync records changed AE regions and marks needs_ae_update.
  • If approach-to-trial matches change, needs_ct_post_process_sync is set.
  • If post-processed trial result structure changes, persisted approach arm matches and key result selections are invalidated.
  • Stale ApproachLabelTrial, LabelTrial, subgroup, outcome, endpoint, and arm rows are reconciled during sync.

The main invariant is that label trial evidence should live once on LabelTrial, while treatment-specific relevance lives on ApproachLabelTrial and KeyTrialArmOutcome.

Once SOC entries are produced, a second pass uses Parallel.ai web search to audit structured fields against official sources (FDA labels, Drugs@FDA, ClinicalTrials.gov, sponsor releases, peer-reviewed pivotal trials).

The framework lives under app/tasks/standard_of_care_structuring/parallel_soc_audits/. BaseAuditor defines the shared JSON schema, scope filters (approval_ids, disease_ids, target-specific ids, limit), Parallel.ai task-group submission, and the AuditIssue write path. Each subclass picks a target table, builds one prompt per record, and lists the editable fields in a Field Contract that constrains what the LLM may correct.

AuditorTargetsChecks
IndicationAuditorIndication, IndicatedTherapeuticApproach, IndicatedCombinationPartnerTreatment lines, single-agent vs combination flag, partner composition.
LabelTrialAuditorLabelTrial, ApproachLabelTrial, TrialDiseaseDetail, TrialSubgroup, TrialOutcomeMeasure, TrialArmOutcomeTrial identifier, title, and structured result fields against the FDA label and registry.
ApproachSegmentAuditorGuideline segment associationsDisease base segment and biomarker segment IDs linked to each guideline row, restricted to the disease’s candidate segment lists.
DrugLineAuditorGuideline rows grouped by disease and treatment lineWhether the drug list per disease/line matches standard-of-care drugs; reports missing and should_not_be_there only.

IndicationAuditor, LabelTrialAuditor, and ApproachSegmentAuditor write findings to audit_issues keyed by issue_type (soc_indication_audit, soc_label_trial_audit, soc_approach_segment_audit). DrugLineAuditor is read-only: it logs results but does not create issues, since drug coverage corrections require human review.

CorrectionApplicator and its subclasses read open AuditIssue rows for a given issue_type and apply each correction as a direct field replacement on the target record. Corrections are filtered the same way audits are scoped (issue_ids, approval_ids, disease_ids, target ids, limit), and dry_run logs intended changes without writing.

ApplicatorIssue typeSpecial handling
IndicationCorrectionApplicatorsoc_indication_auditReplaces indicated_combination_partners rows wholesale when combination_partners is corrected, and resolves partner names through Therapy.find_therapy.
LabelTrialCorrectionApplicatorsoc_label_trial_auditField replacement on the target model row.
ApproachSegmentCorrectionApplicatorsoc_approach_segment_auditReplaces guideline segment associations to match the corrected ID list.

AuditApproachTrialMatches (app/tasks/approval_llm_classification/trial_extraction/audit_approach_trial_matches.rb) follows the same Parallel.ai task-group pattern but lives in the trial-extraction pipeline rather than the SOC module. The --batched, --parallelism, and --accept_findings Thor options on regulatory:trials:audit_approach_trial_matches are kept as no-ops for backwards compatibility; Parallel.ai task groups are always used.

CommandPurpose
bundle exec thor standard_of_care:populate_guidelines --mode=syncCreate or sync automated SOC guideline entries from reviewed FDA indications.
bundle exec thor regulatory:trials:extract_label_trialsIdentify label trials from FDA Clinical Studies text.
bundle exec thor regulatory:trials:extract_trials_endpointsExtract endpoints for label trials.
bundle exec thor regulatory:trials:extract_trials_subgroupsExtract trial subgroups and analysis populations.
bundle exec thor regulatory:trials:extract_trials_resultsExtract efficacy results.
bundle exec thor regulatory:trials:scout_trial_nctidsFind missing trial registry IDs.
bundle exec thor regulatory:trials:match_trial_armsMatch extracted result arms to study plan arms.
bundle exec thor regulatory:trials:classify_endpoint_domainsClassify endpoint domains.
bundle exec thor regulatory:trials:match_endpointsLink extracted endpoints to canonical endpoints.
bundle exec thor regulatory:trials:segment_ae_sectionsSegment FDA AE label sections.
bundle exec thor regulatory:trials:extract_ae_reportsMatch AE report overviews to label trials.
bundle exec thor regulatory:trials:match_ae_armsMatch AE arms to study plan arms.
bundle exec thor regulatory:trials:extract_ae_detailsExtract detailed adverse event rows.
bundle exec thor regulatory:trials:match_approach_trialsDeterministically match approaches to label trials.
bundle exec thor regulatory:trials:match_approach_trials_llmLLM fallback for approach-to-trial matching.
bundle exec thor regulatory:trials:post_process_label_trialsMaterialize label trial evidence into relational rows.
bundle exec thor regulatory:trials:match_approach_trial_armsPick the approach’s investigational arm.
bundle exec thor regulatory:trials:determine_key_label_trial_resultsPick key arm outcomes for each approach-trial pair.
bundle exec thor standard_of_care:match_segments_llmMatch FDA indication text to disease base and biomarker segments.
bundle exec thor standard_of_care:post_process_guidelines --entry-types=drug_approvalApply automated SOC post-processing.
bundle exec thor standard_of_care:audit_indications_parallelParallel.ai audit of structured indications and approach line/partner fields.
bundle exec thor standard_of_care:apply_indication_parallel_audit_correctionsApply open soc_indication_audit corrections.
bundle exec thor standard_of_care:audit_label_trials_parallelParallel.ai audit of LabelTrial identifiers, titles, and result fields.
bundle exec thor standard_of_care:apply_label_trial_parallel_audit_correctionsApply open soc_label_trial_audit corrections.
bundle exec thor standard_of_care:audit_approach_segments_parallelParallel.ai audit of guideline disease base/biomarker segment associations.
bundle exec thor standard_of_care:apply_approach_segment_parallel_audit_correctionsApply open soc_approach_segment_audit corrections.
bundle exec thor standard_of_care:audit_drug_lines_parallelParallel.ai audit of SOC drug coverage by disease and treatment line (read-only report).

Keep these ownership boundaries in mind:

  • DrugApproval owns the FDA label and intermediate JSON.
  • Indication owns the approved disease/population context.
  • IndicatedTherapeuticApproach owns the exact approved regimen and treatment-line semantics.
  • Guideline owns the disease-facing SOC row and display/filter associations.
  • LabelTrial owns extracted label evidence once.
  • ApproachLabelTrial explains why one label trial supports one approach.
  • KeyTrialArmOutcome explains which specific measurement is closest to the guideline entry.

When debugging an SOC entry, start from guidelines.indicated_therapeutic_approach_id, then inspect the approach’s approach_label_trials, the linked label_trials, and finally the key_trial_arm_outcomes that select the evidence shown to users.