subaward-extraction-udm¶
subaward-extraction-udm0.1.0noneTags: subaward post-award agreement pte subrecipient financial-policies contacts udm structured-extraction json
Audience: sponsored-programs-staff, post-award-teams, ingest-pipelines
Manifestations in repo: prompt.md
Extracts a fully executed subaward agreement (Pass-Through Entity → Subrecipient) into a single structured JSON object covering nine sections used for research-administration setup and ongoing monitoring: basic info, project periods, PTE contacts, subrecipient contacts, financial summary, financial policies, reporting requirements, prior-approval handling, and key compliance requirements.
Output contract: schema.json
Contract scope: repo-local, UDM-aligned
Inputs¶
Full text of an executed subaward agreement (PTE → Subrecipient), typically including attachments.
Outputs¶
A single JSON object covering six logical blocks:
- Core award info —
pte_name,subrecipient_name,federal_award_number,subaward_number,project_title,federal_awarding_agency - Contacts — six
{name, email, phone}objects (PTE PI / admin / financial; Subrecipient PI / admin / financial) - Dates & monetary values —
budget_period_start,budget_period_end,amount_funded,total_direct_costs,total_indirect_costs,cost_type(enum) - Financial policies —
invoicing_frequency(enum),final_invoice_due,fa_rate,fa_base,cost_sharing_required,carryforward_policy - Reporting requirements — typed arrays of
technical_reports,financial_reports,invention_reporting - Compliance requirements —
governing_regulations(with source attribution),prior_approval_handling,coi_policy,data_rights,audit_requirements,termination_clauses,record_retention
See schema.json for the authoritative definition and prompt.md for the encoding rules (character-perfect emails / phone numbers / dollar amounts; the strict-inclusion rule for technical_reports and financial_reports; the amount_funded == total_direct_costs + total_indirect_costs reconciliation).
Contract scope¶
Repo-local, UDM-aligned. Extensive UDM column bindings preserved (see prompt.md). The structured shape mirrors the deliverable produced by the subaward-extraction Vandalizer workflow in the ui-insight/ProcessMapping process-mapping corpus.
Triad integration¶
- Evaluation datasets: none yet — planned: PTE → academic-subrecipient subaward (with full Attachment A and Attachment 4); subaward with cost-share commitment; subaward with non-default invoicing cadence; subaward with explicit COI flow-down language.
- Harness notes: canonical manifestation is
prompt.md. The companion top-levelworkflows/subaward-extractionVandalizer workflow at v0.1.0 implements the contract as six parallel Extraction tasks plus a Consolidation Prompt.
Runtime topology — the Vandalizer workflow¶
The canonical runtime is the subaward-extraction workflow.
- Step 1 (parallel Extraction) — six Extraction tasks mirroring the source ProcessMapping workflow one-for-one (Core Award Information; Contact Information; Dates & Monetary Values; Financial Policies; Reporting Requirements; Compliance Requirements).
- Step 2 (Consolidation Prompt) — assembles the six fragments into a single schema-conformant object, normalizes the two enums (
cost_type,invoicing_frequency), and verifies theamount_funded == total_direct_costs + total_indirect_costsreconciliation.
Manifestations¶
prompt.md— canonical, LLM-agnostic prompt
Evals¶
See evals/ for reference inputs and known-good outputs.
Provenance¶
Authored 2026-04-30 against the subaward-extraction (Workflow_ID: WF-SUBAWARD-EXTRACTION) process-mapping workflow in ui-insight/ProcessMapping at commit b7176b0c913833a205efdb5e4ba00c17ff88af0f.
Contract scope¶
-
Output format:
json_object -
Contract scope:
shared_udm_semantics_repo_local_schema -
Validation surfaces:
json_schema -
Schema entrypoints:
# -
Notes: Repo-local subaward agreement contract. Six logical blocks (core award / contacts / dates & monetary / financial policies / reporting / compliance) covering nine deliverable sections used for research-administration setup and ongoing monitoring of subawards. Each contact is a {name, email, phone} object; reporting requirements are typed arrays (technical, financial, invention) with strict-inclusion criteria.
-
Machine-readable catalog entry:
component_catalog.json
Triad integration¶
-
UDM alignment:
shared_udm_semantics_repo_local_schema— Broad UDM alignment via leaf-field column bindings to Award, Subaward, Organization, Personnel, ContactDetails, IndirectRate, CostShare, AwardDeliverable, Modification, Terms, and ConflictOfInterest (see prompt.md). The six-block shape itself is repo-local. -
Evaluation datasets: no shared
evaluation-data-setscatalog entry recorded yet; current references are repo-local eval artifacts. -
Harness notes: Validate JSON outputs against schema.json. Canonical single-call invocation surface is prompt.md. The companion top-level workflows/subaward-extraction Vandalizer workflow at v0.1.0 implements the contract as six parallel Extraction tasks plus a Consolidation Prompt that composes 18 flat contact searchset items into six structured {name, email, phone} objects, normalizes the cost_type and invoicing_frequency enums, and enforces the CFR-01 reconciliation between amount_funded, total_direct_costs, and total_indirect_costs.
Prompt body¶
Source: prompt.md.
Show prompt
Subaward Agreement Extraction — UDM JSON¶
Purpose: Extract a fully executed subaward agreement into a structured JSON object covering nine sections used for research-administration setup and ongoing monitoring of subawards: basic info, project periods, PTE contacts, subrecipient contacts, financial summary, financial policies, reporting requirements, prior-approval handling, and key compliance requirements.
Expected input: Full text of an executed subaward agreement (PTE → Subrecipient), typically including attachments.
Expected output: A single JSON object that validates against
schema.json. No prose, no markdown outside the JSON.
When to use this contract¶
This is the subaward setup-and-monitoring cut of an executed subaward agreement. It produces six logical blocks — core award info, contacts, dates and monetary values, financial policies, reporting requirements, and compliance requirements — pre-shaped so the downstream consumer (Banner setup, Vandalizer monitoring, manual SP analyst review) does not have to re-adjudicate which section a fact belongs to.
UDM-aligned: pte_name and subrecipient_name → Organization.Organization_Name; federal_award_number → Award.Federal_Award_ID; subaward_number → Subaward.Subaward_Number; project_title → Award.Award_Title; subrecipient_pi.name → Subaward.PI_Name; PTE/subrecipient PI / admin / financial contacts → Personnel.First_Name/Last_Name and ContactDetails.ContactDetails_Value; budget_period_start/end → Subaward.Start_Date/End_Date; amount_funded → Subaward.Subaward_Amount; invoicing_frequency → Terms.Invoicing_Frequency; fa_rate → IndirectRate.Rate_Percentage; fa_base → IndirectRate.Base_Type; cost_sharing_required → CostShare.Is_Mandatory; technical/financial reports → AwardDeliverable; prior_approval_handling → Modification.Requires_Prior_Approval; coi_policy → ConflictOfInterest; record_retention → Terms.Record_Retention_Years.
This component does not cover the prime federal award's compliance framework — that lives in award-compliance-extraction-udm. It does not cover prime-award prior-approval procedures — that lives in prior-approval-extraction-udm.
Prompt¶
You are extracting a fully executed subaward agreement (PTE = Pass-Through Entity → Subrecipient) into a single structured JSON object covering core award information, all PTE and Subrecipient contact details, dates and monetary values, financial policies, reporting requirements, and compliance requirements.
Be 100% accurate. Subaward agreements have character-perfect requirements — but match the schema's type for each field exactly:
-
Number-typed fields (
amount_funded,total_direct_costs,total_indirect_costs) — emit as JSON numbers.$1,234,567.89in the document →1234567.89in JSON. No quotes, no$, no thousand-separators. Use the document's exact value; do not round. -
String-typed fields (every other extracted field, including all contact emails and phone numbers, all date strings,
fa_rate, etc.) — quote character-perfectly. Preserve currency symbols when they appear in narrative (carryforward_policy), preserve email format exactly (first.last@institution.edu), preserve phone format exactly ((208) 555-1234), preserve date format exactly. -
Boolean-typed fields (
cost_sharing_required) — emit astrue/false/null.
When a field is not specified, set it to null or — for arrays — return an empty array. Do not invent values.
Search the entire agreement, including attachments, for content in or near sections titled Award Information, Subaward Header, Parties, Attachment A, Attachment B, Contact Information, Award Amount, Budget, Period of Performance, Financial Information, Terms and Conditions, Special Conditions, Financial Terms, Attachment 4, Reporting and Prior Approval Terms, General Terms, Certifications, Flow-Down Requirements.
Return a single JSON object that validates against schema.json with these top-level keys grouped into six blocks:
Core award information:
-
pte_name— Pass-Through Entity name. Required. -
subrecipient_name— Subrecipient organization name. Required. -
federal_award_number— PTE's Federal Award Number (FAIN). Required. -
subaward_number— Subaward number. Required. -
project_title— full project title. Required. -
federal_awarding_agency— federal awarding agency name. Required.
Contacts:
pte_pi,pte_admin_contact,pte_financial_contact,subrecipient_pi,subrecipient_admin_contact,subrecipient_financial_contact— each{name, email, phone}ornull.pte_piandsubrecipient_piare required; the four "contact" fields are optional.
Dates & monetary values:
-
budget_period_start— date string. Required. -
budget_period_end— date string. Required. -
amount_funded— number (decimal). Required. -
total_direct_costs,total_indirect_costs— numbers ornull. -
cost_type— one of"Cost Reimbursement","Fixed Price","Time and Materials", ornull.
Financial policies:
-
invoicing_frequency— one of"Monthly","Quarterly","Semi-Annual","Annual". Required. -
final_invoice_due— string ornull. -
fa_rate— string with rate percentage ornull. -
fa_base— string (e.g.,"MTDC","TDC") ornull. -
cost_sharing_required— boolean ornull. -
carryforward_policy— string ornull.
Reporting requirements:
-
technical_reports— array of{report_name, frequency, due, recipient}objects. Empty when none. -
financial_reports— array of{report_name, frequency, due, recipient}objects. Empty when none. -
invention_reporting— array of{requirement, timing}objects. Empty when none.
Compliance requirements:
-
governing_regulations— array of{regulation, source}objects. Required. -
prior_approval_handling— string describing general prior-approval policy, ornull. -
coi_policy— string describing whose COI policy applies, ornull. -
data_rights— string ornull. -
audit_requirements— string ornull. -
termination_clauses— string ornull. -
record_retention— string ornull.
Encoding rules¶
-
Each contact is a
{name, email, phone}object ornull. Quote emails and phone numbers character-perfectly — these are downstream notification targets. -
amount_funded == total_direct_costs + total_indirect_costswhen all three are present (downstream cross-field checkCHK-01). All three are JSON numbers, not quoted strings. When the agreement does not reconcile, preserve the document's reported values. -
Reporting arrays are typed.
technical_reportsandfinancial_reportsuse{report_name, frequency, due, recipient};invention_reportinguses{requirement, timing}. Each report row is one required deliverable — do not collapse multiple deliverables into a single row. -
governing_regulationsis required and must list every governing instrument the agreement names with its source attribution (the document section / attachment that calls it out). -
Strict inclusion criteria for
technical_reportsandfinancial_reports. Only include reports the agreement explicitly requires. Do not synthesize standard reports the agreement does not mandate. -
cost_typeenum matches the source workflow:Cost Reimbursement,Fixed Price,Time and Materials. Most subawards areCost Reimbursement; quote whatever the document states. -
Do not output any text outside the single JSON object.
Output¶
A single JSON object. No surrounding markdown.
Output schema¶
Source: schema.json.
Show schema.json
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://github.com/AI4RA/prompt-library/components/subaward-extraction-udm/schema.json",
"title": "Subaward Agreement Extraction \u2014 UDM Output",
"description": "JSON contract for a fully executed subaward agreement (Pass-Through Entity \u2192 Subrecipient) distilled into nine sections used for research-administration setup and ongoing monitoring. UDM-aligned to Award, Subaward, Organization, Personnel, ContactDetails, IndirectRate, CostShare, AwardDeliverable, Modification, Terms, ConflictOfInterest.",
"version": "0.1.0",
"type": "object",
"additionalProperties": false,
"required": [
"pte_name",
"subrecipient_name",
"federal_award_number",
"subaward_number",
"project_title",
"federal_awarding_agency",
"pte_pi",
"subrecipient_pi",
"budget_period_start",
"budget_period_end",
"amount_funded",
"invoicing_frequency",
"technical_reports",
"financial_reports",
"invention_reporting",
"governing_regulations"
],
"$defs": {
"contact": {
"type": [
"object",
"null"
],
"additionalProperties": false,
"required": [
"name"
],
"properties": {
"name": {
"type": "string",
"minLength": 1
},
"email": {
"type": [
"string",
"null"
]
},
"phone": {
"type": [
"string",
"null"
]
}
}
},
"report": {
"type": "object",
"additionalProperties": false,
"required": [
"report_name"
],
"properties": {
"report_name": {
"type": "string",
"minLength": 1
},
"frequency": {
"type": [
"string",
"null"
]
},
"due": {
"type": [
"string",
"null"
]
},
"recipient": {
"type": [
"string",
"null"
]
}
}
}
},
"properties": {
"pte_name": {
"type": "string",
"minLength": 1,
"description": "Pass-Through Entity name. Resolves to UDM Organization.Organization_Name."
},
"subrecipient_name": {
"type": "string",
"minLength": 1,
"description": "Subrecipient organization name. Resolves to UDM Organization.Organization_Name."
},
"federal_award_number": {
"type": "string",
"minLength": 1,
"description": "PTE Federal Award Number (FAIN). Resolves to UDM Award.Federal_Award_ID. Required by source workflow."
},
"subaward_number": {
"type": "string",
"minLength": 1,
"description": "Subaward number. Resolves to UDM Subaward.Subaward_Number."
},
"project_title": {
"type": "string",
"minLength": 1,
"description": "Full project title. Resolves to UDM Award.Award_Title."
},
"federal_awarding_agency": {
"type": "string",
"minLength": 1,
"description": "Federal awarding agency name. Required by source workflow."
},
"pte_pi": {
"$ref": "#/$defs/contact",
"description": "PTE Principal Investigator. Resolves to UDM Personnel."
},
"pte_admin_contact": {
"$ref": "#/$defs/contact"
},
"pte_financial_contact": {
"$ref": "#/$defs/contact"
},
"subrecipient_pi": {
"$ref": "#/$defs/contact",
"description": "Subrecipient PI. Resolves to UDM Subaward.PI_Name + Personnel."
},
"subrecipient_admin_contact": {
"$ref": "#/$defs/contact"
},
"subrecipient_financial_contact": {
"$ref": "#/$defs/contact"
},
"budget_period_start": {
"type": "string",
"minLength": 1,
"description": "Subaward budget period start date. Resolves to UDM Subaward.Start_Date."
},
"budget_period_end": {
"type": "string",
"minLength": 1,
"description": "Subaward budget period end date. Resolves to UDM Subaward.End_Date."
},
"amount_funded": {
"type": "number",
"description": "Amount funded this action. Resolves to UDM Subaward.Subaward_Amount. Cross-field check CHK-01: amount_funded == total_direct_costs + total_indirect_costs when all three present."
},
"total_direct_costs": {
"type": [
"number",
"null"
]
},
"total_indirect_costs": {
"type": [
"number",
"null"
]
},
"cost_type": {
"type": [
"string",
"null"
],
"enum": [
"Cost Reimbursement",
"Fixed Price",
"Time and Materials",
null
]
},
"invoicing_frequency": {
"type": "string",
"enum": [
"Monthly",
"Quarterly",
"Semi-Annual",
"Annual"
],
"description": "Resolves to UDM Terms.Invoicing_Frequency. Required by source workflow."
},
"final_invoice_due": {
"type": [
"string",
"null"
]
},
"fa_rate": {
"type": [
"string",
"null"
],
"description": "Resolves to UDM IndirectRate.Rate_Percentage."
},
"fa_base": {
"type": [
"string",
"null"
],
"description": "Resolves to UDM IndirectRate.Base_Type."
},
"cost_sharing_required": {
"type": [
"boolean",
"null"
],
"description": "Resolves to UDM CostShare.Is_Mandatory."
},
"carryforward_policy": {
"type": [
"string",
"null"
]
},
"technical_reports": {
"type": "array",
"items": {
"$ref": "#/$defs/report"
},
"description": "Required technical reports. Resolves to UDM AwardDeliverable. Empty array when none required."
},
"financial_reports": {
"type": "array",
"items": {
"$ref": "#/$defs/report"
},
"description": "Required financial reports. Resolves to UDM AwardDeliverable. Empty array when none required."
},
"invention_reporting": {
"type": "array",
"items": {
"type": "object",
"additionalProperties": false,
"required": [
"requirement"
],
"properties": {
"requirement": {
"type": "string",
"minLength": 1
},
"timing": {
"type": [
"string",
"null"
]
}
}
},
"description": "Invention disclosure and reporting requirements."
},
"governing_regulations": {
"type": "array",
"minItems": 1,
"items": {
"type": "object",
"additionalProperties": false,
"required": [
"regulation"
],
"properties": {
"regulation": {
"type": "string",
"minLength": 1
},
"source": {
"type": [
"string",
"null"
],
"description": "Document section / attachment that calls it out."
}
}
}
},
"prior_approval_handling": {
"type": [
"string",
"null"
],
"description": "General prior-approval policy. Resolves to UDM Modification.Requires_Prior_Approval."
},
"coi_policy": {
"type": [
"string",
"null"
],
"description": "Conflict of interest policy application. Resolves to UDM ConflictOfInterest."
},
"data_rights": {
"type": [
"string",
"null"
]
},
"audit_requirements": {
"type": [
"string",
"null"
]
},
"termination_clauses": {
"type": [
"string",
"null"
]
},
"record_retention": {
"type": [
"string",
"null"
],
"description": "Resolves to UDM Terms.Record_Retention_Years."
}
}
}
Changelog¶
Source: CHANGELOG.md.
All notable changes to this component. Versions follow semver.
[0.1.0] — 2026-04-30¶
- Initial experimental release.
- Schema derived from the
subaward-extractionv2 Vandalizer workflow inui-insight/ProcessMapping(six parallel Extraction tasks + Formatting task; 34 source fields). - Six-block shape (core award / contacts / dates & monetary / financial policies / reporting / compliance) covering nine deliverable sections.
- All six contact fields realized as
{name, email, phone}objects (rather than the sourceStringfields concatenating "name, email, phone") so each component is structurally addressable for downstream notification routing. technical_reports,financial_reports,invention_reporting, andgoverning_regulationsrealized as arrays of typed objects (rather than the sourceListfields) so per-row attributes attach to the right entry.cost_typeenum (Cost Reimbursement,Fixed Price,Time and Materials);invoicing_frequencyenum (Monthly,Quarterly,Semi-Annual,Annual) — match the source workflow's enums.- Cross-field rule from the source workflow (CFR-01: amount_funded == total_direct_costs + total_indirect_costs) is encoded in the prompt's encoding rules and the workflow validation_plan. amount_funded, total_direct_costs, and total_indirect_costs are JSON numbers (not quoted strings), matching the schema's
"type": "number". - Source-workflow requiredness preserved:
federal_award_number,federal_awarding_agency, andinvoicing_frequencyare required (matching the source workflow'sIs_Required: true), in addition topte_name,subrecipient_name,subaward_number,project_title,pte_pi,subrecipient_pi,budget_period_start,budget_period_end,amount_funded, andgoverning_regulations. - UDM column bindings preserved:
pte_name/subrecipient_name→Organization.Organization_Name;federal_award_number→Award.Federal_Award_ID;subaward_number→Subaward.Subaward_Number;project_title→Award.Award_Title;subrecipient_pi.name→Subaward.PI_Name; contact fields →Personnel+ContactDetails;budget_period_start/end→Subaward.Start_Date/End_Date;amount_funded→Subaward.Subaward_Amount;invoicing_frequency→Terms.Invoicing_Frequency;fa_rate→IndirectRate.Rate_Percentage;fa_base→IndirectRate.Base_Type;cost_sharing_required→CostShare.Is_Mandatory; reports →AwardDeliverable;prior_approval_handling→Modification.Requires_Prior_Approval;coi_policy→ConflictOfInterest;record_retention→Terms.Record_Retention_Years. - No eval cases yet — status
experimentaluntil at least one golden extraction is added underevals/cases/.