Skip to content

subaward-extraction-udm

Slugsubaward-extraction-udm
Version0.1.0
Statusexperimental
Last fully evaluatednone
Eval stateno validated eval cases
Categoryextraction
Domainresearch-administration
Manifestationsprompt
Created2026-04-30
Updated2026-04-30

Tags: subaward post-award agreement pte subrecipient financial-policies contacts udm structured-extraction json

Audience: sponsored-programs-staff, post-award-teams, ingest-pipelines

Manifestations in repo: prompt.md

Extracts a fully executed subaward agreement (Pass-Through Entity → Subrecipient) into a single structured JSON object covering nine sections used for research-administration setup and ongoing monitoring: basic info, project periods, PTE contacts, subrecipient contacts, financial summary, financial policies, reporting requirements, prior-approval handling, and key compliance requirements.

Output contract: schema.json Contract scope: repo-local, UDM-aligned

Inputs

Full text of an executed subaward agreement (PTE → Subrecipient), typically including attachments.

Outputs

A single JSON object covering six logical blocks:

  • Core award infopte_name, subrecipient_name, federal_award_number, subaward_number, project_title, federal_awarding_agency
  • Contacts — six {name, email, phone} objects (PTE PI / admin / financial; Subrecipient PI / admin / financial)
  • Dates & monetary valuesbudget_period_start, budget_period_end, amount_funded, total_direct_costs, total_indirect_costs, cost_type (enum)
  • Financial policiesinvoicing_frequency (enum), final_invoice_due, fa_rate, fa_base, cost_sharing_required, carryforward_policy
  • Reporting requirements — typed arrays of technical_reports, financial_reports, invention_reporting
  • Compliance requirementsgoverning_regulations (with source attribution), prior_approval_handling, coi_policy, data_rights, audit_requirements, termination_clauses, record_retention

See schema.json for the authoritative definition and prompt.md for the encoding rules (character-perfect emails / phone numbers / dollar amounts; the strict-inclusion rule for technical_reports and financial_reports; the amount_funded == total_direct_costs + total_indirect_costs reconciliation).

Contract scope

Repo-local, UDM-aligned. Extensive UDM column bindings preserved (see prompt.md). The structured shape mirrors the deliverable produced by the subaward-extraction Vandalizer workflow in the ui-insight/ProcessMapping process-mapping corpus.

Triad integration

  • Evaluation datasets: none yet — planned: PTE → academic-subrecipient subaward (with full Attachment A and Attachment 4); subaward with cost-share commitment; subaward with non-default invoicing cadence; subaward with explicit COI flow-down language.
  • Harness notes: canonical manifestation is prompt.md. The companion top-level workflows/subaward-extraction Vandalizer workflow at v0.1.0 implements the contract as six parallel Extraction tasks plus a Consolidation Prompt.

Runtime topology — the Vandalizer workflow

The canonical runtime is the subaward-extraction workflow.

  • Step 1 (parallel Extraction) — six Extraction tasks mirroring the source ProcessMapping workflow one-for-one (Core Award Information; Contact Information; Dates & Monetary Values; Financial Policies; Reporting Requirements; Compliance Requirements).
  • Step 2 (Consolidation Prompt) — assembles the six fragments into a single schema-conformant object, normalizes the two enums (cost_type, invoicing_frequency), and verifies the amount_funded == total_direct_costs + total_indirect_costs reconciliation.

Manifestations

  • prompt.md — canonical, LLM-agnostic prompt

Evals

See evals/ for reference inputs and known-good outputs.

Provenance

Authored 2026-04-30 against the subaward-extraction (Workflow_ID: WF-SUBAWARD-EXTRACTION) process-mapping workflow in ui-insight/ProcessMapping at commit b7176b0c913833a205efdb5e4ba00c17ff88af0f.

Contract scope

  • Output format: json_object

  • Contract scope: shared_udm_semantics_repo_local_schema

  • Validation surfaces: json_schema

  • Schema entrypoints: #

  • Notes: Repo-local subaward agreement contract. Six logical blocks (core award / contacts / dates & monetary / financial policies / reporting / compliance) covering nine deliverable sections used for research-administration setup and ongoing monitoring of subawards. Each contact is a {name, email, phone} object; reporting requirements are typed arrays (technical, financial, invention) with strict-inclusion criteria.

  • Machine-readable catalog entry: component_catalog.json

Triad integration

  • UDM alignment: shared_udm_semantics_repo_local_schema — Broad UDM alignment via leaf-field column bindings to Award, Subaward, Organization, Personnel, ContactDetails, IndirectRate, CostShare, AwardDeliverable, Modification, Terms, and ConflictOfInterest (see prompt.md). The six-block shape itself is repo-local.

  • Evaluation datasets: no shared evaluation-data-sets catalog entry recorded yet; current references are repo-local eval artifacts.

  • Harness notes: Validate JSON outputs against schema.json. Canonical single-call invocation surface is prompt.md. The companion top-level workflows/subaward-extraction Vandalizer workflow at v0.1.0 implements the contract as six parallel Extraction tasks plus a Consolidation Prompt that composes 18 flat contact searchset items into six structured {name, email, phone} objects, normalizes the cost_type and invoicing_frequency enums, and enforces the CFR-01 reconciliation between amount_funded, total_direct_costs, and total_indirect_costs.

Prompt body

Source: prompt.md.

Show prompt

Subaward Agreement Extraction — UDM JSON

Purpose: Extract a fully executed subaward agreement into a structured JSON object covering nine sections used for research-administration setup and ongoing monitoring of subawards: basic info, project periods, PTE contacts, subrecipient contacts, financial summary, financial policies, reporting requirements, prior-approval handling, and key compliance requirements.

Expected input: Full text of an executed subaward agreement (PTE → Subrecipient), typically including attachments.

Expected output: A single JSON object that validates against schema.json. No prose, no markdown outside the JSON.

When to use this contract

This is the subaward setup-and-monitoring cut of an executed subaward agreement. It produces six logical blocks — core award info, contacts, dates and monetary values, financial policies, reporting requirements, and compliance requirements — pre-shaped so the downstream consumer (Banner setup, Vandalizer monitoring, manual SP analyst review) does not have to re-adjudicate which section a fact belongs to.

UDM-aligned: pte_name and subrecipient_nameOrganization.Organization_Name; federal_award_numberAward.Federal_Award_ID; subaward_numberSubaward.Subaward_Number; project_titleAward.Award_Title; subrecipient_pi.nameSubaward.PI_Name; PTE/subrecipient PI / admin / financial contacts → Personnel.First_Name/Last_Name and ContactDetails.ContactDetails_Value; budget_period_start/endSubaward.Start_Date/End_Date; amount_fundedSubaward.Subaward_Amount; invoicing_frequencyTerms.Invoicing_Frequency; fa_rateIndirectRate.Rate_Percentage; fa_baseIndirectRate.Base_Type; cost_sharing_requiredCostShare.Is_Mandatory; technical/financial reports → AwardDeliverable; prior_approval_handlingModification.Requires_Prior_Approval; coi_policyConflictOfInterest; record_retentionTerms.Record_Retention_Years.

This component does not cover the prime federal award's compliance framework — that lives in award-compliance-extraction-udm. It does not cover prime-award prior-approval procedures — that lives in prior-approval-extraction-udm.


Prompt

You are extracting a fully executed subaward agreement (PTE = Pass-Through Entity → Subrecipient) into a single structured JSON object covering core award information, all PTE and Subrecipient contact details, dates and monetary values, financial policies, reporting requirements, and compliance requirements.

Be 100% accurate. Subaward agreements have character-perfect requirements — but match the schema's type for each field exactly:

  • Number-typed fields (amount_funded, total_direct_costs, total_indirect_costs) — emit as JSON numbers. $1,234,567.89 in the document → 1234567.89 in JSON. No quotes, no $, no thousand-separators. Use the document's exact value; do not round.

  • String-typed fields (every other extracted field, including all contact emails and phone numbers, all date strings, fa_rate, etc.) — quote character-perfectly. Preserve currency symbols when they appear in narrative (carryforward_policy), preserve email format exactly (first.last@institution.edu), preserve phone format exactly ((208) 555-1234), preserve date format exactly.

  • Boolean-typed fields (cost_sharing_required) — emit as true/false/null.

When a field is not specified, set it to null or — for arrays — return an empty array. Do not invent values.

Search the entire agreement, including attachments, for content in or near sections titled Award Information, Subaward Header, Parties, Attachment A, Attachment B, Contact Information, Award Amount, Budget, Period of Performance, Financial Information, Terms and Conditions, Special Conditions, Financial Terms, Attachment 4, Reporting and Prior Approval Terms, General Terms, Certifications, Flow-Down Requirements.

Return a single JSON object that validates against schema.json with these top-level keys grouped into six blocks:

Core award information:

  • pte_name — Pass-Through Entity name. Required.

  • subrecipient_name — Subrecipient organization name. Required.

  • federal_award_number — PTE's Federal Award Number (FAIN). Required.

  • subaward_number — Subaward number. Required.

  • project_title — full project title. Required.

  • federal_awarding_agency — federal awarding agency name. Required.

Contacts:

  • pte_pi, pte_admin_contact, pte_financial_contact, subrecipient_pi, subrecipient_admin_contact, subrecipient_financial_contact — each {name, email, phone} or null. pte_pi and subrecipient_pi are required; the four "contact" fields are optional.

Dates & monetary values:

  • budget_period_start — date string. Required.

  • budget_period_end — date string. Required.

  • amount_funded — number (decimal). Required.

  • total_direct_costs, total_indirect_costs — numbers or null.

  • cost_type — one of "Cost Reimbursement", "Fixed Price", "Time and Materials", or null.

Financial policies:

  • invoicing_frequency — one of "Monthly", "Quarterly", "Semi-Annual", "Annual". Required.

  • final_invoice_due — string or null.

  • fa_rate — string with rate percentage or null.

  • fa_base — string (e.g., "MTDC", "TDC") or null.

  • cost_sharing_required — boolean or null.

  • carryforward_policy — string or null.

Reporting requirements:

  • technical_reports — array of {report_name, frequency, due, recipient} objects. Empty when none.

  • financial_reports — array of {report_name, frequency, due, recipient} objects. Empty when none.

  • invention_reporting — array of {requirement, timing} objects. Empty when none.

Compliance requirements:

  • governing_regulations — array of {regulation, source} objects. Required.

  • prior_approval_handling — string describing general prior-approval policy, or null.

  • coi_policy — string describing whose COI policy applies, or null.

  • data_rights — string or null.

  • audit_requirements — string or null.

  • termination_clauses — string or null.

  • record_retention — string or null.

Encoding rules

  1. Each contact is a {name, email, phone} object or null. Quote emails and phone numbers character-perfectly — these are downstream notification targets.

  2. amount_funded == total_direct_costs + total_indirect_costs when all three are present (downstream cross-field check CHK-01). All three are JSON numbers, not quoted strings. When the agreement does not reconcile, preserve the document's reported values.

  3. Reporting arrays are typed. technical_reports and financial_reports use {report_name, frequency, due, recipient}; invention_reporting uses {requirement, timing}. Each report row is one required deliverable — do not collapse multiple deliverables into a single row.

  4. governing_regulations is required and must list every governing instrument the agreement names with its source attribution (the document section / attachment that calls it out).

  5. Strict inclusion criteria for technical_reports and financial_reports. Only include reports the agreement explicitly requires. Do not synthesize standard reports the agreement does not mandate.

  6. cost_type enum matches the source workflow: Cost Reimbursement, Fixed Price, Time and Materials. Most subawards are Cost Reimbursement; quote whatever the document states.

  7. Do not output any text outside the single JSON object.

Output

A single JSON object. No surrounding markdown.

Output schema

Source: schema.json.

Show schema.json
{

  "$schema": "https://json-schema.org/draft/2020-12/schema",

  "$id": "https://github.com/AI4RA/prompt-library/components/subaward-extraction-udm/schema.json",

  "title": "Subaward Agreement Extraction \u2014 UDM Output",

  "description": "JSON contract for a fully executed subaward agreement (Pass-Through Entity \u2192 Subrecipient) distilled into nine sections used for research-administration setup and ongoing monitoring. UDM-aligned to Award, Subaward, Organization, Personnel, ContactDetails, IndirectRate, CostShare, AwardDeliverable, Modification, Terms, ConflictOfInterest.",

  "version": "0.1.0",

  "type": "object",

  "additionalProperties": false,

  "required": [

    "pte_name",

    "subrecipient_name",

    "federal_award_number",

    "subaward_number",

    "project_title",

    "federal_awarding_agency",

    "pte_pi",

    "subrecipient_pi",

    "budget_period_start",

    "budget_period_end",

    "amount_funded",

    "invoicing_frequency",

    "technical_reports",

    "financial_reports",

    "invention_reporting",

    "governing_regulations"

  ],

  "$defs": {

    "contact": {

      "type": [

        "object",

        "null"

      ],

      "additionalProperties": false,

      "required": [

        "name"

      ],

      "properties": {

        "name": {

          "type": "string",

          "minLength": 1

        },

        "email": {

          "type": [

            "string",

            "null"

          ]

        },

        "phone": {

          "type": [

            "string",

            "null"

          ]

        }

      }

    },

    "report": {

      "type": "object",

      "additionalProperties": false,

      "required": [

        "report_name"

      ],

      "properties": {

        "report_name": {

          "type": "string",

          "minLength": 1

        },

        "frequency": {

          "type": [

            "string",

            "null"

          ]

        },

        "due": {

          "type": [

            "string",

            "null"

          ]

        },

        "recipient": {

          "type": [

            "string",

            "null"

          ]

        }

      }

    }

  },

  "properties": {

    "pte_name": {

      "type": "string",

      "minLength": 1,

      "description": "Pass-Through Entity name. Resolves to UDM Organization.Organization_Name."

    },

    "subrecipient_name": {

      "type": "string",

      "minLength": 1,

      "description": "Subrecipient organization name. Resolves to UDM Organization.Organization_Name."

    },

    "federal_award_number": {

      "type": "string",

      "minLength": 1,

      "description": "PTE Federal Award Number (FAIN). Resolves to UDM Award.Federal_Award_ID. Required by source workflow."

    },

    "subaward_number": {

      "type": "string",

      "minLength": 1,

      "description": "Subaward number. Resolves to UDM Subaward.Subaward_Number."

    },

    "project_title": {

      "type": "string",

      "minLength": 1,

      "description": "Full project title. Resolves to UDM Award.Award_Title."

    },

    "federal_awarding_agency": {

      "type": "string",

      "minLength": 1,

      "description": "Federal awarding agency name. Required by source workflow."

    },

    "pte_pi": {

      "$ref": "#/$defs/contact",

      "description": "PTE Principal Investigator. Resolves to UDM Personnel."

    },

    "pte_admin_contact": {

      "$ref": "#/$defs/contact"

    },

    "pte_financial_contact": {

      "$ref": "#/$defs/contact"

    },

    "subrecipient_pi": {

      "$ref": "#/$defs/contact",

      "description": "Subrecipient PI. Resolves to UDM Subaward.PI_Name + Personnel."

    },

    "subrecipient_admin_contact": {

      "$ref": "#/$defs/contact"

    },

    "subrecipient_financial_contact": {

      "$ref": "#/$defs/contact"

    },

    "budget_period_start": {

      "type": "string",

      "minLength": 1,

      "description": "Subaward budget period start date. Resolves to UDM Subaward.Start_Date."

    },

    "budget_period_end": {

      "type": "string",

      "minLength": 1,

      "description": "Subaward budget period end date. Resolves to UDM Subaward.End_Date."

    },

    "amount_funded": {

      "type": "number",

      "description": "Amount funded this action. Resolves to UDM Subaward.Subaward_Amount. Cross-field check CHK-01: amount_funded == total_direct_costs + total_indirect_costs when all three present."

    },

    "total_direct_costs": {

      "type": [

        "number",

        "null"

      ]

    },

    "total_indirect_costs": {

      "type": [

        "number",

        "null"

      ]

    },

    "cost_type": {

      "type": [

        "string",

        "null"

      ],

      "enum": [

        "Cost Reimbursement",

        "Fixed Price",

        "Time and Materials",

        null

      ]

    },

    "invoicing_frequency": {

      "type": "string",

      "enum": [

        "Monthly",

        "Quarterly",

        "Semi-Annual",

        "Annual"

      ],

      "description": "Resolves to UDM Terms.Invoicing_Frequency. Required by source workflow."

    },

    "final_invoice_due": {

      "type": [

        "string",

        "null"

      ]

    },

    "fa_rate": {

      "type": [

        "string",

        "null"

      ],

      "description": "Resolves to UDM IndirectRate.Rate_Percentage."

    },

    "fa_base": {

      "type": [

        "string",

        "null"

      ],

      "description": "Resolves to UDM IndirectRate.Base_Type."

    },

    "cost_sharing_required": {

      "type": [

        "boolean",

        "null"

      ],

      "description": "Resolves to UDM CostShare.Is_Mandatory."

    },

    "carryforward_policy": {

      "type": [

        "string",

        "null"

      ]

    },

    "technical_reports": {

      "type": "array",

      "items": {

        "$ref": "#/$defs/report"

      },

      "description": "Required technical reports. Resolves to UDM AwardDeliverable. Empty array when none required."

    },

    "financial_reports": {

      "type": "array",

      "items": {

        "$ref": "#/$defs/report"

      },

      "description": "Required financial reports. Resolves to UDM AwardDeliverable. Empty array when none required."

    },

    "invention_reporting": {

      "type": "array",

      "items": {

        "type": "object",

        "additionalProperties": false,

        "required": [

          "requirement"

        ],

        "properties": {

          "requirement": {

            "type": "string",

            "minLength": 1

          },

          "timing": {

            "type": [

              "string",

              "null"

            ]

          }

        }

      },

      "description": "Invention disclosure and reporting requirements."

    },

    "governing_regulations": {

      "type": "array",

      "minItems": 1,

      "items": {

        "type": "object",

        "additionalProperties": false,

        "required": [

          "regulation"

        ],

        "properties": {

          "regulation": {

            "type": "string",

            "minLength": 1

          },

          "source": {

            "type": [

              "string",

              "null"

            ],

            "description": "Document section / attachment that calls it out."

          }

        }

      }

    },

    "prior_approval_handling": {

      "type": [

        "string",

        "null"

      ],

      "description": "General prior-approval policy. Resolves to UDM Modification.Requires_Prior_Approval."

    },

    "coi_policy": {

      "type": [

        "string",

        "null"

      ],

      "description": "Conflict of interest policy application. Resolves to UDM ConflictOfInterest."

    },

    "data_rights": {

      "type": [

        "string",

        "null"

      ]

    },

    "audit_requirements": {

      "type": [

        "string",

        "null"

      ]

    },

    "termination_clauses": {

      "type": [

        "string",

        "null"

      ]

    },

    "record_retention": {

      "type": [

        "string",

        "null"

      ],

      "description": "Resolves to UDM Terms.Record_Retention_Years."

    }

  }

}

Changelog

Source: CHANGELOG.md.

All notable changes to this component. Versions follow semver.

[0.1.0] — 2026-04-30

  • Initial experimental release.
  • Schema derived from the subaward-extraction v2 Vandalizer workflow in ui-insight/ProcessMapping (six parallel Extraction tasks + Formatting task; 34 source fields).
  • Six-block shape (core award / contacts / dates & monetary / financial policies / reporting / compliance) covering nine deliverable sections.
  • All six contact fields realized as {name, email, phone} objects (rather than the source String fields concatenating "name, email, phone") so each component is structurally addressable for downstream notification routing.
  • technical_reports, financial_reports, invention_reporting, and governing_regulations realized as arrays of typed objects (rather than the source List fields) so per-row attributes attach to the right entry.
  • cost_type enum (Cost Reimbursement, Fixed Price, Time and Materials); invoicing_frequency enum (Monthly, Quarterly, Semi-Annual, Annual) — match the source workflow's enums.
  • Cross-field rule from the source workflow (CFR-01: amount_funded == total_direct_costs + total_indirect_costs) is encoded in the prompt's encoding rules and the workflow validation_plan. amount_funded, total_direct_costs, and total_indirect_costs are JSON numbers (not quoted strings), matching the schema's "type": "number".
  • Source-workflow requiredness preserved: federal_award_number, federal_awarding_agency, and invoicing_frequency are required (matching the source workflow's Is_Required: true), in addition to pte_name, subrecipient_name, subaward_number, project_title, pte_pi, subrecipient_pi, budget_period_start, budget_period_end, amount_funded, and governing_regulations.
  • UDM column bindings preserved: pte_name/subrecipient_nameOrganization.Organization_Name; federal_award_numberAward.Federal_Award_ID; subaward_numberSubaward.Subaward_Number; project_titleAward.Award_Title; subrecipient_pi.nameSubaward.PI_Name; contact fields → Personnel + ContactDetails; budget_period_start/endSubaward.Start_Date/End_Date; amount_fundedSubaward.Subaward_Amount; invoicing_frequencyTerms.Invoicing_Frequency; fa_rateIndirectRate.Rate_Percentage; fa_baseIndirectRate.Base_Type; cost_sharing_requiredCostShare.Is_Mandatory; reports → AwardDeliverable; prior_approval_handlingModification.Requires_Prior_Approval; coi_policyConflictOfInterest; record_retentionTerms.Record_Retention_Years.
  • No eval cases yet — status experimental until at least one golden extraction is added under evals/cases/.