proposal-budget-personnel-extraction-udm¶
proposal-budget-personnel-extraction-udm0.1.0noneTags: pre-award proposal budget personnel mentoring-plan compliance-triggers nsf udm structured-extraction json
Audience: sponsored-programs-staff, pre-award-teams, ingest-pipelines
Manifestations in repo: prompt.md
Extracts personnel information and compliance-requirement triggers from a proposal budget document into a structured JSON object. Identifies senior key personnel, postdocs, graduate students (RA / TA), undergraduates, and other personnel; the budget category structure; subaward recipients, equipment over $5,000, travel summary, F&A rate and base, cost sharing, and total costs. The output drives the downstream proposal-document-completeness-udm gap analysis.
Output contract: schema.json
Contract scope: repo-local, UDM-aligned
Inputs¶
A proposal budget document (NSF-style budget tables / NIH PHS 398 / agency budget form), optionally with a budget justification.
Outputs¶
A single JSON object with personnel listings (senior key personnel, postdocs, graduate students, undergraduates, other personnel), derivable boolean triggers (has_postdocs_or_grad_students, mentoring_plan_required, has_subawards, has_equipment_over_5k), the budget structure (budget categories, subaward recipients, equipment items, travel summary, F&A rate and base, cost sharing, total costs), and per-period budgets.
See schema.json for the authoritative definition and prompt.md for the encoding rules (booleans must be derived from list lengths; equipment threshold is $5,000; F&A rate and base split into a {rate, base} object; total-costs reconciliation rule).
Contract scope¶
Repo-local, UDM-aligned. senior_key_personnel rows resolve to Personnel; fa_rate_and_base resolves to IndirectRate; cost_sharing resolves to CostShare. The structured shape mirrors the deliverable produced by the proposal-budget-personnel-extraction Vandalizer workflow in the ui-insight/ProcessMapping process-mapping corpus.
Relationship to sibling components¶
| Concern | This component | Related |
|---|---|---|
| Personnel + compliance-trigger booleans from proposal budget | proposal-budget-personnel-extraction-udm |
— |
| Document-completeness gap analysis (consumes booleans here) | — | proposal-document-completeness-udm |
| Post-award budget structure | — | award-compliance-extraction-udm.financial_management |
| Solicitation requirements | — | rfa-checklist-extraction-udm / foa-checklist-extraction-udm |
Triad integration¶
- Evaluation datasets: none yet — planned: NSF proposal with postdocs and graduate students (mentoring plan trigger); NSF proposal with subaward and equipment > $5K; NIH proposal with cost-share commitment.
- Harness notes: canonical manifestation is
prompt.md. Validation surface isschema.json. The companion top-levelworkflows/proposal-budget-personnel-extractionVandalizer workflow at v0.1.0 implements the contract as two parallel Extraction tasks (personnel identification + budget structure) plus a Consolidation Prompt that derives the boolean triggers.
Runtime topology — the Vandalizer workflow¶
The canonical runtime is the proposal-budget-personnel-extraction workflow shipped at the top level of this repo.
- Step 1 (parallel Extraction) — two Extraction tasks.
extract-personnel-identificationcaptures personnel;extract-budget-structure-and-compliance-triggerscaptures the budget structure. - Step 2 (Consolidation Prompt) — assembles the two fragments and derives the four boolean triggers from list lengths.
Manifestations¶
prompt.md— canonical, LLM-agnostic prompt
Evals¶
See evals/ for reference inputs and known-good outputs.
Provenance¶
Authored 2026-04-30 against the proposal-budget-personnel-extraction (Workflow_ID: WF-PROPOSAL-BUDGET-PERSONNEL-EXTRACTION) process-mapping workflow in ui-insight/ProcessMapping at commit b7176b0c913833a205efdb5e4ba00c17ff88af0f.
Contract scope¶
-
Output format:
json_object -
Contract scope:
shared_udm_semantics_repo_local_schema -
Validation surfaces:
json_schema -
Schema entrypoints:
# -
Notes: Repo-local pre-award budget-personnel and compliance-trigger contract. Personnel listings (senior key, postdocs, grad students, undergraduates, other) plus four derivable boolean triggers (has_postdocs_or_grad_students, mentoring_plan_required, has_subawards, has_equipment_over_5k) plus the budget structure (categories, subawards, equipment over $5K, F&A rate and base as a {rate, base} object, cost sharing, total costs, multi-year periods).
-
Machine-readable catalog entry:
component_catalog.json
Triad integration¶
-
UDM alignment:
shared_udm_semantics_repo_local_schema— senior_key_personnel rows resolve to Personnel; fa_rate_and_base resolves to IndirectRate; cost_sharing resolves to CostShare. -
Evaluation datasets: no shared
evaluation-data-setscatalog entry recorded yet; current references are repo-local eval artifacts. -
Harness notes: Validate JSON outputs against schema.json. Canonical single-call invocation surface is prompt.md. The companion top-level workflows/proposal-budget-personnel-extraction Vandalizer workflow at v0.1.0 implements the contract as two parallel Extraction tasks (personnel identification + budget structure) plus a Consolidation Prompt that DERIVES the four boolean compliance triggers from list lengths and combines fa_rate + fa_base into the nested fa_rate_and_base object.
-
Related component:
proposal-document-completeness-udm(producer_for) — Produces the four boolean triggers and senior_key_personnel listing that proposal-document-completeness-udm consumes for gap-analysis conditional requirements.
Prompt body¶
Source: prompt.md.
Show prompt
Proposal Budget Personnel & Compliance Triggers — UDM JSON¶
Purpose: Extract personnel information and compliance-requirement triggers from a proposal budget document into a structured JSON object. Identifies senior key personnel, postdocs, graduate students, undergraduates, and other personnel; the budget category structure; subaward recipients, equipment over $5,000, travel summary, F&A rate and base, cost sharing, and total costs. The output drives a downstream pre-award document-completeness review.
Expected input: A proposal budget document (NSF-style budget tables / NIH PHS 398 / agency budget form), optionally with a budget justification.
Expected output: A single JSON object that validates against
schema.json. No prose, no markdown outside the JSON.
When to use this contract¶
This is the proposal-budget cut for compliance-trigger derivation. It pairs with proposal-document-completeness-udm — the gap-analysis component consumes senior_key_personnel, has_postdocs_or_grad_students, has_subawards, and has_equipment_over_5k from this output to compute its conditional requirements.
UDM-aligned: senior_key_personnel rows resolve to Personnel; fa_rate_and_base resolves to IndirectRate; cost_sharing resolves to CostShare.
This component does not cover the full document-completeness gap analysis — that lives in proposal-document-completeness-udm. It does not cover the post-award budget structure — that lives in award-compliance-extraction-udm.financial_management.
Prompt¶
You are extracting personnel information and compliance-requirement triggers from a proposal budget document. Your output drives a downstream document-completeness review, so the booleans (has_postdocs_or_grad_students, has_subawards, has_equipment_over_5k, mentoring_plan_required) must be derivable from the extracted lists — never set a boolean except by counting the corresponding list.
Be 100% accurate. Match the schema's type for each field exactly:
-
Number-typed fields (
total_personnel_cost,senior_key_personnel[].salary_requested,postdoc_details[].salary_or_stipend,graduate_student_details[].stipend/tuition_remission,other_personnel[].cost,subaward_recipients[].amount,equipment_items[].cost,budget_categories[].amount,budget_periods[].amount, all three keys oftotal_costs) — emit as JSON numbers. No quotes, no$, no thousand-separators.$1,234,567.89in the document →1234567.89in JSON. Use the document's exact value; do not round. -
Integer-typed fields (
postdoc_count,graduate_student_count,undergraduate_count,budget_periods[].period_number) — emit as JSON integers. -
String-typed fields (effort percentages, F&A rate string, cost-sharing description, etc.) — quote verbatim, preserving the document's
%and other formatting.
When a category is empty, return an empty array; when a scalar is not specified, return null. Do not invent personnel, subawardees, or equipment items.
Search the entire budget for content in or near sections titled Senior/Key Personnel, Other Personnel, Personnel, Section A — Senior/Key Person, Section B — Other Personnel, Salary and Wages, Personnel Justification, Budget, Budget Summary, Cumulative Budget, Subaward, Subcontract, Equipment, Travel, Other Direct Costs, Indirect Costs.
Return a single JSON object that validates against schema.json with these top-level keys:
-
senior_key_personnel— array of{name, role, institution, effort, salary_requested}objects. One row per senior/key person. -
postdoc_count— integer (count of postdoctoral researchers in the budget). -
postdoc_details— array of{name, effort, salary_or_stipend}objects (use empty string or"TBN"for unnamed slots). Empty whenpostdoc_count: 0. -
graduate_student_count— integer. -
graduate_student_details— array of{name, type, stipend, tuition_remission}objects.typeis"RA"or"TA". Empty when count is 0. -
undergraduate_count— integer. -
other_personnel— array of{role, cost}objects (technicians, programmers, administrative staff). Empty when none. -
total_personnel_cost— number (decimal). Total personnel costs including salary and fringe. -
has_postdocs_or_grad_students— boolean. Must equal(postdoc_count > 0) OR (graduate_student_count > 0). -
mentoring_plan_required— boolean. Equalshas_postdocs_or_grad_studentsfor NSF; for non-NSF sponsors set per agency policy. -
budget_categories— array of{category_name, amount, notes}objects. -
subaward_recipients— array of{institution_name, sub_pi_name, amount}objects. -
has_subawards— boolean. Must equallen(subaward_recipients) > 0. -
equipment_items— array of{description, cost}objects (items over $5,000 only). -
has_equipment_over_5k— boolean. Must equallen(equipment_items) > 0. -
travel_summary— string with domestic/foreign travel summary ornull. -
fa_rate_and_base— object{rate, base}wherebaseis one of"MTDC","TDC","Salary & Wages", ornull. -
cost_sharing— string with cost-sharing amount and type ornull. -
total_costs— object{total_direct_costs, total_indirect_costs, total_project_cost}. All numbers. -
budget_periods— array of{period_number, start_date, end_date, amount}objects (multi-year). Empty for single-period.
Encoding rules¶
-
Booleans are derived, not transcribed.
has_postdocs_or_grad_students,mentoring_plan_required,has_subawards, andhas_equipment_over_5kare computed from the corresponding list lengths/counts. Setting a boolean inconsistent with its derivation is a downstreamCHK-02failure. -
Equipment threshold is $5,000. Items below the threshold do not appear in
equipment_items(regardless of whether the budget calls them "equipment"). -
Postdoc / graduate-student names may be unknown. If the budget lists a "Postdoc to be named" or "GRA TBN", use
name: "TBN"so per-person document workflows can flag a missing-name dependency without dropping the row. -
fa_rate_and_baseis two-part."42.5% MTDC"→{rate: "42.5%", base: "MTDC"}. -
total_costsreconciliation. All three keys (total_direct_costs,total_indirect_costs,total_project_cost) are JSON numbers.total_project_cost == total_direct_costs + total_indirect_costs(downstreamCHK-03). -
graduate_student_details.typeenum. Use"RA"(Research Assistant) or"TA"(Teaching Assistant) per the budget's classification. -
Do not output any text outside the single JSON object.
Output¶
A single JSON object. No surrounding markdown.
Output schema¶
Source: schema.json.
Show schema.json
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://github.com/AI4RA/prompt-library/components/proposal-budget-personnel-extraction-udm/schema.json",
"title": "Proposal Budget Personnel & Compliance Triggers \u2014 UDM Output",
"description": "JSON contract for personnel information and compliance-requirement triggers extracted from a proposal budget document. Derivable booleans (has_postdocs_or_grad_students, mentoring_plan_required, has_subawards, has_equipment_over_5k) drive downstream document-completeness reviews. UDM-aligned to Personnel, IndirectRate, and CostShare.",
"version": "0.1.0",
"type": "object",
"additionalProperties": false,
"required": [
"senior_key_personnel",
"postdoc_count",
"postdoc_details",
"graduate_student_count",
"graduate_student_details",
"other_personnel",
"has_postdocs_or_grad_students",
"mentoring_plan_required",
"budget_categories",
"subaward_recipients",
"has_subawards",
"equipment_items",
"has_equipment_over_5k",
"total_costs"
],
"properties": {
"senior_key_personnel": {
"type": "array",
"minItems": 1,
"items": {
"type": "object",
"additionalProperties": false,
"required": [
"name",
"role"
],
"properties": {
"name": {
"type": "string",
"minLength": 1
},
"role": {
"type": "string",
"minLength": 1
},
"institution": {
"type": [
"string",
"null"
]
},
"effort": {
"type": [
"string",
"null"
]
},
"salary_requested": {
"type": [
"number",
"null"
]
}
}
},
"description": "Senior key personnel with effort and salary. Resolves to UDM Personnel."
},
"postdoc_count": {
"type": "integer",
"minimum": 0
},
"postdoc_details": {
"type": "array",
"items": {
"type": "object",
"additionalProperties": false,
"required": [
"name"
],
"properties": {
"name": {
"type": "string",
"minLength": 1,
"description": "Use 'TBN' for to-be-named slots."
},
"effort": {
"type": [
"string",
"null"
]
},
"salary_or_stipend": {
"type": [
"number",
"null"
]
}
}
}
},
"graduate_student_count": {
"type": "integer",
"minimum": 0
},
"graduate_student_details": {
"type": "array",
"items": {
"type": "object",
"additionalProperties": false,
"required": [
"name",
"type"
],
"properties": {
"name": {
"type": "string",
"minLength": 1,
"description": "Use 'TBN' for to-be-named slots."
},
"type": {
"type": "string",
"enum": [
"RA",
"TA"
]
},
"stipend": {
"type": [
"number",
"null"
]
},
"tuition_remission": {
"type": [
"number",
"null"
]
}
}
}
},
"undergraduate_count": {
"type": [
"integer",
"null"
],
"minimum": 0
},
"other_personnel": {
"type": "array",
"items": {
"type": "object",
"additionalProperties": false,
"required": [
"role"
],
"properties": {
"role": {
"type": "string",
"minLength": 1
},
"cost": {
"type": [
"number",
"null"
]
}
}
}
},
"total_personnel_cost": {
"type": [
"number",
"null"
]
},
"has_postdocs_or_grad_students": {
"type": "boolean",
"description": "Derived: (postdoc_count > 0) OR (graduate_student_count > 0)."
},
"mentoring_plan_required": {
"type": "boolean",
"description": "Derived: equals has_postdocs_or_grad_students for NSF sponsors; per agency policy otherwise."
},
"budget_categories": {
"type": "array",
"minItems": 1,
"items": {
"type": "object",
"additionalProperties": false,
"required": [
"category_name",
"amount"
],
"properties": {
"category_name": {
"type": "string",
"minLength": 1
},
"amount": {
"type": "number"
},
"notes": {
"type": [
"string",
"null"
]
}
}
}
},
"subaward_recipients": {
"type": "array",
"items": {
"type": "object",
"additionalProperties": false,
"required": [
"institution_name",
"amount"
],
"properties": {
"institution_name": {
"type": "string",
"minLength": 1
},
"sub_pi_name": {
"type": [
"string",
"null"
]
},
"amount": {
"type": "number"
}
}
}
},
"has_subawards": {
"type": "boolean",
"description": "Derived: len(subaward_recipients) > 0."
},
"equipment_items": {
"type": "array",
"items": {
"type": "object",
"additionalProperties": false,
"required": [
"description",
"cost"
],
"properties": {
"description": {
"type": "string",
"minLength": 1
},
"cost": {
"type": "number",
"minimum": 5000.01,
"description": "Items over $5,000 only."
}
}
}
},
"has_equipment_over_5k": {
"type": "boolean",
"description": "Derived: len(equipment_items) > 0."
},
"travel_summary": {
"type": [
"string",
"null"
]
},
"fa_rate_and_base": {
"type": [
"object",
"null"
],
"additionalProperties": false,
"properties": {
"rate": {
"type": [
"string",
"null"
]
},
"base": {
"type": [
"string",
"null"
],
"enum": [
"MTDC",
"TDC",
"Salary & Wages",
null
]
}
},
"description": "Indirect cost rate and base. Resolves to UDM IndirectRate."
},
"cost_sharing": {
"type": [
"string",
"null"
],
"description": "Cost sharing amount and type if present. Resolves to UDM CostShare."
},
"total_costs": {
"type": "object",
"additionalProperties": false,
"required": [
"total_direct_costs",
"total_indirect_costs",
"total_project_cost"
],
"properties": {
"total_direct_costs": {
"type": "number"
},
"total_indirect_costs": {
"type": "number"
},
"total_project_cost": {
"type": "number"
}
},
"description": "Cross-field check CHK-03: total_project_cost == total_direct_costs + total_indirect_costs."
},
"budget_periods": {
"type": "array",
"items": {
"type": "object",
"additionalProperties": false,
"required": [
"period_number",
"amount"
],
"properties": {
"period_number": {
"type": "integer"
},
"start_date": {
"type": [
"string",
"null"
]
},
"end_date": {
"type": [
"string",
"null"
]
},
"amount": {
"type": "number"
}
}
},
"description": "Per-period budgets for multi-year proposals. Empty array for single-period."
}
}
}
Changelog¶
Source: CHANGELOG.md.
All notable changes to this component. Versions follow semver.
[0.1.0] — 2026-04-30¶
- Initial experimental release.
- Schema derived from the
proposal-budget-personnel-extractionv2 Vandalizer workflow inui-insight/ProcessMapping(two parallel Extraction tasks + Formatting task; 20 source fields). - Four derivable boolean triggers (
has_postdocs_or_grad_students,mentoring_plan_required,has_subawards,has_equipment_over_5k) computed from list lengths so downstream consumers (proposal-document-completeness-udm) can rely on internal consistency. senior_key_personnel,postdoc_details,graduate_student_details,other_personnel,subaward_recipients,equipment_items,budget_categories,budget_periodsrealized as arrays of typed objects (rather than the sourceTablefields) so per-row attributes attach to the right entry.graduate_student_details.typeenum (RA,TA);fa_rate_and_base.baseenum (MTDC,TDC,Salary & Wages) — match the source workflow'sFA_Rate_And_BaseEnum_Values.- Equipment threshold ($5,000) encoded structurally via
equipment_items.cost.minimum: 5000.01. - Cross-field rule from the source workflow (CFR-01: mentoring_plan_required derives from postdoc/graduate counts) is encoded by the schema's required-derivation rule for the boolean fields.
- UDM column bindings preserved:
senior_key_personnel→Personnel;fa_rate_and_base→IndirectRate;cost_sharing→CostShare. - No eval cases yet — status
experimentaluntil at least one golden extraction is added underevals/cases/.