Skip to content

proposal-budget-personnel-extraction-udm

Slugproposal-budget-personnel-extraction-udm
Version0.1.0
Statusexperimental
Last fully evaluatednone
Eval stateno validated eval cases
Categoryextraction
Domainresearch-administration
Manifestationsprompt
Created2026-04-30
Updated2026-04-30

Tags: pre-award proposal budget personnel mentoring-plan compliance-triggers nsf udm structured-extraction json

Audience: sponsored-programs-staff, pre-award-teams, ingest-pipelines

Manifestations in repo: prompt.md

Extracts personnel information and compliance-requirement triggers from a proposal budget document into a structured JSON object. Identifies senior key personnel, postdocs, graduate students (RA / TA), undergraduates, and other personnel; the budget category structure; subaward recipients, equipment over $5,000, travel summary, F&A rate and base, cost sharing, and total costs. The output drives the downstream proposal-document-completeness-udm gap analysis.

Output contract: schema.json Contract scope: repo-local, UDM-aligned

Inputs

A proposal budget document (NSF-style budget tables / NIH PHS 398 / agency budget form), optionally with a budget justification.

Outputs

A single JSON object with personnel listings (senior key personnel, postdocs, graduate students, undergraduates, other personnel), derivable boolean triggers (has_postdocs_or_grad_students, mentoring_plan_required, has_subawards, has_equipment_over_5k), the budget structure (budget categories, subaward recipients, equipment items, travel summary, F&A rate and base, cost sharing, total costs), and per-period budgets.

See schema.json for the authoritative definition and prompt.md for the encoding rules (booleans must be derived from list lengths; equipment threshold is $5,000; F&A rate and base split into a {rate, base} object; total-costs reconciliation rule).

Contract scope

Repo-local, UDM-aligned. senior_key_personnel rows resolve to Personnel; fa_rate_and_base resolves to IndirectRate; cost_sharing resolves to CostShare. The structured shape mirrors the deliverable produced by the proposal-budget-personnel-extraction Vandalizer workflow in the ui-insight/ProcessMapping process-mapping corpus.

Relationship to sibling components

Concern This component Related
Personnel + compliance-trigger booleans from proposal budget proposal-budget-personnel-extraction-udm
Document-completeness gap analysis (consumes booleans here) proposal-document-completeness-udm
Post-award budget structure award-compliance-extraction-udm.financial_management
Solicitation requirements rfa-checklist-extraction-udm / foa-checklist-extraction-udm

Triad integration

  • Evaluation datasets: none yet — planned: NSF proposal with postdocs and graduate students (mentoring plan trigger); NSF proposal with subaward and equipment > $5K; NIH proposal with cost-share commitment.
  • Harness notes: canonical manifestation is prompt.md. Validation surface is schema.json. The companion top-level workflows/proposal-budget-personnel-extraction Vandalizer workflow at v0.1.0 implements the contract as two parallel Extraction tasks (personnel identification + budget structure) plus a Consolidation Prompt that derives the boolean triggers.

Runtime topology — the Vandalizer workflow

The canonical runtime is the proposal-budget-personnel-extraction workflow shipped at the top level of this repo.

  • Step 1 (parallel Extraction) — two Extraction tasks. extract-personnel-identification captures personnel; extract-budget-structure-and-compliance-triggers captures the budget structure.
  • Step 2 (Consolidation Prompt) — assembles the two fragments and derives the four boolean triggers from list lengths.

Manifestations

  • prompt.md — canonical, LLM-agnostic prompt

Evals

See evals/ for reference inputs and known-good outputs.

Provenance

Authored 2026-04-30 against the proposal-budget-personnel-extraction (Workflow_ID: WF-PROPOSAL-BUDGET-PERSONNEL-EXTRACTION) process-mapping workflow in ui-insight/ProcessMapping at commit b7176b0c913833a205efdb5e4ba00c17ff88af0f.

Contract scope

  • Output format: json_object

  • Contract scope: shared_udm_semantics_repo_local_schema

  • Validation surfaces: json_schema

  • Schema entrypoints: #

  • Notes: Repo-local pre-award budget-personnel and compliance-trigger contract. Personnel listings (senior key, postdocs, grad students, undergraduates, other) plus four derivable boolean triggers (has_postdocs_or_grad_students, mentoring_plan_required, has_subawards, has_equipment_over_5k) plus the budget structure (categories, subawards, equipment over $5K, F&A rate and base as a {rate, base} object, cost sharing, total costs, multi-year periods).

  • Machine-readable catalog entry: component_catalog.json

Triad integration

  • UDM alignment: shared_udm_semantics_repo_local_schema — senior_key_personnel rows resolve to Personnel; fa_rate_and_base resolves to IndirectRate; cost_sharing resolves to CostShare.

  • Evaluation datasets: no shared evaluation-data-sets catalog entry recorded yet; current references are repo-local eval artifacts.

  • Harness notes: Validate JSON outputs against schema.json. Canonical single-call invocation surface is prompt.md. The companion top-level workflows/proposal-budget-personnel-extraction Vandalizer workflow at v0.1.0 implements the contract as two parallel Extraction tasks (personnel identification + budget structure) plus a Consolidation Prompt that DERIVES the four boolean compliance triggers from list lengths and combines fa_rate + fa_base into the nested fa_rate_and_base object.

  • Related component: proposal-document-completeness-udm (producer_for) — Produces the four boolean triggers and senior_key_personnel listing that proposal-document-completeness-udm consumes for gap-analysis conditional requirements.

Prompt body

Source: prompt.md.

Show prompt

Proposal Budget Personnel & Compliance Triggers — UDM JSON

Purpose: Extract personnel information and compliance-requirement triggers from a proposal budget document into a structured JSON object. Identifies senior key personnel, postdocs, graduate students, undergraduates, and other personnel; the budget category structure; subaward recipients, equipment over $5,000, travel summary, F&A rate and base, cost sharing, and total costs. The output drives a downstream pre-award document-completeness review.

Expected input: A proposal budget document (NSF-style budget tables / NIH PHS 398 / agency budget form), optionally with a budget justification.

Expected output: A single JSON object that validates against schema.json. No prose, no markdown outside the JSON.

When to use this contract

This is the proposal-budget cut for compliance-trigger derivation. It pairs with proposal-document-completeness-udm — the gap-analysis component consumes senior_key_personnel, has_postdocs_or_grad_students, has_subawards, and has_equipment_over_5k from this output to compute its conditional requirements.

UDM-aligned: senior_key_personnel rows resolve to Personnel; fa_rate_and_base resolves to IndirectRate; cost_sharing resolves to CostShare.

This component does not cover the full document-completeness gap analysis — that lives in proposal-document-completeness-udm. It does not cover the post-award budget structure — that lives in award-compliance-extraction-udm.financial_management.


Prompt

You are extracting personnel information and compliance-requirement triggers from a proposal budget document. Your output drives a downstream document-completeness review, so the booleans (has_postdocs_or_grad_students, has_subawards, has_equipment_over_5k, mentoring_plan_required) must be derivable from the extracted lists — never set a boolean except by counting the corresponding list.

Be 100% accurate. Match the schema's type for each field exactly:

  • Number-typed fields (total_personnel_cost, senior_key_personnel[].salary_requested, postdoc_details[].salary_or_stipend, graduate_student_details[].stipend/tuition_remission, other_personnel[].cost, subaward_recipients[].amount, equipment_items[].cost, budget_categories[].amount, budget_periods[].amount, all three keys of total_costs) — emit as JSON numbers. No quotes, no $, no thousand-separators. $1,234,567.89 in the document → 1234567.89 in JSON. Use the document's exact value; do not round.

  • Integer-typed fields (postdoc_count, graduate_student_count, undergraduate_count, budget_periods[].period_number) — emit as JSON integers.

  • String-typed fields (effort percentages, F&A rate string, cost-sharing description, etc.) — quote verbatim, preserving the document's % and other formatting.

When a category is empty, return an empty array; when a scalar is not specified, return null. Do not invent personnel, subawardees, or equipment items.

Search the entire budget for content in or near sections titled Senior/Key Personnel, Other Personnel, Personnel, Section A — Senior/Key Person, Section B — Other Personnel, Salary and Wages, Personnel Justification, Budget, Budget Summary, Cumulative Budget, Subaward, Subcontract, Equipment, Travel, Other Direct Costs, Indirect Costs.

Return a single JSON object that validates against schema.json with these top-level keys:

  • senior_key_personnel — array of {name, role, institution, effort, salary_requested} objects. One row per senior/key person.

  • postdoc_count — integer (count of postdoctoral researchers in the budget).

  • postdoc_details — array of {name, effort, salary_or_stipend} objects (use empty string or "TBN" for unnamed slots). Empty when postdoc_count: 0.

  • graduate_student_count — integer.

  • graduate_student_details — array of {name, type, stipend, tuition_remission} objects. type is "RA" or "TA". Empty when count is 0.

  • undergraduate_count — integer.

  • other_personnel — array of {role, cost} objects (technicians, programmers, administrative staff). Empty when none.

  • total_personnel_cost — number (decimal). Total personnel costs including salary and fringe.

  • has_postdocs_or_grad_students — boolean. Must equal (postdoc_count > 0) OR (graduate_student_count > 0).

  • mentoring_plan_required — boolean. Equals has_postdocs_or_grad_students for NSF; for non-NSF sponsors set per agency policy.

  • budget_categories — array of {category_name, amount, notes} objects.

  • subaward_recipients — array of {institution_name, sub_pi_name, amount} objects.

  • has_subawards — boolean. Must equal len(subaward_recipients) > 0.

  • equipment_items — array of {description, cost} objects (items over $5,000 only).

  • has_equipment_over_5k — boolean. Must equal len(equipment_items) > 0.

  • travel_summary — string with domestic/foreign travel summary or null.

  • fa_rate_and_base — object {rate, base} where base is one of "MTDC", "TDC", "Salary & Wages", or null.

  • cost_sharing — string with cost-sharing amount and type or null.

  • total_costs — object {total_direct_costs, total_indirect_costs, total_project_cost}. All numbers.

  • budget_periods — array of {period_number, start_date, end_date, amount} objects (multi-year). Empty for single-period.

Encoding rules

  1. Booleans are derived, not transcribed. has_postdocs_or_grad_students, mentoring_plan_required, has_subawards, and has_equipment_over_5k are computed from the corresponding list lengths/counts. Setting a boolean inconsistent with its derivation is a downstream CHK-02 failure.

  2. Equipment threshold is $5,000. Items below the threshold do not appear in equipment_items (regardless of whether the budget calls them "equipment").

  3. Postdoc / graduate-student names may be unknown. If the budget lists a "Postdoc to be named" or "GRA TBN", use name: "TBN" so per-person document workflows can flag a missing-name dependency without dropping the row.

  4. fa_rate_and_base is two-part. "42.5% MTDC"{rate: "42.5%", base: "MTDC"}.

  5. total_costs reconciliation. All three keys (total_direct_costs, total_indirect_costs, total_project_cost) are JSON numbers. total_project_cost == total_direct_costs + total_indirect_costs (downstream CHK-03).

  6. graduate_student_details.type enum. Use "RA" (Research Assistant) or "TA" (Teaching Assistant) per the budget's classification.

  7. Do not output any text outside the single JSON object.

Output

A single JSON object. No surrounding markdown.

Output schema

Source: schema.json.

Show schema.json
{

  "$schema": "https://json-schema.org/draft/2020-12/schema",

  "$id": "https://github.com/AI4RA/prompt-library/components/proposal-budget-personnel-extraction-udm/schema.json",

  "title": "Proposal Budget Personnel & Compliance Triggers \u2014 UDM Output",

  "description": "JSON contract for personnel information and compliance-requirement triggers extracted from a proposal budget document. Derivable booleans (has_postdocs_or_grad_students, mentoring_plan_required, has_subawards, has_equipment_over_5k) drive downstream document-completeness reviews. UDM-aligned to Personnel, IndirectRate, and CostShare.",

  "version": "0.1.0",

  "type": "object",

  "additionalProperties": false,

  "required": [

    "senior_key_personnel",

    "postdoc_count",

    "postdoc_details",

    "graduate_student_count",

    "graduate_student_details",

    "other_personnel",

    "has_postdocs_or_grad_students",

    "mentoring_plan_required",

    "budget_categories",

    "subaward_recipients",

    "has_subawards",

    "equipment_items",

    "has_equipment_over_5k",

    "total_costs"

  ],

  "properties": {

    "senior_key_personnel": {

      "type": "array",

      "minItems": 1,

      "items": {

        "type": "object",

        "additionalProperties": false,

        "required": [

          "name",

          "role"

        ],

        "properties": {

          "name": {

            "type": "string",

            "minLength": 1

          },

          "role": {

            "type": "string",

            "minLength": 1

          },

          "institution": {

            "type": [

              "string",

              "null"

            ]

          },

          "effort": {

            "type": [

              "string",

              "null"

            ]

          },

          "salary_requested": {

            "type": [

              "number",

              "null"

            ]

          }

        }

      },

      "description": "Senior key personnel with effort and salary. Resolves to UDM Personnel."

    },

    "postdoc_count": {

      "type": "integer",

      "minimum": 0

    },

    "postdoc_details": {

      "type": "array",

      "items": {

        "type": "object",

        "additionalProperties": false,

        "required": [

          "name"

        ],

        "properties": {

          "name": {

            "type": "string",

            "minLength": 1,

            "description": "Use 'TBN' for to-be-named slots."

          },

          "effort": {

            "type": [

              "string",

              "null"

            ]

          },

          "salary_or_stipend": {

            "type": [

              "number",

              "null"

            ]

          }

        }

      }

    },

    "graduate_student_count": {

      "type": "integer",

      "minimum": 0

    },

    "graduate_student_details": {

      "type": "array",

      "items": {

        "type": "object",

        "additionalProperties": false,

        "required": [

          "name",

          "type"

        ],

        "properties": {

          "name": {

            "type": "string",

            "minLength": 1,

            "description": "Use 'TBN' for to-be-named slots."

          },

          "type": {

            "type": "string",

            "enum": [

              "RA",

              "TA"

            ]

          },

          "stipend": {

            "type": [

              "number",

              "null"

            ]

          },

          "tuition_remission": {

            "type": [

              "number",

              "null"

            ]

          }

        }

      }

    },

    "undergraduate_count": {

      "type": [

        "integer",

        "null"

      ],

      "minimum": 0

    },

    "other_personnel": {

      "type": "array",

      "items": {

        "type": "object",

        "additionalProperties": false,

        "required": [

          "role"

        ],

        "properties": {

          "role": {

            "type": "string",

            "minLength": 1

          },

          "cost": {

            "type": [

              "number",

              "null"

            ]

          }

        }

      }

    },

    "total_personnel_cost": {

      "type": [

        "number",

        "null"

      ]

    },

    "has_postdocs_or_grad_students": {

      "type": "boolean",

      "description": "Derived: (postdoc_count > 0) OR (graduate_student_count > 0)."

    },

    "mentoring_plan_required": {

      "type": "boolean",

      "description": "Derived: equals has_postdocs_or_grad_students for NSF sponsors; per agency policy otherwise."

    },

    "budget_categories": {

      "type": "array",

      "minItems": 1,

      "items": {

        "type": "object",

        "additionalProperties": false,

        "required": [

          "category_name",

          "amount"

        ],

        "properties": {

          "category_name": {

            "type": "string",

            "minLength": 1

          },

          "amount": {

            "type": "number"

          },

          "notes": {

            "type": [

              "string",

              "null"

            ]

          }

        }

      }

    },

    "subaward_recipients": {

      "type": "array",

      "items": {

        "type": "object",

        "additionalProperties": false,

        "required": [

          "institution_name",

          "amount"

        ],

        "properties": {

          "institution_name": {

            "type": "string",

            "minLength": 1

          },

          "sub_pi_name": {

            "type": [

              "string",

              "null"

            ]

          },

          "amount": {

            "type": "number"

          }

        }

      }

    },

    "has_subawards": {

      "type": "boolean",

      "description": "Derived: len(subaward_recipients) > 0."

    },

    "equipment_items": {

      "type": "array",

      "items": {

        "type": "object",

        "additionalProperties": false,

        "required": [

          "description",

          "cost"

        ],

        "properties": {

          "description": {

            "type": "string",

            "minLength": 1

          },

          "cost": {

            "type": "number",

            "minimum": 5000.01,

            "description": "Items over $5,000 only."

          }

        }

      }

    },

    "has_equipment_over_5k": {

      "type": "boolean",

      "description": "Derived: len(equipment_items) > 0."

    },

    "travel_summary": {

      "type": [

        "string",

        "null"

      ]

    },

    "fa_rate_and_base": {

      "type": [

        "object",

        "null"

      ],

      "additionalProperties": false,

      "properties": {

        "rate": {

          "type": [

            "string",

            "null"

          ]

        },

        "base": {

          "type": [

            "string",

            "null"

          ],

          "enum": [

            "MTDC",

            "TDC",

            "Salary & Wages",

            null

          ]

        }

      },

      "description": "Indirect cost rate and base. Resolves to UDM IndirectRate."

    },

    "cost_sharing": {

      "type": [

        "string",

        "null"

      ],

      "description": "Cost sharing amount and type if present. Resolves to UDM CostShare."

    },

    "total_costs": {

      "type": "object",

      "additionalProperties": false,

      "required": [

        "total_direct_costs",

        "total_indirect_costs",

        "total_project_cost"

      ],

      "properties": {

        "total_direct_costs": {

          "type": "number"

        },

        "total_indirect_costs": {

          "type": "number"

        },

        "total_project_cost": {

          "type": "number"

        }

      },

      "description": "Cross-field check CHK-03: total_project_cost == total_direct_costs + total_indirect_costs."

    },

    "budget_periods": {

      "type": "array",

      "items": {

        "type": "object",

        "additionalProperties": false,

        "required": [

          "period_number",

          "amount"

        ],

        "properties": {

          "period_number": {

            "type": "integer"

          },

          "start_date": {

            "type": [

              "string",

              "null"

            ]

          },

          "end_date": {

            "type": [

              "string",

              "null"

            ]

          },

          "amount": {

            "type": "number"

          }

        }

      },

      "description": "Per-period budgets for multi-year proposals. Empty array for single-period."

    }

  }

}

Changelog

Source: CHANGELOG.md.

All notable changes to this component. Versions follow semver.

[0.1.0] — 2026-04-30

  • Initial experimental release.
  • Schema derived from the proposal-budget-personnel-extraction v2 Vandalizer workflow in ui-insight/ProcessMapping (two parallel Extraction tasks + Formatting task; 20 source fields).
  • Four derivable boolean triggers (has_postdocs_or_grad_students, mentoring_plan_required, has_subawards, has_equipment_over_5k) computed from list lengths so downstream consumers (proposal-document-completeness-udm) can rely on internal consistency.
  • senior_key_personnel, postdoc_details, graduate_student_details, other_personnel, subaward_recipients, equipment_items, budget_categories, budget_periods realized as arrays of typed objects (rather than the source Table fields) so per-row attributes attach to the right entry.
  • graduate_student_details.type enum (RA, TA); fa_rate_and_base.base enum (MTDC, TDC, Salary & Wages) — match the source workflow's FA_Rate_And_Base Enum_Values.
  • Equipment threshold ($5,000) encoded structurally via equipment_items.cost.minimum: 5000.01.
  • Cross-field rule from the source workflow (CFR-01: mentoring_plan_required derives from postdoc/graduate counts) is encoded by the schema's required-derivation rule for the boolean fields.
  • UDM column bindings preserved: senior_key_personnelPersonnel; fa_rate_and_baseIndirectRate; cost_sharingCostShare.
  • No eval cases yet — status experimental until at least one golden extraction is added under evals/cases/.