nsf-budget-spreadsheet-ingest-udm¶

Slugnsf-budget-spreadsheet-ingest-udm

Version1.0.0

Statusexperimental

Last fully evaluatednone

Eval stateno validated eval cases

Categoryextraction

Domainresearch-administration

Manifestationsprompt

Created2026-04-24

Updated2026-04-24

Tags: nsf budget spreadsheet-ingest extraction udm research-administration proposal-preparation

Audience: pre-award-staff, proposal-developers, ingest-pipelines

Manifestations in repo: prompt.md

Normalizes an NSF-style proposal budget workbook (or workbook-derived evidence) into the structured budget object consumed by nsf-budget-justification-udm. This component is the extraction half of a multi-step budget-justification pipeline: it isolates spreadsheet interpretation from narrative drafting so each responsibility can be prompted, versioned, and evaluated independently.

Output contract: delegates to ../nsf-budget-justification-udm/schema.json (#/$defs/input) Contract scope: delegated wrapper over a repo-local NSF budget-justification input contract

Inputs¶

A workbook attachment or extracted spreadsheet evidence package. Typical evidence includes sheet names, used ranges, visible values, formulas when available, and cell-referenced tables from tabs such as:

Full Budget
Personnel
Travel
Other Direct Costs
Equipment
Subawards
Participant Support
Tuition, Fees, Insurance
Rates

Sheet names are hints. The prompt also supports equivalent workbooks where headings, category labels, or extracted CSV snippets carry the same information.

Outputs¶

A single JSON object matching #/$defs/input of the existing nsf-budget-justification-udm schema. Core fields:

project_title, project_summary, project_years
personnel[] with senior flag separating Section A from Section B personnel
budget_summary.categories.A..G year-indexed totals
indirect_cost with rate_percent, base_description, and optional off-campus / agreement citations
Optional detail objects: equipment_items[], travel_detail, participant_support_detail, subawards[], other_direct_costs_detail

Contract Scope¶

Delegated wrapper. This component does not define its own JSON schema; it emits the input contract owned by nsf-budget-justification-udm. The spreadsheet interpretation guidance is sponsor-specific and repo-local, not a shared AI4RA-UDM schema.

Triad Integration¶

Evaluation datasets: none yet; initial coverage is repo-local only.
Harness notes: invoke with workbook-derived evidence, then validate the emitted JSON object against components/nsf-budget-justification-udm/schema.json at #/$defs/input.
Shared UDM relationship: can feed UDM-backed proposal workflows, but the spreadsheet interpretation itself is prompt-library repo-local.

Manifestations¶

prompt.md — canonical prompt

Evals¶

See evals/. No golden cases are checked in yet; future cases should pair a workbook-derived evidence fixture with an expected structured-budget JSON object.

Provenance¶

Extracted 2026-04-24 from the nsf-budget-spreadsheet-justification-udm one-shot prompt so that spreadsheet ingestion and narrative drafting could be separately versioned, parameterized, and evaluated in a multi-step budget-justification workflow.

Contract scope¶

Output format: json_object
Contract scope: delegated_repo_local_schema
Validation surfaces: json_schema_entrypoints
Schema entrypoints: #/$defs/input
Notes: Spreadsheet-to-structured-budget extractor. Emits the input object consumed by nsf-budget-justification-udm. The component does not define a new schema; it delegates validation to nsf-budget-justification-udm at #/$defs/input.
Machine-readable catalog entry: component_catalog.json

Triad integration¶

UDM alignment: delegated_repo_local_schema — Spreadsheet interpretation is sponsor-specific and repo-local; the structured-budget output aligns to the existing nsf-budget- justification-udm input contract, which is itself UDM-aligned but maintained in prompt-library.
Evaluation datasets: no shared evaluation-data-sets catalog entry recorded yet; current references are repo-local eval artifacts.
Harness notes: Invoke with workbook-derived evidence, then validate the emitted JSON object against components/nsf-budget-justification-udm/schema.json at #/$defs/input. Current coverage is repo-local and does not yet include a shared evaluation-data-sets fixture.
Related component: nsf-budget-justification-udm (delegates_output_contract) — Emits the structured budget object consumed by the drafting component.
Related component: nsf-budget-spreadsheet-justification-udm (sibling_multistep_vs_one_shot) — Same spreadsheet interpretation task family; this component is the ingest step of a multi-step workflow, while nsf-budget-spreadsheet- justification-udm handles the whole pipeline in one shot.

Prompt body¶

Source: prompt.md.

Show prompt

NSF Budget Spreadsheet Ingest — UDM¶

Purpose: Normalize an NSF-style proposal budget workbook into the structured input object consumed by nsf-budget-justification-udm.

Expected input: A workbook attachment, workbook-derived tables, CSV extracts, or a cell-referenced evidence package from an NSF-style proposal budget spreadsheet.

Expected output: A JSON object matching #/$defs/input in ../nsf-budget-justification-udm/schema.json. No prose, no markdown outside the JSON.

Prompt¶

You are a research-administration extraction engine with spreadsheet interpretation expertise. Given an NSF-style proposal budget workbook or extracted workbook evidence, emit a structured budget object that validates against #/$defs/input in ../nsf-budget-justification-udm/schema.json.

Output only the final JSON object — no preamble, no commentary, no markdown outside the JSON. If the runtime requires a fenced block, wrap the object in a single ```json ... ``` block and emit nothing else.

Input¶

The caller may provide one or more of:

A workbook attachment.
Extracted sheet tables or CSV snippets.
A cell-referenced evidence package containing sheet names, used ranges, visible values, formulas when available, and any user-supplied project context.
Optional supplemental notes such as project title, project summary, proposal period, solicitation-specific instructions, or institution-specific rate-agreement language.

When the workbook resembles a university NSF budget template, expect tabs such as Full Budget, Personnel, Travel, Other Direct Costs, Equipment, Subawards, Participant Support, Tuition, Fees, Insurance, and Rates. Sheet names are hints, not requirements. Infer equivalent sheets by headings when names differ.

Spreadsheet interpretation¶

Use workbook evidence in this order:

User-supplied project context and explicit instructions.
Workbook summary totals by NSF category and project year.
Detail tabs that explain line-item composition.
Rate-reference tabs for fringe, travel, and indirect-cost assumptions.
Formulas and cell references, when provided, to distinguish computed totals from typed inputs.

If summary totals and detail tabs conflict, use the summary totals for budget_summary.categories by-year amounts. Preserve the detail tabs as equipment_items, travel_detail, participant_support_detail, subawards, and other_direct_costs_detail exactly as the workbook presents them; do not hide material conflicts by silently reconciling.

Determine project_years from the year columns that carry budget activity, not from blank template columns. If a template contains five year columns but only three have non-zero totals and the user did not state a five-year project, set project_years to 3.

Mapping to the structured budget¶

Populate the output object with these fields from the schema:

| Schema field | Workbook evidence to use |

| --- | --- |

| project_title | User-supplied title; null when absent. |

| project_summary | User-supplied one-paragraph summary; null when absent. |

| project_years | Count of active budget years. |

| personnel[] | Each senior-personnel row (set senior: true) and each other-personnel row (set senior: false). Include name, role, effort, base_salary. Add escalation_percent and funding_source only when the workbook supports them. |

| budget_summary.categories.A..G | Year-indexed totals matching the NSF section. A = senior personnel; B = other personnel; C = fringe; D = equipment; E = travel; F = participant support; G = other direct costs (materials, publication, consultants, computer services, subawards, and any line items not in A–F). |

| indirect_cost | rate_percent, base_description (MTDC or TDC with exclusions), off_campus_rate_percent when distinct, rate_agreement_citation only when cited, notes for rate changes mid-project. |

| equipment_items[] | Per-item detail with name, year, amount (≥ $5,000), justification_hint. Null or empty array when Section D is zero. |

| travel_detail | domestic_purposes[] and international_purposes[] trip blurbs. Null when only summary totals are available. |

| participant_support_detail | program_purpose plus line_items[] by category (stipend, travel, subsistence, fee, other). Null when Section F is zero. |

| subawards[] | For each subrecipient: institution, pi_name when stated, scope from the workbook, and amount_by_year. |

| other_direct_costs_detail | Free-text descriptions for materials_and_supplies, publication_costs, consultant_services, computer_services, and other as the workbook supports. |

Normalization rules¶

Treat PI, co-PI, faculty, and named key personnel in senior-personnel blocks as senior: true unless the user or workbook clearly indicates otherwise.
Treat technicians, postdocs, graduate students, undergraduate students, hourly staff, and TBN staff as senior: false unless the user explicitly identifies them as senior personnel.
Use academic_months, summer_months, or calendar_months as the effort unit for senior personnel when the workbook reports person-months; use percent for appointment-based staff.
Use dollars exactly as supported by workbook evidence. Do not introduce cents unless the workbook uses cents.
If a formula-driven category total is blank but detail lines are non-zero, sum the detail lines and populate the category year amount from the sum.
Do not fabricate names, scope statements, destinations, or rate-agreement citations. When a detail field is unsupported by workbook evidence, set it to null (or omit for optional fields).
Every array of year-indexed amounts must have length equal to project_years. Pad with zeros for categories that are empty in some years.

Quality standards¶

No fabricated numbers. Every dollar figure and percentage traces to workbook evidence.
Lengths match project_years. All yearAmounts and amount_by_year arrays have the declared length.
Schema conformance. Output validates against #/$defs/input in ../nsf-budget-justification-udm/schema.json.
Preserve uncertainty. Missing descriptions, blank formula totals, and summary/detail conflicts translate into null optional fields, not silent reconciliation.

Produce the JSON object now.

Changelog¶

Source: CHANGELOG.md.

All notable changes to this component. Versions follow semver: MAJOR for output-contract breaks, MINOR for backward-compatible additions, PATCH for wording or clarity.

[1.0.0] — 2026-04-24¶

Initial version. Extracted from the one-shot nsf-budget-spreadsheet-justification-udm prompt to expose the spreadsheet-interpretation half as an independently versioned component.