Skip to content

nsf-award-notice-extraction-udm

Slugnsf-award-notice-extraction-udm
Version1.1.0
Statusstable
Last fully evaluated1.0.0
Eval statebehind (1.0.0 validated)
Categoryextraction
Domainresearch-administration
Manifestationsprompt, skill
Created2026-04-18
Updated2026-04-20

Tags: nsf award notice noa amendment udm structured-extraction json

Audience: ingest-pipelines, sponsored-programs-staff

Manifestations in repo: prompt.md · skill/SKILL.md

Extracts an NSF Award Notice (initial obligation or amendment) into a single JSON object conforming to this repo's UDM-aligned extraction contract. Designed for the NSF notice format that arrives as a PDF printed from Outlook after OSP receipt — the email header, boxed notice body, and NSF-format 18-category budget table.

Output contract: see schema.json

Inputs

A PDF (or pasted text) of an NSF Award Notice. Typical structure:

  • Email header (Subject, Date, From, To, CC)
  • "NATIONAL SCIENCE FOUNDATION — Award Notice" boxed header with Award Number (FAIN) and Amendment Number
  • Recipient Information (legal name, address, email, UEI)
  • Amendment Information (type, date, number, proposal number, narrative description)
  • Award Information (instrument, dates, title, managing division, R&D flag, funding opportunity, assistance listing)
  • Funding Information (amount obligated, total intended, cost share, total to date, expenditure limitation)
  • Project Personnel (PI and Co-PIs, each with email and organization)
  • NSF Contact Information (managing grants official, awarding official, program officer)
  • General Terms and Conditions (citations to the NSF Act, RTCs, Agency Specific Requirements, PAPPG)
  • Budget table in NSF-format (categories A–M plus subcategories)
  • Indirect Cost Rate table (rate + base)

Outputs

A single JSON object with:

  • Identity scalars — award number, title, sponsor, managing division, instrument, amendment info, funding opportunity, CFDA
  • Date and funding scalars — period of performance, obligations, cost share, indirect cost rate and base, fees
  • Nested objectsrecipient_organization, current_budget_period, optional source_provenance
  • Eight arrays (always present; [] when empty) — project_personnel, sponsor_contacts, budget_categories, subawards, linked_awards, terms_and_conditions, special_conditions

See schema.json for the authoritative definition and prompt.md for the extraction rules the prompt follows. This schema is repo-local and UDM-aligned; it is not a copy of the canonical shared UDM repo schema.

Amendment 000 vs. subsequent amendments

The same schema covers both. amendment_number == "000" indicates an initial obligation (the ingest service creates an Award record); other values indicate modifications (the ingest service creates an AwardModification record linked to the Award named by award_number).

Subaward inference

NSF notices rarely itemize subrecipients. The extractor infers a subaward entry when a Co-PI's organization differs from the recipient AND the budget shows a non-zero Subawards line in category G; inferred entries carry inferred: true and leave amounts null. Explicit subrecipient enumerations (uncommon but possible in amendments) are emitted with inferred: false and all provided fields populated. See prompt.md for the precise rule.

Collaborative Research

Detected heuristically from the title prefix "Collaborative Research:" and/or explicit narrative. Sibling FAINs, when named in the notice, go in linked_awards with relationship: "collaborative_sibling".

Manifestations

Schema

schema.json is a JSON Schema (draft 2020-12) defining the full output contract. Validates with any conforming validator; downstream ingest pipelines should gate on this schema.

Relationship to other components

Concern rfp-extraction-udm nsf-award-notice-extraction-udm
Input Funding announcement (pre-award) Award notice (post-award)
Ingest target RFA record Award or AwardModification
Requirements Nine categorized requirement arrays Budget categories, terms, special conditions, etc.
Multi-round Handled in special_conditions Not applicable (one notice per amendment)

Evals

See evals/ for reference inputs and known-good outputs. The initial case is FAIN 2427549 — a Standard Grant new-project notice to a single prime institution with an implied subaward to a Co-PI at a different institution, exercising the subaward inference rule.

Provenance

Schema designed 2026-04-18 from the NSF Award Notice template as exemplified by FAIN 2427549 (Amendment 000, September 2024).

Contract scope

  • Output format: json_object

  • Contract scope: shared_udm_semantics_repo_local_schema

  • Validation surfaces: json_schema, golden_eval_cases, published_eval_reports

  • Schema entrypoints: #

  • Notes: Repo-local extraction schema aligned to shared AI4RA-UDM award and modification semantics. This schema is not itself the canonical shared UDM repository contract.

  • Machine-readable catalog entry: component_catalog.json

Triad integration

  • UDM alignment: shared_udm_semantics_repo_local_schema — Field meanings track shared UDM award concepts, but the JSON extraction surface and several convenience fields are defined locally in this repo.

  • Evaluation dataset: real.nsf_awards (aligned_ground_truth_dataset) — The evaluation-data-sets repo carries authorized, de-identified NSF award extractions in the same task family and is the main external dataset counterpart for this component.

  • Harness notes: Prefer prompt.md as the canonical prompt manifestation, validate output against schema.json, and pair evaluations with real.nsf_awards when the harness needs de-identified external ground truth.

  • Related component: vandalizer-to-udm-translation (owns_target_schema) — The translator delegates its output contract to this schema.

Prompt body

Source: prompt.md.

Show prompt

NSF Award Notice Structured Extraction — UDM JSON

Purpose: Extract an NSF Award Notice (or subsequent amendment notice) into a single JSON object conforming to this component's repo-local, UDM-aligned schema. The ingest service decides whether the output creates an Award record or an AwardModification record based on amendment_number.

Expected input: A PDF-printed NSF Award Notice email. These documents are typically printed from Outlook after receipt and contain email headers, a boxed narrative header, recipient information, amendment information, award information, funding information, project personnel, NSF contact information, general terms and conditions citations, and an NSF-format budget table (categories A–M).

Expected output: A single JSON object that validates against schema.json. No prose, no markdown outside the JSON.


Prompt

You are a structured-data extraction engine for research administration. Read the provided NSF Award Notice and produce a single JSON object capturing every structured field the notice contains. The output must be suitable for direct ingest into a UDM-conformant data store.

Output contract

Emit one JSON object — nothing else. No preamble, no closing commentary, no markdown outside the JSON. If the runtime requires a fenced block, wrap the object in a single ```json ... ``` block and emit nothing else.

The object has four layers:

  1. Identity and classification scalars (award_number, amendment_number, titles, sponsor, instrument, division, etc.)

  2. Dates and funding scalars (period of performance, obligations, cost share, expenditure limitation, indirect cost rate and base)

  3. Nested objects (recipient_organization, current_budget_period)

  4. Eight required arrays, each present even when empty: project_personnel, sponsor_contacts, budget_categories, subawards, linked_awards, terms_and_conditions, special_conditions. Emit [] rather than null when a category has no entries.

Amendment 000 vs. subsequent amendments

  • amendment_number == "000" means this notice establishes the award (initial obligation). The ingest service will create an Award record from this output.

  • amendment_number other than "000" means this notice amends an existing award. The ingest service will create an AwardModification record linked to the Award identified by award_number.

The same schema applies in both cases. Do not emit different shapes for initial vs. amendment notices.

Scalar field rules

  • award_id"NSF-<award_number>" for NSF notices (e.g., "NSF-2427549"). Null when a canonical identifier cannot be formed.

  • award_number — the FAIN exactly as stated (e.g., "2427549"). Required.

  • award_title — full project title as stated. Required.

  • sponsor_name — always "National Science Foundation" for NSF notices. Do not abbreviate.

  • managing_division — verbatim division code from the notice (e.g., "OIA", "BIO", "ENG").

  • award_instrument — verbatim (e.g., "Standard Grant", "Continuing Grant", "Cooperative Agreement", "Fellowship", "IPA").

  • is_research_and_development — boolean from the "Research and Development Award" field. Null when the field is absent.

  • is_collaborative_research — true when the title begins "Collaborative Research:" or the notice explicitly indicates a collaborative configuration. Sibling FAINs (if named in the notice) go in linked_awards.

  • funding_opportunity_number / funding_opportunity_title — split the "Funding Opportunity" field (e.g., "PD 23-221Y Growing Research Access..." → number "PD 23-221Y", title "Growing Research Access...").

  • cfda_number / cfda_name — split the "Assistance Listing" field. Trailing parenthetical annotations like "(Predominant source of funding for SEFA reporting)" belong in cfda_name.

  • amendment_number — verbatim (e.g., "000", "001"). Required.

  • amendment_type — verbatim (e.g., "New Project", "Administrative", "No-Cost Extension", "Supplemental").

  • amendment_description — the full narrative text block that appears under "Amendment Description". Preserve verbatim. Also emit any obligations or restrictions embedded in this block as entries in special_conditions.

  • Dates — ISO YYYY-MM-DD. Convert US-format dates (09/10/2024) to ISO (2024-09-10).

  • Currency — plain numbers. Strip $, commas, and whitespace. $4,546,9034546903.

  • cost_share_approved_amount — emit 0 (not null) when the notice explicitly states no cost share is approved.

  • indirect_cost_rate_percent — plain number without %. 38.0000%38.0. When the notice lists multiple rates (tiered), emit the primary rate and document additional rates in special_conditions.

  • indirect_cost_base — map stated base to the schema enum: "Modified Total Direct Costs" → "MTDC"; "Total Direct Costs" → "TDC"; "Total Federal Funds Awarded" → "TFFA"; "Salaries and Wages" → "SWB". Otherwise "Other".

  • fees — USD fees line when the NSF budget table prints a distinct "Fees" row between category J and the Amount of this Request (category L). Plain number. Emit 0 when explicitly printed as zero; null when the row is absent from the notice. Do NOT also emit this value as a budget_categories entry (the UDM budget code enum is A–M only).

  • award_received_date — ISO date the recipient received the notice, taken from the email header (Date: line). Use the date in the email's destination timezone. Null when absent.

Recipient organization

Populate recipient_organization from the "RECIPIENT INFORMATION" section:

  • legal_name — from "Recipient (Legal Business Name)".

  • address — from "Recipient Address" (preserve the whole single-line address).

  • email — from "Official Recipient Email Address".

  • uei — from "Unique Entity Identifier (UEI)".

Do not attempt to resolve the recipient to an existing organization record. The ingest service handles that.

Current budget period

Populate current_budget_period with the period covered by this specific obligation:

  • For a Standard Grant the whole Award Period of Performance is one period: start_date, end_date, obligated_amount = amount_obligated_this_amendment, direct_cost = H, indirect_cost = I, period_number = 1, period_label = null.

  • For a Continuing Grant the period is typically one year of the multi-year project; use the dates the notice names for this obligation. If the notice does not name per-period dates distinct from the overall period of performance, reuse the overall dates.

Project personnel

List every person under "PROJECT PERSONNEL" in the order they appear. For each:

  • role — verbatim role label ("Principal Investigator""PI"; "co-Principal Investigator""co-PI"; keep other labels as stated).

  • name, email, organization — verbatim.

  • is_at_recipient_institution — true when the person's organization matches recipient_organization.legal_name (case-insensitive, ignoring punctuation). This flag drives subaward inference; always populate it.

List everyone under "NSF CONTACT INFORMATION" with their stated role. For each:

  • role — verbatim (e.g., "Managing Grants Official", "Awarding Official", "Managing Program Officer").

  • name, email, phone — verbatim. Null when a field is not printed.

Budget categories

Emit one entry per line item actually printed in the NSF-format budget table. Include stated totals (H, J, L) as their own entries — do not recompute. Do not emit lines that are absent from the table.

For each entry:

  • code — the top-level letter A–M. Required.

  • subcode — short identifier for subcategories, using these conventions:

  • B sub-types: "PostDoctoral", "OtherProfessionals", "GraduateStudents", "UndergraduateStudents", "SecretarialClerical", "Other"

  • E sub-types: "Domestic", "International" (or "Foreign" if the notice uses that term)

  • F sub-types: "Stipends", "Travel", "Subsistence", "Other", "TotalParticipants" (for the participant count line), "Total" (for the Total Participant Costs line)

  • G sub-types: "MaterialsSupplies", "Publication", "ConsultantServices", "ComputerServices", "Subawards", "Other", "Total"

  • H / I / J / L / M: null subcode (these are single-line totals)

  • A / C / D / K: null subcode unless the notice itemizes further

  • label — the line's printed label, verbatim.

  • amount — plain number. Null for count-only lines (e.g., "Total Number of Participants").

  • count, calendar_months, academic_months, summer_months — fill only the fields the line prints.

Subawards

NSF notices rarely enumerate subrecipients with names and allocations. Produce subaward entries using two mechanisms:

  1. Explicit entries. When the notice names a subrecipient (some amendments do), emit an entry with inferred: false and fill every field the notice provides.

  2. Inferred entries. When BOTH of these conditions hold:

  3. at least one Co-PI's is_at_recipient_institution is false, AND

  4. the G.Subawards budget line has a non-zero amount

Emit one inferred entry per non-recipient Co-PI organization, with:

  • subawardee_name: the Co-PI's organization

  • pi_name, pi_email: the Co-PI's name and email

  • description: a note explaining the basis (e.g., "Implied subaward based on Co-PI <name> at <organization>. Aggregate Subawards line in Budget Category G totals $<amount>; individual subawardee allocation is not broken out in the notice.")

  • obligated_amount, anticipated_amount, uei: null

  • inferred: true

If a Co-PI is at the recipient institution, do not infer a subaward from their presence. If multiple Co-PIs share the same non-recipient organization, emit one inferred entry per Co-PI (downstream can dedupe).

Linked awards

Populate only when the notice explicitly mentions a related award (parent, sibling, predecessor, companion, supplement). For Collaborative Research, sibling FAINs may be listed in the narrative or in the title. Do not guess.

Terms and conditions

One entry per distinct authority cited in the notice. Typical citations on NSF notices:

  • The National Science Foundation Act of 1950 (42 U.S.C. 1861-75)

  • Research Terms and Conditions (RTCs), with a dated version

  • NSF Agency Specific Requirements, with a dated version

  • NSF Proposal & Award Policies & Procedures Guide (PAPPG), usually cited by chapter for a specific purpose

For each entry:

  • citation — the name of the authority (do not include the date or URL in this field).

  • citation_date — ISO date when stated.

  • url — any URL printed alongside the citation.

  • applicability_notes — any scoping note (e.g., "Chapter X.A.3.a cited for rebudgeting authority" or "42 U.S.C. 1861-75. Primary statutory authority for this award.").

Special conditions

Walk amendment_description and any other narrative section and emit one entry per distinct condition. NSF boilerplate that commonly appears:

  • Participant support cost segregation — category participant_support, action_required true.

  • Subaward authorization — category subaward, action_required true.

  • Cost-sharing verification — category budget when the notice names a cost-share amount.

  • Data Management and Sharing plan compliance — category data_sharing when cited.

  • Program-specific reporting requirements — category reporting.

For each entry:

  • label — short human-readable name.

  • code — SCREAMING_SNAKE_CASE deterministic code (e.g., "PARTICIPANT_SUPPORT_SEGREGATION", "SUBAWARD_AUTHORIZED"). Omit when in doubt.

  • description — verbatim condition text from the notice. Preserve prescribed phrases.

  • category — from the schema enum.

  • action_requiredtrue when the condition requires the recipient to take an action (written policies, separate ledgers, prior approval, submission, etc.).

  • source_section — the section of the notice from which the condition was extracted (e.g., "Amendment Description", "General Terms and Conditions").

Source provenance

source_provenance is optional. When the runtime supplies a source document identifier (filename, URI, or hash) or expects an audit trail, populate:

  • extractor: "nsf-award-notice-extraction-udm"

  • extractor_version: this prompt's version ("1.1.0")

  • source_document: the identifier the runtime provided (do not invent one)

  • extracted_at: the current timestamp, when the runtime provides a clock

  • review_annotations: [] unless the runtime hands in reviewer signals to carry through

Omit the object (emit null) when no provenance information is available. Do not fabricate extractor metadata.

Extraction procedure

  1. Read the entire notice. Email header, header box, each labeled section, budget table, and all footer narrative. Printed-to-PDF email copies often have header text above the notice body — that email metadata is only relevant for award_received_date.

  2. Populate scalars — identity fields first, then dates, then funding, then indirect cost.

  3. Populate recipient_organization and current_budget_period.

  4. Walk the personnel and contact sections to fill project_personnel and sponsor_contacts.

  5. Walk the budget table top to bottom, emitting one entry per printed line.

  6. Apply subaward inference using the Co-PI / G.Subawards rule.

  7. Scan narrative sections for terms, citations, and special conditions.

  8. Check required array presence — every array must be present, even if empty.

Quality standards

  1. Completeness — every printed value appears in the output.

  2. Precision — use the notice's exact wording for prescribed text; numbers preserved verbatim after format normalization (ISO dates, plain currency).

  3. Typed fidelity — JSON types, not strings. 0, not "0". false, not "false". null, not "null".

  4. Schema conformance — the output validates against schema.json. All required arrays present.

  5. No fabrication — do not infer values the notice does not state, except for the specifically-defined subaward inference rule.

  6. Preserve anomalies — if a field is misspelled, misaligned, or internally inconsistent in the notice, preserve the value as stated and add a note to the appropriate entry's description (for list items) or leave the scalar as stated.

Produce the JSON now.

Output schema

Source: schema.json.

Show schema.json
{

  "$schema": "https://json-schema.org/draft/2020-12/schema",

  "$id": "https://github.com/AI4RA/prompt-library/components/nsf-award-notice-extraction-udm/schema.json",

  "title": "NSF Award Notice Extraction \u2014 UDM Output",

  "description": "Flat JSON contract for an NSF Award Notice (or amendment notice) extracted to the Unified Data Model as extended for research administration. One JSON object per notice. Amendment 000 represents the initial award; subsequent amendment numbers represent modifications to an existing award. The ingest service decides whether to create an Award record or an AwardModification record based on amendment_number.",

  "version": "1.1.0",

  "type": "object",

  "additionalProperties": false,

  "required": [

    "award_number",

    "award_title",

    "sponsor_name",

    "amendment_number",

    "recipient_organization",

    "project_personnel",

    "sponsor_contacts",

    "budget_categories",

    "subawards",

    "linked_awards",

    "terms_and_conditions",

    "special_conditions"

  ],

  "properties": {

    "award_id": {

      "type": [

        "string",

        "null"

      ],

      "description": "Stable identifier for the award in the form '<SPONSOR_CODE>-<AWARD_NUMBER>' (e.g., 'NSF-2427549'). Null when no canonical form is appropriate; the ingest service may generate one."

    },

    "award_number": {

      "type": "string",

      "minLength": 1,

      "description": "Federal Award Identification Number (FAIN) as published by the sponsor. For NSF, the numeric award ID shown as 'Award Number (FAIN)'."

    },

    "sponsor_award_number": {

      "type": [

        "string",

        "null"

      ],

      "description": "Alternate sponsor-side award identifier when distinct from award_number."

    },

    "award_title": {

      "type": "string",

      "minLength": 1,

      "description": "Full project title as stated in the notice."

    },

    "sponsor_name": {

      "type": "string",

      "minLength": 1,

      "description": "Full name of the sponsoring agency (e.g., 'National Science Foundation'). Do not abbreviate."

    },

    "managing_division": {

      "type": [

        "string",

        "null"

      ],

      "description": "Sponsor sub-division or directorate that owns the award (e.g., 'OIA', 'BIO', 'ENG'). Verbatim from the notice."

    },

    "award_instrument": {

      "type": [

        "string",

        "null"

      ],

      "description": "Award instrument type (e.g., 'Standard Grant', 'Continuing Grant', 'Cooperative Agreement', 'Fellowship', 'IPA'). Verbatim from the notice."

    },

    "award_status": {

      "type": [

        "string",

        "null"

      ],

      "description": "Lifecycle state at the time of this notice. Default 'Active' for initial obligations unless the notice explicitly indicates otherwise."

    },

    "is_research_and_development": {

      "type": [

        "boolean",

        "null"

      ],

      "description": "True when the notice explicitly flags the award as an R&D Award. Null when the flag is not present."

    },

    "is_collaborative_research": {

      "type": "boolean",

      "description": "True when the award title begins with 'Collaborative Research:' or the notice explicitly indicates a Collaborative Research configuration. Sibling awards are captured in linked_awards."

    },

    "funding_opportunity_number": {

      "type": [

        "string",

        "null"

      ],

      "description": "Program announcement / solicitation identifier referenced by the notice (e.g., 'PD 23-221Y', 'NSF 26-508')."

    },

    "funding_opportunity_title": {

      "type": [

        "string",

        "null"

      ],

      "description": "Program name accompanying funding_opportunity_number."

    },

    "cfda_number": {

      "type": [

        "string",

        "null"

      ],

      "description": "CFDA / Assistance Listing number. Comma-separated when multiple are listed."

    },

    "cfda_name": {

      "type": [

        "string",

        "null"

      ],

      "description": "CFDA / Assistance Listing descriptive name."

    },

    "proposal_number": {

      "type": [

        "string",

        "null"

      ],

      "description": "Sponsor's proposal number tying this award to its originating proposal. Often identical to award_number for NSF."

    },

    "amendment_number": {

      "type": "string",

      "minLength": 1,

      "description": "Amendment sequence. '000' denotes the initial obligation / new project; subsequent values ('001', '002', ...) denote modifications."

    },

    "amendment_type": {

      "type": [

        "string",

        "null"

      ],

      "description": "Type of amendment as named in the notice (e.g., 'New Project', 'Administrative', 'No-Cost Extension', 'Supplemental', 'Budget Reallocation')."

    },

    "amendment_date": {

      "type": [

        "string",

        "null"

      ],

      "format": "date",

      "description": "ISO date of the amendment action."

    },

    "amendment_description": {

      "type": [

        "string",

        "null"

      ],

      "description": "Free-text narrative block describing the action and any special conditions embedded in it. Preserve verbatim where possible; structured items derived from this block also appear in special_conditions."

    },

    "award_date": {

      "type": [

        "string",

        "null"

      ],

      "format": "date",

      "description": "ISO date on which the sponsor executed the award (may equal amendment_date for initial obligations)."

    },

    "award_received_date": {

      "type": [

        "string",

        "null"

      ],

      "format": "date",

      "description": "ISO date on which the recipient received the notice, when that information is preserved in the document (e.g., email header date). Null when not present."

    },

    "start_date": {

      "type": [

        "string",

        "null"

      ],

      "format": "date",

      "description": "Start date of the period of performance (ISO)."

    },

    "end_date": {

      "type": [

        "string",

        "null"

      ],

      "format": "date",

      "description": "End date of the period of performance (ISO)."

    },

    "amount_obligated_this_amendment": {

      "type": [

        "number",

        "null"

      ],

      "description": "USD obligated by this specific notice / amendment. Plain number, no currency symbol."

    },

    "total_intended_amount": {

      "type": [

        "number",

        "null"

      ],

      "description": "Sponsor's total intended award amount across the full period of performance."

    },

    "total_obligated_to_date": {

      "type": [

        "number",

        "null"

      ],

      "description": "Cumulative obligated amount after this amendment (includes all prior amendments)."

    },

    "cost_share_approved_amount": {

      "type": [

        "number",

        "null"

      ],

      "description": "Approved cost share / matching amount in USD. Emit 0 (not null) when the notice states no cost share is approved."

    },

    "expenditure_limitation": {

      "type": [

        "string",

        "null"

      ],

      "description": "Expenditure limitation as stated in the notice (e.g., 'Not Applicable', or a specific rule). Verbatim."

    },

    "indirect_cost_rate_percent": {

      "type": [

        "number",

        "null"

      ],

      "description": "Indirect cost (F&A) rate as a percentage (e.g., 38.0 for 38%). Do not include the '%' symbol."

    },

    "indirect_cost_base": {

      "type": [

        "string",

        "null"

      ],

      "enum": [

        "MTDC",

        "TDC",

        "TFFA",

        "SWB",

        "Other",

        null

      ],

      "description": "Indirect cost base. 'MTDC' = Modified Total Direct Costs, 'TDC' = Total Direct Costs, 'TFFA' = Total Federal Funds Awarded, 'SWB' = Salaries and Wages Base. Use 'Other' when the base does not match these codes."

    },

    "fees": {

      "type": [

        "number",

        "null"

      ],

      "description": "USD fees line on the NSF-format budget, when printed as a distinct row outside categories A\u2013M (some NSF budget tables print a 'Fees' row between J and L). Plain number; 0 when explicitly stated as zero; null when absent from the notice."

    },

    "recipient_organization": {

      "type": "object",

      "description": "Institution receiving the award.",

      "additionalProperties": false,

      "required": [

        "legal_name"

      ],

      "properties": {

        "legal_name": {

          "type": "string",

          "minLength": 1,

          "description": "Legal business name as stated in the notice."

        },

        "address": {

          "type": [

            "string",

            "null"

          ],

          "description": "Street address including city, state, and ZIP. Verbatim."

        },

        "email": {

          "type": [

            "string",

            "null"

          ],

          "description": "Official recipient email address (often the institutional sponsored programs office inbox)."

        },

        "uei": {

          "type": [

            "string",

            "null"

          ],

          "description": "Unique Entity Identifier (12 characters)."

        }

      }

    },

    "current_budget_period": {

      "type": [

        "object",

        "null"

      ],

      "description": "The budget period this notice covers. For Standard Grants the entire period of performance is a single budget period. For Continuing Grants this is the specific period being obligated.",

      "additionalProperties": false,

      "required": [

        "start_date",

        "end_date",

        "obligated_amount"

      ],

      "properties": {

        "period_number": {

          "type": [

            "integer",

            "null"

          ],

          "minimum": 1,

          "description": "Ordinal of this budget period within the award (1 for the first period)."

        },

        "period_label": {

          "type": [

            "string",

            "null"

          ],

          "description": "Optional label (e.g., 'Year 1', 'FY2025')."

        },

        "start_date": {

          "type": "string",

          "format": "date"

        },

        "end_date": {

          "type": "string",

          "format": "date"

        },

        "direct_cost": {

          "type": [

            "number",

            "null"

          ],

          "description": "Total direct cost (category H in the NSF-format budget)."

        },

        "indirect_cost": {

          "type": [

            "number",

            "null"

          ],

          "description": "Total indirect / F&A cost (category I)."

        },

        "obligated_amount": {

          "type": "number",

          "description": "Obligated amount for this period (category J on a single-period award, or the period-specific obligation for multi-period continuations)."

        }

      }

    },

    "project_personnel": {

      "type": "array",

      "description": "Every named project participant listed in the notice. Captures role, name, email, organization, and whether they are at the recipient institution.",

      "items": {

        "type": "object",

        "additionalProperties": false,

        "required": [

          "role",

          "name"

        ],

        "properties": {

          "role": {

            "type": "string",

            "description": "Role label as stated in the notice (e.g., 'PI', 'co-PI', 'Senior Personnel', 'Key Person')."

          },

          "name": {

            "type": "string",

            "minLength": 1,

            "description": "Full name as stated."

          },

          "email": {

            "type": [

              "string",

              "null"

            ]

          },

          "organization": {

            "type": [

              "string",

              "null"

            ],

            "description": "The person's home organization as stated in the notice. Not resolved to an Organization_ID by the extractor."

          },

          "is_at_recipient_institution": {

            "type": [

              "boolean",

              "null"

            ],

            "description": "True when the person's organization matches recipient_organization.legal_name. This flag drives subaward inference."

          }

        }

      }

    },

    "sponsor_contacts": {

      "type": "array",

      "description": "Sponsor-side contacts listed in the notice (grants officers, program officers). Distinct from project_personnel; these are not on the award team.",

      "items": {

        "type": "object",

        "additionalProperties": false,

        "required": [

          "role",

          "name"

        ],

        "properties": {

          "role": {

            "type": "string",

            "description": "Role label as stated (e.g., 'Managing Grants Official', 'Awarding Official', 'Managing Program Officer')."

          },

          "name": {

            "type": "string",

            "minLength": 1

          },

          "email": {

            "type": [

              "string",

              "null"

            ]

          },

          "phone": {

            "type": [

              "string",

              "null"

            ]

          }

        }

      }

    },

    "budget_categories": {

      "type": "array",

      "description": "Every stated line item of the NSF-format budget (A\u2013M and their subcategories). Includes stated totals (H, J, L) as separate entries \u2014 do not re-compute. Omit entries that are not printed in the notice.",

      "items": {

        "type": "object",

        "additionalProperties": false,

        "required": [

          "code",

          "label"

        ],

        "properties": {

          "code": {

            "type": "string",

            "pattern": "^[A-M]$",

            "description": "Top-level category letter A\u2013M (NSF-format budget)."

          },

          "subcode": {

            "type": [

              "string",

              "null"

            ],

            "description": "Subcategory name when the line falls under a parent category (e.g., 'PostDoctoral', 'OtherProfessionals', 'ParticipantSupportStipends', 'Subawards'). Null for top-level lines."

          },

          "label": {

            "type": "string",

            "minLength": 1,

            "description": "Line label as printed (e.g., 'Senior Personnel', 'Post Doctoral Scholars', 'Participant Support Costs Travel')."

          },

          "amount": {

            "type": [

              "number",

              "null"

            ],

            "description": "Dollar amount. Plain number. Null when the line reports a count only (e.g., 'Total Number of Participants')."

          },

          "count": {

            "type": [

              "number",

              "null"

            ],

            "description": "Person count or participant count associated with this line. Null when the line is amount-only."

          },

          "calendar_months": {

            "type": [

              "number",

              "null"

            ]

          },

          "academic_months": {

            "type": [

              "number",

              "null"

            ]

          },

          "summer_months": {

            "type": [

              "number",

              "null"

            ]

          }

        }

      }

    },

    "subawards": {

      "type": "array",

      "description": "Subrecipient entries. NSF notices rarely itemize subawardees with allocations \u2014 they show a single Subawards line in category G and list Co-PIs at external institutions. Emit explicit entries when the notice names them. Also emit an INFERRED entry when a Co-PI's organization differs from the recipient AND the budget shows a non-zero Subawards line; set inferred=true and leave amounts null.",

      "items": {

        "type": "object",

        "additionalProperties": false,

        "required": [

          "subawardee_name",

          "inferred"

        ],

        "properties": {

          "subawardee_name": {

            "type": "string",

            "minLength": 1

          },

          "pi_name": {

            "type": [

              "string",

              "null"

            ]

          },

          "pi_email": {

            "type": [

              "string",

              "null"

            ]

          },

          "description": {

            "type": [

              "string",

              "null"

            ],

            "description": "Scope of work reference or, for inferred entries, a note explaining the basis of inference and the aggregate Subawards line amount from the budget."

          },

          "obligated_amount": {

            "type": [

              "number",

              "null"

            ]

          },

          "anticipated_amount": {

            "type": [

              "number",

              "null"

            ]

          },

          "uei": {

            "type": [

              "string",

              "null"

            ]

          },

          "inferred": {

            "type": "boolean",

            "description": "True when the entry was derived from cross-referencing Co-PI organizations against the recipient and the Subawards budget line rather than an explicit enumeration in the notice."

          }

        }

      }

    },

    "linked_awards": {

      "type": "array",

      "description": "Related awards referenced by this notice. Populated when the notice mentions a parent, sibling (Collaborative Research), predecessor, or companion award.",

      "items": {

        "type": "object",

        "additionalProperties": false,

        "required": [

          "relationship",

          "award_number"

        ],

        "properties": {

          "relationship": {

            "type": "string",

            "enum": [

              "collaborative_sibling",

              "parent",

              "predecessor",

              "companion",

              "supplement_to",

              "other"

            ]

          },

          "award_number": {

            "type": "string",

            "minLength": 1

          },

          "institution": {

            "type": [

              "string",

              "null"

            ]

          },

          "notes": {

            "type": [

              "string",

              "null"

            ]

          }

        }

      }

    },

    "terms_and_conditions": {

      "type": "array",

      "description": "Cited authorities governing the award (statutes, agency-wide terms, agency-specific requirements, policy guide chapters). One entry per distinct citation.",

      "items": {

        "type": "object",

        "additionalProperties": false,

        "required": [

          "citation"

        ],

        "properties": {

          "citation": {

            "type": "string",

            "minLength": 1,

            "description": "Name of the authority (e.g., 'Research Terms and Conditions', 'NSF Agency Specific Requirements', 'NSF Proposal & Award Policies & Procedures Guide (PAPPG)')."

          },

          "citation_date": {

            "type": [

              "string",

              "null"

            ],

            "format": "date",

            "description": "ISO date version of the cited authority, when stated."

          },

          "url": {

            "type": [

              "string",

              "null"

            ],

            "format": "uri"

          },

          "applicability_notes": {

            "type": [

              "string",

              "null"

            ],

            "description": "Any scoping notes from the notice (e.g., a specific PAPPG chapter cited for a specific purpose)."

          }

        }

      }

    },

    "special_conditions": {

      "type": "array",

      "description": "Award-specific conditions and obligations extracted from the amendment description and any other narrative in the notice. Categorized for downstream workflow routing.",

      "items": {

        "type": "object",

        "additionalProperties": false,

        "required": [

          "label",

          "category"

        ],

        "properties": {

          "label": {

            "type": "string",

            "minLength": 1

          },

          "code": {

            "type": [

              "string",

              "null"

            ],

            "pattern": "^[A-Z0-9_]{1,50}$"

          },

          "description": {

            "type": [

              "string",

              "null"

            ],

            "description": "Detailed condition language, preserving the notice's wording for prescribed terms."

          },

          "category": {

            "type": "string",

            "enum": [

              "reporting",

              "scope",

              "budget",

              "subaward",

              "participant_support",

              "personnel",

              "compliance",

              "publications",

              "data_sharing",

              "other"

            ]

          },

          "action_required": {

            "type": [

              "boolean",

              "null"

            ],

            "description": "True when the condition requires an action (written policies, separate ledgers, prior approval, etc.)."

          },

          "source_section": {

            "type": [

              "string",

              "null"

            ],

            "description": "Section of the notice from which the condition was extracted (e.g., 'Amendment Description', 'General Terms and Conditions')."

          }

        }

      }

    },

    "source_provenance": {

      "type": [

        "object",

        "null"

      ],

      "description": "Optional metadata describing how this UDM object was produced. Downstream ingest can use it for audit trails, cache keys, re-processing decisions, and surfacing reviewer annotations. Populated by direct extractors (prompt/skill) or by translators that convert a different extractor's output into this UDM shape. Absent / null when no provenance is recorded.",

      "additionalProperties": false,

      "required": [

        "extractor"

      ],

      "properties": {

        "extractor": {

          "type": "string",

          "minLength": 1,

          "description": "Name of the system that produced this UDM object (e.g., 'nsf-award-notice-extraction-udm', 'vandalizer-to-udm-translation', 'manual-entry')."

        },

        "extractor_version": {

          "type": [

            "string",

            "null"

          ],

          "description": "Version of the extractor (semver), when known."

        },

        "source_document": {

          "type": [

            "string",

            "null"

          ],

          "description": "Identifier (filename, URI, or hash) of the upstream source document the extractor processed. Do not include recipient PII beyond what downstream systems already hold."

        },

        "upstream_extractor": {

          "type": [

            "string",

            "null"

          ],

          "description": "For translators only: the name of the upstream extractor whose output was translated into this UDM object (e.g., 'Vandalizer')."

        },

        "upstream_extractor_version": {

          "type": [

            "string",

            "null"

          ],

          "description": "Version identifier of the upstream extractor, when known."

        },

        "extracted_at": {

          "type": [

            "string",

            "null"

          ],

          "format": "date-time",

          "description": "ISO-8601 timestamp when the extraction (or translation) was performed."

        },

        "review_annotations": {

          "type": "array",

          "description": "Free-form reviewer signals carried through from the upstream extractor or added during translation. Captures things like 'fields highlighted for review' or 'low-confidence regions' without constraining downstream workflows to a fixed taxonomy. Emit [] when none.",

          "items": {

            "type": "object",

            "additionalProperties": false,

            "required": [

              "label"

            ],

            "properties": {

              "label": {

                "type": "string",

                "minLength": 1,

                "description": "Short human-readable name for the annotation (e.g., 'highlighted-yellow', 'low-confidence', 'manual-override')."

              },

              "value": {

                "type": [

                  "string",

                  "null"

                ],

                "description": "Annotation payload, when the annotation carries data (e.g., the verbatim text of a highlighted region)."

              },

              "target_field": {

                "type": [

                  "string",

                  "null"

                ],

                "description": "JSON Pointer or dotted path indicating which UDM field the annotation applies to. Null when the annotation is document-level."

              },

              "description": {

                "type": [

                  "string",

                  "null"

                ],

                "description": "Optional longer-form notes."

              }

            }

          }

        },

        "notes": {

          "type": [

            "string",

            "null"

          ],

          "description": "Free-form note field for provenance details that do not fit the structured fields above."

        }

      }

    }

  }

}

Evals

Reference cases

Golden cases under evals/cases/.

  • 2427549 — FAIN 2427549 / Amendment 000 (artifacts: input, expected)

Evaluation reports

Full evaluation runs — tables, charts, and findings rendered inline. Source under evals/reports/.

Changelog

Source: CHANGELOG.md.

All notable changes to this component. Versions follow semver: MAJOR for output-contract breaks (schema changes that drop or rename fields), MINOR for backward-compatible additions (new optional fields, new enum values, new manifestations), PATCH for wording or clarity with no behavior change expected.

The schema.json version is kept in lockstep with the component version.

[1.1.0] — 2026-04-20

  • Added top-level optional fees scalar (USD, nullable) to cover NSF budget tables that print a distinct Fees row between category J and the Amount of this Request (category L). Previously callers had to either widen the budget_categories.code regex (which would break schema consumers) or hide the value inside special_conditions. The budget_categories.code enum remains ^[A-M]$; Fees is now a first-class sibling scalar instead.
  • Added optional source_provenance object carrying extractor, extractor_version, source_document, upstream_extractor, upstream_extractor_version, extracted_at, review_annotations[], and notes. Enables audit trails for both direct extractors and translators that convert upstream extractor output into this UDM shape. review_annotations items carry label, value, target_field, and description so reviewer signals (e.g., "highlighted yellow in source") survive translation without a fixed taxonomy.
  • Updated prompt.md and skill/SKILL.md with extraction rules for both new fields. Extractors populate source_provenance only when the runtime provides a source document id or timestamp — no fabrication.
  • Backward-compatible. Consumers reading 1.0.0 output will see no missing required fields; consumers targeting 1.1.0 should treat fees and source_provenance as optional.
  • Driven by the vandalizer-to-udm-translation translator (v0.1.0) that converts flat Vandalizer NSF extractions into this UDM shape. The Vandalizer output carries a Fees field and a reviewer-highlight annotation that neither fit cleanly under the 1.0.0 contract.

[1.0.0] — 2026-04-18

  • Initial version.
  • JSON Schema (schema.json) covering identity scalars, dates and funding, nested recipient_organization and current_budget_period, and eight categorized arrays (project personnel, sponsor contacts, NSF-format budget categories A–M, subawards, linked awards, terms and conditions, special conditions).
  • Canonical prompt (prompt.md) including the Amendment 000 vs. subsequent-amendment distinction, NSF-format budget line subcoding, and an explicit subaward inference rule for notices where Co-PIs at non-recipient institutions imply subrecipient arrangements not individually itemized in the notice.
  • Claude Skill manifestation (skill/SKILL.md) tuned for NSF NOA / award letter / award notice triggers.
  • First golden eval case: FAIN 2427549 — Standard Grant, Amendment 000, single prime (University of Idaho) with an inferred subaward to a Co-PI at Southern Utah University, exercising the subaward inference rule and the NSF-format budget table capture.