rfp-extraction-udm¶
rfp-extraction-udm1.0.01.0.0Tags: rfp rfa foa nofo grants udm structured-extraction json
Audience: ingest-pipelines, data-engineers, sponsored-programs-staff
Manifestations in repo: prompt.md · skill/SKILL.md
Extracts any federal funding announcement (RFP, RFA, FOA, NOFO, BAA, Dear Colleague Letter, or program solicitation) into a single JSON object conforming to this repo's UDM-aligned extraction contract. Companion to rfp-extraction, which produces the human-readable checklist form from the same source.
Output contract: see schema.json
Inputs¶
Full text of a funding announcement — pasted text, attached document, or URL.
Outputs¶
A single JSON object with two layers:
- Scalar opportunity metadata —
rfa_id,rfa_number,rfa_title,sponsor_name,program_code,announcement_url,opportunity_number,cfda_number,announcement_date,submission_deadline,loi_deadline,preproposal_deadline,funding_floor,funding_ceiling,expected_awards,max_duration_months,submission_method,rfa_status - Nine requirement arrays, each always present (
[]when empty): required_documents,formatting,review_criteria,eligibility,budget_constraints,compliance,submission_requirements,special_conditions,pappg_deviations
Every requirement entry shares a single shape: label, code, description, page_limit, format_spec, is_required, source_section, structured_rule_type, structured_rule_value. See schema.json for the authoritative definition and prompt.md for the encoding rules the extractor follows (date formats, currency, multi-round handling, structured rule types, etc.). The schema is repo-local and UDM-aligned rather than a checked-in copy of the shared UDM repo.
Relationship to rfp-extraction¶
| Concern | rfp-extraction |
rfp-extraction-udm |
|---|---|---|
| Audience | Humans (PIs, OSP reviewers) | Ingest pipelines, downstream systems |
| Output format | Markdown with GFM checkboxes | JSON matching schema.json |
| Structure | Section-ordered narrative checklist | Flat scalars + nine categorized arrays |
| Eval artifact | expected.md |
expected.json |
The two components are versioned independently. They are not required to share a version number; the canonical source for a given case (e.g., NSF 26-508) lives in both eval directories as the two different target formats.
Manifestations¶
prompt.md— canonical, LLM-agnostic promptskill/SKILL.md— Claude Skill form
Schema¶
schema.json is a JSON Schema (draft 2020-12) that defines the full output contract. Downstream consumers can validate extractor outputs programmatically; the eval suite uses it to validate expected.json. The schema's version matches the component version.
Evals¶
See evals/ for reference inputs and known-good outputs. The initial case is NSF 26-508 (TechAccess), a multi-round solicitation exercising LOI handling, per-institution proposal limits, cost-sharing prohibition, and several parent-guide deviations.
Provenance¶
Companion to rfp-extraction 1.0.0, built for workflows where a downstream system ingests the opportunity metadata and requirements into a UDM-backed data store. Schema designed 2026-04-18 alongside the NSF 26-508 golden case.
Contract scope¶
-
Output format:
json_object -
Contract scope:
shared_udm_semantics_repo_local_schema -
Validation surfaces:
json_schema,golden_eval_cases -
Schema entrypoints:
# -
Notes: Repo-local structured extraction schema for funding announcements. It aligns to shared UDM opportunity semantics but remains a prompt-library contract rather than a direct copy of the shared UDM repo.
-
Machine-readable catalog entry:
component_catalog.json
Triad integration¶
-
UDM alignment:
shared_udm_semantics_repo_local_schema— Scalar and requirement-row meanings are UDM-oriented, but the flat JSON extraction surface and categorized requirement arrays are defined locally. -
Evaluation datasets: no shared
evaluation-data-setscatalog entry recorded yet; current references are repo-local eval artifacts. -
Harness notes: Validate JSON outputs against schema.json and treat the paired rfp-extraction component as the human-readable sibling for the same task. External dataset-catalog registration for this task family is not yet in place.
-
Related component:
rfp-extraction(sibling_human_vs_structured_outputs) — Same source task family, different output contracts. -
Related component:
rfa-checklist-extraction-udm(sibling_same_source_different_shape) — Same document family, different output shape — rfp-extraction-udm is a nine-array requirement-centric flat contract; rfa-checklist-extraction-udm is an eight-section pre-award checklist contract.
Prompt body¶
Source: prompt.md.
Show prompt
RFP Structured Extraction — UDM JSON¶
Purpose: Extract every structured requirement and scalar metadata field from any federal funding announcement (RFP, RFA, FOA, NOFO, BAA, Dear Colleague Letter, or program solicitation) into a single JSON document conforming to this component's repo-local, UDM-aligned schema.
Expected input: Full text of the funding announcement.
Expected output: A single JSON object that validates against
schema.json. No prose, no markdown outside the JSON itself.
Prompt¶
You are a structured-data extraction engine for research administration. Your task is to read a federal funding announcement and produce a single JSON object that captures the opportunity's scalar metadata and all requirements in a form suitable for direct ingest into a UDM-conformant data store.
Output contract¶
Emit one JSON object — nothing else. No preamble, no closing commentary, no markdown fences. If your runtime requires fenced output, wrap in a single ```json ... ``` block and emit nothing outside it.
The object has two layers:
-
Scalar fields describing the opportunity itself (title, sponsor, deadlines, funding amounts, identifiers).
-
Nine list-valued fields, each holding an array of requirement entries for a specific category:
required_documents, formatting, review_criteria, eligibility, budget_constraints, compliance, submission_requirements, special_conditions, pappg_deviations.
Every one of the nine list fields MUST be present, even when empty. Emit [] rather than null for a category with no entries.
Scalar field rules¶
-
rfa_id—"<SPONSOR_CODE>-<OPPORTUNITY_NUMBER>"when both are available (e.g.,"NSF-26-508","NIH-PA-24-246","DOE-DE-FOA-0003117"). Null when no canonical identifier exists. -
rfa_number— the sponsor's announcement number without agency prefix (e.g.,"26-508","PA-24-246"). -
rfa_title— full title including any track or component. -
sponsor_name— full name of the lead sponsoring agency (e.g.,"National Science Foundation"). Do not emit an abbreviation or identifier; the ingest service resolves this to a sponsor organization record. When multiple agencies participate, name only the lead in this field and list the partners inspecial_conditions. -
program_code— sponsor's internal program/division code when stated; null otherwise. -
announcement_url,opportunity_number,cfda_number— emit as stated; multi-value CFDA lists go comma-separated. -
Dates — ISO format
YYYY-MM-DD. Drop time components; document any timezone or local-time rule as asubmission_requirementsentry. -
funding_floor,funding_ceiling— plain numbers in USD (no$, commas, or strings).funding_ceilingis the per-award total; when the announcement states a per-year cap over N years, emitper_year_cap × max_years. -
expected_awards— integer; use the upper bound when given as a range. -
max_duration_months— integer MONTHS. Convert years by multiplying by 12. Use the standard maximum; document conditional extensions inspecial_conditions. -
submission_method— portal name as stated (e.g.,"Research.gov"). -
rfa_status—"Active"unless the announcement explicitly says otherwise.
Multi-round announcements¶
When the announcement has more than one submission round, window, or cycle:
-
Scalar deadline fields (
submission_deadline,loi_deadline,preproposal_deadline) hold the LAST round's date, so the record remains open until the entire program closes. -
Emit one
special_conditionsentry per round with: -
label—"Round N — Full Proposal Deadline"(or"LOI Deadline","Preproposal Deadline"as appropriate) -
code—"ROUND_N_FULL_PROPOSAL_DEADLINE"style -
description— ISO date plus any local-time rule -
source_section— citation to where the round dates appear -
If any round's dates appear inconsistent (e.g., a Round 2 LOI that falls after the Round 2 full proposal), include the anomalous date verbatim and add a note in
descriptionflagging the inconsistency. Do not silently correct the source.
Requirement entry rules¶
Every entry in the nine list fields has this shape (see schema.json for the authoritative definition):
-
label(required) — human-readable name; preserve the announcement's exact wording for prescribed section headers, document titles, and review criteria. -
code— SCREAMING_SNAKE_CASE, max 50 chars, alphanumeric + underscore only. Generate deterministically from the label when no natural code exists (e.g.,"Project Summary"→"PROJECT_SUMMARY"). Emit null if in doubt. -
description— expanded context: conditions under which it applies, consequences of noncompliance, related requirements. Preserve specific numbers, limits, and prescribed phrases verbatim. -
page_limit— integer page or word limit forrequired_documentsandformattingentries; null elsewhere. -
format_spec— format string forrequired_documentsandformattingentries (e.g.,"PDF, single-column","11pt Times New Roman, 1-inch margins"); null elsewhere. -
is_required(required boolean) —truefor mandatory items and for "no restriction" eligibility confirmations (the check itself is required).falseonly when the announcement explicitly makes the item optional. -
source_section— citation to the announcement section or parent guide reference (e.g.,"PAPPG II.D.2.b","Section IV.C","NOT-AG-24-012"). Populate whenever traceable. -
structured_rule_type/structured_rule_value— machine-readable encoding for rules a downstream system might enforce programmatically. Use stable snake_case identifiers for types. See the table below.
Structured rule encodings¶
Use these structured_rule_type values consistently so downstream ingest can index rules by type:
| structured_rule_type | Typical structured_rule_value | Use in category |
|---|---|---|
| cost_sharing | "prohibited", "required", "optional", "required_<N>_percent" | budget_constraints |
| indirect_cost_cap_percent | "<N>" (e.g., "10", "25") | budget_constraints |
| indirect_cost_policy | e.g., "de_minimis_10", "institutional_negotiated", "not_allowed" | budget_constraints |
| salary_cap_annual_usd | "<amount>" (e.g., "221900") | budget_constraints |
| proposal_limit_per_institution | "<N>" | eligibility |
| proposal_limit_per_pi | "<N>" | eligibility |
| institution_type | e.g., "us_ihe", "nonprofit", "small_business", "unrestricted" | eligibility |
| pi_degree_required | e.g., "phd", "md", "phd_or_equivalent", "none" | eligibility |
| pi_citizenship | e.g., "us_citizen", "us_citizen_or_pr", "unrestricted" | eligibility |
| foreign_eligibility | e.g., "allowed", "prohibited", "with_approval" | eligibility |
| page_limit | "<N>" | formatting (redundant with page_limit field; use when rule applies to whole proposal) |
| mentoring_plan_required_if | e.g., "postdoc_funded", "grad_student_funded" | required_documents |
| dmp_required | "yes", "no" | required_documents |
| loi_required | "required", "optional", "prohibited" | submission_requirements |
When the announcement's semantics don't fit an existing type, invent a new snake_case identifier consistent with the conventions above. Do not leave an otherwise enforceable rule in prose-only form.
Extraction procedure¶
-
Read the entire announcement first. Critical constraints are often in unexpected sections — eligibility in award conditions, page limits in review criteria, prohibited materials in supplemental guidance.
-
Populate scalars. Parse every date to ISO, every currency to a plain number, every duration to months.
-
Walk each category. For each of the nine requirement categories, extract every entry the announcement specifies. Do not omit an entry because it seems routine. Do not invent entries that the announcement does not state.
-
Handle parent-guide inheritance. When the announcement defers to a parent guide (PAPPG, SF424 Application Guide, Merit Review Guide, DoD BAA instructions), include that guide's standard document and formatting requirements in the appropriate categories, annotated with
source_sectionpointing to the parent-guide section. When the announcement explicitly deviates from or overrides the parent guide, also emit an entry inpappg_deviations(or the equivalent parent-guide-deviation category) naming what changed. -
Handle "no restriction" confirmations. When the announcement affirmatively states there are no restrictions on a dimension (e.g., "no restrictions on who may serve as PI"), emit that as an
eligibilityentry withis_required: trueand adescriptionrecording the non-restriction — the absence of a restriction is a compliance fact downstream checkers will look for. -
Preserve ambiguity. When the announcement is ambiguous or contains an apparent error, emit the item using the announcement's exact wording and add a
descriptionnote flagging the issue. Do not silently pick an interpretation.
Quality standards¶
-
Completeness — every requirement stated in the document appears as an entry.
-
Precision — use the announcement's exact wording for prescribed section headers and titles; preserve numbers, limits, and constraints verbatim.
-
Actionability — each entry describes something a person or system can verify.
-
Traceability —
source_sectionpopulated whenever the announcement's structure supports it. -
Typed fidelity — dates in ISO, currencies as numbers, durations in months, booleans as JSON booleans. Never emit
"null"as a string. -
Schema conformance — the output object validates against
schema.json. No extra properties; all nine requirement arrays present.
Produce the JSON now.
Output schema¶
Source: schema.json.
Show schema.json
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://github.com/AI4RA/prompt-library/components/rfp-extraction-udm/schema.json",
"title": "RFP Extraction \u2014 UDM Output",
"description": "Flat JSON contract for a funding announcement extracted to the Unified Data Model as extended for research administration. Scalar fields populate a single RFA record; the nine list-valued fields each create typed requirement rows categorized for downstream ingest.",
"version": "1.0.0",
"type": "object",
"additionalProperties": false,
"required": [
"rfa_title",
"sponsor_name",
"required_documents",
"formatting",
"review_criteria",
"eligibility",
"budget_constraints",
"compliance",
"submission_requirements",
"special_conditions",
"pappg_deviations"
],
"properties": {
"rfa_id": {
"type": [
"string",
"null"
],
"description": "Stable identifier for the announcement. When the sponsor publishes an opportunity number, use it prefixed with the sponsor code (e.g., 'NSF-26-508'). Null when no canonical identifier is available; the ingest service will assign one."
},
"rfa_number": {
"type": [
"string",
"null"
],
"description": "Official announcement number as published by the sponsor (e.g., '26-508', 'PA-24-246', 'DE-FOA-0003117'). No prefix."
},
"rfa_title": {
"type": "string",
"minLength": 1,
"description": "Full title of the funding opportunity, including any track or component designation."
},
"sponsor_name": {
"type": "string",
"minLength": 1,
"description": "Full name of the lead sponsoring agency or organization (e.g., 'National Science Foundation'). Do not emit an identifier or abbreviation; the ingest service resolves this to a Sponsor_Organization_ID."
},
"program_code": {
"type": [
"string",
"null"
],
"description": "The sponsor's internal program or division code (e.g., 'NSF/TIP/ITE', 'NIH/NCI/CCR'). Null when not specified."
},
"announcement_url": {
"type": [
"string",
"null"
],
"format": "uri",
"description": "Canonical URL of the announcement on the sponsor's site."
},
"opportunity_number": {
"type": [
"string",
"null"
],
"description": "Grants.gov or other portal opportunity ID, when distinct from rfa_number."
},
"cfda_number": {
"type": [
"string",
"null"
],
"description": "CFDA / Assistance Listing number(s). Comma-separated when the announcement lists multiple (e.g., '47.070, 47.076')."
},
"announcement_date": {
"type": [
"string",
"null"
],
"format": "date",
"description": "ISO date (YYYY-MM-DD) the announcement was posted."
},
"submission_deadline": {
"type": [
"string",
"null"
],
"format": "date",
"description": "ISO date (YYYY-MM-DD) of the final full proposal deadline. For multi-round announcements, use the LAST round's full proposal deadline; document every round in special_conditions. Null only when the announcement is open-ended (rolling submissions)."
},
"loi_deadline": {
"type": [
"string",
"null"
],
"format": "date",
"description": "ISO date of the Letter of Intent / Notice of Intent deadline. For multi-round announcements, use the LAST round's LOI deadline; document every round in special_conditions."
},
"preproposal_deadline": {
"type": [
"string",
"null"
],
"format": "date",
"description": "ISO date of the preliminary proposal / pre-application / white paper deadline, when distinct from LOI."
},
"funding_floor": {
"type": [
"number",
"null"
],
"minimum": 0,
"description": "Minimum award amount per award in USD (plain number, no currency symbol). Null when not stated."
},
"funding_ceiling": {
"type": [
"number",
"null"
],
"minimum": 0,
"description": "Maximum total award amount per award in USD (plain number). When the announcement states a per-year cap over a multi-year project, emit the per-award total (per-year cap \u00d7 maximum years). Null when not stated."
},
"expected_awards": {
"type": [
"integer",
"null"
],
"minimum": 0,
"description": "Estimated number of awards. When stated as a range ('4 to 6'), use the upper bound."
},
"max_duration_months": {
"type": [
"integer",
"null"
],
"minimum": 1,
"description": "Maximum project duration in MONTHS (not years). Convert years to months by multiplying by 12. Use the standard maximum; document any conditional extension (e.g., '4th year with phase-out plan') in special_conditions."
},
"submission_method": {
"type": [
"string",
"null"
],
"description": "Submission portal as named by the sponsor (e.g., 'Research.gov', 'Grants.gov', 'eRA Commons', 'PAMS')."
},
"rfa_status": {
"type": [
"string",
"null"
],
"enum": [
"Active",
"Closed",
"Archived",
"Draft",
null
],
"description": "Lifecycle state. Use 'Active' unless the announcement explicitly indicates closed, archived, or draft status."
},
"required_documents": {
"type": "array",
"description": "Documents the proposal must or may include (Project Summary, Project Description, Budget Justification, Biosketches, etc.). Both required and explicitly-optional documents go here; distinguish via is_required.",
"items": {
"$ref": "#/$defs/requirementEntry"
}
},
"formatting": {
"type": "array",
"description": "Formatting constraints: page limits, font, margins, line spacing, numbering.",
"items": {
"$ref": "#/$defs/requirementEntry"
}
},
"review_criteria": {
"type": "array",
"description": "Merit review criteria (standard agency criteria plus any solicitation-specific additions). Include weights in description when stated.",
"items": {
"$ref": "#/$defs/requirementEntry"
}
},
"eligibility": {
"type": "array",
"description": "Eligibility rules for organizations, PIs, Co-PIs, and personnel. Include 'no restriction' statements explicitly \u2014 the absence of a restriction is itself a compliance fact.",
"items": {
"$ref": "#/$defs/requirementEntry"
}
},
"budget_constraints": {
"type": "array",
"description": "Budget rules: caps, F&A limits, cost sharing, equipment thresholds, participant support, subaward rules, salary caps.",
"items": {
"$ref": "#/$defs/requirementEntry"
}
},
"compliance": {
"type": "array",
"description": "Compliance disclosure requirements (COI, SFI, RCR training, foreign interference, responsible conduct of research, etc.).",
"items": {
"$ref": "#/$defs/requirementEntry"
}
},
"submission_requirements": {
"type": "array",
"description": "Submission logistics beyond the portal name: file formats, collaborative proposal rules, AOR requirements, multi-institution coordination.",
"items": {
"$ref": "#/$defs/requirementEntry"
}
},
"special_conditions": {
"type": "array",
"description": "Special award conditions and post-award obligations. Also the correct home for per-round deadline details in multi-round announcements, conditional duration extensions, and coordination/collaboration mandates.",
"items": {
"$ref": "#/$defs/requirementEntry"
}
},
"pappg_deviations": {
"type": "array",
"description": "Places where the announcement explicitly deviates from the agency's parent proposal guide (PAPPG, SF424, Merit Review Guide, etc.). Highest-risk items for compliance review.",
"items": {
"$ref": "#/$defs/requirementEntry"
}
}
},
"$defs": {
"requirementEntry": {
"type": "object",
"additionalProperties": false,
"required": [
"label",
"is_required"
],
"properties": {
"label": {
"type": "string",
"minLength": 1,
"description": "Human-readable name of the requirement, preserving the announcement's exact wording for prescribed terms."
},
"code": {
"type": [
"string",
"null"
],
"pattern": "^[A-Z0-9_]{1,50}$",
"description": "Machine-readable code (SCREAMING_SNAKE_CASE, max 50 chars). Omit or set null when no natural code exists; the ingest mapper will auto-generate one from the label."
},
"description": {
"type": [
"string",
"null"
],
"description": "Expanded detail or context \u2014 conditions under which it applies, consequences of noncompliance, related requirements."
},
"page_limit": {
"type": [
"integer",
"null"
],
"minimum": 1,
"description": "Page or word limit for documents and formatting items. Null for non-paginated items."
},
"format_spec": {
"type": [
"string",
"null"
],
"description": "Format specification for documents and formatting items (e.g., 'PDF, single-column', '11pt Times New Roman')."
},
"is_required": {
"type": "boolean",
"description": "True for mandatory requirements, false for optional. For 'no restriction' eligibility entries, is_required is true (the check itself is required)."
},
"source_section": {
"type": [
"string",
"null"
],
"description": "Citation to the source location in the announcement or parent guide (e.g., 'PAPPG II.D.2.b', 'Section IV.C', 'NOT-AG-24-012'). Populate whenever traceable."
},
"structured_rule_type": {
"type": [
"string",
"null"
],
"description": "Machine-readable rule type for entries that can be enforced programmatically. Use a stable snake_case identifier."
},
"structured_rule_value": {
"type": [
"string",
"null"
],
"description": "Value associated with structured_rule_type (e.g., 'prohibited', 'required_10_percent', '200000', 'PhD')."
}
}
}
}
}
Evals¶
Reference cases¶
Golden cases under evals/cases/.
NSF_26-508— Parity case with components/rfp-extraction/evals/cases/NSF_26-508. The two (artifacts: input, expected)
Changelog¶
Source: CHANGELOG.md.
All notable changes to this component. Versions follow semver: MAJOR for output-contract breaks (schema changes that drop or rename fields, or change array semantics), MINOR for backward-compatible additions (new optional fields, new structured rule types, new manifestations), PATCH for wording or clarity with no behavior change expected.
The schema.json version is kept in lockstep with the component version.
[1.0.0] — 2026-04-18¶
- Initial version.
- JSON Schema (
schema.json) defining the flat output: 18 scalar opportunity fields plus nine categorized requirement arrays, each holding a shared requirement-entry shape. - Canonical prompt (
prompt.md) covering date/currency/duration normalization, multi-round deadline handling, parent-guide inheritance, "no restriction" confirmations, and structured rule encodings. - Claude Skill manifestation (
skill/SKILL.md) with a description tuned for skill-router matching on machine-readable extraction intents. - First golden eval case: NSF 26-508 (TechAccess) —
expected.jsoncovering multi-round scheduling, LOI requirements, per-institution proposal limit, cost-sharing prohibition, five mandated project-description section headers, five solicitation-specific review criteria, and four parent-guide deviations.