Skip to content

proposal-document-completeness-udm

Slugproposal-document-completeness-udm
Version0.1.0
Statusexperimental
Last fully evaluatednone
Eval stateno validated eval cases
Categoryextraction
Domainresearch-administration
Manifestationsprompt
Created2026-04-30
Updated2026-04-30

Tags: pre-award proposal completeness near-final-review nsf nih gap-analysis udm structured-extraction json

Audience: sponsored-programs-staff, pre-award-teams, ingest-pipelines

Manifestations in repo: prompt.md

Automates the near-final document-completeness review of a proposal package: identifies all senior key personnel, the four required documents per person (biosketch, current & pending, collaboration & affiliation, synergistic activities), subaward documents, and conditional-requirement triggers. Produces a single gap-analysis JSON for sponsored-programs analysts to use when sending a "missing documents" message back to the PI.

Output contract: schema.json Contract scope: repo-local, UDM-aligned

Inputs

A proposal package — VERAS upload bundle, NSF/NIH proposal in PDF/DOCX form, plus the relevant solicitation (RFA / FOA / NOFO).

Outputs

A single JSON object with three logical layers:

  • As-found inventorysponsor_name, rfa_foa_number, proposal_track_type, review_type, senior_key_personnel, budget_personnel, has_postdocs_or_grad_students, has_subawards, uploaded_documents
  • Sponsor requirementsrequired_documents_checklist, conditional_requirements, per_person_required_documents, subaward_required_documents
  • Gap analysisper_person_document_matrix (one row per person × four required documents with a missing list), subaward_documents (one row per subawardee × six required documents), personnel_discrepancies, prioritized_missing

See schema.json for the authoritative definition and prompt.md for the encoding rules (booleans must be derived from observable signals; the matrix is keyed on senior_key_personnel; prioritized_missing ranks compliance-critical gaps first).

Contract scope

Repo-local, UDM-aligned. sponsor_name resolves to UDM Sponsor_Organization; senior_key_personnel rows resolve to Personnel; has_subawards: true triggers Subaward presence; the proposal record itself resolves to Proposal. The structured shape does not duplicate any shared UDM schema — it mirrors the deliverable produced by the proposal-document-completeness Vandalizer workflow in the ui-insight/ProcessMapping process-mapping corpus.

Triad integration

  • Evaluation datasets: none yet — planned: NSF proposal with TTP-P track triggering letter-of-collaboration; NIH proposal with postdocs triggering mentoring plan; proposal with subawards exercising the subaward matrix; proposal with name mismatches between budget and Section 2.
  • Harness notes: canonical manifestation is prompt.md. Validation surface is schema.json. The companion top-level workflows/proposal-document-completeness Vandalizer workflow at v0.1.0 implements the contract as two parallel Extraction tasks (proposal components + sponsor requirements) plus a Consolidation Prompt; record both single-call and post-consolidation signals when both are available.
  • Shared UDM relationship: aligned, not owning.

Runtime topology — the Vandalizer workflow

The canonical runtime is the proposal-document-completeness workflow shipped at the top level of this repo.

  • Step 1 (parallel Extraction) — two Extraction tasks. extract-proposal-components captures what's in the package (senior key personnel, budget personnel, subaward presence, the per-person matrix). extract-sponsor-requirements captures what should be there per the solicitation (required documents checklist, conditional requirements, per-person and per-subawardee required documents).
  • Step 2 (Consolidation Prompt) — joins the as-found inventory with the sponsor requirements, derives the missing lists per person and per subawardee, computes the conditional triggers, surfaces personnel discrepancies, and ranks prioritized_missing.

Manifestations

  • prompt.md — canonical, LLM-agnostic prompt

Evals

See evals/ for reference inputs and known-good outputs.

Provenance

Authored 2026-04-30 against the proposal-document-completeness (Workflow_ID: WF-PROPOSAL-DOC-COMPLETENESS) process-mapping workflow in ui-insight/ProcessMapping at commit b7176b0c913833a205efdb5e4ba00c17ff88af0f, which was built from walkthrough transcripts of the near-final-doc-review process (13-step process). Created to make that workflow a harness-evaluatable, versioned artifact.

Contract scope

  • Output format: json_object

  • Contract scope: shared_udm_semantics_repo_local_schema

  • Validation surfaces: json_schema

  • Schema entrypoints: #

  • Notes: Repo-local pre-award document-completeness gap-analysis contract. Three layers (as-found inventory, sponsor requirements, gap analysis) including a per-person document matrix and a prioritized missing list ranked by severity. Designed to drive sponsored-programs analyst "missing documents" messages back to the PI.

  • Machine-readable catalog entry: component_catalog.json

Triad integration

  • UDM alignment: shared_udm_semantics_repo_local_schema — sponsor_name resolves to UDM Sponsor_Organization; senior_key_personnel rows resolve to Personnel; has_subawards: true triggers Subaward presence; the proposal record itself resolves to Proposal.

  • Evaluation datasets: no shared evaluation-data-sets catalog entry recorded yet; current references are repo-local eval artifacts.

  • Harness notes: Validate JSON outputs against schema.json. Canonical single-call invocation surface is prompt.md. The companion top-level workflows/proposal-document-completeness Vandalizer workflow at v0.1.0 implements the contract as two parallel Extraction tasks (proposal components inventory + sponsor requirements) plus a Consolidation & Gap Analysis Prompt that joins the fragments, derives present/triggered booleans, computes per-person and per-subawardee missing lists, and ranks prioritized_missing.

  • Related component: proposal-budget-personnel-extraction-udm (consumer_of) — Consumes senior_key_personnel, has_postdocs_or_grad_students, and has_subawards from proposal-budget-personnel-extraction-udm to compute conditional requirements.

Prompt body

Source: prompt.md.

Show prompt

Proposal Document Completeness — UDM JSON

Purpose: Automate the near-final document-completeness review of a proposal package: identify all senior key personnel, the four required documents per person (biosketch, current & pending, collaboration & affiliation, synergistic activities), subaward documents, and conditional-requirement triggers, then produce a comprehensive gap-analysis JSON.

Expected input: A proposal package — VERAS upload bundle, NSF/NIH proposal in PDF/DOCX form, plus the relevant solicitation (RFA / FOA / NOFO).

Expected output: A single JSON object that validates against schema.json. No prose, no markdown outside the JSON.

When to use this contract

This is the gap-analysis cut of a proposal package, designed to be run when the package is close to submission and a sponsored-programs analyst is about to send it back to the PI for final fixes. It produces a single object with three logical layers: the as-found inventory (what is uploaded, who is named, what subawards exist), the sponsor requirements (what should be there per the solicitation and agency policy), and the gap analysis (what is missing, per requirement and per person).

UDM-aligned: senior key personnel resolve to UDM Personnel; subaward presence resolves to Subaward; sponsor identity resolves to Sponsor_Organization; the proposal record itself resolves to Proposal.

This component does not cover the budget-personnel extraction — that lives in proposal-budget-personnel-extraction-udm (which feeds the Senior_Key_Personnel and Has_Postdocs_Or_Grad_Students signals consumed here). It does not cover the solicitation extraction itself — that lives in rfa-checklist-extraction-udm or foa-checklist-extraction-udm.


Prompt

You are running a near-final document-completeness review on a proposal package. Your job is to identify everything that is uploaded, everything that should be uploaded per the sponsor's requirements, and everything missing — both in aggregate and per person.

Be 100% accurate. The downstream consumer of this output is a sponsored-programs analyst preparing a "missing documents" message to the PI; a fabricated entry costs the analyst a back-and-forth with the PI and erodes trust in the workflow. When a value is not present, set it to null or — for arrays/tables — return an empty array. Do not invent personnel, documents, or requirements.

Search the entire upload bundle and the solicitation for content in or near sections titled Section 2 (Personnel), Section 8 (Subawards), Section 9.2 (Budget), Proposal Documents, Senior Key Personnel, Solicitation Requirements, Proposal Preparation Instructions, Required Documents, or Submission Requirements.

Return a single JSON object that validates against schema.json with these top-level keys:

  • sponsor_name — sponsor agency name (e.g., "National Science Foundation"). Required.

  • rfa_foa_number — solicitation number (e.g., "NSF 26-508"). String or null.

  • proposal_track_type — proposal track type that may trigger additional requirements (e.g., "TTP-P", "standard", "track-2"). String or null.

  • review_type — one of "7-day basic review", "10-day full review", "Not specified", or null.

  • senior_key_personnel — array of {name, role, institution} objects. One row per named person.

  • budget_personnel — flat list of strings naming personnel as identified from the budget document.

  • has_postdocs_or_grad_students — boolean.

  • has_subawards — boolean.

  • uploaded_documents — flat list of strings naming each currently uploaded document.

  • required_documents_checklist — array of {document_name, required_for, present, notes} objects covering every document the sponsor requires.

  • conditional_requirements — array of {condition, document_name, triggered, notes} objects covering documents that are required only when a condition holds (e.g., mentoring plan when postdocs present).

  • per_person_required_documents — flat list of document names required per senior key person (typically the four NSF documents).

  • per_person_document_matrix — array of {person_name, biosketch, current_and_pending, collaboration_and_affiliation, synergistic_activities, missing} objects. One row per senior key person; missing is a list of the four document names that are missing for that person.

  • subaward_documents — array of {subawardee_name, commitment_form, budget, budget_justification, senior_key_docs, scope_of_work, facilities_equipment, missing} objects (when has_subawards: true). Empty array otherwise.

  • subaward_required_documents — flat list of strings naming the documents required per subawardee per sponsor policy.

  • personnel_discrepancies — flat list of strings describing discrepancies between budget personnel and senior-key-personnel listings (e.g., duplicates, missing entries, name mismatches). Empty array when none.

  • prioritized_missing — ordered array of strings describing the missing items the analyst should ask the PI about, ranked by severity (e.g., compliance-critical first).

Encoding rules

  1. Booleans must be derived from observable signals. has_subawards is true only when a subaward commitment form, subaward budget, or subaward narrative is present in the package. has_postdocs_or_grad_students is true only when the budget actually lists a postdoc or graduate-student line.

  2. per_person_document_matrix is keyed on senior_key_personnel. Every named person in senior_key_personnel must have exactly one row in the matrix. The four document booleans (biosketch, current_and_pending, collaboration_and_affiliation, synergistic_activities) are true iff that document is uploaded and matches that person's name; missing is the list of false-valued document names for that row.

  3. subaward_documents rows are keyed on each named subawardee. When has_subawards: false, this is an empty array.

  4. conditional_requirements.triggered is the boolean that fires the condition. For "mentoring plan if postdocs/grads", triggered equals has_postdocs_or_grad_students; the analyst can cross-check.

  5. prioritized_missing is opinionated. Rank: (a) compliance-critical items the sponsor will reject the proposal for missing, (b) per-person required documents, (c) conditional documents whose condition is triggered, (d) optional improvements. Do not include items that are present.

  6. Do not output any text outside the single JSON object.

Output

A single JSON object. No surrounding markdown.

Output schema

Source: schema.json.

Show schema.json
{

  "$schema": "https://json-schema.org/draft/2020-12/schema",

  "$id": "https://github.com/AI4RA/prompt-library/components/proposal-document-completeness-udm/schema.json",

  "title": "Proposal Document Completeness \u2014 UDM Output",

  "description": "JSON contract for the near-final document-completeness review of a proposal package. Captures the as-found inventory, sponsor requirements, and the gap analysis (per requirement and per person), with explicit boolean flags that downstream UDM ingest can use to populate Proposal, Personnel, and Subaward records.",

  "version": "0.1.0",

  "type": "object",

  "additionalProperties": false,

  "required": [

    "sponsor_name",

    "senior_key_personnel",

    "budget_personnel",

    "has_postdocs_or_grad_students",

    "has_subawards",

    "uploaded_documents",

    "required_documents_checklist",

    "per_person_required_documents",

    "per_person_document_matrix",

    "subaward_documents",

    "personnel_discrepancies",

    "prioritized_missing"

  ],

  "properties": {

    "sponsor_name": {

      "type": "string",

      "minLength": 1,

      "description": "Sponsor agency name. Resolves to UDM Sponsor_Organization."

    },

    "rfa_foa_number": {

      "type": [

        "string",

        "null"

      ],

      "description": "Solicitation number (e.g., 'NSF 26-508')."

    },

    "proposal_track_type": {

      "type": [

        "string",

        "null"

      ],

      "description": "Proposal track type that may trigger additional requirements (e.g., 'TTP-P', 'standard')."

    },

    "review_type": {

      "type": [

        "string",

        "null"

      ],

      "enum": [

        "7-day basic review",

        "10-day full review",

        "Not specified",

        null

      ],

      "description": "Review pathway selected for this proposal."

    },

    "senior_key_personnel": {

      "type": "array",

      "items": {

        "type": "object",

        "additionalProperties": false,

        "required": [

          "name",

          "role"

        ],

        "properties": {

          "name": {

            "type": "string",

            "minLength": 1

          },

          "role": {

            "type": "string",

            "minLength": 1,

            "description": "PI / Co-PI / Senior Personnel."

          },

          "institution": {

            "type": [

              "string",

              "null"

            ]

          }

        }

      },

      "description": "Senior key personnel as identified in the proposal. Resolves to UDM Personnel."

    },

    "budget_personnel": {

      "type": "array",

      "items": {

        "type": "string",

        "minLength": 1

      },

      "description": "Personnel identified from the budget document. Used to cross-check against senior_key_personnel."

    },

    "has_postdocs_or_grad_students": {

      "type": "boolean",

      "description": "True iff the budget includes postdoctoral or graduate-student personnel lines."

    },

    "has_subawards": {

      "type": "boolean",

      "description": "True iff the proposal includes subaward(s). Resolves to UDM Subaward presence."

    },

    "uploaded_documents": {

      "type": "array",

      "items": {

        "type": "string",

        "minLength": 1

      },

      "description": "Flat list of documents currently uploaded in the proposal package."

    },

    "required_documents_checklist": {

      "type": "array",

      "items": {

        "type": "object",

        "additionalProperties": false,

        "required": [

          "document_name",

          "present"

        ],

        "properties": {

          "document_name": {

            "type": "string",

            "minLength": 1

          },

          "required_for": {

            "type": [

              "string",

              "null"

            ],

            "description": "Whom the document is required for (e.g., 'all proposals', 'each senior key person', 'each subawardee')."

          },

          "present": {

            "type": "boolean"

          },

          "notes": {

            "type": [

              "string",

              "null"

            ]

          }

        }

      },

      "description": "Sponsor-required documents with present/missing status."

    },

    "conditional_requirements": {

      "type": "array",

      "items": {

        "type": "object",

        "additionalProperties": false,

        "required": [

          "condition",

          "document_name",

          "triggered"

        ],

        "properties": {

          "condition": {

            "type": "string",

            "minLength": 1

          },

          "document_name": {

            "type": "string",

            "minLength": 1

          },

          "triggered": {

            "type": "boolean"

          },

          "notes": {

            "type": [

              "string",

              "null"

            ]

          }

        }

      },

      "description": "Documents required only when a condition is met (e.g., mentoring plan when has_postdocs_or_grad_students)."

    },

    "per_person_required_documents": {

      "type": "array",

      "items": {

        "type": "string",

        "minLength": 1

      },

      "description": "Document names required per senior key person (typically: biosketch, current & pending, collaboration & affiliation, synergistic activities)."

    },

    "per_person_document_matrix": {

      "type": "array",

      "items": {

        "type": "object",

        "additionalProperties": false,

        "required": [

          "person_name",

          "biosketch",

          "current_and_pending",

          "collaboration_and_affiliation",

          "synergistic_activities",

          "missing"

        ],

        "properties": {

          "person_name": {

            "type": "string",

            "minLength": 1

          },

          "biosketch": {

            "type": "boolean"

          },

          "current_and_pending": {

            "type": "boolean"

          },

          "collaboration_and_affiliation": {

            "type": "boolean"

          },

          "synergistic_activities": {

            "type": "boolean"

          },

          "missing": {

            "type": "array",

            "items": {

              "type": "string",

              "minLength": 1

            }

          }

        }

      },

      "description": "Per-person matrix of the four required documents. One row per senior key person."

    },

    "subaward_documents": {

      "type": "array",

      "items": {

        "type": "object",

        "additionalProperties": false,

        "required": [

          "subawardee_name",

          "commitment_form",

          "budget",

          "budget_justification",

          "senior_key_docs",

          "scope_of_work",

          "facilities_equipment",

          "missing"

        ],

        "properties": {

          "subawardee_name": {

            "type": "string",

            "minLength": 1

          },

          "commitment_form": {

            "type": "boolean"

          },

          "budget": {

            "type": "boolean"

          },

          "budget_justification": {

            "type": "boolean"

          },

          "senior_key_docs": {

            "type": "boolean"

          },

          "scope_of_work": {

            "type": "boolean"

          },

          "facilities_equipment": {

            "type": "boolean"

          },

          "missing": {

            "type": "array",

            "items": {

              "type": "string",

              "minLength": 1

            }

          }

        }

      },

      "description": "Per-subawardee matrix of required subaward documents. Empty array when has_subawards is false."

    },

    "subaward_required_documents": {

      "type": "array",

      "items": {

        "type": "string",

        "minLength": 1

      },

      "description": "Document names required per subawardee."

    },

    "personnel_discrepancies": {

      "type": "array",

      "items": {

        "type": "string",

        "minLength": 1

      },

      "description": "Discrepancies between budget personnel and senior-key-personnel listings (duplicates, missing entries, name mismatches)."

    },

    "prioritized_missing": {

      "type": "array",

      "items": {

        "type": "string",

        "minLength": 1

      },

      "description": "Ordered list of missing items to ask the PI about, ranked by severity (compliance-critical first)."

    }

  }

}

Changelog

Source: CHANGELOG.md.

All notable changes to this component. Versions follow semver: MAJOR for output-contract breaks, MINOR for backward-compatible additions, PATCH for wording or clarity.

[0.1.0] — 2026-04-30

  • Initial experimental release.
  • Schema derived from the proposal-document-completeness v2 Vandalizer workflow in ui-insight/ProcessMapping (two parallel Extraction tasks + Formatting task; 16 source fields).
  • Three-layer shape: as-found inventory (8 keys), sponsor requirements (4 keys), gap analysis (4 keys).
  • per_person_document_matrix realized as an array of {person_name, biosketch, current_and_pending, collaboration_and_affiliation, synergistic_activities, missing} objects (rather than the source Table field) so the per-person missing list is attached directly to the row.
  • subaward_documents realized as an array of {subawardee_name, commitment_form, budget, budget_justification, senior_key_docs, scope_of_work, facilities_equipment, missing} objects so the per-subawardee gap is computed inline.
  • review_type enum matches the source Review_Type Enum_Values (7-day basic review, 10-day full review, Not specified).
  • Cross-field rules from the source workflow (CFR-01 budget vs. senior-key match, CFR-02 mentoring plan trigger, CFR-03 subaward verification) are encoded by the schema shape itself and re-asserted in the prompt's encoding rules.
  • UDM column bindings preserved: sponsor_nameSponsor_Organization; senior_key_personnelPersonnel; has_subawards triggers Subaward presence; the proposal record itself → Proposal.
  • No eval cases yet — status experimental until at least one golden extraction is added under evals/cases/.