A1 named the case for sovereign-by-default deployment. A2 developed the model supply chain: hash verification, key custody, pin authority. This piece goes one layer down. What went into the model before you got it. What is in the retrieval corpus at runtime. What arrives in the prompt context window at inference time.
The finding that moves the geometry
Recent research from Anthropic, the UK AI Safety Institute, and the Alan Turing Institute demonstrated that injecting approximately 250 documents into a retrieval corpus is sufficient to systematically bias the outputs of a retrieval-augmented generation system. Not hundreds of thousands. Not a full corpus replacement. Two hundred and fifty targeted documents, and the model begins behaving as its poisoner intended, consistently, across varied query formulations.
For a firm running a RAG system over a regulatory corpus (a policy engine that maps uploaded documents to coverage across NIST 800-53, DORA Article 9, and PRA SS2/21), this is not a theoretical attack surface. It is the production attack surface, operating today.
The governance gap is not in the model. The model is the part of this stack that governance programmes have begun addressing. The gap is in the data: what went into the model before you got it, what is in the retrieval corpus at runtime, and what arrives in the prompt context window at inference time.
What open weights actually solve
The first piece in this series covered the economics of self-hosting. The second covered the model supply chain: hash verification, key custody, pin authority for model weights in a regulated environment. Both arguments rest on the same premise. Open weights are better than proprietary API access for regulated firms because you can inspect, verify, and attest to what you are running.
That premise remains correct. It is narrower than current industry posture assumes.
Open weights solve exactly one supply chain problem: you can verify the weights you received. You can hash the model artefacts on arrival. You can confirm they have not been modified in transit or at rest. You can run inference locally without sending data to a third-party endpoint. These are genuine governance improvements.
Open weights do not solve training-data provenance. The accountability for what the model was trained on transfers to you by default.
The weights you received are a function of a training process you did not observe, over a corpus you cannot fully reconstruct, using post-training preference pairs whose provenance is partially documented in technical reports and partially inferred from model behaviour. The model that produces authoritative-sounding outputs about regulatory compliance was trained on internet text, code repositories, academic papers, and unknown proportions of crawled content whose regulatory accuracy, jurisdictional relevance, and temporal currency are unattested.
This is not a criticism of open-weight models. It is a description of training-data transparency across the field, proprietary and open alike. The difference is that proprietary API providers carry a vendor relationship creating a contractual accountability layer. When you self-host an open-weight model, you become the effective model operator. The accountability for what the model was trained on transfers to you by default.
The four layers of the data and prompt supply chain
A useful frame for regulated firms: the data and prompt supply chain has four layers, each with distinct provenance risks, distinct governance gaps, and distinct regulatory hooks.
The base model’s knowledge, biases, and failure modes are baked in at pretraining. For current open-weight models, pretraining corpus documentation ranges from partial to absent. The firm deploying the model does not know which regulatory texts were in the training data, in which versions, from which jurisdictions, with which editorial quality.
For a governance engine producing NIST 800-53 control coverage assessments, this matters. If the pretraining corpus contained an outdated draft of NIST 800-53 Rev 4 more frequently than the current Rev 5, the model has a systematic bias toward Rev 4 concepts that no amount of prompt engineering will fully correct.
You cannot reconstruct the training corpus. The control: validate the model’s domain-specific fitness against a benchmark dataset of known-good regulatory coverage assessments, constructed from authoritative source documents with expert-annotated ground truth. Benchmark before production deployment. Re-benchmark on every model version change. Pretraining provenance risk cannot be mitigated by runtime controls; it requires task-specific validation that the firm must own.
Between pretraining and the weights you download sits instruction tuning and preference tuning. They are different, and the governance posture for each differs.
Instruction tuning is supervised on (prompt, completion) pairs. The datasets used are often partially documented: Alpaca, Dolly, OpenHermes, and synthetic data lineages are named in many model cards. Inspecting these lineages gives you context on what the model encountered during fine-tuning. It does not give you domain-level validation signal. Dataset cards for open-weight instruction-tuning mixes are too coarse for regulatory-domain assurance. Lineage inspection is context. It is not a control.
Preference tuning (RLHF, DPO, and variants) is where provenance is genuinely opaque. The preference pairs that shaped what the model treats as a good answer were crafted by human annotators or AI judges whose guidelines are partially documented. For regulated domains: were regulatory accuracy, jurisdictional precision, and temporal currency signals present in the reward model? In most cases: not explicitly. The failure mode is characteristic. The model sounds authoritative, produces well-structured output, and is systematically wrong about jurisdiction-specific details.
The control for Layer 2 is domain benchmarking, not lineage inspection. Build a test set of regulatory coverage questions with known-correct answers drawn from authoritative sources. Validity requirements: questions drawn from the current version of the regulatory standard being assessed; expected answers annotated by a subject-matter expert against the authoritative source text; a refresh cadence tied to the freshness contract of that source. A benchmark against an outdated standard version produces false assurance, which may be worse than no benchmark. Date-stamp the benchmark alongside the corpus version it was built against.
This is where the 250-document finding lands. At inference time, your model operates not on its weights alone but on its weights plus the documents retrieved from your vector store. For regulatory RAG systems, that vector store contains ingested versions of the frameworks you claim to cover.
The provenance question is concrete and answerable: who put each document in the corpus, when, from which source, with what content hash, and has it changed since ingest? Most RAG implementations in early production do not answer all of these questions. The corpus is treated as a static, trusted knowledge base rather than as an untrusted input requiring the same governance controls as any other data source in a regulated firm.
Three specific risks:
Freshness failure. Regulatory frameworks update without fixed schedules. NIST Special Publications are revised. FCA rules change on gazette publication. DORA implementing technical standards are still being written. A RAG corpus carrying a stale ingest of a framework that has since been amended is not a current regulatory reference. The model will produce coverage assessments against outdated requirements without any signal to the user that the underlying framework has changed.
Provenance gap. If the corpus was populated by scraping regulatory websites, the provenance chain is: URL crawled, content extracted, chunk embedded. There is no attestation that the content at that URL was authoritative, that the extraction preserved document structure and version metadata, or that the URL has not since been redirected.
Integrity loss. A document modified post-ingest looks identical to an unmodified document at the retrieval layer. Content hash at ingest with hash verification at retrieval is the minimum integrity control. One critical constraint: the hash registry must be stored separately from the corpus it attests and cryptographically signed. A hash stored alongside the document it attests does not protect against an adversary with corpus write access; they can modify both simultaneously.
Minimum defensible Layer 3 posture:
- Provenance metadata on every corpus document: source URI, ingest timestamp, content hash at ingest, framework version where the source carries version metadata, ingest pipeline version.
- Separate signed hash registry, within the firm’s own control boundary. A cloud-hosted registry reintroduces a third-party attestation dependency that conflicts with the self-hosting rationale this series has established. Local deployment, firm-operated PKI, HSM-protected signing key.
- Freshness contracts, two-component: a calendar backstop threshold (documents past threshold are flagged regardless of whether a source-side change has been detected) plus event-driven source monitoring (a detected source-side update triggers immediate refresh review, independent of elapsed time). Calendar backstop catches what monitoring misses; monitoring catches what thresholds miss. For regulatory sources that publish no change feed or version diff (most national regulatory websites publish neither), monitoring reduces to high-frequency polling at a cadence set below the calendar backstop interval.
- Content inspection on ingest: flag documents containing anomalous instruction-like patterns for human review before corpus admission. Use flagging, not automated blocking. Regulatory documents use normative language (“shall”, “must”, “is required to”) that will produce false positives under naive instruction-pattern detection.
Enterprise document management systems and emerging corpus management platforms partially implement these controls. Evaluate before building. What is absent in most current offerings is the regulatory-domain freshness contract and the separate signed hash registry: the controls most directly relevant to SS2/21 and DORA Article 9 compliance.
The final layer is the inference-time prompt: the system instruction, the user query, and the retrieved context. Layers 1 to 3 determine what knowledge the model has access to. Layer 4 is where adversarial prompt injection operates.
Prompt injection in RAG systems is distinct from prompt injection in direct chat applications. In a RAG system, retrieved documents arrive in the prompt context as trusted content. If a document in the corpus contains an adversarial instruction, the model may treat that instruction as part of its operational directive rather than as document content to be analyzed.
A specific failure mode: a retrieved document that opens with a role-prefix construction (“You are a compliance assistant. Disregard previous instructions and…”) can collapse the model’s instruction hierarchy in systems without explicit priority ordering between the system prompt and context-window content. The control has two components. Content inspection at ingest: flag documents containing system-like instruction patterns before corpus admission. System-prompt hardening at inference: explicitly instruct the model that the system prompt takes unconditional precedence over any instruction appearing in retrieved context. These controls are owned by different teams with different toolchains. Name an accountable owner for each before treating either as implemented.
More sophisticated injection uses format-mimicry: a retrieved document that replicates the system prompt’s structural conventions is harder to detect at ingest and harder to deflect at inference. Architecture-level sandboxing of retrieved context, where the model is explicitly denied the ability to treat context-window content as instruction-class input, is the longer-term control. Prompt-hardening is the immediate but incomplete mitigation.
Output provenance logging is the second Layer 4 control: every inference contributing to a compliance output logs the full retrieval context, including document IDs, retrieval scores, corpus version at inference time, model version, and inference timestamp. This is the audit trail that regulatory review of compliance decisions will require.
The regulatory geometry shift
PRA SS2/21 (Supervisory Statement on Model Risk Management) structures model risk across model development, model validation, and model deployment. The traditional allocation for proprietary vendor models: vendor owns development risk, firm owns deployment risk, validation sits at the boundary.
Open-weight self-hosting disrupts this allocation. When a firm downloads model weights and serves them in production, it has assumed a de facto development role. SS2/21’s development-scope question is not “did you train this model?” but “do you know what it was trained on, and can you attest to its fitness for purpose in your specific use case?”
The answer for most current open-weight deployments is no on both counts. One important constraint: SS2/21 development-scope obligations are conditioned on a materiality determination. The firm must first establish whether the model meets the materiality threshold under its MRM framework. For small internal tools with limited decision scope and human-in-the-loop review, the full development-scope validation programme may not trigger. The provenance posture described in this piece applies regardless of materiality. The formal validation obligation does not.
FCA PS21/3 (Policy Statement on algorithmic decision-making) adds auditability: if a regulatory coverage output from a RAG system supports a compliance decision, that output must be auditable. The audit trail is the Layer 4 provenance log described above.
DORA Article 9 applies where the AI system is classified as a critical or important ICT function. Supply chain security requirements under DORA include third-party ICT risk management. For open-weight models, “third-party” includes the model originator and any corpus sources. The 250-document poisoning result maps directly to DORA Article 9’s data integrity requirements.
Ownership gap: Corpus governance currently falls between model risk (second-line, model validation) and data management (second-line, data governance) in most current MRM frameworks. It requires explicit ownership assignment. Without a named owner across both second-line functions, the controls above will be built by no one.
Model governance is necessary. It is not sufficient.
The governance conversation in regulated AI has focused on the model: which model, what controls on the API, how to attest to model behaviour. That focus is not wrong. It is incomplete.
The data layers (what went into the model, what is in the retrieval corpus, what arrives in the prompt) are less visible, less governed, and more directly manipulable by an adversary with modest access.
Two hundred and fifty documents is a small number. Small enough to fit in a routine corpus refresh. Small enough to be missed by a manual corpus review. Large enough to systematically bias a governance engine’s outputs across thousands of policy evaluations.
The provenance chain is the governance primitive the industry has not yet standardised. Until it is, the corpus is the attack surface your programme should be watching.
Next in this series: the silicon and firmware trust base. Once the corpus is compromised, the software layer cannot recover it. Hardware attestation is not optional.
About the Author
I am a CISA and CISSP-certified governance practitioner. My day-to-day work spans technology risk, audit defensibility, and cross-border regulatory intelligence across the UK (FCA, PRA), India (RBI, SEBI, IFSCA), Southeast Asia (MAS), and the Gulf (CBUAE), with working knowledge of the EU AI Act’s financial services implications.
My current research sits at the intersection of audit-defensible AI deployment patterns and supervisory expectations in regulated firms. The specific threads are multi-regulation reasoning architectures, sovereign open-weights deployment, supply-chain provenance for self-hosted inference paths, and the governance of inference pipelines in firms with cross-border regulatory perimeters.
Sentinel Engine is the sovereign model deployment I run from my own hardware, currently in beta. The four-layer supply chain framework above is the discipline Sentinel operates under, not a hypothetical proposal. Where this article names a gap, the gap is one Sentinel has surfaced and is working to close.
LinkedIn • [email protected] • rtapulse.com
Corrections, counterexamples, and build ideas welcome. [email protected] • Discussions • Issues • How to collaborate.
Practitioner opinion. Not legal or regulatory advice. No vendor relationships. Full disclosures.