This note sits beside the five-part AI Governance in Regulated Firms series, where an interactive walkthrough steps the same sovereign stack from the perimeter to the time axis.
An earlier piece asked three questions of a self-hosted model: whose hash backstops the weights, whose key signs the inference log, whose pin authority holds the runtime. This piece answers a question that one provoked. If those are the right questions for the model, what are they for everything around the model? The answer reframes the stack. Hash, key, and pin are not three layers. They are three properties you ask of every compartment. Once the stack is seen that way, the model becomes the cheap, swappable piece, and the durable value sits in the compartments a model swap never touches. That is a resilience property: a stack that survives the loss or replacement of its most visible component by how it is built, not by hoping the component never has to change.
The question, asked of the whole stack
The supply-chain framing in whose hash, whose key, whose pin treated the model weights as the artefact and asked who stands behind each guarantee. The framing is correct and it is incomplete in one specific way: it reads as though hash, key, and pin describe three things, the weights, the log, and the runtime. They do not. They describe three properties, and every component in a self-hosted AI deployment has all three whether or not anyone has named them.
Integrity is the hash property: is this artefact the one I intended, unaltered. Attestation is the key property: can I prove what this component did, signed by something only I control. Version control is the pin property: is the running version the one I approved, unable to change without me. A model has all three. So does the inference runtime. So does the orchestration code, the guardrail layer, the embedding model, the vector store, and the signing infrastructure itself. The earlier piece asked the three questions of one compartment. The resilient posture asks them of every compartment, and writes down the answer for each.
This resolves a question that the supply-chain piece left ambiguous. If guardrails and the control plane need governing, do they fall under pin. Partly. The control plane has a version, so pin governs which version of the gateway, the agent loop, the prompt templates, and the guardrail configuration is running. But a guardrail also has behaviour, and behaviour is not a version question. Whether the guardrail that was meant to fire actually fired is an attestation question, answered by a signed decision log, which is the key property. Do not collapse the control plane into pin. Pin tells you which version of each compartment is live. Key tells you what each compartment did. They are different questions and a regulated deployment needs both answered for the control plane, not only for the model.
The compartments
A sovereign deployment is easier to govern, and far easier to recover, when it is drawn as a set of compartments rather than a single system. Each compartment is a separable piece with its own integrity, attestation, and version question, and crucially its own cost to replace. Drawing them this way exposes something the single-system view hides: the compartments are not equally swappable, and the cheap one to swap is the model itself.
The same picture in a table, with the question to write down for each compartment and whether the compartment survives a model swap unchanged.
| Compartment | Questions to write down | Swap cost | Survives a model swap? |
|---|---|---|---|
| Model weights | Whose hash | Low | This is the piece being swapped |
| Inference runtime | Whose pin | Low to moderate | Yes, largely model-agnostic |
| Control plane and guardrails | Whose pin for the version, whose key for proof it fired | Moderate; prompts re-validate per model | Mostly, with prompt re-validation |
| Embedding model and vector store | Whose hash for corpus integrity, whose pin for the embedding-model version | High; a full re-index if the embedding model changes | Only if the embedding model is held fixed |
| Signing keys | Whose key | High; rotation is a governed event | Yes, model-independent |
| Hardware and firmware | Outside the three questions, a named gap | Very high | Not applicable |
Where the value actually sits
Read the swap-cost column again and the strategic point falls out. The model is cheap to replace. What is expensive to replace is everything the firm has accumulated around it: the document corpus and its embeddings, the prompt library tuned over months of production, the guardrail policies calibrated against real failure cases, the evaluation suite that defines what good output means for this firm, and the signed audit logs that prove the system behaved. None of that lives in the model. All of it lives in compartments a model swap does not touch.
This inverts the instinct that the model is the asset. The model is the commodity. A new open-weights model of comparable capability arrives every few months, and the firm can adopt it for the cost of a download and a verification run. The corpus, the prompts, the policies, and the evals are the durable capital, because they encode the firm's specific knowledge and the firm's specific definition of acceptable behaviour, and they took real time to build. A balance sheet that treats the model as the AI investment is looking at the cheap part.
Keep your learnings out of the model, so the model stays a thing you can replace. The architecture that makes the model disposable is the architecture that makes the firm sovereign.
ṛtaPulse research, June 2026
The sovereignty argument follows directly. Sovereignty is not which model you run. It is the ability to change which model you run, on your terms, without losing what you have built. A firm that can swap its generation model in a controlled week holds an option a firm with its value fused into one vendor's model does not. That option is what regulators are reaching for when they ask about concentration risk and exit strategy, even though a self-hosted open-weights model has no third-party provider to exit in the literal sense. The firm that compartmentalises has a practical exit from any single model, any single model-origin jurisdiction, and any single vendor's licensing terms, with its accumulated assets intact. One honest boundary: a decisive capability gap can still make a new model worth adopting on its merits, and the architecture exists precisely so the firm can take that gain without re-pouring its foundations.
Moving from Qwen to Llama, honestly
Take the concrete case. A firm runs a self-hosted assistant on one open-weights model and decides to move to another, say from a Qwen-family model to a Llama-family model, because of a capability gain, a licensing change, or a shift in the geopolitics of model origin. What moves with it, and what does not.
Three categories of accumulated value have three different portability properties, and conflating them is where migration plans go wrong.
The corpus and its embeddings port, with one condition. The vector store is built by an embedding model, not by the generation model. Swapping the generation model leaves the embeddings valid, so the corpus carries over without a re-index, but only if the embedding model is held fixed across the swap. Change the embedding model and every vector in the store is stale, forcing a full re-index of the entire corpus. The embedding model, not the generation model, is the real lock-in in most deployments, and it is the compartment most teams never think to pin.
The prompts and policies port, with re-validation. Prompt templates and guardrail configurations are text, so they copy across trivially. Whether they still work is another matter. A prompt tuned to one model's instruction-following habits can degrade on another, and a guardrail regex calibrated against one model's output distribution can miss on another. The artefacts move; their fitness does not move with them. Every swap needs the prompts and guardrails re-validated against the new model before the swap is trusted.
Fine-tuning does not port at all. Any model-specific fine-tune or adapter is tied to the base model's weights and architecture. Move from Qwen to Llama and the fine-tune is re-done from scratch against the new base, or it is abandoned. This is the sharpest reason to keep the firm's knowledge in the corpus and the prompts rather than baked into a fine-tune: value in a fine-tune is value welded to one model, and it does not survive the swap that compartmentalisation is supposed to make cheap.
The gate that turns all of this from hope into discipline is a parity evaluation. Before any model swap goes live, the firm runs its evaluation suite against the new model and the old model on the same inputs and compares. A swap is trustworthy when the new model matches or beats the old one on the firm's own measures, not when the new model benchmarks well in someone else's report. No eval suite, no defensible swap. The eval suite is therefore a compartment in its own right, and one of the durable assets the firm is protecting. The honest dependency: an eval suite only catches what it covers, so a thin suite passes a regression it never tested for, including the case where a new generation model reads the same retrieved context differently. Coverage of the eval suite is the real single point of failure of the whole portability claim, which is the strongest reason to treat it as a first-class asset and not an afterthought.
Git as the spine, with its limit named
Compartments are only manageable if their state is written down somewhere a person can read, change, and undo. That place is version control. Every declarative piece of the stack belongs in one signed repository: the model hash manifest, the runtime pin, the orchestration code, the prompt templates, the guardrail configurations, and the embedding-model version. Binaries do not go in git, the weight files and the runtime images are too large and belong in an artefact registry; git holds the references to them, the hashes and the pinned versions, not the blobs themselves. The repository becomes the single declared truth of what the deployment is.
Two of the three properties operationalise here at once. Pin is the version of every compartment, captured as the committed state. Key reappears as signed commits: when each change to the declared state is signed, the change history is itself attestable, and the firm's evidence chain for what the system was on any given date is the commit log. A supervisor asking what ran in production on a date in the retention window is answered by a signed commit, not by recollection. One limit to keep honest: a signed commit attests who changed the declared state and when, not that the change was correct or safe. A signed bad configuration is still a bad configuration, so signing sits alongside review and the parity eval, it does not replace them.
The seams are where sovereignty leaks
Compartmentalisation does not remove complexity. It moves complexity to the contracts between compartments, and those seams are where a stack that is sovereign on paper leaks in practice. Three seams deserve a name.
The model-to-control-plane seam is the prompt contract. The control plane assumes the model responds to a given instruction format, and a model swap can break that assumption silently while every component still reports healthy. The embedding-to-corpus seam is the index contract. The corpus is only valid for the embedding model that built it, and a quiet change to the embedding model invalidates the store without any single component failing. The hardware-to-everything seam is the firmware floor named in the supply-chain piece, where GPU firmware, the BMC, and microcode sit beneath every signature in the stack and outside any pin authority the firm can exercise. None of these seams shows up as a broken compartment. Each shows up as a system that passes every component check and produces wrong answers. Name the seams at design time, because they are cheaper to find on a diagram than under supervisory pressure.
For each material AI workload the firm runs self-hosted, ask one question: could the firm replace the underlying model inside a controlled week without losing its corpus, its prompts, its policies, its evaluation suite, or its audit history. If the answer is yes, the firm holds a model exit strategy and its value is compartmentalised. If the answer is no, the firm's accumulated value is fused into one model, and that is a concentration the risk register should carry by name.
The model is the part everyone watches and the part that matters least to whether the firm stays sovereign. Watch the compartments around it, write down the three questions for each, keep the value out of the model, and the day the model has to change becomes a controlled week rather than a crisis. That is resilience as an architectural property: the stack holds because of how the pieces are separated, not because the most visible piece never has to move.
Sources and further reading
Credit where it is due. This piece builds on the ones below and on the frameworks named. Live links go to source-of-record where available; regulatory texts update mid-cycle, so read at the source, not from this table.
| Category | Source | Link |
|---|---|---|
| Builds on | ṛtaPulse: Whose hash, whose key, whose pin: supply chain is the sovereignty question (May 2026) | rtapulse.com/ai-augmented-governance/field-notes/whose-hash-whose-key-whose-pin |
| Related, data layer | ṛtaPulse: Two hundred and fifty documents (corpus integrity, RAG provenance) | rtapulse.com/ai-augmented-governance/field-notes/two-hundred-and-fifty-documents |
| Related, agent layer | ṛtaPulse: Every server is code in your context (host as integrity boundary) | rtapulse.com/resilience-engineering/field-notes/every-server-is-code-in-your-context |
| Related, time axis | ṛtaPulse: Proof with an expiry date (crypto-agility and migration) | rtapulse.com/ai-augmented-governance/field-notes/proof-with-an-expiry-date |
| Regulatory framework | EU DORA (Regulation 2022/2554), Article 9 ICT integrity, Article 28 third-party ICT and exit strategy | eur-lex.europa.eu |
| Regulatory framework | UK PRA SS2/21 (Model Risk Management Principles for Banks) | bankofengland.co.uk |
| Regulatory framework | MAS Technology Risk Management Guidelines; Notice 655 | mas.gov.sg |
| Regulatory framework | RBI Master Direction on IT Governance, Risk, Controls and Assurance Practices (2023) | rbi.org.in |
| Regulatory framework | HKMA TM-G-1 (General Principles for Technology Risk Management) | hkma.gov.hk |
| Supply-chain primitive | Sigstore (signing and transparency log for artefacts and commits) | sigstore.dev |
| Inference runtime | Ollama | github.com/ollama/ollama |
| Inference runtime | llama.cpp | github.com/ggerganov/llama.cpp |
| Vector store | Chroma (vector database) | trychroma.com |
| Indian AI ecosystem | IndiaAI Mission (Government of India) | indiaai.gov.in |
About the Author
I am a CISA and CISSP-certified governance practitioner. My day-to-day work spans technology risk, audit defensibility, and cross-border regulatory intelligence across the UK (FCA, PRA), India (RBI, SEBI, IFSCA), Southeast Asia (MAS), and the Gulf (CBUAE), with working knowledge of the EU AI Act's financial services implications.
My current research sits at the intersection of audit-defensible AI deployment patterns and supervisory expectations in regulated firms: sovereign open-weights deployment, supply-chain provenance for self-hosted inference, and the resilience properties of inference and agent pipelines in firms with cross-border regulatory perimeters.
Sentinel Engine is the sovereign model deployment I run from my own hardware, currently in beta. The compartment discipline above is the one Sentinel is built on: the model is pinned and treated as replaceable, while the corpus, prompts, policies, and signed logs are held in compartments a model swap does not touch. The model swap described here is one Sentinel has run: the generation model moved from Mistral to Qwen after smoke-testing four candidates, with the corpus and prompts untouched. Commit signing runs on a self-hosted signer, which keeps the attestation root inside the same perimeter as everything else. The Qwen-to-Llama move is the same discipline applied to the next swap.
LinkedIn • [email protected] • rtapulse.com
Corrections, counterexamples, and build ideas welcome. [email protected] • Discussions • Issues • How to collaborate.
Practitioner opinion. Not legal or regulatory advice. No vendor relationships. Full disclosures.