Writing

Move the model, keep the learnings — the sovereign stack is compartments, not layers

By Sachin Mehta • 7 June 2026 • 12 min read

Model Portability Sovereign AI Operational Resilience GitOps DORA

This note sits beside the five-part AI Governance in Regulated Firms series, where an interactive walkthrough steps the same sovereign stack from the perimeter to the time axis.

An earlier piece asked three questions of a self-hosted model: whose hash backstops the weights, whose key signs the inference log, whose pin authority holds the runtime. This piece answers a question that one provoked. If those are the right questions for the model, what are they for everything around the model? The answer reframes the stack. Hash, key, and pin are not three layers. They are three properties you ask of every compartment. Once the stack is seen that way, the model becomes the cheap, swappable piece, and the durable value sits in the compartments a model swap never touches. That is a resilience property: a stack that survives the loss or replacement of its most visible component by how it is built, not by hoping the component never has to change.

The question, asked of the whole stack

The supply-chain framing in whose hash, whose key, whose pin treated the model weights as the artefact and asked who stands behind each guarantee. The framing is correct and it is incomplete in one specific way: it reads as though hash, key, and pin describe three things, the weights, the log, and the runtime. They do not. They describe three properties, and every component in a self-hosted AI deployment has all three whether or not anyone has named them.

Integrity is the hash property: is this artefact the one I intended, unaltered. Attestation is the key property: can I prove what this component did, signed by something only I control. Version control is the pin property: is the running version the one I approved, unable to change without me. A model has all three. So does the inference runtime. So does the orchestration code, the guardrail layer, the embedding model, the vector store, and the signing infrastructure itself. The earlier piece asked the three questions of one compartment. The resilient posture asks them of every compartment, and writes down the answer for each.

This resolves a question that the supply-chain piece left ambiguous. If guardrails and the control plane need governing, do they fall under pin. Partly. The control plane has a version, so pin governs which version of the gateway, the agent loop, the prompt templates, and the guardrail configuration is running. But a guardrail also has behaviour, and behaviour is not a version question. Whether the guardrail that was meant to fire actually fired is an attestation question, answered by a signed decision log, which is the key property. Do not collapse the control plane into pin. Pin tells you which version of each compartment is live. Key tells you what each compartment did. They are different questions and a regulated deployment needs both answered for the control plane, not only for the model.

The compartments

A sovereign deployment is easier to govern, and far easier to recover, when it is drawn as a set of compartments rather than a single system. Each compartment is a separable piece with its own integrity, attestation, and version question, and crucially its own cost to replace. Drawing them this way exposes something the single-system view hides: the compartments are not equally swappable, and the cheap one to swap is the model itself.

Six compartments, three properties each, one swap-cost gradient. The model is the cheap piece to replace.

The same picture in a table, with the question to write down for each compartment and whether the compartment survives a model swap unchanged.

Compartment	Questions to write down	Swap cost	Survives a model swap?
Model weights	Whose hash	Low	This is the piece being swapped
Inference runtime	Whose pin	Low to moderate	Yes, largely model-agnostic
Control plane and guardrails	Whose pin for the version, whose key for proof it fired	Moderate; prompts re-validate per model	Mostly, with prompt re-validation
Embedding model and vector store	Whose hash for corpus integrity, whose pin for the embedding-model version	High; a full re-index if the embedding model changes	Only if the embedding model is held fixed
Signing keys	Whose key	High; rotation is a governed event	Yes, model-independent
Hardware and firmware	Outside the three questions, a named gap	Very high	Not applicable

Where the value actually sits

Read the swap-cost column again and the strategic point falls out. The model is cheap to replace. What is expensive to replace is everything the firm has accumulated around it: the document corpus and its embeddings, the prompt library tuned over months of production, the guardrail policies calibrated against real failure cases, the evaluation suite that defines what good output means for this firm, and the signed audit logs that prove the system behaved. None of that lives in the model. All of it lives in compartments a model swap does not touch.

This inverts the instinct that the model is the asset. The model is the commodity. A new open-weights model of comparable capability arrives every few months, and the firm can adopt it for the cost of a download and a verification run. The corpus, the prompts, the policies, and the evals are the durable capital, because they encode the firm's specific knowledge and the firm's specific definition of acceptable behaviour, and they took real time to build. A balance sheet that treats the model as the AI investment is looking at the cheap part.

Keep your learnings out of the model, so the model stays a thing you can replace. The architecture that makes the model disposable is the architecture that makes the firm sovereign.

ṛtaPulse research, June 2026

The sovereignty argument follows directly. Sovereignty is not which model you run. It is the ability to change which model you run, on your terms, without losing what you have built. A firm that can swap its generation model in a controlled week holds an option a firm with its value fused into one vendor's model does not. That option is what regulators are reaching for when they ask about concentration risk and exit strategy, even though a self-hosted open-weights model has no third-party provider to exit in the literal sense. The firm that compartmentalises has a practical exit from any single model, any single model-origin jurisdiction, and any single vendor's licensing terms, with its accumulated assets intact. One honest boundary: a decisive capability gap can still make a new model worth adopting on its merits, and the architecture exists precisely so the firm can take that gain without re-pouring its foundations.

Moving from Qwen to Llama, honestly

Take the concrete case. A firm runs a self-hosted assistant on one open-weights model and decides to move to another, say from a Qwen-family model to a Llama-family model, because of a capability gain, a licensing change, or a shift in the geopolitics of model origin. What moves with it, and what does not.

Field note, not theoryI have run a model swap on the deployment I operate. The generation model moved from Mistral to Qwen, after smoke-testing four candidate models and selecting the one that fit the extraction task best. The corpus and the prompts carried across untouched, because the value was never in the generation model; what changed was one compartment. The Qwen-to-Llama direction below is the same discipline applied to the next move I would make, and the failure modes named are the ones a real swap surfaced, not paper hazards.

Three categories of accumulated value have three different portability properties, and conflating them is where migration plans go wrong.

The corpus and its embeddings port, with one condition. The vector store is built by an embedding model, not by the generation model. Swapping the generation model leaves the embeddings valid, so the corpus carries over without a re-index, but only if the embedding model is held fixed across the swap. Change the embedding model and every vector in the store is stale, forcing a full re-index of the entire corpus. The embedding model, not the generation model, is the real lock-in in most deployments, and it is the compartment most teams never think to pin.

The prompts and policies port, with re-validation. Prompt templates and guardrail configurations are text, so they copy across trivially. Whether they still work is another matter. A prompt tuned to one model's instruction-following habits can degrade on another, and a guardrail regex calibrated against one model's output distribution can miss on another. The artefacts move; their fitness does not move with them. Every swap needs the prompts and guardrails re-validated against the new model before the swap is trusted.

Fine-tuning does not port at all. Any model-specific fine-tune or adapter is tied to the base model's weights and architecture. Move from Qwen to Llama and the fine-tune is re-done from scratch against the new base, or it is abandoned. This is the sharpest reason to keep the firm's knowledge in the corpus and the prompts rather than baked into a fine-tune: value in a fine-tune is value welded to one model, and it does not survive the swap that compartmentalisation is supposed to make cheap.

The gate that turns all of this from hope into discipline is a parity evaluation. Before any model swap goes live, the firm runs its evaluation suite against the new model and the old model on the same inputs and compares. A swap is trustworthy when the new model matches or beats the old one on the firm's own measures, not when the new model benchmarks well in someone else's report. No eval suite, no defensible swap. The eval suite is therefore a compartment in its own right, and one of the durable assets the firm is protecting. The honest dependency: an eval suite only catches what it covers, so a thin suite passes a regression it never tested for, including the case where a new generation model reads the same retrieved context differently. Coverage of the eval suite is the real single point of failure of the whole portability claim, which is the strongest reason to treat it as a first-class asset and not an afterthought.

Git as the spine, with its limit named

Compartments are only manageable if their state is written down somewhere a person can read, change, and undo. That place is version control. Every declarative piece of the stack belongs in one signed repository: the model hash manifest, the runtime pin, the orchestration code, the prompt templates, the guardrail configurations, and the embedding-model version. Binaries do not go in git, the weight files and the runtime images are too large and belong in an artefact registry; git holds the references to them, the hashes and the pinned versions, not the blobs themselves. The repository becomes the single declared truth of what the deployment is.

Two of the three properties operationalise here at once. Pin is the version of every compartment, captured as the committed state. Key reappears as signed commits: when each change to the declared state is signed, the change history is itself attestable, and the firm's evidence chain for what the system was on any given date is the commit log. A supervisor asking what ran in production on a date in the retention window is answered by a signed commit, not by recollection. One limit to keep honest: a signed commit attests who changed the declared state and when, not that the change was correct or safe. A signed bad configuration is still a bad configuration, so signing sits alongside review and the parity eval, it does not replace them.

Revert is a control-plane safety net, not a universal undo.

The limit you must state out loudGit revert restores declared state. It does not restore the world. Re-embedding a corpus with a new embedding model, rotating a signing key, and migrating a database are stateful changes, and reverting the commit that triggered them does not undo them. Those changes need their own snapshots and a forward-fix runbook, not a revert. Sell git as the audit and rollback spine for configuration, and say plainly that the changes most likely to hurt are precisely the ones it cannot reverse. That honesty is the difference between a resilience layer and a comfort blanket.

The seams are where sovereignty leaks

Compartmentalisation does not remove complexity. It moves complexity to the contracts between compartments, and those seams are where a stack that is sovereign on paper leaks in practice. Three seams deserve a name.

The model-to-control-plane seam is the prompt contract. The control plane assumes the model responds to a given instruction format, and a model swap can break that assumption silently while every component still reports healthy. The embedding-to-corpus seam is the index contract. The corpus is only valid for the embedding model that built it, and a quiet change to the embedding model invalidates the store without any single component failing. The hardware-to-everything seam is the firmware floor named in the supply-chain piece, where GPU firmware, the BMC, and microcode sit beneath every signature in the stack and outside any pin authority the firm can exercise. None of these seams shows up as a broken compartment. Each shows up as a system that passes every component check and produces wrong answers. Name the seams at design time, because they are cheaper to find on a diagram than under supervisory pressure.

A question for your board's risk register

For each material AI workload the firm runs self-hosted, ask one question: could the firm replace the underlying model inside a controlled week without losing its corpus, its prompts, its policies, its evaluation suite, or its audit history. If the answer is yes, the firm holds a model exit strategy and its value is compartmentalised. If the answer is no, the firm's accumulated value is fused into one model, and that is a concentration the risk register should carry by name.

Regulatory frame note: The control plane and the data compartment are ICT assets, and under DORA they fall inside the Article 9 lifecycle integrity obligation. Model portability is not a literal Article 28 third-party exit, a self-hosted open-weights model is not a third-party provider, but supervisors increasingly read model-concentration risk through the same lens of documented substitutability. PRA SS2/21 deployment-environment attestation extends to the control plane and its signed change record, not only to the model. Equivalents apply under the RBI IT and IT-outsourcing directions, MAS TRM and Notice 655, and HKMA TM-G-1. The compartment architecture generalises across jurisdictions; framework mapping and materiality thresholds differ, and this is not a substitute for counsel.

The model is the part everyone watches and the part that matters least to whether the firm stays sovereign. Watch the compartments around it, write down the three questions for each, keep the value out of the model, and the day the model has to change becomes a controlled week rather than a crisis. That is resilience as an architectural property: the stack holds because of how the pieces are separated, not because the most visible piece never has to move.

Sources and further reading

Credit where it is due. This piece builds on the ones below and on the frameworks named. Live links go to source-of-record where available; regulatory texts update mid-cycle, so read at the source, not from this table.

Category	Source	Link
Builds on	ṛtaPulse: Whose hash, whose key, whose pin: supply chain is the sovereignty question (May 2026)	rtapulse.com/ai-augmented-governance/field-notes/whose-hash-whose-key-whose-pin
Related, data layer	ṛtaPulse: Two hundred and fifty documents (corpus integrity, RAG provenance)	rtapulse.com/ai-augmented-governance/field-notes/two-hundred-and-fifty-documents
Related, agent layer	ṛtaPulse: Every server is code in your context (host as integrity boundary)	rtapulse.com/resilience-engineering/field-notes/every-server-is-code-in-your-context
Related, time axis	ṛtaPulse: Proof with an expiry date (crypto-agility and migration)	rtapulse.com/ai-augmented-governance/field-notes/proof-with-an-expiry-date
Regulatory framework	EU DORA (Regulation 2022/2554), Article 9 ICT integrity, Article 28 third-party ICT and exit strategy	eur-lex.europa.eu
Regulatory framework	UK PRA SS2/21 (Model Risk Management Principles for Banks)	bankofengland.co.uk
Regulatory framework	MAS Technology Risk Management Guidelines; Notice 655	mas.gov.sg
Regulatory framework	RBI Master Direction on IT Governance, Risk, Controls and Assurance Practices (2023)	rbi.org.in
Regulatory framework	HKMA TM-G-1 (General Principles for Technology Risk Management)	hkma.gov.hk
Supply-chain primitive	Sigstore (signing and transparency log for artefacts and commits)	sigstore.dev
Inference runtime	Ollama	github.com/ollama/ollama
Inference runtime	llama.cpp	github.com/ggerganov/llama.cpp
Vector store	Chroma (vector database)	trychroma.com
Indian AI ecosystem	IndiaAI Mission (Government of India)	indiaai.gov.in

About the Author

I am a CISA and CISSP-certified governance practitioner. My day-to-day work spans technology risk, audit defensibility, and cross-border regulatory intelligence across the UK (FCA, PRA), India (RBI, SEBI, IFSCA), Southeast Asia (MAS), and the Gulf (CBUAE), with working knowledge of the EU AI Act's financial services implications.

My current research sits at the intersection of audit-defensible AI deployment patterns and supervisory expectations in regulated firms: sovereign open-weights deployment, supply-chain provenance for self-hosted inference, and the resilience properties of inference and agent pipelines in firms with cross-border regulatory perimeters.

A footnote on Sentinel Engine

Sentinel Engine is the sovereign model deployment I run from my own hardware, currently in beta. The compartment discipline above is the one Sentinel is built on: the model is pinned and treated as replaceable, while the corpus, prompts, policies, and signed logs are held in compartments a model swap does not touch. The model swap described here is one Sentinel has run: the generation model moved from Mistral to Qwen after smoke-testing four candidates, with the corpus and prompts untouched. Commit signing runs on a self-hosted signer, which keeps the attestation root inside the same perimeter as everything else. The Qwen-to-Llama move is the same discipline applied to the next swap.

LinkedIn • [email protected] • rtapulse.com

Collaborate

Corrections, counterexamples, and build ideas welcome. [email protected] • Discussions • Issues • How to collaborate.

Disclosures

Practitioner opinion. Not legal or regulatory advice. No vendor relationships. Full disclosures.

Request a topic

← Field Notes

Every server is code in your context →