Why UK firms need a live model change risk register for frontier AI releases

Model Intelligence & News

10 May 2026 | By Ashley Marshall

Why UK firms need a live model change risk register for frontier AI releases?

UK firms need a live model change risk register because frontier model upgrades can materially alter capability, refusal behaviour, tool use, cost, latency and data risk inside production workflows. The register gives leaders an auditable way to decide what changed, who owns it, what was tested and whether the risk is acceptable.

Frontier model releases are no longer background supplier updates. If they change how live workflows behave, they need the same discipline as any other production change.

Frontier model releases are now operational change events

The important shift for UK firms is not that frontier models are improving. It is that the improvement now lands directly inside live work. A model release can change how a customer service agent escalates a complaint, how a coding assistant edits a repository, how a finance team summarises exceptions, or how an analyst interprets a regulatory document. That makes the release a production change, even when no one in the business has deployed new application code.

OpenAI's April 2026 GPT-5.5 release is a useful example. The company says the model is designed for complex real-world work, including researching online, analysing information, creating documents and spreadsheets, writing code, and moving across tools until a task is done. It also says GPT-5.5 and GPT-5.5 Pro became available in the API on 24 April 2026, with additional deployment safeguards. For a board, risk committee or operations lead, that is not a product announcement to file away. It is a control question: which of our live workflows will inherit different behaviour because a supplier has changed the model underneath them?

The point is not to slow adoption to a crawl. It is to stop treating frontier model upgrades as harmless background improvements. If a SaaS supplier swapped a payments rules engine, changed its fraud scoring thresholds or pushed a material CRM automation change, most firms would expect release notes, testing, a rollback plan and an owner. A model change risk register applies the same discipline to AI. It records what changed, where that model is used, what could break, what evidence has been collected, who accepted the residual risk, and what will trigger rollback or extra review.

What this means in practice is simple: procurement, IT, data protection, security and operational owners need one shared view of model changes. The register should include direct API models such as OpenAI, Anthropic, Google Gemini or Mistral, embedded copilots such as Microsoft 365 Copilot or GitHub Copilot, and AI features inside line-of-business tools. Without that live map, the business may be running production processes on a new frontier model before anyone has checked whether the decision logic, refusal behaviour, latency, output format, data handling or safety posture has changed.

Sources referenced include OpenAI's GPT-5.5 release note and OpenAI's GPT-5.5 system card.

The risk is measurable: capability, cost and autonomy are changing quickly

The strongest argument for a live register is that the underlying capability curve is now too fast for annual policy review. The UK's National Cyber Security Centre reported in April 2026 that the AI Security Institute evaluated seven frontier models released before March 2026 on multi-step cyber attack scenarios. On a 32-step enterprise network attack, estimated to take a human expert about 14 hours end to end, the best-performing model averaged 15.6 steps with extended processing time, and its single best run reached 22 of 32 steps. NCSC also noted that the best model went from fewer than two steps 18 months earlier to 9.8 steps without extended processing time, with a full attempt costing around £65.

Those figures matter beyond cyber security. They show that frontier releases can materially change what a model can do inside a workflow. If an AI assistant can take more steps, hold context for longer, use tools more reliably and keep working with less supervision, then the operating model around it has to change. A workflow that was safe because it required frequent human steering may become riskier when the model becomes more persistent. A workflow that previously failed quickly may now complete an action chain that reaches customer data, production code, supplier portals or financial reports.

The same OpenAI release notes report GPT-5.5 scoring 82.7 percent on Terminal-Bench 2.0, 78.7 percent on OSWorld-Verified and 81.8 percent on CyberGym. OpenAI also says more than 85 percent of its own company uses Codex every week, and cites internal uses across finance, communications, marketing, data science and product management. These are vendor claims, so they should not be treated as independent assurance. But they are exactly the sort of supplier evidence that should trigger a register entry, a targeted evaluation plan and an acceptance decision before live dependency expands.

What this means in practice: every frontier model release should be triaged against the workflows it touches. A low-risk summarisation use case might only need prompt regression tests, output sampling and owner sign-off. A workflow that writes code, drafts regulated communications, recommends vulnerability fixes or touches personal data needs a stronger gate: pre-release sandbox tests, benchmark comparison against previous model output, red-team prompts relevant to the business, monitoring after rollout and a named rollback route.

The cyber figures above are from NCSC analysis of frontier AI cyber capability. For broader context, see our internal perspective on AI governance operating models.

UK governance already points in this direction

There is no single UK law called the model change risk register act, and that can mislead executives into thinking this is optional paperwork. It is better to read the existing governance signals together. The ICO's accountability guidance says organisations are responsible for complying with the UK GDPR and must be able to demonstrate compliance. It lists measures such as maintaining documentation, implementing appropriate security, carrying out DPIAs for high-risk processing, recording breaches and reviewing measures when necessary. It also states that accountability obligations are ongoing, not a one-off exercise.

The government's Data and AI Ethics Framework makes the same practical point for public sector projects, and it is useful for private sector governance teams too. It says teams should regularly revisit the framework throughout a project, especially when they make changes. It defines accountability as having appropriate governance, oversight and routes to challenge decisions. It defines safety as ensuring data-driven systems such as AI are robust, secure and safe at every stage of their life cycle. A frontier model release inside a live workflow is plainly a life-cycle change.

The Department for Work and Pensions' public Artificial Intelligence Security Policy is more operational. It says DWP promotes AI in a measured and controlled manner, sets responsibilities for data protection, accountability, accuracy and transparency, and states that significant changes to an approved AI tool's use case must be approved by a relevant governance board. It also requires the DPIA process where an AI tool processes personal data, is introduced into an existing process involving personal data, or its output affects people. Most firms do not need to copy DWP policy word for word, but the direction is clear: approval, evidence and change control matter.

For private UK firms, the register becomes the connective tissue between these obligations. It can link a model release to the relevant DPIA, supplier assessment, security review, data processing agreement, equality or bias review, risk acceptance and board reporting. It also gives the business a defensible audit trail if a customer, regulator, insurer or client asks how it knew a model upgrade was safe enough to use in production.

Relevant UK guidance includes the ICO accountability guidance, the Data and AI Ethics Framework and the DWP Artificial Intelligence Security Policy.

A useful register tracks decisions, not just model names

A weak register is just an inventory: model name, vendor, date and owner. That is better than nothing, but it will not help during an incident or assurance review. A useful register is a live decision record. It should show the model version, release date, deployment route, affected workflows, business owner, technical owner, data owner, supplier evidence reviewed, internal tests run, risks identified, controls applied, residual risk rating, acceptance decision, review date and rollback or containment plan.

For example, a firm using Microsoft 365 Copilot for internal knowledge work, GitHub Copilot for development, OpenAI or Anthropic through an API, and a CRM with embedded AI summarisation needs to know which workflows are pinned to a fixed model, which are on automatic upgrade, and which are hidden behind a supplier-controlled abstraction. The most dangerous phrase is often 'the vendor manages that'. The vendor may manage the platform, but the firm still owns its client commitments, data protection duties, service quality, cyber resilience and operational risk appetite.

The register should also separate change types. A capability change might make the model better at tool use or longer chains of reasoning. A safety change might alter refusal behaviour, content filters or cyber safeguards. A commercial change might affect cost, rate limits or data residency options. A product change might add memory, connectors, file access or agentic actions. Each change has a different risk profile. Tighter cyber classifiers, for instance, can be positive for misuse reduction but can also block legitimate security work, break support workflows or create inconsistent user experiences if no one tests them before rollout.

What this means in practice: make the register part of the release calendar, not a quarterly spreadsheet. Ask suppliers for advance release notes, deprecation timelines, model card or system card updates, security statements, data processing changes and audit evidence. For API use, record whether you can pin model versions and whether fallback models are allowed. For SaaS use, record the admin controls available, whether opt-in previews are enabled, and which departments can turn on new AI features without central review.

The counterargument is speed, but uncontrolled speed is false economy

The most common objection is predictable: frontier AI moves too quickly for formal change control, and teams need the latest model to stay competitive. There is truth in that. If governance becomes a monthly committee that blocks every experiment, staff will route around it. Engineers will use personal accounts, analysts will paste data into unauthorised tools, and departments will buy AI features inside existing SaaS contracts without telling IT. A register should not be used as a brake on low-risk experimentation.

The mistake is to confuse experimentation with production dependency. A live risk register can support speed precisely because it tells teams what is already approved, what evidence is needed for a new release, and where the escalation threshold sits. The fastest organisations will not be those with no governance. They will be those with pre-agreed lanes: safe sandbox use, supervised internal use, client-facing draft use, human-approved production use, and autonomous production use. Each lane should have a different evidence standard.

There is also a misconception that vendor safety work removes the need for local controls. Microsoft said in May 2026 that it is working with the UK's AI Security Institute on methods for evaluating high-risk capabilities and the effectiveness of safeguards. OpenAI says GPT-5.5 went through full predeployment safety evaluations, targeted red-teaming for advanced cybersecurity and biology, and feedback from nearly 200 early-access partners. Those are useful signals, but they are not a substitute for testing your own workflow, with your own prompts, data, users, integrations, permissions and failure consequences.

The practical compromise is tiered change control. Low-risk use cases get lightweight logging and sample checks. Medium-risk workflows get regression tests, owner approval and monitoring. High-risk workflows get a formal change assessment, legal or data protection input, security review, human fallback and rollback criteria. This is how firms keep pace without pretending every model release is either harmless or catastrophic. It is also how leadership avoids the unhelpful binary of 'approve AI' or 'ban AI'.

The Microsoft and AISI partnership is described by the AI Security Institute and Microsoft.

How to start before the next release lands

Most firms can build a first version of the register in two weeks. Start with the live workflows, not the tools. Ask: where does AI influence a customer outcome, employee decision, operational process, security control, software release, legal review, financial analysis or public communication? Then map which model or supplier capability sits behind each workflow. This will reveal uncomfortable gaps, especially where AI features are embedded inside SaaS tools that procurement treated as ordinary product updates.

The minimum viable register should include ten fields: workflow name, model or AI feature, supplier, current version or release channel, upgrade mode, data involved, impact level, owner, evidence required and next review date. Add richer fields once the habit is established. The strongest registers also include an issue log for post-release drift: hallucination patterns, changed tone, broken templates, new refusals, latency spikes, cost anomalies, connector errors, prompt injection incidents and human override rates. These are operational signals, not abstract ethics notes.

Then create a release review rhythm. Assign someone to monitor supplier release notes for OpenAI, Anthropic, Google, Microsoft, GitHub and any vertical SaaS platforms in use. Tie that monitoring to service management tooling such as Jira Service Management, ServiceNow, Azure DevOps or a controlled SharePoint register. Require business owners to confirm whether a release affects their process. Security should review tool access and logging. Data protection should review personal data use and DPIA triggers. Finance should review consumption and cost changes. Operations should confirm fallback plans.

Finally, test the register through a tabletop exercise. Pick a recent frontier release and ask what would happen if output quality changed overnight, refusals increased, a connector gained new access, or a model began taking longer multi-step actions than expected. Could the business identify affected workflows in an hour? Could it pause the feature? Could it explain the decision to a client or regulator? If the answer is no, the firm does not yet have production control over its AI estate. It has enthusiasm, useful tools and a blind spot.

Frequently Asked Questions

What is a model change risk register?

It is a live record of material AI model changes, the workflows they affect, the evidence reviewed, the risks identified, the controls applied and the person who accepted the residual risk.

Is this only needed for firms building their own AI systems?

No. It is often more important when firms use third-party AI through APIs, copilots or SaaS products, because supplier-controlled changes can affect live workflows without an internal software release.

Which model changes should trigger a register entry?

Any change that could affect output quality, autonomy, refusal behaviour, tool access, data processing, latency, cost, security controls, compliance evidence or user permissions should be logged and triaged.

How is this different from an AI inventory?

An inventory lists what AI tools exist. A change risk register records decisions about specific releases and upgrades, including testing, ownership, risk acceptance and rollback plans.

Does UK regulation require this exact document?

Not by name. But UK GDPR accountability, DPIA duties, security governance and public sector AI guidance all point towards ongoing evidence, review and control when AI systems change.

Should every new model release go to a governance board?

No. Use tiers. Low-risk internal use may only need logging and sample checks. High-risk workflows involving personal data, customer impact, security actions or autonomous tool use need stronger review.

Can vendor system cards and release notes be enough evidence?

They are useful inputs, not complete assurance. You still need local testing against your workflows, prompts, data, integrations, users and failure consequences.

Who should own the register?

The best owner is usually a shared AI governance or risk function, with named business, technical, security and data protection owners for each workflow.