Data Residency Questions To Ask Before Putting AI Workflows Into Production In The UK

The Sovereign Cloud

4 July 2026 | By Ashley Marshall

Quick Answer: Data Residency Questions To Ask Before Putting AI Workflows Into Production In The UK

Before putting AI workflows into production in the UK, ask where every class of data is stored, processed, logged, backed up, accessed and supported. The practical test is whether you can explain the full route of personal data, confidential records and model telemetry across cloud regions, vendors, subprocessors and incident processes without relying on sales language.

Data residency is not a hosting checkbox. Once an AI workflow is in production, prompts, files, logs, embeddings, human reviews and supplier support paths can all become part of the data map.

Start with the workflow, not the vendor region

The first data residency question is not "is this hosted in the UK?" It is "what data moves through this workflow from start to finish?" That sounds basic, but it is where many AI rollouts go wrong. A production AI workflow may take data from a CRM, retrieve policy documents from SharePoint, create embeddings in a vector database, call a model endpoint, store prompts and responses for evaluation, send traces to an observability tool, trigger an action in a ticketing system and create logs for support. One cloud region label does not describe that chain.

For UK businesses, the right starting point is a data movement map. List the data classes first: personal data, special category data, customer records, employee records, commercial contracts, operational instructions, prompts, retrieved documents, embeddings, outputs, evaluation data, audit logs, error traces and backups. Then ask where each class is stored, where it is processed, who can access it, which support teams can see it, how long it is retained and whether it is used to improve a model or service. This is especially important for tools such as Microsoft Copilot Studio, Azure OpenAI, AWS Bedrock, Google Vertex AI, OpenAI Enterprise, Anthropic on cloud marketplaces, Salesforce Einstein, ServiceNow Now Assist and private RAG stacks.

The ICO's AI and data protection guidance is useful because it frames AI as a lifecycle issue, not a single procurement event. It stresses accountability, governance, transparency, lawfulness, fairness, accuracy, security, data minimisation and individual rights. Those principles apply to each data movement in the workflow. The NCSC's cloud security principles make a similar point from a security angle: you should analyse the cloud service and the company that runs it, and consider the evidence provided by the provider.

What this means in practice: before approving production, require a one-page data residency map for the workflow. It should show the primary cloud region, failover region, model endpoint, vector database, logging destination, analytics destination, backup location, support access route and subprocessors. If the supplier cannot explain the map without hand-waving, the workflow is not ready for production.

Sources: ICO guidance on AI and data protection and NCSC cloud security principles.

Ask whether UK residency is contractual, technical or merely configurable

UK data residency can mean several different things. It may mean the customer can choose a UK region for storage. It may mean processing happens in a UK region except during failover. It may mean primary application data stays in the UK, while telemetry, abuse monitoring, billing records, support tickets or model safety logs go elsewhere. It may mean the supplier has a UK tenant option, but some AI features still call a global service. These distinctions matter because a production AI workflow often exposes more than the final business record.

The practical question is whether the residency claim is contractual, technical or only configurable. A contractual commitment should appear in the data processing agreement, service terms, subprocessor list, product documentation and order form. A technical commitment should be enforceable through region selection, network controls, tenant settings, customer-managed keys, private endpoints, data loss prevention rules and logs. A configurable commitment means the customer can set the service up correctly, but could also set it up badly. That is not necessarily a problem, but it changes who owns the control.

The UK Government's Technology Code of Practice, last updated in July 2025, tells public sector teams to consider public cloud first, make things secure, make privacy integral, make better use of data and define a purchasing strategy that considers commercial and technology aspects and contractual limitations. Those points translate well to private sector AI procurement. The question is not whether cloud is acceptable. The question is whether the business has matched the residency requirement to the risk of the workflow.

Named examples help. Azure OpenAI customers choose deployment regions, but they still need to check product-specific data handling and logging commitments. AWS Bedrock offers regional model invocation, but the customer still needs to understand model provider terms, CloudTrail, guardrail logging and cross-region resilience. Google Vertex AI has regional services, but teams still need to check feature availability by region and data logging settings. A UK hosted SaaS product may still rely on US analytics, CDN, support or AI inference providers.

What this means in practice: ask suppliers to state the residency commitment in three columns: data at rest, data in processing and operational metadata. Then ask for the exact product documentation and contract clause that supports each answer.

Source: GOV.UK Technology Code of Practice.

Treat international transfer rules as a production control

Data residency and international transfers are related, but they are not the same thing. A system can be hosted in the UK and still involve restricted transfers if personal data is accessed, processed or supported from another country. A workflow can also process data in an adequate country while relying on subprocessors elsewhere. The ICO's international transfer guidance is clear that organisations need to understand when the rules on transferring personal information to other countries apply, when there is a restricted transfer, and what safeguards are being used.

Before production, ask the supplier to identify every country where personal data may be stored, processed or accessed. Do not stop at the main cloud region. Include support access, incident response, diagnostics, security monitoring, model safety review, human annotation, backup operations, disaster recovery, subprocessors and affiliates. Then ask which transfer mechanism applies: UK adequacy regulations, the UK International Data Transfer Agreement, the UK Addendum to the EU standard contractual clauses, binding corporate rules, or an exception. If a transfer risk assessment is needed, ask who completes it, who reviews it and what supplementary measures are in place.

This is not paperwork for its own sake. AI workflows often create new data that did not exist before: prompts, summaries, classifications, embeddings, risk scores, tool-call traces and evaluation records. Some of that data may contain personal data even if the original use case was described as operational automation. A customer support summariser can reveal health, financial distress or vulnerability information. A sales assistant can expose commercially sensitive negotiations. An HR copilot can process employee relations notes. Residency questions must cover the new artefacts as well as the source records.

The common misconception is that "UK hosted" means "no international transfer issue". That is unsafe. The right question is whether any personal data leaves the UK by storage, processing, remote access or onward support. If the answer is yes, the business needs to know the mechanism and the residual risk. If the supplier says no, the business should ask for the product documentation, subprocessor schedule and logging evidence that proves it.

Source: ICO international transfers guidance.

Interrogate logs, embeddings and evaluation data before go-live

The data most likely to be missed is not the customer record itself. It is the AI operational data created around it. Prompts, completions, retrieved chunks, embeddings, red-team outputs, latency traces, error logs, human feedback, evaluation sets, screenshots, support tickets and incident notes can all contain sensitive information. They are also the data teams most often keep because they need to debug, improve and evidence the system. That makes them residency and retention questions, not only engineering details.

The NCSC's cloud security principles include asset protection and resilience, governance, operational security, personnel security, supply chain security, secure user management, audit information and secure use of the service. Those principles are directly relevant to AI logs. Ask where logs are stored, whether they are encrypted, who can search them, whether supplier staff can access raw prompts, whether logs are copied into third-party observability platforms, whether embeddings can be deleted, whether vector stores are backed up, and whether deleted source documents remain in derived indexes.

Recent UK cyber statistics make the supplier and logging angle more urgent. The Cyber Security Breaches Survey 2025 found that 43 percent of UK businesses and 30 percent of charities identified a cyber security breach or attack in the previous 12 months. Among organisations that experienced a breach or attack, phishing was reported by 85 percent of businesses and 86 percent of charities. The same survey found that only 14 percent of businesses reviewed risks posed by immediate suppliers, and only 7 percent reviewed the wider supply chain. That gap is exactly where AI production risk can hide.

What this means in practice: add an AI logging schedule to the production checklist. For each log type, record location, purpose, retention period, access group, deletion route, encryption, masking, export path and incident review owner. Decide what must never be logged, such as payment card details, secrets, privileged legal advice, raw special category data unless justified, or credentials. Then test it. Send seeded data through the workflow and confirm whether it appears in logs, traces, vector stores and analytics dashboards.

Sources: NCSC cloud security principles and Cyber Security Breaches Survey 2025.

Put supplier due diligence into operational language

Supplier questionnaires often fail because they ask broad questions and receive broad answers. "Are you GDPR compliant?" is not a production control. "Where is our prompt and retrieved source text stored after a failed model call from the complaints workflow?" is a production control. The difference is specificity. AI workflows need due diligence that follows the data path, not generic assurance language.

Start with twelve direct questions. Which cloud regions are used for storage, inference, logs, backups and disaster recovery? Which subprocessors can access our data and from which countries? Are prompts and outputs used for model training, service improvement or human review? Can we disable retention of prompts and outputs? Can we use customer-managed encryption keys? Can we route traffic through private networking? Can we delete embeddings when the source document is deleted? Are logs searchable by supplier support teams? What happens during failover? What evidence can you provide for region confinement? How quickly will you notify us of subprocessor changes? What audit reports, penetration tests and incident summaries are available?

The UK Government's AI Playbook for the public sector, published in February 2025, is a useful model for the tone of procurement. It tells teams to use AI lawfully, ethically and responsibly, use AI securely, manage the full AI lifecycle, use the right tool for the job and work with commercial colleagues from the start. It also explicitly says teams should seek data protection advice early and understand risks such as prompt injection, data poisoning, hallucinations and cyber attacks. Private sector teams should take the same cross-functional approach.

Procurement should also reflect the operating model. If the workflow is built on Microsoft 365 and Azure, involve Microsoft tenant administrators, security, data protection and operations. If it uses AWS Bedrock or SageMaker, involve cloud platform owners and network teams. If it uses a specialist SaaS vendor, require a data flow diagram and a subprocessor map. If it uses open-source models on UK infrastructure, ask about patching, weights provenance, vulnerability scanning, abuse monitoring and who supports the stack at 2am.

The practical outcome should be a supplier evidence pack: data flow map, transfer mechanism, subprocessor list, region controls, logging schedule, retention matrix, incident process, security evidence and exit plan. Without that pack, production approval is guesswork.

Source: GOV.UK AI Playbook for the UK Government.

Decide what residency level the business actually needs

The final question is commercial as much as technical: what level of residency does this workflow need? Not every AI use case needs UK-only infrastructure. A low-risk internal drafting assistant may be acceptable on a mainstream cloud service with strong contractual safeguards, good access controls and clear logging rules. A workflow processing patient records, vulnerable customer data, regulated financial decisions, legal case files, public sector data or sensitive intellectual property may justify UK region confinement, private networking, customer-managed keys, stricter support access and documented transfer risk assessment.

This is where the misconception cuts both ways. Some teams assume data residency is unnecessary because "the big cloud providers are compliant". Others assume every workload needs sovereign cloud, which can slow useful adoption and create unnecessary cost. The better approach is to classify workflows by data sensitivity, decision impact, regulatory exposure, operational criticality and public trust. Then apply the residency pattern that matches the risk.

There are practical patterns. For standard business productivity, Microsoft 365 Copilot with tenant controls may be enough if the organisation already governs permissions well. For regulated workflow automation, Azure OpenAI in a chosen region, AWS Bedrock in a specific region or Google Vertex AI with controlled logging may fit. For highly sensitive workflows, a UK hosted private RAG stack, on-prem inference, dedicated cloud tenancy or a sovereign cloud option may be worth the complexity. For some use cases, the right answer is not to put personal data into the AI layer at all, but to use synthetic data, redaction, retrieval filters or human-mediated review.

Precise Impact AI's view is that residency decisions should be made at workflow level. A blanket policy either blocks too much or allows too much. Ask the questions, document the answer, and turn the answer into production controls. The strongest AI programmes do not ask suppliers for reassurance after the fact. They define the data boundary first, test it before go-live and keep checking it when vendors change models, regions, features and subprocessors. For related planning, see how to run AI data residency reviews in supplier management and sovereign AI backup plans for model and workflow portability.

Source context: GOV.UK guidelines for AI procurement.

Frequently Asked Questions

Is UK data residency mandatory for every AI workflow?

No. UK law does not require every AI workflow to be UK-only. The requirement depends on the data, sector, transfer mechanism, contract, risk level and whether the organisation can meet UK GDPR and security obligations.

Does UK hosting mean there is no international transfer?

Not necessarily. Remote support, subprocessors, failover, diagnostics, analytics, human review or logging can still involve access or processing outside the UK. Ask for the full data map.

What is the most important data residency question to ask a supplier?

Ask where prompts, source documents, embeddings, outputs, logs, backups and support data are stored, processed and accessed, broken down by country and subprocessor.

Should prompts and outputs be retained?

Only where there is a defined purpose, retention period and access control. Some workflows need prompt and output retention for audit and evaluation, but others should minimise or disable retention.

Do embeddings count for data residency purposes?

They can. Embeddings may be derived from personal or confidential data and can be linked back to source material in a retrieval system, so they need location, access, retention and deletion controls.

How should SMEs approach supplier due diligence?

Start with a simple evidence pack: data flow diagram, region commitments, subprocessor list, transfer mechanism, logging schedule, retention policy, support access rules and exit plan.

What if the supplier will not answer detailed residency questions?

Treat that as a risk signal. For low-risk use it may be acceptable with limits, but for production workflows handling sensitive or regulated data, lack of evidence should block approval.

Is sovereign cloud always the best answer?

No. Sovereign cloud can be the right answer for high-sensitivity workflows, but many AI use cases can run safely on mainstream cloud if data boundaries, logging, transfers and access controls are well governed.