AI Red-Team Evidence for UK Customer-Facing Agent Deployments

AI Trust & Governance

14 June 2026 | By Ashley Marshall

Quick Answer: AI Red-Team Evidence for UK Customer-Facing Agent Deployments

UK businesses deploying customer-facing AI agents need red-team evidence that shows how the agent behaves under adversarial prompts, unsafe customer requests, data leakage attempts, tool misuse, prompt injection, escalation failures and unexpected model changes. The evidence should connect testing results to business controls, launch gates, monitoring, incident response and accountable ownership.

A customer-facing AI agent should not go live because a demo looked polished. It should go live because the business has evidence that it behaves safely under pressure.

Red-team evidence is becoming a launch requirement

Customer-facing AI agents are moving from support experiments into live service channels. They answer complaints, triage bookings, draft refunds, retrieve policy wording, recommend next steps, update CRM records and hand customers to human teams. That changes the assurance problem. A chatbot that only drafts internal copy can be managed with lighter controls. An agent that talks to customers and touches operational systems needs evidence that it can withstand misuse, confusion, edge cases and hostile input.

The urgency is practical, not theoretical. Ofcom reported that ChatGPT had 1.8 billion UK visits in the first eight months of 2025, up from 368 million in the same period of 2024, and that about 30% of searches now show AI overviews. Customers are being trained by the wider internet to expect instant AI-assisted answers. The commercial pressure on UK firms is obvious: reduce contact centre load, improve out-of-hours service, speed up case handling and keep customers from waiting in queues. The risk is that speed becomes the only launch metric.

Red-team evidence gives leaders a better question than whether the agent works. It asks how the agent fails. Can a customer persuade it to reveal another person's data? Does it invent refund rights? Can it be manipulated through a pasted email, malicious PDF or poisoned knowledge-base article? Does it escalate complaints when regulation, vulnerability, safeguarding, legal threat or financial loss appears? Does it call tools only inside its authority? Does it refuse safely when the right answer is no?

For UK deployments, this evidence should sit alongside data protection impact assessments, cyber risk reviews, operational readiness checks and customer journey testing. It should be understandable to product, legal, security, customer operations and senior management. A red-team report that only lists clever prompts is not enough. The useful output is a launch decision record: what was tested, what broke, what was fixed, what remains accepted risk, who owns it and what monitoring will catch the next failure. Related: AI cyber security evidence for agent platforms.

Sources: Ofcom Online Nation 2025 and GOV.UK AI Cyber Security Code of Practice.

What UK guidance already expects you to prove

The UK already has enough guidance to turn red-team work into evidence rather than theatre. DSIT's AI Cyber Security Code of Practice, published on GOV.UK, sets out 13 principles for the cyber security of AI systems. It covers staff awareness, secure design, risk management, human responsibility, asset tracking, infrastructure, supply chain, documentation of data, models and prompts, testing and evaluation, user processes, updates, monitoring and disposal. For a customer-facing agent, those principles map directly onto launch gates.

Principle 9, appropriate testing and evaluation, is the obvious anchor. But the stronger approach is to use the full Code as the test frame. If Principle 2 says the AI system should be designed for security as well as functionality and performance, the red team should test whether security weakens when customer journeys get messy. If Principle 8 says data, models and prompts should be documented, the test report should identify the model version, system prompt, tool permissions, retrieval sources and configuration used during testing. If Principle 12 says system and user actions should be logged, the test should prove that failed attacks and suspicious behaviours are visible after the event.

The ICO's agentic AI work adds the data protection angle. Its Tech Futures report says agentic systems increase automation, use contextual information, operate through natural language and can process personal information beyond what is necessary if design choices are poor. It also highlights risks around broad purposes, special category inferences, transparency, information rights and cyber security. For customer-facing deployments, red-team testing should therefore include privacy abuse cases, not only security jailbreaks.

In practice, this means the evidence pack should answer four questions. First, what harms are plausible for this exact customer journey? Second, what controls stop those harms before the customer is affected? Third, what logs and alerts prove the controls worked? Fourth, who has authority to pause the agent if the red-team findings or live telemetry show unacceptable risk? That is the business angle. A firm that can answer those questions can move faster through internal approval, because legal, data protection, security and operations are looking at the same evidence rather than arguing from separate checklists.

Sources: GOV.UK AI Cyber Security Code of Practice and ICO Tech Futures: Agentic AI.

The red-team scope must match the agent's authority

The most common mistake is to red-team the model while under-testing the deployment. Customer-facing risk rarely lives inside the model alone. It appears when the model is connected to a refund tool, CRM write access, a product knowledge base, a policy store, customer history, web browsing, email, payments, scheduling or a case management workflow. The question is not simply whether GPT, Claude, Gemini, Mistral, Llama or another model can be jailbroken. The question is what a successful jailbreak can do in your business.

Scope the red team around the agent's authority. A read-only FAQ agent needs tests for misinformation, unsafe advice, privacy leakage, source poisoning and escalation failure. A customer service agent with account access needs tests for identity confusion, cross-customer data exposure, excessive disclosure, unauthorised changes and social engineering. An agent that can issue refunds, change bookings, generate quotes or trigger complaints workflows needs tests for tool misuse, approval bypass, hidden instructions, fraud pathways and rollback. If the agent serves regulated customers, vulnerable customers or complaint scenarios, the scope should include those harms explicitly.

Good red-team scenarios are also operationally realistic. They use actual policy documents, sample tickets, transcripts, call summaries, web pages, customer records, tool permissions and escalation routes. They test indirect prompt injection through documents and messages the agent may retrieve. They test multi-turn manipulation, where the first prompt looks harmless and the harmful action appears later. They test boundary confusion, such as a customer asking for legal, medical, financial or HR advice that the business does not provide. They test persistence, memory and context carryover where those features exist.

Named tooling can help, but it does not replace judgement. Microsoft PyRIT, Garak, OWASP guidance, Lakera, HiddenLayer, Protect AI, Promptfoo, LangSmith, Langfuse, OpenTelemetry traces and vendor-native safety evaluation tools can all support testing. The business still has to define unacceptable outcomes. For a contact centre, that might include an unauthorised refund, a wrong eligibility statement, a missed vulnerable customer signal or a GDPR disclosure failure. For a software firm, it might be credential leakage or support guidance that weakens security. The evidence must connect test cases to those business harms.

Recent AI security evidence makes shallow testing risky

Recent AI Security Institute findings are a useful warning against shallow assurance. AISI's Frontier AI Trends Report says model safeguards are improving, and that there was a 40x difference in expert effort required to jailbreak two models released six months apart for certain malicious request categories. That is encouraging. The same report also says AISI managed to find vulnerabilities in every system it tested. For customer-facing agents, both parts matter. Better safeguards help, but they do not remove the need to test your deployment, your prompts, your retrieval sources and your tool design.

AISI's research also shows why evidence needs to be refreshed. Its report says success rates on self-replication evaluations rose from 5% to 60% between 2023 and 2025, while its cyber work notes rapid progress in autonomous cyber capability. In March 2026, AISI tested seven large language models on two custom cyber ranges. On one corporate network range, average steps completed at a 10 million token budget rose from 1.7 for GPT-4o in August 2024 to 9.8 for Opus 4.6 in February 2026, and the best single run completed 22 of 32 steps. Increasing inference-time compute from 10 million to 100 million tokens produced gains of up to 59%.

Those figures are not a prediction that a customer service agent will suddenly attack its own employer. They are evidence that AI capability and agent scaffolding are moving quickly enough to make one-off testing stale. A model update, a larger context window, a new memory feature, a new connector, a better planning scaffold or more generous token budget can change what the system can do. If the red-team report is six months old and the deployed agent has changed twice since then, the report is more historical than operational.

The practical control is versioned evidence. Tie each red-team result to model version, prompt version, retrieval corpus, tool permissions, guardrail version, evaluation dataset and deployment environment. Require a retest after material changes. Keep a regression suite of the failures that mattered most, then run it before launch and after updates. This is not bureaucracy. It is how the business avoids treating a fast-moving AI system as if it were static software. It also gives procurement and compliance a defensible answer when a supplier says its model has improved: improved for what, under which tests, and with which residual risks?

Sources: AISI Frontier AI Trends Report and AISI cyber range evaluation.

The counterargument: red teaming can become security theatre

There is a fair objection to all this: red teaming can become security theatre. A team runs a workshop, finds some colourful jailbreaks, writes a dramatic report and everyone feels more serious about AI risk. Two months later the product team ships a different prompt, adds a new connector and changes the escalation path. The red-team evidence is technically true but practically detached from the live system. That is not assurance. It is a snapshot without control.

The answer is to define what evidence must do before testing starts. Red-team findings should be triaged like product and security defects. Each finding needs severity, affected journey, exploit path, business harm, affected control, owner, mitigation, residual risk and retest result. A finding that can cause disclosure of another customer's information should not be treated like a funny jailbreak. A finding that causes the agent to promise a refund the business cannot honour should involve customer operations and legal, not only engineering. A finding that bypasses tool approval should block launch until fixed or formally accepted at the right level.

It is also important to test defences, not only attacks. Does retrieval filtering stop the wrong source from reaching the model? Does the agent ask for human approval when policy requires it? Does the UI label AI-generated content clearly enough? Does the escalation path preserve conversation history for the human team? Do monitoring alerts fire when the agent makes repeated refusals, attempts unusual tool calls or receives prompt-injection patterns? Can the business pause the agent without taking down the wider customer portal?

The commercial balance is proportionate testing. A low-risk public FAQ assistant does not need the same exercise as an agent that can issue refunds or advise on regulated products. But every customer-facing agent needs some adversarial evidence before launch. Keep it small for small risk, deep for high risk, and repeatable for anything that will scale. Red-team evidence should reduce friction by giving decision-makers a shared basis for approval, not create endless delay. When done well, it helps the business ship faster because the risk discussion becomes specific, testable and owned.

The evidence pack leaders should ask for before go-live

A credible go-live pack for a UK customer-facing AI agent should be concise, specific and usable. It should start with an agent summary: purpose, customer journeys, data categories, models, vendors, subprocessors, hosting, retrieval sources, tools, permissions, escalation routes, retention rules and accountable owners. That summary gives leaders the context to understand the red-team results. Without it, findings become abstract technical observations.

The second layer is the red-team evidence itself. Include the tested versions, test dates, test team, threat model, scenario catalogue, success criteria, pass and fail results, severity ratings and mitigations. The scenario catalogue should cover jailbreaks, unsafe advice, hallucinated policy, cross-customer disclosure, prompt injection, malicious uploads, source poisoning, social engineering, vulnerable customer signals, complaint handling, tool misuse, escalation failure, denial of service patterns and data retention mistakes. For high-risk deployments, include independent testing or at least a separate team from the builders.

The third layer is operational readiness. Show that monitoring and incident response work. The ICO's May 2026 cyber guidance says organisations should monitor suspicious activity, abnormal API usage and unexpected data transfers, maintain incident response plans and apply least privilege. The Bank of England, FCA and HM Treasury said on 15 May 2026 that regulated firms should plan for frontier AI cyber risk across governance, vulnerability management, third parties, access management, data protection, response and recovery. Even outside financial services, those are sensible headings for a customer-facing agent that can affect trust.

Finally, include a launch decision. List blocked issues, accepted risks, compensating controls, required monitoring, rollback plan, support team training, customer communications and review dates. Decide who can suspend the agent if telemetry or complaints show harm. For a business, this is where red-team evidence becomes valuable. It shortens debate, supports accountable launch, helps insurers and clients understand control maturity, and gives customer operations a playbook when something unexpected happens. The worst position is not finding problems. The worst position is finding them after customers, regulators or journalists have already done the testing for you.

Sources: ICO guidance on AI-powered cyber threats and Bank of England, FCA and HM Treasury statement.

Frequently Asked Questions

What is AI red-team evidence?

It is documented proof of how an AI system behaved under adversarial, unusual and high-risk test scenarios. For customer-facing agents, it should show test scope, attack paths, failures, mitigations, retests, residual risk and the launch decision.

Why does a customer-facing AI agent need red-team testing?

Because it interacts with real customers and may use business data, retrieval sources and tools. Testing helps find risks such as data leakage, unsafe advice, hallucinated policy, prompt injection, tool misuse and failed escalation before customers are affected.

Is vendor red-team evidence enough?

Usually not on its own. Vendor evidence is useful, but the business also needs testing for its own prompts, documents, permissions, workflows, customer journeys, escalation rules and operational controls.

Which UK guidance is relevant?

Useful sources include the GOV.UK AI Cyber Security Code of Practice, ICO guidance on agentic AI and AI-powered cyber threats, NCSC AI security guidance, Ofcom evidence on AI adoption, and sector guidance such as FCA and Bank of England operational resilience material.

What should a red-team scenario catalogue include?

It should include jailbreak attempts, indirect prompt injection, malicious uploads, source poisoning, privacy abuse, cross-customer disclosure, unsafe advice, fraud attempts, tool approval bypass, escalation failure and customer vulnerability signals.

How often should an AI agent be retested?

Retest before launch, after material model or prompt changes, after adding connectors or tools, after changing retrieval sources, after major policy updates and after serious incidents or near misses.

Can small businesses do this proportionately?

Yes. A small public FAQ agent can use a lighter test pack. An agent touching personal data, payments, complaints, regulated advice, HR or customer records needs deeper evidence and clearer launch gates.

What is the biggest red-team misconception?

The biggest misconception is that red teaming is only about clever jailbreak prompts. The real value is proving that business controls, logging, escalation, permissions and incident response work when the agent is under pressure.