AI supplier SLAs need more than uptime promises

ROI & Cost Optimisation

1 May 2026 | By Ashley Marshall

Quick Answer: AI supplier SLAs need more than uptime promises

AI supplier SLAs should still include uptime, but they also need measurable accuracy standards, named escalation paths, audit evidence, model change controls, and usage based cost guardrails. Without those controls, the buyer carries the operational, regulatory, and financial risk even when the supplier has technically met its availability target.

AI can be available, fast, and still wrong. Supplier SLAs now need to prove accuracy, escalation, and cost control before the first incident happens.

The old uptime promise is too narrow for AI suppliers

Most supplier SLAs were designed for services that either work or do not work. A cloud database is reachable, an email platform is sending, a helpdesk system is responding within a defined number of seconds. AI changes that pattern. A model can be fully available and still give the wrong answer, cite a non-existent policy, leak sensitive context into the wrong workflow, or burn through a monthly budget in a week. That is why a 99.9 percent uptime clause is no longer enough when the supplier is providing a model, an AI agent, a retrieval layer, a workflow automation platform, or a managed AI service.

The shift is not theoretical. The FCA says it wants safe and responsible AI adoption in UK financial markets and, importantly, that it does not plan to introduce a separate AI rulebook. Instead, it expects firms to use existing outcome-focused frameworks. That makes procurement discipline more important, not less. If the regulator expects the buyer to remain accountable for outcomes, the supplier contract has to measure more than whether the service endpoint was online. See the FCA position here: AI and the FCA: our approach.

What this means in practice is simple. AI SLAs should include evidence thresholds for factual accuracy, groundedness, refusal behaviour, latency, incident escalation, audit access, model change notification, cost ceilings, and human override. A supplier may resist this and argue that model behaviour is probabilistic. That is true, but it is not a reason to avoid measurement. It is a reason to define measurement properly. For example, a customer support assistant can be tested against a golden set of policy questions, edge cases, and known failure modes. A finance analysis assistant can be tested against reconciled source data. An AI agent can be measured on task completion, unsafe action prevention, and handoff quality. Uptime remains necessary. It is just no longer the main control.

Accuracy needs its own service level objective

Accuracy is not a vague aspiration. It should be written into the service level model as a measurable service level objective. The current research direction supports this. A 2025 AgentSLA paper argues that AI agents require quality models and SLA definitions beyond traditional performance and availability, because there is not yet a clear consensus on how to define quality for autonomous AI components. The paper frames the issue well: service consumers still need minimum quality expectations, but AI agents introduce new quality characteristics that have to be made explicit. Source: AgentSLA: Towards a Service Level Agreement for AI Agents.

For a business buyer, this turns into a practical procurement question: accurate against what? A model should not be measured against a generic benchmark if the business risk sits inside your contracts, policies, product catalogue, claims process, pricing rules, or customer commitments. The SLA should define the test corpus, the expected answer standard, the tolerance for uncertainty, and the rule for escalation when the confidence level is too low. It should also separate different failure types. A hallucinated citation is different from a calculation error. A missed compliance warning is different from a slightly clumsy summary. Treating them all as one accuracy percentage hides the failures that matter most.

Good AI supplier SLAs also need monitoring rights. The buyer should be able to review sampled outputs, failure logs, prompt injection attempts, retrieval misses, and human override decisions. This is especially important where the supplier controls the model, orchestration layer, or managed prompt stack. Without audit evidence, the buyer is effectively accepting a black box promise. That might be fine for a low-risk productivity tool. It is not enough for customer-facing advice, regulated workflows, HR screening, finance operations, claims handling, legal triage, cyber monitoring, or anything that could create customer harm. The service level should therefore define accuracy evidence, sampling frequency, reporting format, and remediation times.

Escalation is the control that stops small AI errors becoming incidents

Escalation clauses matter because AI failures often begin quietly. A traditional outage is visible quickly: dashboards turn red, users complain, transactions fail. AI degradation can be subtler. A retrieval system starts missing a newly updated policy. A model update changes refusal behaviour. An agent takes longer paths through tools and creates duplicated work. A customer service assistant gives technically fluent but wrong advice. If the contract only says the platform will be available, the supplier can meet the SLA while the buyer absorbs the operational risk.

The FCA and Bank of England direction on third party reporting reinforces this point. In PS26/2, the FCA says many reported incidents originate at third parties and that firms are increasingly reliant on the services those third parties provide. The new single FCA, PRA and Bank of England regimes for operational incident and third party reporting apply from 18 March 2027. The FCA also says that third parties are supplying services through transformative technologies like AI, and that regulators need detailed, accurate and consistently structured data to supervise operational resilience. Source: PS26/2: Operational incident and third party reporting.

What this means in practice is that AI supplier SLAs need named escalation paths, not just a support inbox. Define severity levels for harmful output, repeated low confidence answers, data exposure, unsafe tool use, uncontrolled spend, model drift, and service outage. Define who is contacted, how quickly, through which channel, and what evidence must be provided. A serious AI incident may need the supplier's engineering lead, security contact, model operations owner, and account manager involved at once. It may also need a freeze on model changes, a rollback to a previous prompt or model version, temporary routing to a human queue, or suspension of an automated action. The SLA should make those rights explicit before the first incident, not during it.

Cost controls belong in the SLA, not only in the finance report

AI cost risk is different from software subscription risk. A standard SaaS bill is usually predictable. AI usage can scale with tokens, images, tool calls, retrieval volume, hosted compute, vector storage, evaluation runs, or agent loops. A successful adoption push can increase spend quickly. A poorly designed workflow can do the same with no business benefit at all. If the SLA does not define cost controls, the organisation may discover the problem only when the monthly invoice arrives.

This is where procurement and finance need to work together. The SLA should include usage caps, budget alerts, per-workflow cost allocation, rate limits, approval thresholds for new models, and a clear process for emergency throttling. It should also define what happens when the supplier changes model pricing, deprecates a model, changes context window limits, or moves workloads to a different hosting pattern. In a multi-vendor environment using OpenAI, Microsoft Azure AI Foundry, Google Vertex AI, Anthropic, AWS Bedrock, Salesforce Einstein, ServiceNow, or specialist vertical AI tools, these details decide whether AI remains an investment or becomes an uncontrolled operating cost.

There is also a governance angle. IBM's 2025 Cost of a Data Breach report found that 63 percent of organisations lacked AI governance policies to manage AI or prevent the proliferation of shadow AI, and that the average global cost of a data breach was USD 4.4 million. Source: IBM Cost of a Data Breach Report 2025. Those figures are not an SLA template, but they underline the same point: unmanaged AI is not just a technical issue. It creates cost, security, and accountability exposure. For supplier SLAs, the lesson is to make spend visible at the point of operation. A weekly finance report is too late if an agent has already looped through thousands of tool calls or a new use case has shifted from pilot volume to production volume without approval.

Security and resilience clauses must cover the AI life cycle

AI supplier risk is not limited to uptime or data hosting. The AI life cycle includes design, training or tuning, retrieval configuration, prompt management, model deployment, monitoring, maintenance, and retirement. Each stage can introduce supplier dependencies. A vendor may use a foundation model from another provider, a vector database from a cloud platform, a labelling partner, a moderation API, an analytics layer, or a managed orchestration service. The buyer's risk register needs to understand that chain, and the SLA needs to make the supplier accountable for the parts it controls.

The NCSC and DSIT have been clear that AI brings novel security vulnerabilities such as prompt injection and data poisoning alongside standard cyber risks. In May 2025, the NCSC highlighted an ETSI specification setting baseline cyber security requirements for AI models and systems. It described the specification as the first global standard setting minimum security requirements across the AI life cycle, with 13 core security principles grouped into secure design, secure development, secure deployment, secure maintenance, and secure end of life. Source: NCSC on the new ETSI AI security standard.

For an SLA, this means the buyer should ask for evidence of secure development practices, model and prompt change control, vulnerability disclosure, incident notification, access control, data retention rules, logging, and secure deletion at the end of the contract. It also means testing the supplier's resilience claims. The FCA's operational resilience observations one year after the 31 March 2025 transition period noted that firms were testing third party providers and supply chain vulnerabilities more rigorously, including jointly with third parties in some cases. Source: FCA operational resilience observations. AI suppliers should expect the same pressure. The buyer should not accept a glossy trust page as a substitute for operational evidence.

The counterargument is right, but incomplete

The common pushback is that AI is probabilistic, so contractual accuracy promises are unrealistic. There is a fair point here. No serious supplier should promise perfect answers from a general purpose language model. Some use cases are subjective. Some prompts are ambiguous. Some source data is incomplete. Some outputs require judgement rather than deterministic calculation. A crude contractual clause that says the model must be 99 percent accurate can create false comfort and endless dispute.

But that is not an argument against AI SLAs. It is an argument for better ones. Accuracy should be scoped by use case, risk level, source material, evaluation method, and required fallback. A low-risk drafting assistant might need user satisfaction, data protection, and availability measures. A regulated advice assistant needs evidence groundedness, citation accuracy, refusal behaviour, escalation thresholds, and audit logs. An autonomous finance operations agent needs authority limits, reconciliation checks, exception queues, and spend controls. A coding assistant needs security scanning, license controls, and human review. Different risks need different objectives.

The practical answer is to build a tiered AI SLA schedule. Tier one covers platform basics: availability, latency, support response, data residency, and security certifications. Tier two covers AI behaviour: accuracy tests, hallucination controls, bias testing where relevant, grounding, explainability, and model change notification. Tier three covers operational controls: escalation, rollback, human handoff, cost alerts, usage caps, audit rights, and incident reporting. Tier four covers continuous improvement: monthly failure review, evaluation set updates, supplier roadmap disclosure, and remediation commitments. This is more work than signing a standard SaaS order form, but it is proportionate to the risk AI now carries.

For leadership teams, the buying question has changed. Do not ask only, can this supplier keep the service online? Ask, can this supplier prove the system is accurate enough for our use case, escalate failures quickly, keep costs under control, and support our accountability when something goes wrong? If the answer is not clear in the SLA, the risk has not disappeared. It has simply been transferred back to you.

Frequently Asked Questions

Is uptime still important in an AI supplier SLA?

Yes. Availability and latency still matter, especially for production workflows. The point is that uptime is now only one layer. AI SLAs also need behavioural, operational, and commercial controls.

How should accuracy be measured for an AI supplier?

Measure accuracy against the buyer's own use case, source material, and risk profile. Use golden test sets, sampled production outputs, citation checks, and separate categories for hallucination, calculation error, policy error, and unsafe action.

Can a supplier realistically guarantee AI accuracy?

Not as a blanket promise across every possible prompt. A sensible SLA scopes accuracy to defined workflows, evidence sets, confidence thresholds, and required human escalation when the system is uncertain.

What escalation terms should be included?

Include severity levels, named supplier contacts, response times, evidence requirements, rollback rights, human handoff rules, customer notification support, and temporary suspension rights for unsafe automated actions.

Why do cost controls belong in the SLA?

AI spend often scales with usage, tokens, tool calls, retrieval, hosted compute, or agent loops. SLA level cost controls make budget alerts, caps, throttling, and pricing change notification enforceable rather than informal.

Which UK regulatory expectations are relevant?

For regulated firms, FCA, PRA, and Bank of England operational resilience and third party reporting expectations are highly relevant. More broadly, NCSC and DSIT AI security guidance is useful for procurement and supplier assurance.

What evidence should buyers ask suppliers to provide?

Ask for evaluation results, monitoring logs, model and prompt change records, incident reports, security controls, access controls, data retention details, sub-processor lists, and proof of remediation after failures.

Should smaller businesses use the same SLA approach?

Yes, but proportionately. A small business does not need a bank grade assurance pack for every tool, but it should still define accuracy expectations, support escalation, data controls, and spending limits for any important AI workflow.