AI Evidence Retention Schedules for UK Firms

AI Trust & Governance

18 June 2026 | By Ashley Marshall

Quick Answer: AI Evidence Retention Schedules for UK Firms

UK firms should treat prompts, model outputs, retrieval traces, tool calls and human reviews as AI evidence records when they help explain or defend a business process. Retention should be purpose based, documented, minimised and reviewed, not set as one blanket logging period.

The audit trail for AI is being created before most firms have decided what it is. Prompts, logs and retrieved documents need retention rules before they become a liability.

Your AI log is already a business record

Most UK firms are now creating a new class of evidence without naming it properly. Every customer prompt, system prompt, retrieved document, tool call, model response, human override and incident note can become part of the story of how an AI assisted decision was made. Treating that material as temporary technical exhaust is convenient, but it is also risky. The better framing is simpler: if the artefact helps prove what the system did, why it did it, what data it used, who approved it, or how a complaint should be investigated, it belongs in an AI evidence retention schedule.

The UK GDPR does not give firms a universal number of days for AI logs. The ICO says organisations must not keep personal data for longer than they need it, must justify retention by purpose, and should use retention policies or schedules wherever possible. That matters because AI prompts and retrieval traces often contain personal data, confidential business information, special category hints, customer complaints, staff names or commercially sensitive documents. A prompt history in Microsoft Copilot, ChatGPT Enterprise, Azure OpenAI, Gemini, Claude, ServiceNow, Salesforce Einstein or a custom RAG tool can quickly become a mixed record containing operational evidence and data protection risk.

What this means in practice is that the question is not, "Should we keep everything or delete everything?" It is, "Which AI artefacts are needed for security, audit, complaints, quality assurance, regulatory assurance and model improvement, and how long is each category justified?" A short lived debug trace might only need days. A customer affecting decision trace may need to align with complaint, contract or regulated record periods. A retrieved document copy may not need to be retained if the source document is already controlled in SharePoint, iManage, NetDocuments, Confluence, Google Drive or a records management platform. The schedule should separate these categories explicitly rather than hiding them in a generic IT logging policy.

The starting point is the ICO storage limitation guidance, which says retention periods should be documented where possible and that old personal data should be erased or anonymised when no longer needed. See the ICO guidance on storage limitation and, for broader AI programmes, its guidance on AI security and data minimisation. Firms that have already built acceptable use policies should extend the same discipline into evidence design. A useful companion is our note on AI input provenance logs for automated decisions, because provenance and retention are two halves of the same control.

The schedule needs more than one clock

A common mistake is to set one AI logging period and apply it everywhere. That feels tidy, but it rarely survives contact with real operations. Prompts, logs and retrieved documents serve different purposes. They also carry different privacy, confidentiality and evidential risks. A mature schedule should therefore have several clocks running at once: transient observability logs, security investigation logs, user activity records, evaluation evidence, customer decision evidence, incident evidence, model change evidence and source document records.

For example, raw prompt and completion logs used to debug a new internal assistant may only need a short retention period, especially if they contain free text entered by staff. Aggregated evaluation metrics can often be retained longer because they are less identifying and help show the system is improving. Security logs may need longer if they are used to detect prompt injection, credential misuse, unusual retrieval patterns or repeated probing. Customer complaint evidence may need to be retained for the life of the complaint and any related limitation period. Retrieved documents should be treated carefully: if the AI system stores the full document chunk, you may have duplicated the retention obligation. If it stores only a source identifier, version ID and retrieval score, the evidential value may be preserved without keeping a second copy of the underlying content.

NCSC guidance makes the security case clearly. Its machine learning security principles say organisations should be able to audit system use, inputs and outputs, and have appropriate log data to investigate a compromise even if it is not identified immediately. It also links repeated querying and unusual input behaviour to attacks such as prompt injection and model evasion. See NCSC guidance on monitoring and logging user activity and its broader guidance on secure AI operation and maintenance.

What this means in practice is that a retention schedule should be designed with your security, legal, compliance, records, product and operational teams in the room. The policy owner may be data protection or records management, but the retention purpose comes from the workflow. A helpdesk agent copilot, a credit decision assistant, a legal research tool, a board paper summariser and a production code agent do not need the same evidence package. Each use case should state the artefacts retained, the purpose, the retention trigger, the period, the deletion method, the owner, the access controls and the exception process for legal hold or incident preservation.

UK rules reward purpose, not hoarding

The strongest argument for retaining AI evidence is accountability. The strongest argument against retaining it is also accountability. UK firms have to prove they used AI responsibly, but they also have to prove they did not keep personal data, confidential records or sensitive prompt histories for vague future convenience. That is the tension at the centre of AI evidence retention. It cannot be resolved by saying "logs are useful". It has to be resolved by matching each evidence type to a lawful, documented purpose.

The ICO is explicit that UK GDPR does not set fixed retention periods for different types of personal data. The organisation has to justify the period based on purpose, document standard retention periods where possible, review data, and erase or anonymise personal data when it is no longer needed. It also warns that holding more personal data than needed creates storage, security and subject access request burdens. For AI systems, this is not theoretical. Prompt logs can contain more sensitive material than the business intended to collect because users paste emails, contracts, case notes, screenshots, customer identifiers, HR details or medical information into tools that were never designed as records repositories.

Public sector records guidance points in the same direction. The section 46 Code of Practice under the Freedom of Information Act says public authorities should know what information they hold, why they hold it, how sensitive it is and how it should be managed. It also says information can become a liability if not properly managed, and that records provide evidence of activities. Although many private firms are outside the Code, the principles are useful for any organisation that expects to defend an AI assisted process to a regulator, auditor, client, insurer or court. See the statutory Code of Practice on records management and GOV.UK guidance on using AI to manage the digital heap.

The practical implication is that AI retention schedules should use purpose labels that a non-technical reviewer can understand. "LLM trace retained for 30 days" is weak. "Prompt, response and retrieval metadata retained for 90 days to investigate misuse, security events and quality defects" is stronger. "Decision evidence retained for six years because it forms part of a regulated customer file" may be stronger still, if the business has mapped the relevant obligation. The point is not to invent legal certainty where none exists. The point is to make the decision traceable, reviewable and proportionate.

Financial services shows why evidence design is urgent

Financial services is a useful stress test because adoption is high, the consequences are tangible and regulators already expect evidence. The Bank of England and FCA 2024 survey found that 75 percent of responding firms were already using AI, with another 10 percent planning to use it over the next three years. Foundation models accounted for 17 percent of all AI use cases. Respondents also expected the median number of AI use cases to rise from 9 to 21 over three years. That is not a lab experiment. It is operational infrastructure spreading through firms that already have conduct, resilience, outsourcing, data protection and senior manager accountability obligations.

The same survey found that four of the top five current AI risks identified by respondents were data related, including data privacy and protection, data quality and data security. The FCA has also said it does not plan to introduce a separate AI rulebook, and will rely on existing frameworks, while taking an evidence based view of benefits and risks. See the Bank of England and FCA report on AI in UK financial services and the FCA page on its approach to AI. In January 2026, the House of Commons Treasury Committee went further, recommending that the FCA publish practical guidance by the end of 2026 on consumer protection rules, accountability and assurance expected from senior managers for AI harms.

This is where evidence retention becomes a board issue rather than a logging decision. If an AI assisted workflow affects affordability assessment, fraud investigation, complaints triage, vulnerable customer handling, suitability review or claims processing, the firm will need more than screenshots and good intentions. It will need to reconstruct which model version was used, which policy or knowledge article was retrieved, whether the retrieved source was current, what the system prompt required, what the user asked, what the model returned, what the human accepted or rejected, and which control failed if harm occurred.

The leading misconception is that retaining less is always safer because it reduces breach impact. Sometimes it does. But over deletion can also destroy the evidence needed to investigate a customer complaint, detect systemic bias, prove a human reviewed a recommendation, or show a regulator that the firm had control. The answer is evidence minimisation, not evidence avoidance. Keep the minimum artefacts needed to prove the control worked, protect them properly, and delete or anonymise the rest on schedule.

Retrieved documents need their own retention logic

Prompts and model outputs get most of the attention, but retrieved documents are often the bigger governance problem. A retrieval augmented generation system can surface HR policies, customer files, support tickets, board papers, contract clauses, product specifications, medical notes, source code, risk registers or regulatory manuals. The user may see only a neat answer, but the system may have stored chunks, embeddings, snippets, citations, document IDs, access control context and reranking data. Each of those artefacts has a different evidential value and a different retention risk.

The first design question is whether the AI platform needs to keep a copy of retrieved content at all. In many cases, the better evidence record is metadata: source system, document ID, version, timestamp, chunk ID, retrieval score, access decision and user identity. That can prove which material influenced the answer without creating a shadow archive of sensitive documents. If the answer itself includes extracted text, then the output may still contain personal or confidential data and must be scheduled accordingly. If the retrieval layer writes full chunks into logs for debugging, those logs should normally have a shorter, tightly controlled retention period and masking rules.

GOV.UK guidance on managing the digital heap is useful here because it reminds organisations that unstructured data includes documents, posts, chat messages, emails, images and recordings, and that repositories often contain some material that should be retained and some that should be deleted. HMRC records policy is also a practical example of lifecycle thinking: records managed by third parties are in scope, owners should be designated, and digital records need metadata that documents authority, status, structure and integrity. See the GOV.UK material on content lifecycle management and HMRC retention and disposal policy.

What this means in practice is that a RAG evidence schedule should distinguish source records from retrieval traces. The source record remains governed by its original system of record. The retrieval trace proves how the AI used it. The generated answer becomes a new record if it is relied on in a business process. That distinction helps firms avoid two bad outcomes: deleting all traces and being unable to explain an answer, or retaining every chunk forever and creating an uncontrolled evidence lake.

A workable schedule starts with seven fields

The schedule does not need to start as a 60 page policy. It can start as a control table that the business can actually maintain. The first field is evidence category: prompt, completion, system prompt, retrieval metadata, retrieved content copy, tool call, human review, model configuration, evaluation result, incident record or complaint record. The second is use case: customer support copilot, adviser assistant, HR policy bot, legal research assistant, coding agent, finance reconciliation workflow or board pack summariser. The third is purpose: security monitoring, audit trail, quality improvement, complaint handling, legal defence, regulated record, incident response or model evaluation.

The fourth field is retention period and trigger. Be specific: 30 days from creation, 90 days from session close, one year from model retirement, six years from customer relationship end, or until legal hold release. The fifth is content minimisation: store full prompt, redacted prompt, hash, metadata only, sampled output, source ID only or aggregated metric. The sixth is access and location: who can view it, whether it sits in Datadog, Splunk, Azure Monitor, Microsoft Purview, Google Cloud Logging, AWS CloudTrail, OpenTelemetry, LangSmith, Langfuse, Arize Phoenix, Humanloop, Giskard, Galileo or an internal evidence store. The seventh is disposal: automatic deletion, anonymisation, backup expiry, vendor deletion certification or records review.

The counterargument is that this slows teams down. Product teams want observability, engineers want traces, compliance wants assurance and support teams want replay. All fair. But a schedule actually makes deployment faster once the pattern exists. It gives procurement something to ask vendors. It gives engineering default logging profiles. It gives DPOs a basis for DPIAs. It gives security a reason to retain the traces that matter. It gives records teams a bridge between traditional schedules and AI artefacts. It also reduces the number of awkward late questions after a pilot has already filled a vendor console with customer data.

A good first implementation is to choose three live AI workflows and classify every artefact they create. Then write default retention profiles: low risk internal assistant, confidential internal assistant, customer affecting assistant and regulated decision support. Review those profiles against the ICO, NCSC and sector rules, then put them into vendor configuration and operational runbooks. If you need a broader governance wrapper, connect the schedule to your AI register, DPIA, vendor risk review and incident response playbook. Our guide to AI evidence logs as a compliance layer covers the adjacent evidence pack that boards usually need.

Frequently Asked Questions

Do UK firms have to keep every AI prompt?

No. UK GDPR does not require blanket retention of every prompt. Firms should keep prompts only where there is a clear purpose, such as security investigation, complaint handling, audit evidence or quality assurance, and should minimise, redact or delete when that purpose ends.

How long should AI logs be retained?

There is no universal UK retention period for AI logs. A practical schedule may keep short lived debug logs for days or weeks, security logs for longer, and customer affecting decision evidence in line with complaint, contract or regulated record periods.

Are prompts personal data under UK GDPR?

They can be. A prompt is personal data if it identifies or can reasonably identify an individual, either directly or when combined with other information. Staff often paste customer, employee or case details into prompts, so firms should assume some prompt logs may contain personal data.

Should retrieved documents be copied into AI evidence logs?

Usually not by default. In many RAG systems, retaining source document ID, version, timestamp, chunk reference and retrieval score gives enough evidence without creating a second copy of sensitive content. Full content copies should have a specific justification and shorter access controlled retention.

What should be in an AI evidence retention schedule?

At minimum, include evidence category, use case, retention purpose, retention trigger and period, minimisation approach, owner, storage location, access controls, deletion method, legal hold process and vendor deletion requirements.

How does this affect Microsoft Copilot or ChatGPT Enterprise rollouts?

The firm should understand where prompts, responses, file references and audit logs are stored, which admin controls exist, whether logs can be exported or deleted, and how vendor settings map to the organisation's own retention schedule and DPIA.

Is deleting AI logs quickly always safer?

Not always. Short retention reduces breach and subject access risk, but deleting too quickly can leave the firm unable to investigate misuse, complaints, prompt injection, bias, fraud or system defects. The safer approach is evidence minimisation with justified retention.

Who should own AI evidence retention?

Ownership is usually shared. Records management or data protection may own the schedule, but security, legal, compliance, engineering, product owners and business process owners should define the evidence purpose and operational controls for each use case.