The decay of AI agent context files

The findings

We measured how out-of-date AI agent context files (CLAUDE.md, .cursorrules, AGENTS.md) are relative to the codebases they describe. Four findings:

3x

drift rate jump between low- and high-velocity codebases. The busiest tier (≥500 commits in 90 days) drifts at 54% vs 18% for the slower tier — even though busy teams update the file more recently.

26%

of CLAUDE.md files in our random sample have drifted past what GitHub’s diff API can describe in a single response. By the time content is a quarter old, drift plateaus at 25–55%.

241k

active public repos use these files. April 2026 alone produced 36% of the entire 12-month corpus. Adoption is roughly 10x year-on-year and accelerating.

73%

of files contain zero security instructions. The average file devotes 23% of its content to architecture and 1.3% to security.

The full dataset, methodology, and source seeds are sent to your inbox once you sign up. Anyone can reproduce the study.

Two ways of measuring drift, one answer

We measured drift between AI agent context files and the codebases they describe two ways. The first, by codebase velocity: how busy is the underlying repo, and does the file keep pace? The second, by content age: how long ago was the file’s content last meaningfully edited, and how much has the codebase changed since? The two cuts ask different questions. They land in the same place.

Cut one: drift correlates with codebase velocity

Among actively-maintained CLAUDE.md repos, we grouped files by how many commits the underlying codebase shipped in the past 90 days. The pattern is clean: the busier the codebase, the more likely the file has drifted past what GitHub’s diff API can describe in a single response.

CLAUDE.md drift rate by codebase velocity

% of files in each velocity tier flagged architectural_lag_capped · n=232 active-maintained repos · CLAUDE.md cohort

Source: Memex CLAUDE.md Staleness Study, May 2026. Filtered to repos that passed the active-filter (≥1 commit in 30 days AND ≥50 in 90 days AND ≥3 contributors or ≥500 lifetime commits) and had a CLAUDE.md with measurable staleness. The 500+ tier is at GitHub’s pagination cap — actual values may be higher.

As codebase activity rises, the median time since the file was last edited actually falls — busy teams are updating their CLAUDE.md more recently. The drift rate triples anyway. The file is moving more often, but the codebase is moving so much more often that the file can’t keep up.

It is not that teams are neglecting these files. Teams in the high-velocity tier are editing their CLAUDE.md within 18 days, on average — three weeks fresher than the slower tier. They’re trying. The medium itself can’t keep pace with the codebases it’s describing. A static markdown file in a fast-moving repo is, at any given moment, more likely to misrepresent the codebase than to describe it.

For an AI agent reading these files, this means: the more important an accurate context file would be (because the codebase is moving fast and you really need to know what shipped this week), the less likely it is that the file is actually accurate. The signal-to-noise ratio inverts at exactly the wrong moment.

Cut two: drift accumulates with content age

The same pattern shows up when we slice the data a different way. Here we group CLAUDE.md files by how recently their content was last meaningfully edited, ignoring codebase velocity. The trajectory is the same: the older the content, the more drift accumulates. The rate plateaus at roughly 25–55% once content is more than a quarter old.

CLAUDE.md drift by content age

% of files in each cohort flagged architectural_lag_capped · n=338 across 9 cohorts

Source: Memex CLAUDE.md Staleness Study, May 2026. Cohorts grouped by the date of the commit that last touched the file. Apr 2026 cohort has the lowest drift simply because the files haven’t had time to drift yet — they were just edited.

Across the random sample, 26% of CLAUDE.md files and 20% of AGENTS.md files clear this drift threshold. Within the older cohorts, the rate sustains at 25–55%. By the time a file’s content is more than a quarter old, it’s a coin flip whether the file describes the code.

Static context files do not survive contact with active development.

Two slicings of the same data, one consistent answer. The medium isn’t keeping pace with the work it’s meant to describe — not because of teams’ effort, but because of the structure of the medium itself.

How widespread this is

We discovered every public GitHub repository with at least 50 stars and at least one push between May 2025 and April 2026 containing any of CLAUDE.md, .cursorrules, or AGENTS.md.

The result: 241,046 repos.

Month	New active repos with these files
2025-05	8,248
2025-06	8,490
2025-07	8,872
2025-08	9,174
2025-09	9,622
2025-10	10,805
2025-11	12,284
2025-12	14,117
2026-01	18,180
2026-02	21,333
2026-03	34,255
2026-04	85,673

April 2026 alone produced 36% of the entire 12-month corpus. The May-2025-to-April-2026 growth rate is just over 10x. The trajectory is still accelerating.

This isn’t a fringe pattern. It’s the dominant pattern of how engineering teams are configuring AI agents on public codebases in 2026 — and the same teams adopting these files at this velocity are, on average, accumulating drift between what the files say and what the code does.

Who’s in the sample

Every repo in the 20,000-repo enrichment sample meets four criteria, applied at the GitHub API level: at least 50 stars, not a fork, not archived, and at least one push between May 2025 and April 2026. Within those criteria, the sample profile is:

Stars	Median	P75	P90	P95	Max
All sampled repos	141	342	887	1,485	175,747

Contributors (where populated)	Median	P75	P90	P95
All sampled repos	7	24	57	95

Among repos that contained at least one of the three target files and had a populated contributor count, 93% had three or more contributors. The “side project” framing doesn’t fit; these are real teams.

Top primary languages: Python (21%), TypeScript (11%), JavaScript (8%), C++ (5%), Go (5%), Java (4%), Rust (4%), C (4%), C# (3%). Long tail across 80+ languages.

This is a study of active, popular, public open-source projects — not abandoned repos, not private codebases, not weekend projects. Read the findings against this population.

What’s actually in these files

For 100 files in the high-staleness cohort, we used Claude Sonnet to extract two structured outputs per file: the architectural, security, and product claims made in the file, and a summary of what had structurally changed in the repo since the file was last touched. Both outputs are published in the dataset for direct reader inspection. We didn’t classify any individual repo as “drifted” — we published what each file claims and what each codebase did, and let readers compare.

Separately, we ran a line-by-line classification across all 467 files in the cohort, sorting each line into seven categories.

The aggregate distribution, averaged across all 467 files:

Category	Mean %	What it covers
Architecture	23.1%	Module boundaries, design patterns, framework choices
Workflow	24.6%	Build commands, test instructions, dev setup
Style	6.4%	Formatting, naming, code style
Meta	12.2%	Instructions to the agent (“be concise”, “always cite sources”)
Security	1.3%	Auth, secret handling, what not to log
Product	3.6%	Domain logic, business rules
Unclassifiable	28.3%	Headings, separators, generic intros

The original hypothesis was that these files would be dominated by formatting rules and miss everything important. The data tells a sharper story.

Architecture is well-represented

A quarter of the average file’s content is about how the codebase is structured. That’s actually a reasonable amount.

Security is essentially absent

1.3% of the average file. 342 of 467 classified files (73.2%) contain zero security instructions of any kind. Three out of four AI agent context files give the agent no guidance on what not to log, what to authenticate, what to validate, or what to keep out of source control. The files exist; they just don’t cover the half of the problem most likely to surface in a production incident.

Product behaviour is barely there

3.6% of the average file. 254 of 467 files (54.4%) contain zero product or domain instructions. The things that make a product unique — the rules of the business — are mostly not in these files.

What is there, in the gap? Workflow commands. A quarter of the average file is build instructions and test commands. The files are doing the equivalent of telling the agent how to ride the bike before telling it where the road is.

A specific example

To make this concrete, here is one of the validation extractions from the dataset.

github.com/cyberia-to/cyb · CLAUDE.md · last edited 6 months ago

What the file claims

Runtime is Deno 2 (used for task runner, build, dev server, lint)
Bundler is Rspack 1.7 (Rust-based)
Package manager is Deno (using deno install for node_modules)
Iframe sandbox attribute mandatory for any IPFS/gateway content
CSP defined in src/index.html
DOMPurify sanitises Rune script content_result
Secrets stored in localStorage (unencrypted) — key secrets

What’s shipped since the file was last touched

New dependencies: vendored copa, nushell, rio-backend, sugarloaf for terminal rendering
New top-level directories added (6)
9 modified API routes
Mobile UI rebuilt with responsive overhaul across all routes
IPFS reliability fixes — timeout handling rewritten
Security hardening pass: changes that touch the same surface the file describes

The file describes a build system, a runtime, and a security model. The codebase has since shipped a security hardening pass, a mobile rebuild, and substantial reliability changes. An agent reading this file is being told about a state of the system the maintainers have already moved past — and crucially, the security guidance the file does include (which is more than most: 4 out of 44 lines) doesn’t reflect the post-hardening state.

This isn’t unusual. We could have picked any of the 100 validation extractions and shown a similar gap. The full set is published in the dataset.

What this study is not

This study is not a critique of the maintainers. The teams building these projects are doing the right thing in writing the files at all. Every CLAUDE.md, .cursorrules, and AGENTS.md represents a maintainer who took the time to think about what an AI agent should know. The structural problem is not their effort; it’s that the medium they’re working in (a static markdown file in a repo) doesn’t have the properties needed to keep pace with how fast their teams are shipping.

This study is also not making a claim about whether any specific agent is or isn’t drifting in any specific session. We measured the gap between documented architecture and current code. We did not measure agent behaviour against either. The behavioural question — what happens when an agent acts on stale context — is the next study, and it requires different methodology.