A study of 20,000 active public GitHub repositories. We measured drift between AI agent context files and the codebases they describe two ways. The answer is the same either way: static markdown can’t keep pace with active development.
We measured how out-of-date AI agent context files (CLAUDE.md, .cursorrules, AGENTS.md) are relative to the codebases they describe. Four findings:
The full dataset, methodology, and source seeds are sent to your inbox once you sign up. Anyone can reproduce the study.
We measured drift between AI agent context files and the codebases they describe two ways. The first, by codebase velocity: how busy is the underlying repo, and does the file keep pace? The second, by content age: how long ago was the file’s content last meaningfully edited, and how much has the codebase changed since? The two cuts ask different questions. They land in the same place.
Among actively-maintained CLAUDE.md repos, we grouped files by how many commits the underlying codebase shipped in the past 90 days. The pattern is clean: the busier the codebase, the more likely the file has drifted past what GitHub’s diff API can describe in a single response.
architectural_lag_capped · n=232 active-maintained repos · CLAUDE.md cohortSource: Memex CLAUDE.md Staleness Study, May 2026. Filtered to repos that passed the active-filter (≥1 commit in 30 days AND ≥50 in 90 days AND ≥3 contributors or ≥500 lifetime commits) and had a CLAUDE.md with measurable staleness. The 500+ tier is at GitHub’s pagination cap — actual values may be higher.
As codebase activity rises, the median time since the file was last edited actually falls — busy teams are updating their CLAUDE.md more recently. The drift rate triples anyway. The file is moving more often, but the codebase is moving so much more often that the file can’t keep up.
It is not that teams are neglecting these files. Teams in the high-velocity tier are editing their CLAUDE.md within 18 days, on average — three weeks fresher than the slower tier. They’re trying. The medium itself can’t keep pace with the codebases it’s describing. A static markdown file in a fast-moving repo is, at any given moment, more likely to misrepresent the codebase than to describe it.
For an AI agent reading these files, this means: the more important an accurate context file would be (because the codebase is moving fast and you really need to know what shipped this week), the less likely it is that the file is actually accurate. The signal-to-noise ratio inverts at exactly the wrong moment.
The same pattern shows up when we slice the data a different way. Here we group CLAUDE.md files by how recently their content was last meaningfully edited, ignoring codebase velocity. The trajectory is the same: the older the content, the more drift accumulates. The rate plateaus at roughly 25–55% once content is more than a quarter old.
architectural_lag_capped · n=338 across 9 cohortsSource: Memex CLAUDE.md Staleness Study, May 2026. Cohorts grouped by the date of the commit that last touched the file. Apr 2026 cohort has the lowest drift simply because the files haven’t had time to drift yet — they were just edited.
Across the random sample, 26% of CLAUDE.md files and 20% of AGENTS.md files clear this drift threshold. Within the older cohorts, the rate sustains at 25–55%. By the time a file’s content is more than a quarter old, it’s a coin flip whether the file describes the code.
Static context files do not survive contact with active development.
Two slicings of the same data, one consistent answer. The medium isn’t keeping pace with the work it’s meant to describe — not because of teams’ effort, but because of the structure of the medium itself.
We discovered every public GitHub repository with at least 50 stars and at least one push between May 2025 and April 2026 containing any of CLAUDE.md, .cursorrules, or AGENTS.md.
The result: 241,046 repos.
| Month | New active repos with these files |
|---|---|
| 2025-05 | 8,248 |
| 2025-06 | 8,490 |
| 2025-07 | 8,872 |
| 2025-08 | 9,174 |
| 2025-09 | 9,622 |
| 2025-10 | 10,805 |
| 2025-11 | 12,284 |
| 2025-12 | 14,117 |
| 2026-01 | 18,180 |
| 2026-02 | 21,333 |
| 2026-03 | 34,255 |
| 2026-04 | 85,673 |
April 2026 alone produced 36% of the entire 12-month corpus. The May-2025-to-April-2026 growth rate is just over 10x. The trajectory is still accelerating.
This isn’t a fringe pattern. It’s the dominant pattern of how engineering teams are configuring AI agents on public codebases in 2026 — and the same teams adopting these files at this velocity are, on average, accumulating drift between what the files say and what the code does.
Every repo in the 20,000-repo enrichment sample meets four criteria, applied at the GitHub API level: at least 50 stars, not a fork, not archived, and at least one push between May 2025 and April 2026. Within those criteria, the sample profile is:
| Stars | Median | P75 | P90 | P95 | Max |
|---|---|---|---|---|---|
| All sampled repos | 141 | 342 | 887 | 1,485 | 175,747 |
| Contributors (where populated) | Median | P75 | P90 | P95 |
|---|---|---|---|---|
| All sampled repos | 7 | 24 | 57 | 95 |
Among repos that contained at least one of the three target files and had a populated contributor count, 93% had three or more contributors. The “side project” framing doesn’t fit; these are real teams.
Top primary languages: Python (21%), TypeScript (11%), JavaScript (8%), C++ (5%), Go (5%), Java (4%), Rust (4%), C (4%), C# (3%). Long tail across 80+ languages.
This is a study of active, popular, public open-source projects — not abandoned repos, not private codebases, not weekend projects. Read the findings against this population.
For 100 files in the high-staleness cohort, we used Claude Sonnet to extract two structured outputs per file: the architectural, security, and product claims made in the file, and a summary of what had structurally changed in the repo since the file was last touched. Both outputs are published in the dataset for direct reader inspection. We didn’t classify any individual repo as “drifted” — we published what each file claims and what each codebase did, and let readers compare.
Separately, we ran a line-by-line classification across all 467 files in the cohort, sorting each line into seven categories.
The aggregate distribution, averaged across all 467 files:
| Category | Mean % | What it covers |
|---|---|---|
| Architecture | 23.1% | Module boundaries, design patterns, framework choices |
| Workflow | 24.6% | Build commands, test instructions, dev setup |
| Style | 6.4% | Formatting, naming, code style |
| Meta | 12.2% | Instructions to the agent (“be concise”, “always cite sources”) |
| Security | 1.3% | Auth, secret handling, what not to log |
| Product | 3.6% | Domain logic, business rules |
| Unclassifiable | 28.3% | Headings, separators, generic intros |
The original hypothesis was that these files would be dominated by formatting rules and miss everything important. The data tells a sharper story.
A quarter of the average file’s content is about how the codebase is structured. That’s actually a reasonable amount.
1.3% of the average file. 342 of 467 classified files (73.2%) contain zero security instructions of any kind. Three out of four AI agent context files give the agent no guidance on what not to log, what to authenticate, what to validate, or what to keep out of source control. The files exist; they just don’t cover the half of the problem most likely to surface in a production incident.
3.6% of the average file. 254 of 467 files (54.4%) contain zero product or domain instructions. The things that make a product unique — the rules of the business — are mostly not in these files.
What is there, in the gap? Workflow commands. A quarter of the average file is build instructions and test commands. The files are doing the equivalent of telling the agent how to ride the bike before telling it where the road is.
To make this concrete, here is one of the validation extractions from the dataset.
deno install for node_modules)src/index.htmlcontent_resultsecretscopa, nushell, rio-backend, sugarloaf for terminal renderingThe file describes a build system, a runtime, and a security model. The codebase has since shipped a security hardening pass, a mobile rebuild, and substantial reliability changes. An agent reading this file is being told about a state of the system the maintainers have already moved past — and crucially, the security guidance the file does include (which is more than most: 4 out of 44 lines) doesn’t reflect the post-hardening state.
This isn’t unusual. We could have picked any of the 100 validation extractions and shown a similar gap. The full set is published in the dataset.
This study is not a critique of the maintainers. The teams building these projects are doing the right thing in writing the files at all. Every CLAUDE.md, .cursorrules, and AGENTS.md represents a maintainer who took the time to think about what an AI agent should know. The structural problem is not their effort; it’s that the medium they’re working in (a static markdown file in a repo) doesn’t have the properties needed to keep pace with how fast their teams are shipping.
This study is also not making a claim about whether any specific agent is or isn’t drifting in any specific session. We measured the gap between documented architecture and current code. We did not measure agent behaviour against either. The behavioural question — what happens when an agent acts on stale context — is the next study, and it requires different methodology.
Want the full picture? Get the methodology, the dataset, and early access to Memex.
Check your inbox — we’ll send the link to the full report shortly.