The Prompt Is the Work · Extreme AI Programming #4

A decade of open-source documentation. The skill the industry never named. Translators, not developers. Three people, one login form. Why the prompter matters more than the prompt.

Somewhere in the mid-2000s I got drawn into open source, and the role I kept ending up in, almost by accident, was writing documentation. Not code comments or terse README files, but the serious stuff. Onboarding guides. Architectural overviews. The docs site. I had a reasonable amount of success, in the sense that the people who read them came away understanding the projects well enough to actually contribute.

What I noticed in the course of doing this is that writing clearly about software is a meaningfully different skill from writing software. The two overlap, obviously, but they overlap less than you would expect. Plenty of excellent engineers cannot explain what they have built to save their lives. Plenty of people who could not write production code to a professional standard are remarkably good at describing what code does, how it fits together, and why the choices in front of you matter.

These are two distinct abilities, and until recently the second one mattered only at the edges of the work: in teaching, in onboarding, in writing the documentation nobody quite got paid for. Anyone whose job was to ship code could safely under-invest in it. That has changed completely in the last two years, and the reason it has changed is worth being precise about.

This is a skill in its own right, though it is rarely treated as one. It is the skill of having a proper vocabulary for software, and of being able to deploy that vocabulary precisely enough that someone else can build a working mental model from what you have said. The difference between a class and an instance of a class. Coupling and cohesion. Interface and implementation. These distinctions are not obscure; they are the basic grammar of how software is put together. Deploying them correctly, in context, in full prose, while walking an inexperienced person through a real system, is surprisingly hard, and the people who can do it well have usually spent years at it.

This is the skill that is now the most economically productive one in serious software work.

When you sit down to work with a modern coding agent, the entire interface is an exercise in describing a software outcome. You are not typing the code; the agent is going to type the code. What you are doing is translating a mental model of software into prose precise enough that a recipient who does not share your context can build the same mental model on the other side. The agent is an unusually demanding recipient: it takes what you have said exactly as you have said it and proceeds accordingly. If your brief is precise but subtly misaligned with the codebase, the output will be precisely wrong, which is worse than vague.

The reason articulation has become this kind of bottleneck is worth naming directly. Jon Ayre, who runs the Strategic Advisory practice at Equal Experts, gave me the cleanest available framing in a comment on the first article in this series. The Large Language Model sitting inside Cursor and Claude Code is not a developer; what it does is translation. Free-form English in, code out. The thinking happens before the prompt, not during it.

In a follow-up exchange Jon sharpened the point. The model is doing translation coupled with retrieval by keyword: a single phrase in English can expand into an entire coded function because the phrase is the agreed shorthand for that function. His analogy was a designer asking for “three paragraphs of lorem ipsum here”. The phrase fetches a thing that already exists; the model is not inventing it. Asking the agent to “write me a binary search” requires the prompter to know the thing exists, to know what it does, and to know it is the right solution to the problem in front of them. The model cannot supply any of those three.

If the model is doing translation coupled to keyword retrieval, then the words you choose are doing a great deal more work than they would in any other written register most of us know. Every load-bearing word in a prompt is a key turning a different lock. Build fetches one set of patterns; refactor fetches another; tighten fetches a third. Scalable recruits a different idiom from simple. These are not stylistic choices. They are switches, and the model is too good at translating across them for any of us to be casual about which one we typed.

Which is where almost every published guide to this craft stops. Pick the right words. Choose the right verbs. Be precise. All true, all useful, none of it the part of the picture I have been wanting to write about. The most consequential variable in any prompt loop is the one almost nobody is naming, and it is not the prompt. It is the person sitting at the keyboard.

The login form

Imagine three people typing into the same agent against the same feature. None of them is doing anything wrong. Each is using the tool in good faith. The differences in what they get back are a function of the vocabulary each one brought to the keyboard, and almost nothing else.

The first is a founder or business owner. The requirement, in their head, is a business outcome: we need to know who our users are, so we can charge them, retain them, and show them their own data. They type something close to this: “We need users to be able to sign up and log in, so we know who is using the product and can attach their activity to their account.” The prompt is clear and articulate, but the lexicon is business-side. The why is fully present; the how is left entirely to the agent.

The second is a product manager or a product designer. They have the vocabulary the founder does not: of user flows, of jobs to be done, of the visual elements that make a login screen feel like a login screen. Their prompt reads more like: “Add a login screen as the first page after the marketing site. Email and password fields, a ‘forgot password’ link, a ‘sign in with Google’ button. Land the user on their dashboard after sign-in, and treat returning users with a remembered-email shortcut.” This is a different prompt entirely. It is describing the application from the visible surface inward.

The third is a senior developer or architect. She sees the same requirement and starts thinking in terms the other two could not have used. Do we want a username-and-password flow at all, or are we going single sign-on? If single sign-on, which provider: Google, Microsoft, an enterprise OIDC of some kind? Where does the session token live and for how long? How does this slot into the authentication primitives the rest of the system already uses, so we are not now maintaining two parallel ways of knowing who a user is? Is this the third login form being introduced this month, or are we converging on the helper the team has quietly been aligning on? Her prompt, when it lands, names every one of those decisions, picks the one she wants, and hands the agent the constraints to honour: DRY against the existing helpers, faithful to the SOLID principles the codebase has been built to, and conforming to the team’s authentication standard.

All three prompts produce a working login form. The agent is capable enough that none falls over. But the three forms are not the same form. The founder gets the generic implementation: the model’s most plausible defaults, assembled from the training data, with no knowledge of the codebase and no concern for sustainability beyond what works tonight. The product manager gets a more specific implementation, faithful to the workflow described and visually correct, but still operating outside the codebase’s existing architecture. The senior developer gets the implementation she already knew was right, expressed in enough detail that the agent could not have produced anything else.

I do not want to draw a verdict here about which of the three is using the tool correctly. All three are using it well, given what they brought. The founder is keeping momentum on a product that may not survive the quarter. The product manager is prototyping an interaction faster than any old-style design tool would have allowed. The developer is compressing a week of careful implementation into an afternoon. These are all good reasons to be at the keyboard.

The point is sharper than which role is doing it best. Output is correlated to input, and input is a function of what the person typing actually knows about the system they are asking for. The tool is now embedded at every level of the work, and each of us gets back exactly what our vocabulary instructed it to produce.

Beyond vibe coding

There is a story that has dominated the discussion of AI coding for the last eighteen months, and it goes roughly like this. One person sits at a keyboard. They prompt an agent. The agent produces working software. The person becomes the entire engineering function on their own. The industry has given this a name, vibe coding, and it has become the dominant mental picture of what AI-assisted development looks like.

I think vibe coding is the scourge of our profession right now. It is not that the products it produces never work; they often do, on a small enough scale. The problem is that it obliterates the specialist roles that every piece of consequential software still needs to have: the founder’s instinct for whether this maps to a real customer, the product manager’s care for how the interaction should land, the designer’s sense of what the surface should look like, and the developer’s discipline around the architectural choices that determine whether tonight’s working code is still working in six months. Vibe coding flattens all of those into one person, working alone, in front of one agent, with no scrutiny and no second voice.

In older language, this is cowboy coding. The agent does not change the substance of that picture. It just lets one cowboy do it faster.

The case for serious AI-native engineering is not the cowboy with an agent. It is the same case the discipline has always made for itself: software of any real consequence is built by teams whose facets complement each other. The new variable is that those facets now express themselves to a shared instrument, and the discipline of the team is precisely the discipline of orchestrating those vocabularies coherently.

In the picture I am working towards, the founder articulates the first hypothesis to the agent in business vocabulary and produces a thin prototype, good enough to put in front of real users. The product manager and designer, working from what those users said, then articulate the interaction in product vocabulary and produce something visually correct and workflow-faithful. The senior developer, taking the prototype and the interaction as inputs, articulates the implementation in architectural vocabulary and produces the version of the feature that will actually live in production. None of them, working alone, would have produced that final result. The agent is the same agent throughout. The articulation it receives, and the artefact it produces, change at each step.

The interesting unit, in an AI-first team, is neither the prompt nor the prompter. It is the orchestrating discipline that decides which facet of the team supplies which articulation, at which point in the loop, against the same instrument. That orchestration is the work the previous generation of engineering practices was already attempting. It is just that an agent now sits in the middle of every articulation and responds, in real time, to whichever vocabulary lands in its window.

Why this changes the discipline

Working well with these instruments is not a tips-and-tricks discipline. It is the older discipline of saying exactly what you mean, exposed for the first time as a unit of economic production. Ten years ago, the difference between write a function that handles every edge case and write a function that handles edge cases was a sentence in a code review. Now it is the difference between two implementations the agent will defend with equal fluency. The precision of individual words is suddenly load-bearing in a way that was previously reserved for poetry, contract law and code itself.

The discipline this asks of us, across every role, is to be honest about how much our lived experience and specific knowledge are doing inside what looks, on the surface, like casual typing into a chat box. The thing the model cannot supply for itself is the thing that determines, more than anything else in the loop, what we get back.

When I started writing onboarding guides for open-source projects in the mid-2000s, I did not think I was building a skill that would become the most economically productive thing I knew how to do. I was just trying to make sure the next contributor to the project understood what was going on. The same activity, sentence by sentence and word by word, is what an entire industry is now rediscovering and giving a new name to.

What I actually do

The argument above is the chapter. The five short habits below are what the argument means at a keyboard, taken from a year of running an AI-native company. There are better-written-up versions of all of them in the references at the foot. These are the practical residue.

Voice-dictate the long briefings; type the short ones. Speaking forces a different and almost always richer composition than typing does. The benefit is to you, not to the model.
Open meaningful work by asking the model to interview you. “Before you write anything, identify the three or four decisions in this brief that will most constrain the implementation, and ask me about each of them in turn.” The questions you do not think to ask yourself are the ones that bite later.
Suggest a direction with your reasoning; let the model push back. Specifying every detail produces compliant, slightly mechanical work. Naming the option and the alternatives invites the intelligence the model actually has.
Challenge whatever comes back, at least once, before accepting it. “What is the strongest case against this approach?” The first answer is a draft. Drafts benefit from review.
Treat the context window as a finite resource, not a container. When the agent starts repeating itself, contradicting earlier decisions, or drifting from the brief, ask it to produce a handoff and start a fresh window with the handoff at the top. Long contexts degrade in use, often well before they are nearly full.

Where to read more

The prompting craft has been written up by people who have specialised in it further than I have. The four I would recommend reading first:

Anthropic’s prompt engineering overview. The canonical reference for working with Claude, with specific testable claims about long-context placement.
Eugene Yan, Prompting Fundamentals and How to Apply Them Effectively. Strong on n-shot prompting, structured scratchpads and the primacy of evaluation over prompt tinkering.
Lilian Weng, Prompt Engineering. The technical survey that hardens the academic vocabulary: zero-shot, few-shot, chain-of-thought, self-consistency, tree-of-thoughts.
Hamel Husain, Fuck You, Show Me The Prompt. The case for reading, owning and measuring the actual prompts your tools are sending, against the comforting abstraction of frameworks.

For the academic version, the most comprehensive source is Sander Schulhoff and colleagues, The Prompt Report: A Systematic Survey of Prompting Techniques. Fifty-eight techniques, seventy-six pages. Useful as a reference, harder to read straight through.

The thing those guides take for granted, and the thing this chapter has been about, is the person at the keyboard: the vocabulary they brought, the role they sit in, the team they belong to. The prompt is the visible artefact of all of that. The prompt is not the work. The prompter is. The team behind the prompter is what determines whether the work amounts to anything.

The work was always in the words. We just did not have to be this honest about it before.

— Barrie

I am co-founder and CEO of Mindset AI, where we are building Memex AI, a decision and knowledge layer for AI-native engineering teams. This series is the thinking that shapes our product. I will flag it explicitly when an article touches something we build. Most of it is simply where the industry is going, with or without us.