Skip to main content
Back to philosophy

From Prompts to Process

The current approach to AI-assisted development — write better prompts, craft better rules files — addresses the wrong layer. What's actually needed is workflow engineering.

The Prompt Trap

The dominant strategy for improving AI-assisted development is to optimize the input. Write a more detailed prompt. Craft a better rules file. Add more context to the system message. Provide clearer examples.

This addresses the wrong layer. You can perfect the instructions and still get work that violates your conventions, drifts from your specifications, or solves the wrong problem, because there’s nothing between the prompt and the output that enforces discipline.

The core problem with AI-assisted development today isn’t that developers give bad instructions. It’s the absence of a system around those instructions. The input quality matters, but the real bottleneck is that there’s no workflow to catch what falls through.

Consider the parallel to software engineering itself. Writing code has always been possible. People were doing it in the 1960s without version control, without CI/CD, without code review. The code worked, but the process didn’t scale. It took decades to transform “writing code” into “software engineering”:1 a practice with structure, accountability, traceability, and continuous improvement.

AI-assisted development is at that same inflection point. The capability exists. The discipline doesn’t. And no amount of better prompting will substitute for building it.

Rules Files Don’t Work

Here is a truth every developer who has spent serious time with AI coding assistants has discovered: rules files are treated as suggestions. The AI follows them when it feels like it, and the violations are discovered during review, if they are discovered at all.

The difference between “please follow these rules” and “your work will be reviewed by independent specialists who can reject it” is the difference between a recommendation and an institution. Recommendations rely on the goodwill and attention of the recipient. Institutions create accountability structures that function even when goodwill and attention fail.

Workflow Engineering

The shift from prompt engineering to workflow engineering is the core insight.

Instead of trying to make a single AI session smarter, operate at a higher level of abstraction and engineer the entire workflow. How do I build a system where AI-assisted development reliably produces quality results?

Structure, accountability, traceability, and continuous improvement didn’t make individual code-writing sessions better, but they made the overall practice of building software dramatically more reliable.

Workflow engineering means defining formal work units sized to what an AI can complete thoroughly, curating context deliberately rather than dumping everything or providing nothing, building verification systems that catch violations independently of the AI’s own attention, and creating institutional memory so that knowledge persists across ephemeral sessions. None of these are about the prompt, but all of them dramatically affect the outcome.

The Three Domains

The workflow engineering approach reveals three distinct domains of value that the current conversation about AI-assisted development has collapsed into one.

The preceding pages explored two of them: specification-layer tooling (the tools for creating, maintaining, and cross-checking design documents) and compilation automation (the structured lifecycle that translates specifications into implementation). Both are substantial, and both are underserved by current tooling.

But the workflow lens also reveals a third domain that the current conversation largely overlooks: the cultural infrastructure of human-AI collaboration.

This is the continuous work of building a shared operating context between a human engineer and the AI agents they work with: how the AI behaves, what it prioritizes, what it knows about your project’s particular values and sensitivities, how it responds to ambiguity, and (most critically) the shared language through which all participants coordinate.

An objection surfaces here: isn’t this just better prompting? You’re loading context, rules, and onboarding materials into the AI’s input. That’s a prompt.

The distinction matters. Giving a new employee detailed task instructions is prompting. Onboarding them into the company’s culture, vocabulary, and way of thinking is something else entirely. Both involve telling them things. One is transactional and improves the next output. The other is structural, because it shapes how every future interaction works. The Prompt Trap is about optimizing instructions, but this third domain is about building the relationship within which instructions are understood.

Most of the industry treats AI as a tool you use. This third domain treats it as a collaborator you manage, and management requires different skills and infrastructure than tool usage. Almost no one is building for it yet.

Each of these domains requires its own approach and tooling. Collapsing them into a single conversation about “AI coding tools” obscures the real work that needs to happen in each one.

The Seed of a Shared Language

A discipline is defined less by its tools than by its vocabulary. Software engineering became a discipline not when it got compilers and debuggers, but when it developed a shared culture of concepts, metaphors, and shorthand that compressed hard-won lessons into transmissible form.

“Technical debt”2 gave entire organizations a metaphor that changed how non-engineers reasoned about code quality. “Code smell”3 gave engineers permission to trust their intuition before they could articulate the problem. Dijkstra’s “goto considered harmful”4 compressed a design philosophy into three words that are still invoked decades later. Brooks’s “no silver bullet”5 remains the standard response to magical thinking about productivity. These aren’t instructions or best practices. They’re cultural artifacts, the shared language of a discipline.

New domains generate this vocabulary naturally, because novel phenomena need names before they can be discussed. Social media created “ghosted,” “ratioed,” and “shadow-banned” to describe interactions that simply didn’t exist until there was a domain in which they could manifest. The terms made the phenomena visible and discussable.

Human-AI collaboration is a new domain, and its indigenous vocabulary is just beginning to form. What do you call a unit of work sized to fit within an AI’s effective attention span? What do you call the blocking priority that preempts all other work when an urgent issue surfaces during execution? What do you call the shared documents that allow agents who can never speak to each other directly to coordinate across sessions? These phenomena exist now. Most of them don’t have names yet.

Naming matters because it enables coordination. When every agent in a system (human and AI alike) shares a conceptual vocabulary, they can operate from the same mental model without rebuilding it from scratch each session. The shared language becomes a trellis: a structure that doesn’t do the growing itself, but gives the vines a shape to form around. Without it, each session starts from nothing. With it, the work compounds.

This is what the cultural layer of human-AI collaboration actually looks like in practice: a shared conceptual framework that orients every participant in the system, not a rules file that the AI may or may not follow. Building that framework by recognizing the phenomena, naming them, and refining the vocabulary through sustained practice, is work that barely anyone has started. It may be the most important work of the three domains, because the other two depend on it.

Trust Through Verification

Karpathy observed that in the traditional programming paradigm, the capability frontier is governed by what you can specify — if you can write the algorithm, the computer will execute it faithfully. In the AI paradigm, the frontier shifts: what matters is what you can verify.6 A task that can be verified can be optimized. A task that cannot be verified is left to hope and generalization.

This principle, originally about which tasks AI can master through training, extends directly to the production use of AI in development. The quality ceiling of AI-assisted work is determined by verification quality, not prompt quality. You can only trust AI output to the degree that you can independently confirm it. This is the defining constraint of the paradigm, and it connects directly to the Prompt Trap: better prompting optimizes the wrong variable. The binding constraint is verification.

AI-assisted development presents a trust dilemma with no comfortable resolution. You can review everything the AI produces, which erases the speed advantage that justified using it in the first place. Or you can trust the output and accept the risk of subtle bugs, convention violations, and specification drift accumulating silently. Neither option scales.

The way out is to stop thinking of verification as a human activity. A single generalist reviewer scanning for everything will inevitably miss things, the same way a single developer doing “code review” catches different issues than a dedicated security auditor, test reviewer, and standards checker working independently. Specialization produces thoroughness. And when each specialist must provide concrete evidence that they actually performed their review, proof of work rather than just a verdict, accountability becomes structural rather than aspirational.

This creates a level of quality assurance that would be exhausting for a human to perform on every iteration, delivered automatically and consistently. The cost of thoroughness drops to nearly zero, which means you can afford to be thorough every time, not only when the stakes feel high enough to justify the effort.

Footnotes

  1. P. Naur and B. Randell, Eds., “Software Engineering: Report on a conference sponsored by the NATO Science Committee,” Garmisch, Germany, 7–11 Oct. 1968. The conference is widely credited with coining the term “software engineering” to frame the discipline’s foundational crisis: that software development needed the rigor of established engineering practices to manage growing system complexity.

  2. W. Cunningham coined “technical debt” in 1992; see M. Fowler, “TechnicalDebt,” martinfowler.com. [Online]. Available: https://martinfowler.com/bliki/TechnicalDebt.html

  3. M. Fowler, Refactoring: Improving the Design of Existing Code. Reading, MA, USA: Addison-Wesley, 1999. Fowler popularized the term “code smell,” originally coined by Kent Beck, to describe surface-level indicators of deeper structural problems — giving engineers a shared vocabulary for intuitions they already had.

  4. E. W. Dijkstra, “Go to statement considered harmful,” Commun. ACM, vol. 11, no. 3, pp. 147–148, Mar. 1968. The letter, whose provocative title was chosen by editor Niklaus Wirth, became one of the most cited and debated publications in computing history.

  5. F. P. Brooks Jr., “No silver bullet: Essence and accident in software engineering,” Computer, vol. 20, no. 4, pp. 10–19, Apr. 1987. Brooks argued that no single technology or management technique would yield an order-of-magnitude improvement in software productivity, a position that has held for nearly four decades.

  6. A. Karpathy, “Verifiability,” Nov. 17, 2025. [Online]. Available: https://karpathy.bearblog.dev/verifiability/. Karpathy’s framework distinguishes Software 1.0 (hand-written programs, bounded by specifiability) from Software 2.0 (neural networks trained via gradient descent, bounded by verifiability). AI-assisted development via LLM prompting — what Karpathy elsewhere terms Software 3.0 — inherits and amplifies the verification constraint.