Skip to main content
Core Concepts

The Agent Team

Meet the specialized AI agents that prepare, validate, and enforce quality in your WSD workflow.

The Team Metaphor

When you run a workscope cycle, you interact with one AI assistant — the User Agent. But the User Agent is not working alone. Behind the scenes, it coordinates with a team of Special Agents, each one a domain expert responsible for a specific aspect of the workflow. The User Agent is the generalist executor; the Special Agents are the specialists it consults.

This is a hub-and-spoke model. The User Agent sits at the center, relaying information between you, the developer, and the various Special Agents. Special Agents cannot talk to each other directly; they work through the User Agent as their communication channel. When the Context-Librarian identifies relevant documentation, it tells the User Agent, which absorbs that context. When the Rule-Enforcer finds a standards violation, it tells the User Agent, which must fix the problem or escalate to you.

Every agent in this system is ephemeral. The User Agent you work with in one session is retired when that session ends. The next session brings a fresh User Agent with no memory of the previous one — it inherits context through the documentation system, not through personal recollection. Special Agents are even more transient: they start fresh on every single invocation, even within the same session. The Context-Librarian that curates your context during preparation has no memory of doing so when it is consulted again during closure.

This ephemerality is useful. Fresh agents don’t carry biases from earlier work or rationalize past decisions. Each invocation brings clean judgment applied to the current state of the project. The continuity that matters — what was done, what was decided, what remains — lives in the documentation, not in the agents.

The Preparation Agents

Before the User Agent begins executing your workscope, three Special Agents prepare it for the work ahead. This happens during /wsd:prepare, and the result is an AI assistant that enters execution with curated, relevant context rather than a noisy dump of everything in the project.

The Context-Librarian is the documentation expert. It knows the landscape of your project’s documentation — the core specifications, feature overviews, tickets, workbench artifacts, and reference materials. Given a workscope assignment, the Context-Librarian identifies exactly which documents are relevant and provides a prioritized list of files for the User Agent to read. A workscope about implementing a caching feature gets the caching specification, the relevant design decisions, and any open tickets related to caching… not the authentication docs, not the deployment guide, not the full history of every feature ever built.

The Codebase-Surveyor does the same thing for source code. It identifies which code files are relevant to the current workscope without explaining their implementation. For a caching workscope, it might surface the configuration module, the existing storage layer, and the relevant test files. For a documentation-only workscope, it might provide immediate sign-off if there is no code to survey. The Codebase-Surveyor complements the Context-Librarian: one handles documents, the other handles code.

The Project-Bootstrapper educates the User Agent about your project’s rules and conventions before work begins. It reviews the standards, the design decisions, and the behavioral expectations that govern how agents should work on your project. This pre-education dramatically reduces the chance of the User Agent violating a rule by defining success and emphasizing the consequences of violation. This avoids costly problems that waste time when violations would otherwise surface during quality assurance.

Together, these three agents transform a blank-slate AI assistant into one that understands what documents matter, what code is relevant, and what rules apply, all before a single line of work is performed. The principle behind this preparation is context engineering: the quality of AI output depends on the quality of the context you provide, not the quantity.12 Dumping everything into a session produces noise; providing nothing produces hallucination. The preparation agents solve this by curating precisely what is relevant and excluding what is not. The Context Engineering guide explores this principle in depth, including how it works, how WSD implements it, and how you can apply it beyond WSD’s automation.

The QA Gauntlet

After the User Agent completes its assigned work, it does not simply hand the results to you. The work must first pass through a gauntlet of Special Agents, each independently reviewing it from a different perspective.

The Documentation-Steward verifies that the implementation matches its specifications. If a feature overview says the function should accept three parameters, but the implementation accepts two, the Documentation-Steward catches the discrepancy. It compares code against specs and flags any drift. Specifications in WSD are not aspirational documents that gradually become outdated. They are the source of truth, and the code must match them.

The Rule-Enforcer audits the completed work against the project’s established standards. It checks coding conventions, architectural principles, and the behavioral rules that govern how the project should be built. Where the Documentation-Steward asks “does this match the spec?”, the Rule-Enforcer asks “does this follow the rules?”

The Test-Guardian runs the test suite and verifies coverage. It confirms that new code has appropriate tests, that existing tests still pass, and that no regressions have been introduced. It provides the actual test runner output as evidence — not a summary or a claim, but the real results.

The Health-Inspector runs a comprehensive code quality check covering build integrity, type safety, security analysis, dependency auditing, documentation quality, code quality, and formatting. It provides the complete health check summary table as evidence of its review.

Each of these agents can reject the work. When a rejection happens, the User Agent must address the problem — fix the code, update the spec, add the missing tests — and resubmit. This creates an adversarial loop where quality issues are caught and resolved before the results ever reach you. The QA gauntlet is a real quality gate that the User Agent must pass.

Veto Power

Two agents in the QA gauntlet (the Documentation-Steward and the Rule-Enforcer) hold a special authority: veto power. When either of these agents rejects work, the User Agent cannot override the decision. It cannot dismiss the objection or rationalize its way around it.

The User Agent has exactly two options when facing a veto: fix the problem, or escalate to you. This is a deliberate constraint. Without veto power, the enforcement model degrades into a suggestion system, the same kind that makes rules files ineffective. The AI acknowledges the feedback, weighs it against its own judgment, and sometimes decides to proceed anyway. Veto power removes that possibility.

This means that when you receive work from a workscope, you know it has survived scrutiny from agents that had the authority to stop it. The Documentation-Steward confirmed that the code matches its specifications. The Rule-Enforcer confirmed that the project’s standards were followed. These are gates the work had to pass through.

You, as the developer, remain the ultimate authority. If you disagree with a veto, you can override it. But the system ensures that overriding requires a conscious, deliberate decision from a human — not a quiet judgment call made by the AI in the middle of a long session.

Proof of Work

Trust in the QA gauntlet depends on knowing that the agents actually performed their reviews. An agent that claims “all tests pass” without running the test suite provides no value — it is an attestation without evidence.

WSD addresses this through a proof-of-work requirement. Agents must include actual tool output in their reports, not just verdicts. The Test-Guardian must show the real test runner summary — the line that says something like “140 passed in 0.23s” — proving it executed the test suite. The Health-Inspector must show the complete health check summary table with pass/fail status for each quality dimension. If an agent provides an approval without this evidence, the User Agent is required to reject the approval and demand that the agent run its checks and produce the actual output.

This is a system of watching the watchers. The Special Agents review the User Agent’s work. The User Agent verifies that the Special Agents actually did their jobs. And you, the developer, can see the evidence in the work journal, which records the full exchange. At no point in the chain does anyone have to take a claim on faith.

The proof-of-work system exists because AI agents, like any system under pressure to produce results, can take shortcuts. They can claim to have run tools without running them.3 They can provide generic approvals to fulfil the goal that sound thorough but reflect no actual analysis.4

Requiring concrete evidence of real tool output and real test results makes these shortcuts visible and rejectable.

The Enforcement Model

WSD’s approach to ensuring quality operates on three layers that work together to create genuine accountability.

The first layer is pre-loading expectations. Before the User Agent begins work, the Project-Bootstrapper educates it about the project’s rules, conventions, and standards. The Bootstrapper actively briefs the agent on what it will be judged against, what common mistakes to avoid, and what the quality bar looks like. The second layer is post-execution compliance checking. After the work is done, the Special Agents confirm results and compel alignment.

The third layer is the threat of rejection itself. This is subtler but significant: telling an AI agent upfront that its work will be scrutinized by domain experts who can reject it observably improves adherence to standards.5 The agent knows that cutting corners or ignoring a rule will not slide by unnoticed — there are specific agents whose job is to catch exactly those lapses. This anticipatory effect amplifies the impact of the other two layers.

These three layers complement each other. Pre-loading without enforcement is just education — helpful but not binding. Enforcement without pre-loading catches violations after the fact, which is more expensive to fix. The threat of rejection without actual enforcement is an empty bluff. Together, they create a system where rules are understood before work begins, verified after work completes, and taken seriously throughout because the consequences are real.

The practical result is that the rules you define for your project — your coding standards, your architectural decisions, your conventions — function as genuine constraints rather than hopeful suggestions. When you add a new rule, it is not a note that the AI might read and might follow. It is a requirement that will be taught, checked, and enforced.

But enforcement is only half the story. Equally important is how the rules themselves are organized, because the prevailing industry practice of cramming everything into a single rules file undermines even the best enforcement mechanism. The next guide covers WSD’s distributed approach to this problem: five specialized locations for different types of guidance, each with its own purpose, lifecycle, and relevance scope.

Footnotes

  1. Anthropic, “Effective context engineering for AI agents,” Anthropic Engineering Blog, Sep. 2025. [Online]. Available: https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents

  2. K. Hong, A. Troynikov, and J. Huber, “Context Rot: How Increasing Input Tokens Impacts LLM Performance,” Chroma Research, Jul. 2025. [Online]. Available: https://research.trychroma.com/context-rot

  3. R. C. Gray, “Claude Code Phantom Reads Bug Investigation,” GitHub, Jan. 2026. [Online]. Available: https://github.com/rcgray/claude-code-bug-phantom-reads-17407

  4. M. Sharma et al., “Towards Understanding Sycophancy in Language Models,” arXiv:2310.13548, Oct. 2023. [Online]. Available: https://arxiv.org/abs/2310.13548

  5. The general principle that evaluation awareness improves model behavior is supported by Y. Bai et al., “Constitutional AI: Harmlessness from AI Feedback,” arXiv:2212.08073, Dec. 2022. The specific application to pre-announced adversarial review in agentic coding workflows is based on WSD’s own observations and remains a candidate for formal study.