How to Design a Planner-Executor-Reviewer Agent Workflow

A useful AI-agent workflow is not just a bigger prompt. It is a system of roles, handoffs, checks, and stop conditions. The planner-executor-reviewer pattern is a practical way to split agent work into three responsibilities: deciding what should be done, doing the work, and deciding whether the result is good enough to ship.

This pattern is especially useful for developer-facing work: writing code, maintaining content sites, running research pipelines, updating documentation, or operating a small automation system. It does not make agents magically correct. It gives you a structure where mistakes are easier to find before they reach production.

AI-assisted disclosure: this guide was produced with AI assistance and reviewed against KnowToAct editorial gates for source quality, practical usefulness, and risk controls.

The core idea

A planner-executor-reviewer workflow separates strategy from implementation and quality control.

The planner defines scope, constraints, acceptance criteria, and the order of work.
The executor performs the work using tools, files, APIs, or code.
The reviewer checks the result against the original requirements and quality standards.
An optional acceptance gate decides whether to publish, deploy, merge, or escalate to a human.

Anthropic's guide to building effective agents distinguishes predictable workflows from more autonomous agents and recommends simple, composable patterns before adding complexity. That advice matters here: the planner-executor-reviewer pattern is not meant to be a giant swarm. It is a minimal structure for work that is too risky or too complex for a single unchecked agent call.

Workflow diagram

User goal
   |
   v
Planner agent
   |-- writes task brief
   |-- defines constraints
   |-- defines acceptance criteria
   v
Executor agent
   |-- performs implementation or research
   |-- records commands, files, sources, and uncertainty
   v
Reviewer agent
   |-- checks spec compliance
   |-- checks quality and risks
   |-- requests changes or approves
   v
Acceptance gate
   |-- runs deterministic checks
   |-- publishes, deploys, merges, or escalates

The important point is that each stage leaves an artifact. A plan, a diff, a source list, a test log, or a review note is more reliable than relying on a later agent to remember what happened in the conversation.

When this pattern is worth using

Use this workflow when the output has external consequences or quality matters. Good examples include:

publishing a technical article to a public website;
modifying production code;
creating a GitHub Actions workflow;
updating Cloudflare Pages deployment settings;
researching tools where hallucinated sources would damage trust;
generating docs that developers will follow step by step.

A single-agent workflow is often enough for quick brainstorming, throwaway scripts, or private notes. Adding more agents increases cost, latency, and coordination overhead. The pattern earns its keep only when the review and acceptance gates catch enough errors to justify that overhead.

Role boundaries

The pattern works best when each agent has a narrow job. If every agent can rewrite the goal, change the implementation, approve its own output, and deploy to production, the workflow is just a single agent with extra steps.

Role	Primary job	Should write down	Should not do
Planner	Convert a goal into a scoped task	Brief, constraints, dependencies, acceptance criteria	Make unreviewed production changes
Executor	Produce the artifact	Files changed, commands run, sources used, known uncertainty	Approve its own work
Reviewer	Compare the result with the brief	Spec gaps, quality issues, risk notes	Rewrite the goal without saying so
Acceptance gate	Decide publish/deploy/merge/escalate	Final checklist and verification output	Skip deterministic checks

OpenAI's Agents SDK documents concepts such as agents, tools, handoffs, guardrails, and tracing. Those concepts map well to this table: handoffs define how work moves between roles, guardrails constrain behavior, and tracing helps you inspect what happened when a multi-step workflow fails.

Handoff artifacts

A planner-executor-reviewer workflow needs explicit handoffs. For a content article, the planner should not simply say "write a good post." It should produce a task contract.

task: upgrade-agent-workflow-article
owner: writer-agent
section: agents
objective: publish a practical guide to planner-executor-reviewer workflows
audience: developers building AI-agent automation systems
constraints:
  - avoid unsupported productivity claims
  - include at least five reliable sources
  - include human approval rules for risky actions
  - no affiliate recommendations in this article
acceptance_criteria:
  - 1500-2500 words
  - includes workflow diagram, role table, and checklist
  - passes content validation and build
  - reviewer confirms no hallucinated sources
escalation:
  - unclear legal, security, or production-deployment advice
  - missing or unreachable primary sources

For software work, the same idea becomes a different contract: files to modify, tests to run, expected failure mode, and rollback plan. For operations work, it may include API scopes, allowed commands, and a manual approval point.

Reviewer gates should combine AI review with deterministic checks

Reviewer agents are useful, but they are not a guarantee of correctness. They can miss bugs, over-trust the executor, or reinforce a flawed plan. Treat reviewer output as one signal, not the only control.

Stronger gates combine model review with external checks:

unit tests and integration tests;
link checks and source validation;
static analysis or schema validation;
GitHub Actions CI jobs;
Cloudflare Pages preview deployments;
human approval for high-risk changes.

GitHub Actions is useful because it turns acceptance criteria into repeatable workflow steps. A reviewer may say a change looks good, but CI can still catch a broken build. Cloudflare Pages preview and production deployments can provide a visible artifact for final inspection before a public release.

Example: publishing a technical article

A content workflow can use the pattern like this:

Planner writes the article brief, search intent, required sources, and quality gates.
Researcher or executor gathers official docs, standards, and credible engineering sources.
Writer produces the draft using only the brief and source notes.
Fact-checker verifies that claims are supported by sources.
Reviewer checks usefulness, SEO quality, anti-spam risks, and compliance disclosures.
Acceptance gate runs content validation, build, sitemap/RSS generation, and preview checks.
The article is published only after all gates pass.

This is slower than asking one model to write a post. It can also be safer for a site that wants long-term trust, because review gates make mistakes easier to catch before publication. A low-quality AI article can hurt both readers and search visibility; a source-backed article with a checklist, examples, and clear limitations is more likely to be useful.

Failure modes and fixes

Failure mode	What it looks like	Practical fix
Vague planning	Executor receives a broad goal with no acceptance criteria	Require a written task contract before execution
Reviewer rubber-stamp	Reviewer says "looks good" without checking sources or tests	Use a review checklist and require evidence
Excessive agency	Executor can deploy, delete, or spend money without approval	Apply least privilege and manual approval for risky actions
Context loss	Later agents do not know why a decision was made	Store decisions in files, issues, or task artifacts
Infinite iteration	Agents keep revising without a stop condition	Define max attempts and escalation rules
Hallucinated sources	Article cites nonexistent docs or unsupported claims	Require reachable URLs and source-to-claim mapping

OWASP's LLM application guidance highlights risks such as prompt injection, sensitive information disclosure, insecure output handling, and excessive agency. Those risks become more serious when agents can use tools. The fix is not to avoid automation entirely; it is to limit permissions, validate inputs and outputs, log actions, and require human approval for high-impact steps.

Human approval rules

Do not let a reviewer agent be the only approval layer for irreversible actions. Require human approval when a task involves:

production deployments;
financial transactions;
legal, medical, tax, or compliance advice;
security-sensitive code or credentials;
deleting data;
contacting users or customers;
publishing claims about people, companies, or products;
changing DNS, billing, or account permissions.

The NIST AI Risk Management Framework emphasizes governance, measurement, and managing AI risks in context. A small developer project does not need enterprise bureaucracy, but it still benefits from clear roles, logs, and escalation rules.

What this pattern does not solve

This pattern does not eliminate hallucinations, security risks, unclear requirements, or bad judgment. It only creates clearer places to catch and correct them. If the original goal is wrong, if the sources are weak, or if the tools have excessive permissions, adding more agents can make the failure harder to debug. Treat the pattern as a control system, not a correctness guarantee.

Implementation notes for Hermes-style teams

In a Hermes-based workflow, you can represent these roles with profiles, prompts, skills, or explicit task instructions. A practical starting team might be:

a planner for scope, architecture, and acceptance criteria;
a researcher for source collection and notes;
a writer or executor for drafts and implementation work;
a fact-checker for claim verification;
a reviewer for quality and risk review;
an acceptance gate for final checks and release decisions.

Those role names are examples, not required components or endorsements. The important design choice is that planning, execution, review, and final acceptance are not all performed by the same unchecked actor.

Start with one workflow and one repository. Store stable project facts in the repo, not only in chat memory. Put repeatable procedures into checklists or scripts. Use CI as the final gate whenever possible.

Publication readiness checklist

Before publishing an AI-assisted technical artifact, check:

The planner wrote a task contract with acceptance criteria.
The executor recorded files changed, commands run, sources used, and uncertainty.
The reviewer checked both spec compliance and quality.
Factual claims have reliable sources or are framed as recommendations.
No agent approved its own work as the final authority.
Tests, build, source checks, or link checks passed.
Risky actions have a human approval path.
The output includes disclosure when AI assistance materially contributed.
The system has a rollback or correction path.

Start small

The biggest mistake is building a complicated swarm before the basic loop works. Begin with one planner, one executor, one reviewer, and one acceptance checklist. Run the process on a real task. Record what failed. Tighten the handoffs. Only then add specialized agents, cron jobs, memory layers, or analytics-driven planning.

A planner-executor-reviewer workflow is not about making agents look organized. It is about making the work inspectable. When every stage leaves evidence, you can improve the system instead of guessing why the last run went wrong.

Sources

Anthropic Engineering: Building Effective Agents: https://www.anthropic.com/engineering/building-effective-agents
Hermes Agent Documentation: https://hermes-agent.nousresearch.com/docs
OpenAI Agents SDK Documentation: https://openai.github.io/openai-agents-python/
GitHub Actions Documentation: https://docs.github.com/en/actions
Cloudflare Pages Git Integration: https://developers.cloudflare.com/pages/configuration/git-integration/
NIST AI Risk Management Framework: https://www.nist.gov/itl/ai-risk-management-framework
OWASP Top 10 for LLM Applications: https://genai.owasp.org/llm-top-10/

How to Design a Planner-Executor-Reviewer Agent Workflow

The core idea

Workflow diagram

When this pattern is worth using

Role boundaries

Handoff artifacts

Reviewer gates should combine AI review with deterministic checks

Example: publishing a technical article

Failure modes and fixes

Human approval rules

What this pattern does not solve

Implementation notes for Hermes-style teams

Publication readiness checklist

Start small

Sources

Original value in this guide

Sources