Pipelines¶
Every Developer command runs a pipeline. A pipeline is a sequence of stages connected by routing functions. The pattern is always the same: call the agent, look at what it returned, decide what to do next.
Stages¶
A stage is a single agent invocation. It's defined as a Stage dataclass with these fields:
| Field | What it does |
|---|---|
template |
Jinja2 template name for the prompt |
session |
Session strategy: new, resume:<stage>, or fork:<stage> |
output |
Pydantic model for structured output (or None for free-form) |
tools |
List of tools the agent can use (Read, Glob, Grep, Bash, etc.) |
max_turns |
How many turns the agent gets |
route |
Function that takes the context dict and returns the next stage name (or None to stop) |
pre |
Optional hook that runs before the agent call |
post |
Optional hook that runs after the agent call |
disallowed_tools |
Tools to explicitly block |
Each pipeline defines its stages as a Python dict in its command file. The shared run_stage() and run_pipeline() functions in commands/common.py handle execution.
How a stage runs¶
- Pre-hook runs if defined (inject extra context, set up files)
- Template is rendered with the current context dict
- Agent is called with the rendered prompt, allowed tools, and output model
- Agent's structured output gets merged back into the context
- Post-hook runs if defined (capture diffs, run quality tools, commit code)
- Routing function decides the next stage
Sessions¶
Stages can share conversational context through sessions.
new-- fresh conversation, no memory of prior stagesresume:<stage>-- picks up where the named stage left off, full context preservedfork:<stage>-- branches from the named stage's conversation without mutating it
Most pipelines use new for the first stage, then resume for follow-ups. This means the evaluation stage can reference things the implementation stage saw, without re-reading the entire codebase.
If a resume target is missing (the referenced stage didn't run), run_stage logs a warning and falls back to a fresh session.
The safety boundary¶
This is the part that matters. The agent can read code, write files, and run shell commands. But it cannot:
- Run
git add,git commit,git push, or any destructive git command - Create pull requests or issues
- Post comments on GitHub
All of that happens in Python post-hooks and pipeline glue code. The agent returns structured data (a PR title, a commit message, a list of findings). Python takes that data and does the actual operations.
The boundary is enforced at the provider level. The Claude provider has a regex that blocks destructive git commands before they reach the shell.
Quality gates¶
The resolve and address pipelines run quality tools after implementation. If they fail, the agent gets up to three attempts to fix the issues. After three failed attempts, the resolve pipeline opens a draft PR noting the remaining problems. The address pipeline pushes what it has and notes the failures in its response comment.
Metrics¶
Every stage records timing, turn count, tool calls, and token usage. These get aggregated into a PipelineMetrics object and appended to PR bodies and comments as a collapsible <details> block. Good for understanding cost and spotting stages that take longer than expected.