Pipelines¶

Every Developer command runs a pipeline. A pipeline is a sequence of stages connected by routing functions. The pattern is always the same: call the agent, look at what it returned, decide what to do next.

Stages¶

A stage is a single agent invocation. It's defined as a Stage dataclass with these fields:

Field	What it does
`template`	Jinja2 template name for the prompt
`session`	Session strategy: `new`, `resume:<stage>`, or `fork:<stage>`
`output`	Pydantic model for structured output (or `None` for free-form)
`tools`	List of tools the agent can use (Read, Glob, Grep, Bash, etc.)
`max_turns`	How many turns the agent gets
`route`	Function that takes the context dict and returns the next stage name (or `None` to stop)
`pre`	Optional hook that runs before the agent call
`post`	Optional hook that runs after the agent call
`disallowed_tools`	Tools to explicitly block

Each pipeline defines its stages as a Python dict in its command file. The shared run_stage() and run_pipeline() functions in commands/common.py handle execution.

How a stage runs¶

Pre-hook runs if defined (inject extra context, set up files)
Template is rendered with the current context dict
Agent is called with the rendered prompt, allowed tools, and output model
Agent's structured output gets merged back into the context
Post-hook runs if defined (capture diffs, run quality tools, commit code)
Routing function decides the next stage

Sessions¶

Stages can share conversational context through sessions.

new -- fresh conversation, no memory of prior stages
resume:<stage> -- picks up where the named stage left off, full context preserved
fork:<stage> -- branches from the named stage's conversation without mutating it

Most pipelines use new for the first stage, then resume for follow-ups. This means the evaluation stage can reference things the implementation stage saw, without re-reading the entire codebase.

If a resume target is missing (the referenced stage didn't run), run_stage logs a warning and falls back to a fresh session.

The safety boundary¶

This is the part that matters. The agent can read code, write files, and run shell commands. But it cannot:

Run git add, git commit, git push, or any destructive git command
Create pull requests or issues
Post comments on GitHub

All of that happens in Python post-hooks and pipeline glue code. The agent returns structured data (a PR title, a commit message, a list of findings). Python takes that data and does the actual operations.

The boundary is enforced at the provider level. The Claude provider has a regex that blocks destructive git commands before they reach the shell.

Quality gates¶

The resolve and address pipelines run quality tools after implementation. If they fail, the agent gets up to three attempts to fix the issues. After three failed attempts, the resolve pipeline opens a draft PR noting the remaining problems. The address pipeline pushes what it has and notes the failures in its response comment.

Metrics¶

Every stage records timing, turn count, tool calls, and token usage. These get aggregated into a PipelineMetrics object and appended to PR bodies and comments as a collapsible <details> block. Good for understanding cost and spotting stages that take longer than expected.