Skip to content

Pipelines

Every Developer command runs a pipeline. A pipeline is a sequence of stages connected by routing functions. The pattern is always the same: call the agent, look at what it returned, decide what to do next.

Stages

A stage is a single agent invocation. It's defined as a Stage dataclass with these fields:

Field What it does
template Jinja2 template name for the prompt
session Session strategy: new, resume:<stage>, or fork:<stage>
output Pydantic model for structured output (or None for free-form)
tools List of tools the agent can use (Read, Glob, Grep, Bash, etc.)
max_turns How many turns the agent gets
route Function that takes the context dict and returns the next stage name (or None to stop)
pre Optional hook that runs before the agent call
post Optional hook that runs after the agent call
disallowed_tools Tools to explicitly block

Each pipeline defines its stages as a Python dict in its command file. The shared run_stage() and run_pipeline() functions in commands/common.py handle execution.

How a stage runs

  1. Pre-hook runs if defined (inject extra context, set up files)
  2. Template is rendered with the current context dict
  3. Agent is called with the rendered prompt, allowed tools, and output model
  4. Agent's structured output gets merged back into the context
  5. Post-hook runs if defined (capture diffs, run quality tools, commit code)
  6. Routing function decides the next stage

Sessions

Stages can share conversational context through sessions.

  • new -- fresh conversation, no memory of prior stages
  • resume:<stage> -- picks up where the named stage left off, full context preserved
  • fork:<stage> -- branches from the named stage's conversation without mutating it

Most pipelines use new for the first stage, then resume for follow-ups. This means the evaluation stage can reference things the implementation stage saw, without re-reading the entire codebase.

If a resume target is missing (the referenced stage didn't run), run_stage logs a warning and falls back to a fresh session.

The safety boundary

This is the part that matters. The agent can read code, write files, and run shell commands. But it cannot:

  • Run git add, git commit, git push, or any destructive git command
  • Create pull requests or issues
  • Post comments on GitHub

All of that happens in Python post-hooks and pipeline glue code. The agent returns structured data (a PR title, a commit message, a list of findings). Python takes that data and does the actual operations.

The boundary is enforced at the provider level. The Claude provider has a regex that blocks destructive git commands before they reach the shell.

Quality gates

The resolve and address pipelines run quality tools after implementation. If they fail, the agent gets up to three attempts to fix the issues. After three failed attempts, the resolve pipeline opens a draft PR noting the remaining problems. The address pipeline pushes what it has and notes the failures in its response comment.

Metrics

Every stage records timing, turn count, tool calls, and token usage. These get aggregated into a PipelineMetrics object and appended to PR bodies and comments as a collapsible <details> block. Good for understanding cost and spotting stages that take longer than expected.