Architecture¶

The core split¶

Developer is built around one idea: the agent reads and reasons, Python acts. The agent can explore code, understand context, write files, and produce structured output. But it can never push code, create PRs, post comments, or file issues. That's Python's job.

This isn't just a preference. It's enforced at the AgentProvider base class level. AgentProvider provides a concrete check_bash_command method that blocks destructive git commands with a regex filter before they reach the shell:

_DENIED_GIT_PATTERN = re.compile(
    r"\bgit\s+"
    r"(add|commit|push|reset|checkout|merge|rebase|stash|cherry-pick|revert|tag|branch\s+-[dDmM])\b"
)

Every provider inherits this method. The ClaudeProvider wires it into its can_use_tool permission callback, and any new provider must integrate the same check into its own tool-permission layer.

The agent works in a git worktree. It can read anything, write files, run quality tools. When it's done, Python inspects what changed, runs git diff, and decides whether to commit and push based on the pipeline's routing logic.

Provider abstraction¶

The pipeline code doesn't talk to Claude directly. It talks to an AgentProvider interface defined in core/agent.py:

class AgentProvider(ABC):
    async def run(
        self,
        prompt: str,
        *,
        cwd: str | Path,
        allowed_tools: list[str],
        max_turns: int = 30,
        output_model: type[BaseModel] | None = None,
        resume: str | None = None,
        fork_session: bool = False,
        disallowed_tools: list[str] | None = None,
    ) -> RunResult: ...

RunResult carries the structured output, a session ID (for resuming later), and metrics. Implementations include ClaudeProvider (core/providers/claude.py) and PiProvider (core/providers/pi.py).

The claude-agent-sdk is an optional extra, installed via developer[claude]. The Pi provider is also available. Install provider extras as needed.

Adding a new provider means implementing AgentProvider and wiring it into get_provider(). The pipeline stages, templates, routing functions -- none of that changes.

Provider selection is configured in developer.yaml. The provider field sets the default provider for all stages. The providers field allows per-stage overrides, so you can use different providers for different stages (e.g., Pi for triage and Claude for implementation). The CLI --provider flag overrides both, forcing a single provider for the entire run.

Worktrees¶

Developer clones target repos into a repos/ directory at the project root (gitignored). On each run, it pulls latest from the remote. For actual work, it creates git worktrees from these clones.

Worktrees give you isolation. Multiple pipelines can run against the same repo concurrently without stepping on each other. Each worktree is a separate working directory with its own checked-out branch, but they share the same git objects. Cheap to create, cheap to clean up.

The resolve pipeline creates a worktree, does its work, and when the pipeline is done, Python commits and pushes from that worktree. The review pipeline creates a read-only worktree just so the agent has files to explore.

Structured output¶

Most stages use Pydantic models for structured output. The agent returns JSON that gets validated against the model. If validation fails, the agent gets another turn to fix the output.

The models live in src/developer/models/. Some examples:

TriageResult -- the triage decision and reasoning
EvalResult -- pass or needs_context
PRResult -- branch name, commit message, PR body
AuditReport -- list of findings with titles, descriptions, and labels
FeedbackAnalysis -- actionable items and out-of-scope items

Routing functions inspect these structured outputs to decide what happens next. This is how a single pipeline definition handles multiple paths through the flow.

Templates¶

Prompts are Jinja2 templates in src/developer/templates/. Each pipeline has its own subdirectory. Two shared includes show up in most templates:

_personality.md.j2 -- writing style rules (no AI tells, conversational tone)
_labels.md.j2 -- canonical GitHub label set

Templates receive the pipeline's context dict, which accumulates data as stages run. A later stage's template can reference anything an earlier stage produced.

Metrics¶

Every stage records:

Duration (wall clock)
Turn count (how many agent turns)
Tool call count
Input and output tokens
Cache hit tokens

These get collected into a PipelineMetrics object and formatted as a collapsible <details> block that Python appends to PR bodies and comments. Useful for tracking cost and spotting slow stages.

GitHub integration¶

Developer authenticates as a GitHub App using JWT-based auth. The App ID, private key, and installation ID come from .env. On startup, it exchanges the JWT for a short-lived installation token and passes that to githubkit for API calls.

The bot's commit email is constructed from the app's slug and user ID, so commits show up as authored by the bot account. PRs, comments, and issues are all created through the GitHub API, never through git.