Skip to content

Meet Graphite Agent — your collaborative AI reviewer, built right into your PR page.

Read more

Building AI agents with Graphite

Greg Foster
Greg Foster
Graphite software engineer
Try Graphite

Table of contents

Based on OpenAI's framework, an AI agent is a system that orchestrates workflows autonomously, not just responding to single commands—it executes tasks multi-step, acting on users' behalf. It goes beyond static LLM calls or simple chatbots.

  • Multi-step workflows that the agent orchestrates autonomously
  • Ability to react to events, take corrective action, retry, chain tools
  • Guardrails to prevent harmful or unintended actions

Use agents when:

  • Workflows are complex, multi-turn, multistage (like booking, triaging, code changes)
  • Tasks require dynamic decision-making and tool use (e.g., API calls, database updates)
  • Automation frees your team from repetitive or error-prone flows, with safety controls

OpenAI's guide gives frameworks to pick candidate use cases effectively.

First design logic: decompose your workflow into discrete steps, with clear inputs, outputs, success signals, and error paths.

  1. Trigger (event or schedule)
  2. Action (call LLM, call external service, call tools)
  3. Decision logic (branching based on LLM output)
  4. Error handling (retry, fallback, guardrail)
  5. Termination / report (summarize what happened)

Apply strong guardrails at each point: verify outputs, ensure compliance, detect harmful language—OpenAI emphasizes safety and predictability.

Use Graphite's CLI ("gt" commands) to manage and split your agent's code into reviewable diffs. Build incrementally, test each component—trigger, action handlers, safety logic—as separate stacked PRs for review clarity.

As you build your agent, Graphite Agent:

  • Reviews your workflows' code for logic bugs, security risks, style issues.
  • Flags issues like inverted logic, naming issues, or unexpected side effects.
  • Integrates seamlessly with GitHub for inline feedback and 1-click fixes.

Graphite's embedded chat helps you review agent code and reason about agent behavior directly in the PR.

Track agent development velocity, PR feedback loops, performance. Stacked diffs + insights give visibility into your agent's evolution.

  1. Slack event triggers a new support request
  2. Agent summarizes ticket via LLM
  3. Checks urgency via keyword/LLM scoring
  4. Routes to appropriate team (e.g., ops, security) or triggers escalation
  5. Logs actions, sends acknowledgement, retries if Slack call fails
StepDescription
1. Sketch modulesDefine fetch_event, summarize_ticket, score, route, acknowledge, error_handler
2. Split diffsUse stacked diffs: one PR for fetch_event, next for summarize_ticket, etc. Review incrementally
3. Code modulesEach handler is a function: fetch_event() listens to Slack; summarize_ticket(text) → summary
4. Add guardrailsEnforce max length, filter profanity, fallback to default team if unsure
5. Graphite Agent reviewGraphite Agent flags logic like "if score < 0.5" vs "> 0.5" inverted, suggests clearer names
6. Test & observeTest events, watch logs, track errors. Insights show PR cycle duration, feedback speed
7. Finalize stackMerge the stacks once all review feedback is resolved and agent behavior is verified

Break your AI agent into logical components using Graphite's stacked diff workflow:

  • Separate concerns: Create individual PRs for trigger logic, LLM integration, error handling, and monitoring
  • Test incrementally: Each stacked diff can be tested independently before combining
  • Review complexity: Smaller PRs make it easier for Graphite Agent to catch agent-specific issues like infinite loops or missing error handling

Graphite Agent understands your entire codebase context, making it particularly valuable for AI agents:

  • Guardrail validation: Graphite Agent can flag missing input validation, unsafe API calls, or insufficient error handling in agent workflows
  • Logic consistency: Catches inverted conditions, missing edge cases, and potential race conditions in multi-step agent processes
  • Security patterns: Identifies potential vulnerabilities in agent interactions with external services or data processing

When building complex agent workflows:

  • Behavioral analysis: Use Graphite Chat to discuss agent decision-making logic and edge cases
  • Prompt engineering: Collaborate on LLM prompt optimization and response validation strategies
  • Integration testing: Debug how your agent interacts with external APIs and services

Track your agent development velocity and quality:

  • Development metrics: Monitor how quickly you can iterate on agent components
  • Review efficiency: Track how Graphite Agent's feedback improves your agent code quality
  • Deployment readiness: Use PR cycle metrics to ensure your agent is thoroughly tested before deployment

Customize Graphite Agent's rules for AI agent development:

  • Error handling: Enforce comprehensive error handling for all external service calls
  • Logging standards: Require structured logging for agent decisions and actions
  • Resource management: Ensure proper cleanup of connections, memory, and temporary data
  • Testing requirements: Mandate unit tests for each agent component and integration tests for workflows

You can use any language with OpenAI-compatible HTTP or SDK support—common choices include JavaScript (Node.js), Python, or TypeScript. Graphite's CLI and Graphite Agent work language-agnostic as long as the code is in a supported repo.

Implement multi-layer validation—check LLM output length, filter profanity, verify predictions against thresholds, fallback on failures, and log context. Graphite Agent can help flag logic gaps or unexpected behavior in how guardrails are implemented.

Yes. You can chain or multiplex calls—e.g., use one LLM for summarization, another for severity scoring, or call internal/external APIs. Just segment each step clearly in your stacked diffs and test them independently.

Use mock events for each step, run unit tests for handlers, simulate failure cases (e.g., API timeouts), and monitor outcomes via logs. Insights will highlight slow or failing PR cycles—great for optimizing dev workflow. You can also iterate quickly with stacked diffs to fix issues in isolated modules.

Deploy agent components in incremental stages—use canary or beta flags, log every decision, track failure rates, and roll back quickly if needed. Graphite integrates with your repo's CI/CD and gives observability so you can see how code changes affect agent behavior.

Design your handlers to be stateless and idempotent, use queues or async processing if needed, monitor latency, error rates, and cost per action. Insights help highlight where performance bottlenecks or high-latency flows are creeping in.

Built for the world's fastest engineering teams, now available for everyone