Developer Guide
How to Build an AI Agent: Architecture, Frameworks, and Production Best Practices
Vendor-neutral architecture patterns, design decisions, and the production checklist. Focused on patterns, not code samples.
Before You Build
Before writing any code, answer these three questions honestly. They will save you weeks of wasted effort.
What specific problem are you solving?
"Improve customer support" is too vague. "Deflect 40% of tier-1 tickets about order status, shipping, and returns" is specific enough to build against.
Does it actually need an agent?
If the task has predictable inputs and outputs, a rule-based workflow, API integration, or simple RAG pipeline is cheaper, faster, and more reliable. Agents add value when reasoning, tool selection, or multi-step planning is required.
What is the success metric?
Define it before building. Deflection rate? Resolution time? Cost per interaction? Accuracy on a test set? Without a metric, you cannot evaluate whether the agent is working or iterate effectively.
Five Architecture Patterns
Simple Tool-Calling Agent
A single LLM with access to a set of tools (APIs, databases, functions). The model decides which tool to call based on the user's input, executes the call, and returns the result. No planning loop, no memory beyond the current conversation.
RAG Agent
Retrieval-augmented generation. The agent searches a knowledge base (vector store, document index) to find relevant context, then generates a response grounded in the retrieved documents. Can be combined with tool calling for actions.
ReAct Agent
The Reason-Act loop. The agent alternates between reasoning (thinking about what to do next) and acting (executing a tool call or action). After each action, it observes the result and reasons about the next step. Continues until the task is complete or a stopping condition is met.
Plan-and-Execute Agent
Two-phase approach. A planner agent first creates a structured plan (ordered list of steps with dependencies). An executor agent then works through the steps. If a step fails, the planner revises the remaining plan. Separates strategic thinking from tactical execution.
Multi-Agent Orchestration
A team of specialized agents coordinated by an orchestrator. Each agent has its own prompt, tools, and expertise. The orchestrator routes tasks, manages shared state, and synthesizes results. Can use different models for different agents (cheap models for simple tasks, expensive models for reasoning).
Key Design Decisions
Memory
Stateless (cheapest, simplest), session memory (conversation context), episodic memory (past interactions via vector store), or semantic memory (learned facts). Choose the minimum complexity that meets your requirements.
Model Selection
Use the cheapest model that achieves the required accuracy. GPT-4o-mini or Claude 3.5 Haiku for simple tool calls. GPT-4o or Claude 3.5 Sonnet for complex reasoning. Consider different models for different agent roles in multi-agent systems.
Tool Design
Keep tool descriptions clear and specific. Test with 5, 10, 15, and 20 tools to find where accuracy degrades. Group related tools. Consider dynamic tool loading (only surface relevant tools per query).
Human-in-the-Loop
Decide where human approval is required before building. High-impact actions (sending emails, modifying data) should require approval. Low-impact actions (searching, summarizing) can run autonomously.
Error Handling
Every tool call can fail. Define retry strategies, fallback behaviors, and escalation paths. Set maximum retries per tool and per task. Log errors for post-mortem analysis.
Cost Management
Set per-request and per-session token budgets. Implement circuit breakers that stop agent loops before they consume excessive resources. Monitor cost per interaction and alert on anomalies.
Production Readiness Checklist
What separates a demo from a production agent. Every item on this list has caused a production incident when skipped.
Observability
- ▢Full request/response logging
- ▢Token usage tracking per request
- ▢Latency monitoring per step
- ▢Error rate dashboards
- ▢Tool call success/failure rates
Evaluation
- ▢Automated test suite with golden examples
- ▢Accuracy measurement on held-out test set
- ▢Regression testing on prompt changes
- ▢A/B testing framework for improvements
- ▢User satisfaction measurement
Guardrails
- ▢Input validation and sanitization
- ▢Output content filtering
- ▢Confidence thresholds for responses
- ▢PII detection and redaction
- ▢Prompt injection defenses
Reliability
- ▢Graceful degradation on model API failure
- ▢Retry logic with exponential backoff
- ▢Circuit breakers on token budget
- ▢Fallback to simpler model if primary unavailable
- ▢Human escalation path
Safety and Governance
AI agents have more power than chatbots because they can take actions. That power comes with responsibility. Every production agent should address these five areas.
Prompt Injection Defense
Agents that process user input are vulnerable to prompt injection. Separate system prompts from user content. Validate tool call parameters. Never execute arbitrary code from user input.
Permission Scoping
Agents should have the minimum permissions needed. A support agent does not need write access to the billing system. A research agent does not need email-sending capability. Scope aggressively.
Audit Logging
Every agent action should be logged with timestamp, user context, tool called, parameters, and result. This is non-negotiable for regulated industries and critical for debugging.
PII Handling
If the agent processes personal data, ensure compliance with GDPR, CCPA, or relevant regulations. Implement PII detection and redaction in logs. Consider data residency requirements for LLM API calls.
Compliance
Regulated industries (finance, healthcare, legal) have specific requirements for AI systems. Consult compliance teams early. Document model selection, training data, and decision-making processes.