The Orchestrator Pattern
Why Single Agents Hit Limits
Single agents hit limitations: they lose context over long sessions, they cannot parallelize, and they conflate generation with evaluation. Multi-agent systems solve this by splitting responsibilities.
The Orchestrator pattern uses one coordinator agent to delegate work to specialist agents and merge their results. It is the most structured and predictable multi-agent pattern.
Architecture
+------------------+
| ORCHESTRATOR |
| (Coordinator) |
+--------+---------+
|
+--------------+--------------+
| | |
+-----+-----+ +-----+-----+ +-----+-----+
| Agent: | | Agent: | | Agent: |
| UI Tests | | API Tests | | Perf Tests|
+-----------+ +-----------+ +-----------+
The orchestrator:
- Receives a feature specification or test plan
- Analyzes it to determine which test types are needed
- Delegates scenarios to specialist agents
- Collects and merges results
- Reports a unified test outcome
Implementation
class OrchestratorAgent:
def __init__(self, specialists: dict[str, Agent]):
self.specialists = specialists # {"ui": UIAgent, "api": APIAgent, ...}
self.llm = get_llm()
def plan_and_execute(self, feature_spec: str) -> TestReport:
# Step 1: Analyze the spec and create a test plan
plan = self.llm.generate(f"""
Given this feature specification:
{feature_spec}
Determine which test types are needed:
- UI tests (if there are user-facing changes)
- API tests (if there are endpoint changes)
- Performance tests (if there are SLA requirements)
- Security tests (if there are auth/data changes)
Output a JSON plan: {{"ui": [...scenarios], "api": [...scenarios], ...}}
""")
# Step 2: Delegate to specialists
results = {}
for agent_type, scenarios in plan.items():
if agent_type not in self.specialists:
results[agent_type] = {"skipped": f"No specialist for {agent_type}"}
continue
specialist = self.specialists[agent_type]
results[agent_type] = specialist.execute_scenarios(scenarios)
# Step 3: Merge and resolve conflicts
return self.merge_results(results)
def merge_results(self, results: dict) -> TestReport:
"""Merge results from multiple specialists into a unified report."""
all_tests = []
all_failures = []
all_coverage = {}
for agent_type, agent_results in results.items():
if isinstance(agent_results, dict) and "skipped" in agent_results:
continue
all_tests.extend(agent_results.tests)
all_failures.extend(agent_results.failures)
all_coverage[agent_type] = agent_results.coverage
return TestReport(
total_tests=len(all_tests),
total_failures=len(all_failures),
results_by_type=results,
coverage=all_coverage,
overall_status="FAIL" if all_failures else "PASS"
)
Specialist Agent Design
Each specialist agent is optimized for its domain:
UI Test Specialist
class UITestSpecialist(Agent):
def __init__(self, browser, llm):
self.browser = browser
self.llm = llm
def execute_scenarios(self, scenarios: list[str]) -> AgentResults:
results = []
for scenario in scenarios:
# Each scenario is a natural language description
# e.g., "Verify the checkout button is disabled when cart is empty"
result = self.react_loop(
objective=scenario,
tools=["navigate", "click", "type", "text", "screenshot"],
max_steps=20
)
results.append(result)
return AgentResults(tests=results, failures=[r for r in results if r.failed])
API Test Specialist
class APITestSpecialist(Agent):
def __init__(self, base_url: str, llm):
self.base_url = base_url
self.llm = llm
def execute_scenarios(self, scenarios: list[str]) -> AgentResults:
results = []
for scenario in scenarios:
# e.g., "Verify POST /orders returns 400 when quantity is 0"
result = self.execute_api_test(scenario)
results.append(result)
return AgentResults(tests=results, failures=[r for r in results if r.failed])
def execute_api_test(self, scenario: str) -> TestResult:
# Use LLM to determine the HTTP request
request_spec = self.llm.generate(f"""
Scenario: {scenario}
Base URL: {self.base_url}
Generate the HTTP request as JSON:
{{"method": "...", "path": "...", "headers": {{...}}, "body": {{...}}}}
And the expected response:
{{"status": ..., "body_contains": [...], "body_not_contains": [...]}}
""")
# Execute and evaluate
response = httpx.request(
method=request_spec["method"],
url=f"{self.base_url}{request_spec['path']}",
headers=request_spec.get("headers", {}),
json=request_spec.get("body")
)
return self.evaluate_response(response, request_spec["expected"])
Orchestrator Communication Protocol
The orchestrator and specialists communicate through structured messages:
@dataclass
class TaskAssignment:
"""Message from orchestrator to specialist."""
task_id: str
agent_type: str # "ui", "api", "perf", "security"
scenarios: list[str] # Natural language test scenarios
constraints: dict # Time budget, step limits, etc.
priority: int # 0=low, 1=normal, 2=high
@dataclass
class TaskResult:
"""Message from specialist to orchestrator."""
task_id: str
agent_type: str
tests_executed: int
tests_passed: int
tests_failed: int
failures: list[dict] # {scenario, reason, screenshot}
duration_seconds: float
tokens_used: int
When to Use the Orchestrator Pattern
Best for:
- Multi-layer testing (UI + API + performance in one run)
- Feature-level test orchestration (one feature, multiple test types)
- Centralized reporting across test domains
- Teams with distinct testing specialties
Risks:
- The orchestrator becomes a bottleneck if it makes poor delegation decisions
- Single point of failure: if the orchestrator agent fails, all testing stops
- Over-engineering: for simple test suites, a single agent is simpler
Mitigation:
- Always log the delegation rationale for debugging
- Implement fallback: if a specialist fails, the orchestrator continues with others
- Set per-specialist timeouts so one slow specialist does not block the report
Orchestrator vs Manual Test Coordination
| Aspect | Manual Coordination | Orchestrator Agent |
|---|---|---|
| Test planning | QA lead writes test plan | Agent analyzes spec, generates plan |
| Delegation | Lead assigns to team members | Agent delegates to specialists |
| Parallel execution | Depends on team availability | All specialists run concurrently |
| Result aggregation | Lead manually collects results | Agent merges automatically |
| Consistency | Varies by team member | Consistent (same prompts, same standards) |
| Speed | Hours to days | Minutes |
| Adaptability | Manual re-planning | Agent re-plans if specialists report issues |
Key Takeaway
The Orchestrator pattern is the most predictable multi-agent architecture. It maps naturally to how QA teams already work (lead + specialists) but executes at machine speed. The key design decision is how much intelligence to put in the orchestrator (plan generation, conflict resolution) versus the specialists (domain expertise, test execution).