The Orchestrator Pattern

Why Single Agents Hit Limits

Single agents hit limitations: they lose context over long sessions, they cannot parallelize, and they conflate generation with evaluation. Multi-agent systems solve this by splitting responsibilities.

The Orchestrator pattern uses one coordinator agent to delegate work to specialist agents and merge their results. It is the most structured and predictable multi-agent pattern.

Architecture

                    +------------------+
                    |   ORCHESTRATOR   |
                    |   (Coordinator)  |
                    +--------+---------+
                             |
              +--------------+--------------+
              |              |              |
        +-----+-----+  +-----+-----+  +-----+-----+
        |  Agent:   |  |  Agent:   |  |  Agent:   |
        |  UI Tests |  | API Tests |  | Perf Tests|
        +-----------+  +-----------+  +-----------+

The orchestrator:

Receives a feature specification or test plan
Analyzes it to determine which test types are needed
Delegates scenarios to specialist agents
Collects and merges results
Reports a unified test outcome

Implementation

class OrchestratorAgent:
    def __init__(self, specialists: dict[str, Agent]):
        self.specialists = specialists  # {"ui": UIAgent, "api": APIAgent, ...}
        self.llm = get_llm()

    def plan_and_execute(self, feature_spec: str) -> TestReport:
        # Step 1: Analyze the spec and create a test plan
        plan = self.llm.generate(f"""
        Given this feature specification:
        {feature_spec}

        Determine which test types are needed:
        - UI tests (if there are user-facing changes)
        - API tests (if there are endpoint changes)
        - Performance tests (if there are SLA requirements)
        - Security tests (if there are auth/data changes)

        Output a JSON plan: {{"ui": [...scenarios], "api": [...scenarios], ...}}
        """)

        # Step 2: Delegate to specialists
        results = {}
        for agent_type, scenarios in plan.items():
            if agent_type not in self.specialists:
                results[agent_type] = {"skipped": f"No specialist for {agent_type}"}
                continue
            specialist = self.specialists[agent_type]
            results[agent_type] = specialist.execute_scenarios(scenarios)

        # Step 3: Merge and resolve conflicts
        return self.merge_results(results)

    def merge_results(self, results: dict) -> TestReport:
        """Merge results from multiple specialists into a unified report."""
        all_tests = []
        all_failures = []
        all_coverage = {}

        for agent_type, agent_results in results.items():
            if isinstance(agent_results, dict) and "skipped" in agent_results:
                continue
            all_tests.extend(agent_results.tests)
            all_failures.extend(agent_results.failures)
            all_coverage[agent_type] = agent_results.coverage

        return TestReport(
            total_tests=len(all_tests),
            total_failures=len(all_failures),
            results_by_type=results,
            coverage=all_coverage,
            overall_status="FAIL" if all_failures else "PASS"
        )

Specialist Agent Design

Each specialist agent is optimized for its domain:

UI Test Specialist

class UITestSpecialist(Agent):
    def __init__(self, browser, llm):
        self.browser = browser
        self.llm = llm

    def execute_scenarios(self, scenarios: list[str]) -> AgentResults:
        results = []
        for scenario in scenarios:
            # Each scenario is a natural language description
            # e.g., "Verify the checkout button is disabled when cart is empty"
            result = self.react_loop(
                objective=scenario,
                tools=["navigate", "click", "type", "text", "screenshot"],
                max_steps=20
            )
            results.append(result)
        return AgentResults(tests=results, failures=[r for r in results if r.failed])

API Test Specialist

class APITestSpecialist(Agent):
    def __init__(self, base_url: str, llm):
        self.base_url = base_url
        self.llm = llm

    def execute_scenarios(self, scenarios: list[str]) -> AgentResults:
        results = []
        for scenario in scenarios:
            # e.g., "Verify POST /orders returns 400 when quantity is 0"
            result = self.execute_api_test(scenario)
            results.append(result)
        return AgentResults(tests=results, failures=[r for r in results if r.failed])

    def execute_api_test(self, scenario: str) -> TestResult:
        # Use LLM to determine the HTTP request
        request_spec = self.llm.generate(f"""
        Scenario: {scenario}
        Base URL: {self.base_url}

        Generate the HTTP request as JSON:
        {{"method": "...", "path": "...", "headers": {{...}}, "body": {{...}}}}
        And the expected response:
        {{"status": ..., "body_contains": [...], "body_not_contains": [...]}}
        """)

        # Execute and evaluate
        response = httpx.request(
            method=request_spec["method"],
            url=f"{self.base_url}{request_spec['path']}",
            headers=request_spec.get("headers", {}),
            json=request_spec.get("body")
        )

        return self.evaluate_response(response, request_spec["expected"])

Orchestrator Communication Protocol

The orchestrator and specialists communicate through structured messages:

@dataclass
class TaskAssignment:
    """Message from orchestrator to specialist."""
    task_id: str
    agent_type: str         # "ui", "api", "perf", "security"
    scenarios: list[str]    # Natural language test scenarios
    constraints: dict       # Time budget, step limits, etc.
    priority: int           # 0=low, 1=normal, 2=high

@dataclass
class TaskResult:
    """Message from specialist to orchestrator."""
    task_id: str
    agent_type: str
    tests_executed: int
    tests_passed: int
    tests_failed: int
    failures: list[dict]    # {scenario, reason, screenshot}
    duration_seconds: float
    tokens_used: int

When to Use the Orchestrator Pattern

Best for:

Multi-layer testing (UI + API + performance in one run)
Feature-level test orchestration (one feature, multiple test types)
Centralized reporting across test domains
Teams with distinct testing specialties

Risks:

The orchestrator becomes a bottleneck if it makes poor delegation decisions
Single point of failure: if the orchestrator agent fails, all testing stops
Over-engineering: for simple test suites, a single agent is simpler

Mitigation:

Always log the delegation rationale for debugging
Implement fallback: if a specialist fails, the orchestrator continues with others
Set per-specialist timeouts so one slow specialist does not block the report

Orchestrator vs Manual Test Coordination

Aspect	Manual Coordination	Orchestrator Agent
Test planning	QA lead writes test plan	Agent analyzes spec, generates plan
Delegation	Lead assigns to team members	Agent delegates to specialists
Parallel execution	Depends on team availability	All specialists run concurrently
Result aggregation	Lead manually collects results	Agent merges automatically
Consistency	Varies by team member	Consistent (same prompts, same standards)
Speed	Hours to days	Minutes
Adaptability	Manual re-planning	Agent re-plans if specialists report issues

Key Takeaway

The Orchestrator pattern is the most predictable multi-agent architecture. It maps naturally to how QA teams already work (lead + specialists) but executes at machine speed. The key design decision is how much intelligence to put in the orchestrator (plan generation, conflict resolution) versus the specialists (domain expertise, test execution).