Practical ReAct Implementation
From Prototype to Production
The core ReAct loop is straightforward to understand but requires careful engineering to use in production. This file covers the practical concerns: error handling, prompt optimization, history management, and testing the test agent itself.
Robust Error Handling
A production ReAct agent must handle three categories of errors:
Category 1: Agent Errors (the agent makes a bad decision)
class AgentDecisionError(Exception):
"""Agent produced an unparseable or invalid action."""
pass
def parse_decision(self, raw_response: str) -> Action:
"""Parse the LLM's response into a structured action."""
raw = raw_response.strip()
# Handle malformed responses
if not raw:
raise AgentDecisionError("Empty response from LLM")
# Try to parse known action formats
for prefix in ["NAVIGATE", "CLICK", "TYPE", "ASSERT", "DONE"]:
if raw.upper().startswith(prefix):
return Action(type=prefix, payload=raw[len(prefix):].strip())
# If no known action matches, ask the LLM to retry
retry_prompt = f"""
Your previous response was not a valid action: "{raw}"
Please respond with exactly one of:
- NAVIGATE <url>
- CLICK <selector>
- TYPE <selector> <text>
- ASSERT <condition>
- DONE <pass|fail> <reason>
"""
retry_response = self.llm.generate(retry_prompt)
return self.parse_decision(retry_response) # One retry only
Category 2: Environment Errors (the system under test fails)
def execute_with_recovery(self, action: Action) -> ActionResult:
"""Execute an action with environment error recovery."""
try:
return self.execute(action)
except ElementNotFoundError as e:
# Try alternative selectors
alternatives = self.browser.find_similar(action.selector)
if alternatives:
return ActionResult(
status="recovered",
message=f"Original selector '{action.selector}' not found. "
f"Found alternatives: {alternatives}",
alternatives=alternatives
)
return ActionResult(status="failed", message=str(e))
except TimeoutError:
# Take a screenshot for debugging
screenshot = self.browser.screenshot(f"timeout_step_{self.step_count}.png")
return ActionResult(
status="timeout",
message="Action timed out",
screenshot=screenshot
)
except NavigationError as e:
# Page failed to load
return ActionResult(
status="nav_error",
message=f"Navigation failed: {e}",
url=self.browser.current_url()
)
Category 3: Infrastructure Errors (the agent infrastructure fails)
def run_with_infrastructure_safety(self, objective: str) -> TestResult:
"""Run a test with infrastructure-level safety nets."""
try:
return self.run(objective)
except LLMRateLimitError:
return TestResult(status="ABORTED", reason="LLM rate limit exceeded")
except LLMTimeoutError:
return TestResult(status="ABORTED", reason="LLM response timeout")
except BrowserCrashError:
return TestResult(status="ABORTED", reason="Browser process crashed")
except Exception as e:
# Catch-all for unexpected errors
return TestResult(
status="ERROR",
reason=f"Unexpected infrastructure error: {type(e).__name__}: {e}",
screenshot=self.safe_screenshot()
)
Prompt Optimization for Speed and Quality
The THINK phase prompt is the most critical component. Every extra token in the prompt costs money and latency.
Minimal Prompt (Fast, Cheap, Less Accurate)
minimal_prompt = f"""
Goal: {objective}
URL: {observation.url}
Text: {observation.text[:500]}
Last action: {self.history[-1] if self.history else 'none'}
Next action (NAVIGATE/CLICK/TYPE/ASSERT/DONE):"""
Use when: Smoke tests, high-confidence flows, cost-sensitive environments.
Standard Prompt (Balanced)
standard_prompt = f"""
Objective: {objective}
Current URL: {observation.url}
Page content (truncated): {observation.text[:2000]}
Console errors: {observation.errors}
History: {self.history[-5:]}
What should I do next? Choose one:
- NAVIGATE <url>
- CLICK <selector>
- TYPE <selector> <text>
- ASSERT <condition>
- DONE <pass|fail> <reason>
"""
Use when: Standard testing, staging environments.
Detailed Prompt (Slow, Expensive, Most Accurate)
detailed_prompt = f"""
You are a senior QA engineer testing: {objective}
CURRENT STATE:
URL: {observation.url}
Title: {observation.title}
Visible text: {observation.text[:3000]}
Visible buttons: {observation.buttons}
Form fields: {observation.form_fields}
Console errors: {observation.errors}
Network failures: {observation.network_errors}
TEST HISTORY (last 10 steps):
{json.dumps(self.history[-10:], indent=2)}
CONSTRAINTS:
- Steps remaining: {self.max_steps - self.step_count}
- Allowed domains: {self.config.allowed_domains}
- Do not click delete/remove buttons unless the objective requires it
REASONING REQUIRED:
1. Analyze the current state
2. Identify what has changed since the last step
3. Decide the most efficient next action toward the objective
4. Explain your reasoning in one sentence
ACTION (choose one):
NAVIGATE <url>
CLICK <selector>
TYPE <selector> <text>
ASSERT <condition>
DONE <pass|fail> <reason>
"""
Use when: Complex flows, exploratory testing, debugging sessions.
History Management Strategies
The history buffer directly affects agent performance. Too little history causes loops; too much wastes tokens.
Strategy 1: Fixed Window
# Keep the last N steps
self.history = self.history[-5:]
Simple and predictable. Works well for tests under 15 steps.
Strategy 2: Summarized History
def summarize_history(self) -> str:
"""Compress history into a summary to save tokens."""
if len(self.history) <= 3:
return json.dumps(self.history)
# Keep first step, summary of middle, and last 2 steps
summary = f"Started at {self.history[0]['state_url']}. "
summary += f"Took {len(self.history) - 3} intermediate steps. "
if any(h.get('result') == 'error' for h in self.history[1:-2]):
summary += "Encountered errors along the way. "
summary += f"Recent actions: {json.dumps(self.history[-2:])}"
return summary
Saves tokens while preserving context for long-running agents.
Strategy 3: Milestone-Based History
def milestone_history(self) -> list[dict]:
"""Keep only steps that represent significant state changes."""
milestones = []
last_url = None
for step in self.history:
# URL changed = navigation milestone
if step.get("state_url") != last_url:
milestones.append(step)
last_url = step.get("state_url")
# Error occurred = error milestone
elif step.get("result") == "error":
milestones.append(step)
# Assertion made = assertion milestone
elif "ASSERT" in step.get("action", ""):
milestones.append(step)
return milestones[-5:] # Keep last 5 milestones
Best for long multi-page workflows where only navigation and assertions matter.
Testing the Test Agent
An often-overlooked concern: how do you test your ReAct agent itself?
Unit Testing the Decision Parser
class TestDecisionParser:
def test_parses_navigate_command(self):
action = agent.parse_decision("NAVIGATE https://example.com")
assert action.type == "NAVIGATE"
assert action.payload == "https://example.com"
def test_parses_click_with_complex_selector(self):
action = agent.parse_decision('CLICK button[data-testid="submit"]')
assert action.type == "CLICK"
assert action.payload == 'button[data-testid="submit"]'
def test_handles_malformed_response(self):
with pytest.raises(AgentDecisionError):
agent.parse_decision("I think we should click the button")
def test_handles_empty_response(self):
with pytest.raises(AgentDecisionError):
agent.parse_decision("")
Integration Testing with Mock LLM
class TestReActAgentFlow:
def test_completes_login_flow(self, mock_browser):
"""Agent should complete a login flow with a scripted LLM."""
mock_llm = ScriptedLLM([
'NAVIGATE https://app.example.com/login',
'TYPE input[name=email] test@test.com',
'TYPE input[name=password] secret',
'CLICK button[type=submit]',
'DONE pass "Successfully navigated to dashboard"',
])
agent = ReActTestAgent(llm=mock_llm, browser=mock_browser)
result = agent.run("Log in with valid credentials")
assert result.status == "pass"
assert result.steps_taken == 5
def test_times_out_on_stuck_flow(self, mock_browser):
"""Agent should timeout if it cannot complete the objective."""
mock_llm = ScriptedLLM([
'CLICK #nonexistent-button', # Repeated forever
] * 20)
agent = ReActTestAgent(llm=mock_llm, browser=mock_browser, max_steps=5)
result = agent.run("Click the submit button")
assert result.status == "TIMEOUT"
assert result.steps_taken == 5
Regression Testing Agent Behavior
Record agent decisions on a known-good run, then replay to detect regressions:
class TestAgentRegression:
def test_login_agent_matches_baseline(self, real_browser, real_llm):
"""Agent decisions should not drift from established baseline."""
agent = ReActTestAgent(llm=real_llm, browser=real_browser)
result = agent.run("Log in with test@test.com / password123")
# Save the decision sequence
decisions = [h["action"] for h in agent.history]
# Compare against baseline (saved from a known-good run)
baseline = load_baseline("login_test_baseline.json")
# Allow some flexibility (exact decisions may vary)
assert result.status == baseline["status"]
assert len(decisions) <= baseline["max_steps"] * 1.5 # Allow 50% more steps
Performance Benchmarks
| Configuration | Steps | LLM Latency | Total Time | Token Cost |
|---|---|---|---|---|
| Minimal prompt, Haiku | 8 | 100ms/step | 3s | $0.005 |
| Standard prompt, Sonnet | 12 | 300ms/step | 8s | $0.04 |
| Detailed prompt, Opus | 15 | 500ms/step | 15s | $0.15 |
Key Takeaway
A production ReAct agent requires more than the core loop. It needs robust error handling (agent errors, environment errors, infrastructure errors), optimized prompts for the right speed/quality tradeoff, smart history management to prevent token waste, and -- critically -- tests for the agent itself. The engineering around the loop is as important as the loop itself.