The ReAct Core Loop for Testing
From Scripts to Agents: A Fundamental Shift
Traditional test automation is imperative: you write step-by-step scripts that execute deterministically. Agentic testing is fundamentally different. An agent observes the state of the system, reasons about what to do next, acts, and evaluates the outcome -- all within a loop that can adapt to unexpected situations.
ReAct (Reason + Act) is the foundational pattern for agentic systems. Originally described for general-purpose LLM agents, it maps directly to testing workflows.
The Four Phases
+---------------------------------------------------+
| |
| OBSERVE --> THINK --> ACT --> EVALUATE --> LOOP |
| | | | | |
| Read page Decide Execute Check if |
| state, what to the test assertion |
| logs, test action passed or |
| errors next failed |
| |
+---------------------------------------------------+
Phase 1: OBSERVE
The agent gathers information about the current state of the system under test. In browser testing, this means reading the page content, URL, console errors, and network activity. In API testing, it means reading response status codes, headers, and bodies.
class Observation:
url: str # Current page URL
text: str # Visible text content (truncated)
errors: list[str] # Console errors
network_requests: list[dict] # Recent HTTP requests
screenshot_path: str | None # Path to current screenshot
page_title: str # Document title
Key insight: The quality of the observation determines the quality of the agent's reasoning. A rich observation (text + errors + network) produces better decisions than a sparse one (just the URL).
Phase 2: THINK
The agent analyzes the observation and decides what to do next. This is where the LLM's reasoning ability is essential. It considers:
- The test objective (what are we trying to verify?)
- The current state (where are we now?)
- The history (what have we already tried?)
- The constraints (how many steps remain? what actions are allowed?)
prompt = f"""
Objective: {test_objective}
Current URL: {observation.url}
Page content (truncated): {observation.text[:2000]}
Console errors: {observation.errors}
History: {self.history[-5:]} # Last 5 actions
What should I do next? Choose one:
- NAVIGATE <url>
- CLICK <selector>
- TYPE <selector> <text>
- ASSERT <condition>
- DONE <pass|fail> <reason>
"""
decision = self.llm.generate(prompt)
Phase 3: ACT
The agent executes the decided action against the system under test. This is the only phase that changes state.
def execute(self, decision: str) -> ActionResult:
if decision.startswith("NAVIGATE"):
url = decision.split(" ", 1)[1]
self.browser.navigate(url)
elif decision.startswith("CLICK"):
selector = decision.split(" ", 1)[1]
self.browser.click(selector)
elif decision.startswith("TYPE"):
parts = decision.split(" ", 2)
self.browser.type(parts[1], parts[2])
elif decision.startswith("ASSERT"):
condition = decision.split(" ", 1)[1]
return self.evaluate_assertion(condition)
elif decision.startswith("DONE"):
return self.parse_final_result(decision)
Phase 4: EVALUATE
After acting, the agent evaluates whether the action succeeded and whether the test objective is met. If the objective is not yet met, the loop continues.
if result.is_terminal:
return TestResult(
status=result.status,
reason=result.reason,
steps_taken=self.step_count,
history=self.history
)
# Otherwise, loop back to OBSERVE
A Complete ReAct Agent Implementation
class ReActTestAgent:
def __init__(self, llm, browser, max_steps=20):
self.llm = llm
self.browser = browser
self.max_steps = max_steps
self.history = []
def run(self, test_objective: str) -> TestResult:
"""Execute a test using the ReAct loop."""
for step in range(self.max_steps):
# OBSERVE
observation = self.browser.get_state()
# THINK
prompt = f"""
Objective: {test_objective}
Current URL: {observation.url}
Page content (truncated): {observation.text[:2000]}
Console errors: {observation.errors}
History: {self.history[-5:]}
What should I do next? Choose one:
- NAVIGATE <url>
- CLICK <selector>
- TYPE <selector> <text>
- ASSERT <condition>
- DONE <pass|fail> <reason>
"""
decision = self.llm.generate(prompt)
# ACT
action_result = self.execute(decision)
self.history.append({
"step": step,
"observation": observation.summary,
"decision": decision,
"result": action_result
})
# EVALUATE
if decision.startswith("DONE"):
return self.parse_result(decision)
return TestResult(
status="TIMEOUT",
reason=f"Exceeded {self.max_steps} steps"
)
Why ReAct Matters for Testing
The key insight: the agent does not follow a fixed script. It adapts its next action based on what it observes. This creates capabilities that deterministic scripts cannot match:
| Capability | Script-Based Testing | ReAct Agent Testing |
|---|---|---|
| Missing element | Test fails immediately | Agent looks for alternative selector |
| Unexpected popup | Test fails or blocks | Agent dismisses popup and continues |
| Slow page load | Timeout (fixed wait) | Agent observes loading state, waits intelligently |
| Changed layout | Test fails (selector broken) | Agent reasons about new layout |
| Error state | Test follows happy path only | Agent investigates the error |
| New feature | Test ignores it | Agent notices and can explore |
Example: Adaptive Error Investigation
Step 1: OBSERVE — Login page loaded, email and password fields visible
Step 2: THINK — "I should fill in valid credentials and click login"
Step 3: ACT — Type email, type password, click submit
Step 4: OBSERVE — Error banner appeared: "Account locked after 5 failed attempts"
Step 5: THINK — "Unexpected error. This is not a test failure, it is test
environment state. The account was locked by a previous test run.
I should report this as an environment issue."
Step 6: ACT — DONE fail "Account locked - environment state issue, not a bug"
A deterministic script would simply fail with "Expected dashboard URL, got login URL." The agent provides diagnostic information.
The History Buffer
The history buffer is critical for preventing infinite loops and enabling learning within a test run.
# Keep the last N steps to fit in context window
self.history = self.history[-5:]
# Each history entry includes:
{
"step": 3,
"observation": "Page shows login form with email/password fields",
"decision": "TYPE input[name=email] test@example.com",
"result": "success"
}
Without history: The agent might click the same button repeatedly, not remembering it already tried that.
With history: The agent reasons, "I already tried clicking the submit button and it did not navigate. Let me check if there is a validation error I missed."
Limitations of ReAct for Testing
Non-deterministic. The same test objective may take different paths on different runs. This is a feature for exploration but a problem for regression testing.
Expensive. Each step requires an LLM call. A 20-step test costs 20x more in tokens than a script.
Slow. LLM latency (100-500ms per call) adds up. A 20-step agent takes 2-10 seconds just for reasoning, plus action execution time.
Hallucination risk. The agent might "see" elements that do not exist or misinterpret page content.
Debugging difficulty. When a ReAct agent fails, you have to trace through its reasoning history to understand why. This is harder than debugging a linear script.
When to Use ReAct vs Scripts
| Use ReAct | Use Scripts |
|---|---|
| Exploratory testing | Regression testing |
| New feature discovery | Known flow verification |
| Self-healing test suites | CI/CD gate tests |
| Complex multi-step workflows | Simple CRUD operations |
| Environments with frequent UI changes | Stable environments |
Key Takeaway
The ReAct loop is the building block of all agentic testing. It transforms testing from "follow these exact steps" to "achieve this objective by adapting to what you observe." Master this pattern and you understand the foundation of every agentic testing tool on the market.