Reporting and Observability
What to Capture
Per-Test Artifacts
| Artifact | Format | Purpose |
|---|---|---|
| Commands executed | Text log | Reproducibility |
| Screenshots (each step) | PNG | Visual timeline |
| Page text (each step) | Text | Agent-readable state |
| Current URL (each step) | Text | Navigation tracking |
| Timing | JSON | Performance analysis |
| Pass/Fail result | JSON | CI gate |
| Error details | Text | Debugging |
Per-Suite Artifacts
| Artifact | Format | Purpose |
|---|---|---|
| Summary report | JSON + text | Quick overview |
| Failure screenshots | PNG directory | Human review |
| Agent reasoning log | Markdown | Understanding agent decisions |
| Performance metrics | JSON | Trend analysis |
Command Logging
Every vibe-check command the agent executes should be logged:
[2026-02-09T14:23:01Z] vibe-check navigate https://app.example.com/login
→ exit 0, 1.2s
[2026-02-09T14:23:02Z] vibe-check type "input[name=email]" "user@test.com"
→ exit 0, 0.3s
[2026-02-09T14:23:03Z] vibe-check type "input[name=password]" "***"
→ exit 0, 0.2s
[2026-02-09T14:23:03Z] vibe-check click "button[type=submit]"
→ exit 0, 0.4s
[2026-02-09T14:23:04Z] vibe-check wait ".dashboard-title"
→ exit 0, 0.8s
[2026-02-09T14:23:05Z] vibe-check text ".dashboard-title"
→ exit 0, 0.1s
→ output: "Welcome, User"
This log enables:
- Replay — run the exact same commands to reproduce an issue
- Timing analysis — identify slow steps
- Debugging — see exactly what the agent did
JSON Report Format
{
"suite": "authentication",
"timestamp": "2026-02-09T14:23:00Z",
"environment": {
"base_url": "https://staging.example.com",
"browser": "Chrome for Testing 131",
"vibium_version": "0.1.7",
"mode": "oneshot",
"headless": true
},
"summary": {
"total": 8,
"passed": 7,
"failed": 1,
"skipped": 0,
"duration_ms": 18500
},
"tests": [
{
"name": "login_valid_credentials",
"status": "pass",
"duration_ms": 2300,
"steps": [
{
"command": "navigate https://staging.example.com/login",
"duration_ms": 1200,
"exit_code": 0,
"screenshot": "screenshots/login_valid_step1.png"
},
{
"command": "type \"input[name=email]\" \"test@example.com\"",
"duration_ms": 300,
"exit_code": 0
}
]
},
{
"name": "login_expired_account",
"status": "fail",
"duration_ms": 5100,
"error": {
"message": "Expected text 'Account expired' but got 'Welcome, User'",
"step": 5,
"command": "text \".error-message\"",
"screenshot": "failures/login_expired_account/screenshot.png",
"page_text": "failures/login_expired_account/page_text.txt",
"page_url": "https://staging.example.com/dashboard"
}
}
]
}
Console Summary Output
Human-readable output for CI logs:
Authentication Test Suite
=========================
Target: https://staging.example.com
Browser: Chrome 131 (headless, oneshot)
Started: 2026-02-09 14:23:00 UTC
PASS login_valid_credentials 2.3s
PASS login_invalid_password 1.8s
PASS login_empty_form 1.5s
PASS login_sql_injection 1.2s
FAIL login_expired_account 5.1s
Expected: "Account expired"
Got: "Welcome, User"
Screenshot: failures/login_expired_account/screenshot.png
PASS signup_new_user 3.2s
PASS signup_duplicate_email 2.1s
PASS logout 1.3s
=========================
Results: 7 passed, 1 failed (18.5s)
Failure Analysis
When a test fails, the agent (or framework) should capture:
1. Visual State (Screenshot)
vibe-check screenshot -o "failures/${TEST_NAME}/screenshot.png"
2. Textual State (Page Content)
vibe-check text > "failures/${TEST_NAME}/page_text.txt"
3. Location (URL)
vibe-check url > "failures/${TEST_NAME}/current_url.txt"
4. Console Errors (JavaScript)
vibe-check eval "JSON.stringify(window.__console_errors || [])" > "failures/${TEST_NAME}/console_errors.json"
5. Agent Reasoning (Optional)
If running with Claude Code, the agent's reasoning about the failure is captured in conversation history. This can be extracted as a markdown file explaining what happened.
Performance Tracking
Per-Command Timing
# Wrapper function that times each command
timed_vibe() {
local start=$(date +%s%N)
vibe-check "$@"
local exit_code=$?
local end=$(date +%s%N)
local duration_ms=$(( (end - start) / 1000000 ))
echo "[timing] vibe-check $@ → ${duration_ms}ms (exit $exit_code)" >> timing.log
return $exit_code
}
Trend Analysis
Track test durations over time to catch performance regressions:
{
"date": "2026-02-09",
"test": "checkout_flow",
"duration_ms": 4500,
"previous_avg_ms": 3200,
"regression": true,
"delta_pct": 40.6
}
Alert when a test consistently takes >50% longer than its rolling average.
Integration with CI Dashboards
JUnit XML Output (for Jenkins/GitHub/GitLab)
<?xml version="1.0" encoding="UTF-8"?>
<testsuites>
<testsuite name="authentication" tests="8" failures="1" time="18.5">
<testcase name="login_valid_credentials" time="2.3"/>
<testcase name="login_expired_account" time="5.1">
<failure message="Expected 'Account expired' but got 'Welcome, User'">
Screenshot: failures/login_expired_account/screenshot.png
URL: https://staging.example.com/dashboard
</failure>
</testcase>
</testsuite>
</testsuites>
Interview Talking Point
"Our reporting captures three things for every test: a command log for reproducibility, screenshots for visual debugging, and structured JSON for programmatic analysis. On failure, we also capture page text and URL so the AI agent can re-analyze what went wrong. We output JUnit XML for CI dashboard integration and track timing trends to catch performance regressions. The key insight is that AI-driven tests need richer failure artifacts than traditional tests, because the agent may need to reason about the failure to determine if it's a real bug or a test environment issue."