Reporting and Observability

What to Capture

Per-Test Artifacts

Artifact	Format	Purpose
Commands executed	Text log	Reproducibility
Screenshots (each step)	PNG	Visual timeline
Page text (each step)	Text	Agent-readable state
Current URL (each step)	Text	Navigation tracking
Timing	JSON	Performance analysis
Pass/Fail result	JSON	CI gate
Error details	Text	Debugging

Per-Suite Artifacts

Artifact	Format	Purpose
Summary report	JSON + text	Quick overview
Failure screenshots	PNG directory	Human review
Agent reasoning log	Markdown	Understanding agent decisions
Performance metrics	JSON	Trend analysis

Command Logging

Every vibe-check command the agent executes should be logged:

[2026-02-09T14:23:01Z] vibe-check navigate https://app.example.com/login
  → exit 0, 1.2s

[2026-02-09T14:23:02Z] vibe-check type "input[name=email]" "user@test.com"
  → exit 0, 0.3s

[2026-02-09T14:23:03Z] vibe-check type "input[name=password]" "***"
  → exit 0, 0.2s

[2026-02-09T14:23:03Z] vibe-check click "button[type=submit]"
  → exit 0, 0.4s

[2026-02-09T14:23:04Z] vibe-check wait ".dashboard-title"
  → exit 0, 0.8s

[2026-02-09T14:23:05Z] vibe-check text ".dashboard-title"
  → exit 0, 0.1s
  → output: "Welcome, User"

This log enables:

Replay — run the exact same commands to reproduce an issue
Timing analysis — identify slow steps
Debugging — see exactly what the agent did

JSON Report Format

{
  "suite": "authentication",
  "timestamp": "2026-02-09T14:23:00Z",
  "environment": {
    "base_url": "https://staging.example.com",
    "browser": "Chrome for Testing 131",
    "vibium_version": "0.1.7",
    "mode": "oneshot",
    "headless": true
  },
  "summary": {
    "total": 8,
    "passed": 7,
    "failed": 1,
    "skipped": 0,
    "duration_ms": 18500
  },
  "tests": [
    {
      "name": "login_valid_credentials",
      "status": "pass",
      "duration_ms": 2300,
      "steps": [
        {
          "command": "navigate https://staging.example.com/login",
          "duration_ms": 1200,
          "exit_code": 0,
          "screenshot": "screenshots/login_valid_step1.png"
        },
        {
          "command": "type \"input[name=email]\" \"test@example.com\"",
          "duration_ms": 300,
          "exit_code": 0
        }
      ]
    },
    {
      "name": "login_expired_account",
      "status": "fail",
      "duration_ms": 5100,
      "error": {
        "message": "Expected text 'Account expired' but got 'Welcome, User'",
        "step": 5,
        "command": "text \".error-message\"",
        "screenshot": "failures/login_expired_account/screenshot.png",
        "page_text": "failures/login_expired_account/page_text.txt",
        "page_url": "https://staging.example.com/dashboard"
      }
    }
  ]
}

Console Summary Output

Human-readable output for CI logs:

Authentication Test Suite
=========================
Target: https://staging.example.com
Browser: Chrome 131 (headless, oneshot)
Started: 2026-02-09 14:23:00 UTC

  PASS  login_valid_credentials          2.3s
  PASS  login_invalid_password           1.8s
  PASS  login_empty_form                 1.5s
  PASS  login_sql_injection              1.2s
  FAIL  login_expired_account            5.1s
        Expected: "Account expired"
        Got: "Welcome, User"
        Screenshot: failures/login_expired_account/screenshot.png
  PASS  signup_new_user                  3.2s
  PASS  signup_duplicate_email           2.1s
  PASS  logout                           1.3s

=========================
Results: 7 passed, 1 failed (18.5s)

Failure Analysis

When a test fails, the agent (or framework) should capture:

1. Visual State (Screenshot)

vibe-check screenshot -o "failures/${TEST_NAME}/screenshot.png"

2. Textual State (Page Content)

vibe-check text > "failures/${TEST_NAME}/page_text.txt"

3. Location (URL)

vibe-check url > "failures/${TEST_NAME}/current_url.txt"

4. Console Errors (JavaScript)

vibe-check eval "JSON.stringify(window.__console_errors || [])" > "failures/${TEST_NAME}/console_errors.json"

5. Agent Reasoning (Optional)

If running with Claude Code, the agent's reasoning about the failure is captured in conversation history. This can be extracted as a markdown file explaining what happened.

Performance Tracking

Per-Command Timing

# Wrapper function that times each command
timed_vibe() {
  local start=$(date +%s%N)
  vibe-check "$@"
  local exit_code=$?
  local end=$(date +%s%N)
  local duration_ms=$(( (end - start) / 1000000 ))
  echo "[timing] vibe-check $@ → ${duration_ms}ms (exit $exit_code)" >> timing.log
  return $exit_code
}

Trend Analysis

Track test durations over time to catch performance regressions:

{
  "date": "2026-02-09",
  "test": "checkout_flow",
  "duration_ms": 4500,
  "previous_avg_ms": 3200,
  "regression": true,
  "delta_pct": 40.6
}

Alert when a test consistently takes >50% longer than its rolling average.

Integration with CI Dashboards

JUnit XML Output (for Jenkins/GitHub/GitLab)

<?xml version="1.0" encoding="UTF-8"?>
<testsuites>
  <testsuite name="authentication" tests="8" failures="1" time="18.5">
    <testcase name="login_valid_credentials" time="2.3"/>
    <testcase name="login_expired_account" time="5.1">
      <failure message="Expected 'Account expired' but got 'Welcome, User'">
        Screenshot: failures/login_expired_account/screenshot.png
        URL: https://staging.example.com/dashboard
      </failure>
    </testcase>
  </testsuite>
</testsuites>

Interview Talking Point

"Our reporting captures three things for every test: a command log for reproducibility, screenshots for visual debugging, and structured JSON for programmatic analysis. On failure, we also capture page text and URL so the AI agent can re-analyze what went wrong. We output JUnit XML for CI dashboard integration and track timing trends to catch performance regressions. The key insight is that AI-driven tests need richer failure artifacts than traditional tests, because the agent may need to reason about the failure to determine if it's a real bug or a test environment issue."