CI/CD Integration for AI-Driven Browser Tests

Principles

Headless always — No display in CI
Oneshot mode — Fresh browser per test for isolation
Artifacts on failure — Screenshots, page text, agent logs
Deterministic timeouts — No infinite waits
Exit codes matter — CI gates on pass/fail

GitHub Actions Configuration

Basic Setup

name: Browser Tests
on: [push, pull_request]

jobs:
  browser-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Install Vibium
        run: npm install -g vibium

      - name: Install Chrome dependencies (Linux)
        run: |
          sudo apt-get update
          sudo apt-get install -y \
            libgbm1 libnss3 libatk-bridge2.0-0 \
            libdrm2 libxkbcommon0 libxcomposite1 \
            libxdamage1 libxfixes3 libxrandr2 libasound2

      - name: Run browser tests
        env:
          VIBIUM_ONESHOT: 1
        run: |
          vibe-check navigate https://staging.example.com --headless
          vibe-check text "h1" | grep -q "Welcome"

      - name: Upload failure artifacts
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: test-failures
          path: failures/

Full Test Suite

name: Full Test Suite
on:
  push:
    branches: [main]
  pull_request:

jobs:
  browser-tests:
    runs-on: ubuntu-latest
    timeout-minutes: 30
    strategy:
      fail-fast: false
      matrix:
        test-group: [auth, dashboard, checkout, settings]

    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Install dependencies
        run: |
          npm install -g vibium
          sudo apt-get update && sudo apt-get install -y \
            libgbm1 libnss3 libatk-bridge2.0-0 libdrm2 \
            libxkbcommon0 libxcomposite1 libxdamage1 \
            libxfixes3 libxrandr2 libasound2

      - name: Run ${{ matrix.test-group }} tests
        env:
          VIBIUM_ONESHOT: 1
          TEST_BASE_URL: ${{ secrets.STAGING_URL }}
        run: ./run-tests.sh ${{ matrix.test-group }}

      - name: Upload screenshots
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: screenshots-${{ matrix.test-group }}
          path: screenshots/

      - name: Upload failure details
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: failures-${{ matrix.test-group }}
          path: failures/

Test Runner Script

`run-tests.sh`

#!/bin/bash
set -euo pipefail

TEST_GROUP="${1:-all}"
BASE_URL="${TEST_BASE_URL:-http://localhost:3000}"
FAILURES_DIR="failures"
SCREENSHOTS_DIR="screenshots"
RESULTS_FILE="results.json"

mkdir -p "$FAILURES_DIR" "$SCREENSHOTS_DIR"

PASSED=0
FAILED=0
TOTAL=0

run_test() {
  local test_name="$1"
  local test_script="$2"
  TOTAL=$((TOTAL + 1))

  echo -n "  $test_name ... "

  # Execute test, capture output
  if output=$(bash -c "$test_script" 2>&1); then
    echo "PASS"
    PASSED=$((PASSED + 1))
  else
    echo "FAIL"
    FAILED=$((FAILED + 1))

    # Capture failure artifacts
    mkdir -p "$FAILURES_DIR/$test_name"
    echo "$output" > "$FAILURES_DIR/$test_name/output.txt"
    vibe-check screenshot -o "$FAILURES_DIR/$test_name/screenshot.png" --headless 2>/dev/null || true
    vibe-check text > "$FAILURES_DIR/$test_name/page_text.txt" --headless 2>/dev/null || true
    vibe-check url > "$FAILURES_DIR/$test_name/current_url.txt" --headless 2>/dev/null || true
  fi
}

echo "Running $TEST_GROUP tests against $BASE_URL"
echo "=========================================="

# Load and run tests for the group
case $TEST_GROUP in
  auth)
    run_test "login_valid" "
      vibe-check navigate $BASE_URL/login --headless &&
      vibe-check type 'input[name=email]' 'test@example.com' --headless &&
      vibe-check type 'input[name=password]' 'password123' --headless &&
      vibe-check click 'button[type=submit]' --headless &&
      vibe-check wait 'h1' --headless &&
      vibe-check text 'h1' --headless | grep -q 'Dashboard'
    "

    run_test "login_invalid" "
      vibe-check navigate $BASE_URL/login --headless &&
      vibe-check type 'input[name=email]' 'wrong@example.com' --headless &&
      vibe-check type 'input[name=password]' 'wrongpass' --headless &&
      vibe-check click 'button[type=submit]' --headless &&
      vibe-check wait '.error' --headless &&
      vibe-check text '.error' --headless | grep -qi 'invalid'
    "
    ;;

  dashboard)
    run_test "dashboard_loads" "
      vibe-check navigate $BASE_URL/dashboard --headless &&
      vibe-check wait '.metrics' --headless &&
      vibe-check text '.metric-count' --headless | grep -qE '[0-9]+'
    "
    ;;

  *)
    echo "Unknown test group: $TEST_GROUP"
    exit 1
    ;;
esac

echo ""
echo "=========================================="
echo "Results: $PASSED passed, $FAILED failed, $TOTAL total"

# Exit with failure if any tests failed
[ "$FAILED" -eq 0 ]

Docker Configuration

Dockerfile for Test Runner

FROM node:20-slim

# Chrome dependencies
RUN apt-get update && apt-get install -y \
    libgbm1 libnss3 libatk-bridge2.0-0 \
    libdrm2 libxkbcommon0 libxcomposite1 \
    libxdamage1 libxfixes3 libxrandr2 \
    libasound2 fonts-liberation \
    && rm -rf /var/lib/apt/lists/*

# Install Vibium
RUN npm install -g vibium

# Pre-download Chrome
RUN vibium install

# Copy test scripts
WORKDIR /tests
COPY . .

ENV VIBIUM_ONESHOT=1

CMD ["./run-tests.sh", "all"]

docker-compose.yml (with test app)

version: '3.8'
services:
  app:
    build: ./app
    ports:
      - "3000:3000"
    healthcheck:
      test: curl -f http://localhost:3000/health
      interval: 5s
      timeout: 3s
      retries: 5

  tests:
    build:
      context: ./tests
      dockerfile: Dockerfile
    depends_on:
      app:
        condition: service_healthy
    environment:
      - TEST_BASE_URL=http://app:3000
      - VIBIUM_ONESHOT=1
    volumes:
      - ./test-results:/tests/failures

Parallel Execution in CI

Matrix Strategy (GitHub Actions)

strategy:
  fail-fast: false
  matrix:
    test-group: [auth, dashboard, checkout, settings, admin]

Each test group runs as a separate job with its own browser instance. fail-fast: false ensures all groups run even if one fails.

Within a Single Job

# Run 4 tests in parallel
cat test_list.txt | xargs -P4 -I{} bash -c '
  VIBIUM_ONESHOT=1 ./run-single-test.sh "{}"
'

Resource Limits

Workers	RAM Needed	CPU Needed
1	~200MB	1 core
4	~800MB	4 cores
8	~1.6GB	8 cores

CI runner recommendation: 4 parallel workers on a standard 2-core runner (Chrome is mostly I/O-bound, not CPU-bound).

Artifact Management

What to Capture

Artifact	When	Size	Value
Screenshots (per step)	Always	~100KB each	High — visual timeline
Screenshots (on failure)	On failure	~100KB each	Critical — debugging
Page text (on failure)	On failure	~1-10KB	High — agent can re-analyze
Console logs	Always	~1-50KB	Medium — JavaScript errors
Test results JSON	Always	~1-5KB	High — programmatic analysis

Retention Policy

# GitHub Actions
- uses: actions/upload-artifact@v4
  with:
    name: test-results
    path: results/
    retention-days: 30    # Keep for 30 days

Interview Talking Point

"Our CI pipeline runs browser tests using the vibe-check skill in headless oneshot mode — each test gets a fresh Chrome instance for isolation. We use GitHub Actions matrix strategy to parallelize test groups across 4-5 jobs. On failure, we capture screenshots, page text, and the current URL as artifacts. The test runner script is a simple Bash orchestrator that pipes vibe-check commands and checks exit codes. The whole setup — from install to first test — takes about 30 seconds in CI, and each test runs in 2-5 seconds including browser startup."