CI/CD Integration for AI-Driven Browser Tests
Principles
- Headless always — No display in CI
- Oneshot mode — Fresh browser per test for isolation
- Artifacts on failure — Screenshots, page text, agent logs
- Deterministic timeouts — No infinite waits
- Exit codes matter — CI gates on pass/fail
GitHub Actions Configuration
Basic Setup
name: Browser Tests
on: [push, pull_request]
jobs:
browser-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install Vibium
run: npm install -g vibium
- name: Install Chrome dependencies (Linux)
run: |
sudo apt-get update
sudo apt-get install -y \
libgbm1 libnss3 libatk-bridge2.0-0 \
libdrm2 libxkbcommon0 libxcomposite1 \
libxdamage1 libxfixes3 libxrandr2 libasound2
- name: Run browser tests
env:
VIBIUM_ONESHOT: 1
run: |
vibe-check navigate https://staging.example.com --headless
vibe-check text "h1" | grep -q "Welcome"
- name: Upload failure artifacts
if: failure()
uses: actions/upload-artifact@v4
with:
name: test-failures
path: failures/
Full Test Suite
name: Full Test Suite
on:
push:
branches: [main]
pull_request:
jobs:
browser-tests:
runs-on: ubuntu-latest
timeout-minutes: 30
strategy:
fail-fast: false
matrix:
test-group: [auth, dashboard, checkout, settings]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install dependencies
run: |
npm install -g vibium
sudo apt-get update && sudo apt-get install -y \
libgbm1 libnss3 libatk-bridge2.0-0 libdrm2 \
libxkbcommon0 libxcomposite1 libxdamage1 \
libxfixes3 libxrandr2 libasound2
- name: Run ${{ matrix.test-group }} tests
env:
VIBIUM_ONESHOT: 1
TEST_BASE_URL: ${{ secrets.STAGING_URL }}
run: ./run-tests.sh ${{ matrix.test-group }}
- name: Upload screenshots
if: always()
uses: actions/upload-artifact@v4
with:
name: screenshots-${{ matrix.test-group }}
path: screenshots/
- name: Upload failure details
if: failure()
uses: actions/upload-artifact@v4
with:
name: failures-${{ matrix.test-group }}
path: failures/
Test Runner Script
run-tests.sh
#!/bin/bash
set -euo pipefail
TEST_GROUP="${1:-all}"
BASE_URL="${TEST_BASE_URL:-http://localhost:3000}"
FAILURES_DIR="failures"
SCREENSHOTS_DIR="screenshots"
RESULTS_FILE="results.json"
mkdir -p "$FAILURES_DIR" "$SCREENSHOTS_DIR"
PASSED=0
FAILED=0
TOTAL=0
run_test() {
local test_name="$1"
local test_script="$2"
TOTAL=$((TOTAL + 1))
echo -n " $test_name ... "
# Execute test, capture output
if output=$(bash -c "$test_script" 2>&1); then
echo "PASS"
PASSED=$((PASSED + 1))
else
echo "FAIL"
FAILED=$((FAILED + 1))
# Capture failure artifacts
mkdir -p "$FAILURES_DIR/$test_name"
echo "$output" > "$FAILURES_DIR/$test_name/output.txt"
vibe-check screenshot -o "$FAILURES_DIR/$test_name/screenshot.png" --headless 2>/dev/null || true
vibe-check text > "$FAILURES_DIR/$test_name/page_text.txt" --headless 2>/dev/null || true
vibe-check url > "$FAILURES_DIR/$test_name/current_url.txt" --headless 2>/dev/null || true
fi
}
echo "Running $TEST_GROUP tests against $BASE_URL"
echo "=========================================="
# Load and run tests for the group
case $TEST_GROUP in
auth)
run_test "login_valid" "
vibe-check navigate $BASE_URL/login --headless &&
vibe-check type 'input[name=email]' 'test@example.com' --headless &&
vibe-check type 'input[name=password]' 'password123' --headless &&
vibe-check click 'button[type=submit]' --headless &&
vibe-check wait 'h1' --headless &&
vibe-check text 'h1' --headless | grep -q 'Dashboard'
"
run_test "login_invalid" "
vibe-check navigate $BASE_URL/login --headless &&
vibe-check type 'input[name=email]' 'wrong@example.com' --headless &&
vibe-check type 'input[name=password]' 'wrongpass' --headless &&
vibe-check click 'button[type=submit]' --headless &&
vibe-check wait '.error' --headless &&
vibe-check text '.error' --headless | grep -qi 'invalid'
"
;;
dashboard)
run_test "dashboard_loads" "
vibe-check navigate $BASE_URL/dashboard --headless &&
vibe-check wait '.metrics' --headless &&
vibe-check text '.metric-count' --headless | grep -qE '[0-9]+'
"
;;
*)
echo "Unknown test group: $TEST_GROUP"
exit 1
;;
esac
echo ""
echo "=========================================="
echo "Results: $PASSED passed, $FAILED failed, $TOTAL total"
# Exit with failure if any tests failed
[ "$FAILED" -eq 0 ]
Docker Configuration
Dockerfile for Test Runner
FROM node:20-slim
# Chrome dependencies
RUN apt-get update && apt-get install -y \
libgbm1 libnss3 libatk-bridge2.0-0 \
libdrm2 libxkbcommon0 libxcomposite1 \
libxdamage1 libxfixes3 libxrandr2 \
libasound2 fonts-liberation \
&& rm -rf /var/lib/apt/lists/*
# Install Vibium
RUN npm install -g vibium
# Pre-download Chrome
RUN vibium install
# Copy test scripts
WORKDIR /tests
COPY . .
ENV VIBIUM_ONESHOT=1
CMD ["./run-tests.sh", "all"]
docker-compose.yml (with test app)
version: '3.8'
services:
app:
build: ./app
ports:
- "3000:3000"
healthcheck:
test: curl -f http://localhost:3000/health
interval: 5s
timeout: 3s
retries: 5
tests:
build:
context: ./tests
dockerfile: Dockerfile
depends_on:
app:
condition: service_healthy
environment:
- TEST_BASE_URL=http://app:3000
- VIBIUM_ONESHOT=1
volumes:
- ./test-results:/tests/failures
Parallel Execution in CI
Matrix Strategy (GitHub Actions)
strategy:
fail-fast: false
matrix:
test-group: [auth, dashboard, checkout, settings, admin]
Each test group runs as a separate job with its own browser instance. fail-fast: false ensures all groups run even if one fails.
Within a Single Job
# Run 4 tests in parallel
cat test_list.txt | xargs -P4 -I{} bash -c '
VIBIUM_ONESHOT=1 ./run-single-test.sh "{}"
'
Resource Limits
| Workers | RAM Needed | CPU Needed |
|---|---|---|
| 1 | ~200MB | 1 core |
| 4 | ~800MB | 4 cores |
| 8 | ~1.6GB | 8 cores |
CI runner recommendation: 4 parallel workers on a standard 2-core runner (Chrome is mostly I/O-bound, not CPU-bound).
Artifact Management
What to Capture
| Artifact | When | Size | Value |
|---|---|---|---|
| Screenshots (per step) | Always | ~100KB each | High — visual timeline |
| Screenshots (on failure) | On failure | ~100KB each | Critical — debugging |
| Page text (on failure) | On failure | ~1-10KB | High — agent can re-analyze |
| Console logs | Always | ~1-50KB | Medium — JavaScript errors |
| Test results JSON | Always | ~1-5KB | High — programmatic analysis |
Retention Policy
# GitHub Actions
- uses: actions/upload-artifact@v4
with:
name: test-results
path: results/
retention-days: 30 # Keep for 30 days
Interview Talking Point
"Our CI pipeline runs browser tests using the vibe-check skill in headless oneshot mode — each test gets a fresh Chrome instance for isolation. We use GitHub Actions matrix strategy to parallelize test groups across 4-5 jobs. On failure, we capture screenshots, page text, and the current URL as artifacts. The test runner script is a simple Bash orchestrator that pipes vibe-check commands and checks exit codes. The whole setup — from install to first test — takes about 30 seconds in CI, and each test runs in 2-5 seconds including browser startup."