Daemon Architecture: Process Model and Browser Lifecycle

Two Modes of Operation

Vibium operates in two modes, chosen per-command:

Daemon Mode (Default)

A background process keeps Chrome running between commands. Fast and stateful.

# First command: daemon starts, Chrome launches
vibe-check navigate https://example.com    # ~2s (cold start)

# Subsequent commands: reuse existing browser
vibe-check text "h1"                       # ~100ms (hot)
vibe-check click "a"                       # ~200ms (hot)
vibe-check screenshot -o shot.png          # ~300ms (hot)

# Explicit control
vibe-check daemon status                   # Check if running
vibe-check daemon stop                     # Kill daemon + browser

Best for:

Interactive testing sessions
Agent-driven multi-step flows
Development and debugging

Oneshot Mode

Fresh browser per command. Isolated and stateless.

# Each command: launch → execute → teardown
vibe-check navigate https://example.com --oneshot    # ~2s
vibe-check text "h1" --oneshot                       # ~2s (new browser!)

# Or via environment variable
VIBIUM_ONESHOT=1 vibe-check navigate https://example.com

Best for:

CI/CD pipelines
Parallel test execution
Tests that need clean state

The Process Tree

When Vibium launches a browser session:

clicker (Go binary, ~10MB)
  └── chromedriver
        └── Chrome for Testing (main browser process)
              ├── chrome_crashpad_handler (crash reporting)
              ├── GPU helper
              ├── Network helper
              ├── Storage helper
              ├── Renderer helper (one per tab/frame)
              └── ...more helpers

A single session spawns 8-12 OS processes. This matters for:

Resource planning (memory, CPU)
Process cleanup (killing one doesn't kill all)
CI environments (container limits)

The Zombie Problem

What Goes Wrong

When chromedriver is killed, its children (Chrome + helpers) get reparented to PID 1 (launchd on macOS, init on Linux) before the cleanup code can reach them. They become orphans:

Before kill:                    After chromedriver dies:
clicker                         clicker
  └── chromedriver              (gone - killed)
        └── Chrome                Chrome (parent = PID 1, orphaned!)
              └── GPU helper        └── GPU helper
              └── Renderer          └── Renderer

Why Not Kill Chrome First?

The naive approach — "just kill Chrome before chromedriver" — fails because:

Sending DELETE /session to chromedriver can be interrupted by signals
The HTTP request might time out
Chromedriver might die before Chrome fully terminates
Race conditions between the cleanup sequence and OS process management

The Solution: Three-Phase Cleanup

Implemented in launcher.go:Close():

// Phase 1: Polite request
// Send DELETE /session to chromedriver → asks Chrome to quit gracefully
// Best effort, may fail

// Phase 2: Kill process tree
func killProcessTree(pid int) {
    descendants := getDescendants(pid)  // recursive pgrep -P
    // Kill deepest children first (reverse order)
    for i := len(descendants) - 1; i >= 0; i-- {
        syscall.Kill(descendants[i], syscall.SIGKILL)
    }
    syscall.Kill(pid, syscall.SIGKILL)
}

// Phase 3: Orphan sweep
// Find Chrome/chromedriver processes with parent PID 1
// These escaped the tree kill — terminate them
killOrphanedChromeProcesses()

What If the Clicker Itself Dies?

Scenario	What Happens	Cleanup
Normal exit	`Close()` runs all three phases	Automatic
Ctrl+C (SIGINT)	Signal handler calls `KillAll()`	Automatic
`kill -9` (SIGKILL)	Nothing can intercept this	Orphans remain until next session
System crash	Process table wiped	OS handles it

For development, make double-tap kills all Chrome for Testing and chromedriver processes:

# Manual cleanup during development
make double-tap

# Debugging: check for orphans
pgrep -lf 'Chrome for Testing'
pgrep -lf chromedriver

# Check parent PIDs (orphans have PPID = 1)
ps -o pid,ppid,comm -p $(pgrep -f 'Chrome for Testing')

Daemon Communication

The daemon listens on a local WebSocket. CLI commands connect to it:

vibe-check click "button"
    │
    ▼
Daemon (clicker binary, persistent)
    │
    ▼ WebSocket (BiDi)
Chrome (persistent)

If the daemon isn't running, the first command starts it automatically. This is the "auto-launch" behavior that makes the tool feel seamless.

Implications for Test Frameworks

Daemon Mode for Test Suites

# Start of test suite: ensure clean state
vibe-check daemon stop 2>/dev/null
vibe-check daemon start

# Run tests (all share the same browser)
run_test "login_test"
run_test "dashboard_test"
run_test "checkout_test"

# Cleanup
vibe-check daemon stop

Pro: Fast — no browser restart between tests Con: State leaks between tests (cookies, localStorage, etc.)

Oneshot Mode for Isolated Tests

# Each test gets a fresh browser
VIBIUM_ONESHOT=1 run_test "login_test"
VIBIUM_ONESHOT=1 run_test "dashboard_test"

Pro: Perfect isolation — no state leaks Con: Slow — ~2s overhead per test for browser launch

Hybrid: Daemon + Manual State Reset

# Keep daemon for speed, but reset state between tests
vibe-check daemon start

for test in tests:
    vibe-check eval "localStorage.clear(); sessionStorage.clear()"
    vibe-check eval "document.cookie.split(';').forEach(c => document.cookie = c.trim().split('=')[0] + '=;expires=Thu, 01 Jan 1970')"
    vibe-check navigate "about:blank"
    run_test "$test"

vibe-check daemon stop

Pro: Fast + mostly isolated Con: Manual state management, may miss server-side session state

CI/CD Considerations

Headless Mode

# No display needed
vibe-check navigate https://example.com --headless

Container Environments

# Chrome needs these dependencies on Linux
RUN apt-get install -y libgbm1 libnss3 libatk-bridge2.0-0 \
    libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 \
    libxfixes3 libxrandr2 libasound2

Parallel Execution

Each daemon instance uses a separate port. For parallel tests, use oneshot mode or explicitly manage daemon instances.

Interview Talking Point

"Vibium's daemon architecture is a pragmatic trade-off between speed and isolation. The daemon keeps Chrome alive between commands — reducing per-command latency from ~2 seconds to ~100ms — while oneshot mode gives you fresh-browser isolation for CI. The interesting technical challenge is process cleanup: Chrome spawns 8-12 child processes, and killing the driver process orphans them. Vibium solves this with a three-phase cleanup: graceful shutdown request, recursive process tree kill, then an orphan sweep for any that escaped. For our test framework, we use daemon mode during development for speed and oneshot in CI for reliability."