Agent Skills for driving Browser for UI Test Automation: The Complete QA Engineer's Guide

Prepared for: QA Engineer gentle introduction to the roles where heavy AI usage is expected. Focus: Driving a browser through agent skills (not MCP servers), building an AI-augmented test automation framework, and speaking about it credibly with architect-level developers.

1. Foundations: How Agent Skills Work — `01-foundations/`

Skill Anatomy — SKILL.md structure with annotated examples
Skill Lifecycle — Discovery, selection, invocation, execution, teardown
Token Economics — Why skills cost dozens of tokens vs thousands for MCP

2. Vibium Deep Dive — `02-vibium-deep-dive/`

CLI Command Reference — All 22 commands with examples and edge cases
Actionability Checks — The five checks with actual JavaScript source code
Daemon Architecture — Process model, cleanup, zombie prevention
Extension Commands — How vibium:find, vibium:click, vibium:type work over BiDi

3. Skills vs MCP: When and Why — `03-skills-vs-mcp/`

Architectural Comparison — Deep side-by-side with diagrams
Token Budget Analysis — Real numbers: how many tokens each approach costs
Hybrid Strategies — Using both together in one framework

4. Building an AI Test Automation Framework — `04-building-test-framework/`

Architecture Decisions — ADRs for the framework
Test Patterns — Patterns for AI-driven tests (navigation, forms, assertions, data extraction)
CI/CD Integration — Running in GitHub Actions, handling headless mode, artifacts
Reporting and Observability — What to capture, how to present results
Self-Healing Strategies — How agents recover from broken selectors

5. WebDriver BiDi Protocol — `05-webdriver-bidi/`

Protocol Overview — Message format, sessions, commands vs events
Evolution from Selenium — Historical context from WebDriver to BiDi
Vibium Extension Commands — How vibium:find/click/type extend the protocol

6. Interview Preparation — `06-interview-preparation/`

Architect QA Scenarios — 20 questions with detailed answers
Framework Presentation — How to present your framework in 5, 15, and 30 minutes
Buzzword Decoder — What people actually mean by "agentic testing", "self-healing", "ReAct pattern"

7. Competitive Landscape — `07-competitive-landscape/`

Tool Comparison Matrix — Detailed feature-by-feature comparison
When to Use What — Decision framework for choosing the right tool
Future Directions — Where the industry is heading (AI locators, Cortex/Retina, video recording)

1. Foundations

Subfolder: `01-foundations/`

What Are Agent Skills?

Agent skills are reusable capability packages for AI coding agents. Unlike MCP servers that expose tool schemas over a protocol (adding thousands of tokens to context), skills inject procedural knowledge — a markdown file that teaches the agent how to accomplish domain-specific tasks using existing tools (Bash, Read, Write, etc.).

The critical insight: Skills don't add new tools. They teach the agent how to use existing tools for a specific domain.

The SKILL.md Contract

Every skill is defined by a single SKILL.md file with two sections:

---
name: vibe-check        # Lowercase, hyphens only, max 64 chars
description: |           # THE selection signal — Claude reads this to decide if the skill applies
  Browser automation via CLI. Navigate pages, click elements,
  fill forms, take screenshots, extract text.
allowed-tools: Bash      # What tools the skill may use
---

# Instructions for the agent (markdown content)
The `vibe-check` CLI automates Chrome via the command line...

How Selection Works

There is no algorithmic routing. Claude receives a formatted list of all available skills inside the Skill tool description. When a user asks something like "take a screenshot of this page," Claude's language model matches intent to skill descriptions through its forward pass — no embeddings, no classifiers, just comprehension.

How Invocation Works

When Claude decides to invoke a skill:

A visible "loading" message appears to the user
The SKILL.md content is injected as a hidden system message into conversation context
Tool permissions from allowed-tools are temporarily granted
Claude executes the skill's instructions using available tools (primarily Bash for CLI skills)
Permissions revert when the skill completes

Key Files to Read

01-foundations/01-skill-anatomy.md — Full breakdown of SKILL.md structure with annotated examples
01-foundations/02-skill-lifecycle.md — Discovery, selection, invocation, execution, teardown
01-foundations/03-token-economics.md — Why skills cost dozens of tokens vs thousands for MCP

2. Vibium Deep Dive

Subfolder: `02-vibium-deep-dive/`

What Is Vibium?

Vibium is browser automation infrastructure built by the creator of Selenium and Appium. It's a single Go binary (~10MB) that:

Launches and manages Chrome via WebDriver BiDi
Exposes 22 CLI commands for browser control
Runs as a daemon (browser persists between commands) or oneshot (fresh browser per command)
Also provides an MCP server and JS/Python client libraries

The `vibe-check` Skill

The vibe-check skill (from skills/vibe-check/SKILL.md) is a single SKILL.md file that teaches an AI agent all 22 CLI commands. When installed, the agent can drive Chrome through Bash:

# The agent executes these through the Bash tool
vibe-check navigate https://app.example.com/login
vibe-check type "input[name=email]" "user@test.com"
vibe-check type "input[name=password]" "secret123"
vibe-check click "button[type=submit]"
vibe-check wait "h1"
vibe-check text "h1"                    # → "Welcome, User"
vibe-check screenshot -o dashboard.png

Architecture: Sense → Think → Act

Vibium's roadmap follows a robotics control loop:

Layer	Component	Status	Purpose
Act	Clicker	Shipped (V1)	Browser automation via BiDi
Sense	Retina	V2 planned	Chrome extension that observes everything
Think	Cortex	V2 planned	SQLite-backed memory + navigation planning

The Five Actionability Checks

Before any interaction, Vibium verifies server-side (in Go, not in client code):

Visible — Element has non-zero size, not display:none or visibility:hidden
Stable — Position unchanged over 50ms (catches animations)
ReceivesEvents — Not obscured by another element (elementFromPoint check)
Enabled — Not disabled, aria-disabled, or inside disabled <fieldset>
Editable — (for type only) Accepts text input, not readonly

These run in a polling loop with 100ms intervals until all pass or timeout (default 30s).

Daemon vs Oneshot

Mode	How	Best For
Daemon (default)	Background process keeps browser alive	Interactive sessions, chaining commands
Oneshot	Fresh browser per command, torn down after	CI pipelines, isolated test runs

Key Files to Read

02-vibium-deep-dive/01-cli-command-reference.md — All 22 commands with examples and edge cases
02-vibium-deep-dive/02-actionability-checks.md — The five checks with actual JavaScript source code
02-vibium-deep-dive/03-daemon-architecture.md — Process model, cleanup, zombie prevention
02-vibium-deep-dive/04-extension-commands.md — How vibium:find, vibium:click, vibium:type work over BiDi

3. Skills vs MCP: When and Why

Subfolder: `03-skills-vs-mcp/`

The Core Trade-off

Dimension	Skills (CLI)	MCP Server
Token cost	Dozens (SKILL.md ~200-500 lines)	Thousands (tool schemas + accessibility trees)
State management	Stateless between commands (daemon handles browser state)	Persistent connection with rich introspection
Setup	`npx skills add <repo>`	`claude mcp add <name> -- <command>`
How agent interacts	Bash tool executes CLI commands	Dedicated MCP tools (`browser_click`, etc.)
Error handling	Exit codes + stderr	Structured JSON error responses
Best for	High-throughput agents balancing many tasks	Exploratory automation, self-healing loops

When to Use Skills (CLI Approach)

Your agent is doing more than just browser work (writing code, running tests, reading files)
You need to minimize context window consumption
You want simple, composable commands that chain with other CLI tools
You're in a CI/CD pipeline where token costs matter
Playwright's own docs now acknowledge CLI+Skills is more token-efficient

When to Use MCP

Exploratory testing where the agent needs rich page introspection
Self-healing test flows that require iterative reasoning over DOM structure
Long-running autonomous workflows with continuous browser context
You need accessibility tree analysis for semantic understanding

Key Files to Read

03-skills-vs-mcp/01-architectural-comparison.md — Deep side-by-side with diagrams
03-skills-vs-mcp/02-token-budget-analysis.md — Real numbers: how many tokens each approach costs
03-skills-vs-mcp/03-hybrid-strategies.md — Using both together in one framework

4. Building an AI Test Automation Framework

Subfolder: `04-building-test-framework/`

Framework Architecture Overview

┌─────────────────────────────────────────────────────┐
│                    Test Runner                      │
│  (pytest / Jest / custom orchestrator)              │
├─────────────────────────────────────────────────────┤
│              AI Agent Layer (Claude Code)           │
│  ┌─────────────┐  ┌──────────────┐  ┌────────────┐  │
│  │ vibe-check  │  │ Test Skills  │  │ Reporting  │  │
│  │ skill       │  │ (custom)     │  │ skill      │  │
│  └──────┬──────┘  └──────┬───────┘  └─────┬──────┘  │
├─────────┼────────────────┼────────────────┼─────────┤
│         │    Bash Tool   │                │         │
│         ▼                ▼                ▼         │
│  ┌─────────────┐  ┌─────────────┐  ┌────────────┐   │
│  │ Vibium CLI  │  │ Test Utils  │  │ Report Gen │   │
│  │ (vibe-check)│  │ (scripts)   │  │ (scripts)  │   │
│  └──────┬──────┘  └─────────────┘  └────────────┘   │
│         │                                           │
│         ▼                                           │
│  ┌─────────────┐                                    │
│  │  Chrome     │                                    │
│  │  (BiDi)     │                                    │
│  └─────────────┘                                    │
└─────────────────────────────────────────────────────┘

Core Design Principles

Agent as orchestrator, not executor — The AI agent decides what to test and how to interact, the framework handles mechanics
Natural language test definitions — Tests describe intent ("verify login works with valid credentials") not implementation
Self-healing selectors — When a selector fails, the agent uses vibe-check find-all and vibe-check text to reason about alternatives
Screenshot-driven debugging — Every failure produces a screenshot + page text for the agent to analyze

Key Files to Read

04-building-test-framework/01-architecture-decisions.md — ADRs for the framework
04-building-test-framework/02-test-patterns.md — Patterns for AI-driven tests (navigation, forms, assertions, data extraction)
04-building-test-framework/03-ci-cd-integration.md — Running in GitHub Actions, handling headless mode, artifacts
04-building-test-framework/04-reporting-and-observability.md — What to capture, how to present results
04-building-test-framework/05-self-healing-strategies.md — How agents recover from broken selectors

5. WebDriver BiDi Protocol

Subfolder: `05-webdriver-bidi/`

Why This Matters for Your Interview

WebDriver BiDi is the W3C standard that Vibium is built on. Understanding it shows you know the layer below the tools — critical for architect-level conversations.

Evolution: WebDriver → CDP → BiDi

Protocol	Year	Transport	Direction	Owned By
WebDriver	2018 (W3C)	HTTP+JSON	Request/Response	W3C
CDP	2017	WebSocket	Bidirectional	Google
BiDi	2021+	WebSocket+JSON	Bidirectional	W3C

BiDi combines the best of both: standardized like WebDriver, bidirectional like CDP, cross-browser by design.

How Vibium Uses BiDi

Client (JS/Python/CLI)
    │
    ▼ WebSocket
Clicker (Go binary) ← BiDi Proxy
    │
    ▼ WebSocket
Chrome (BiDi endpoint)

Vibium's Go binary sits as a proxy between clients and Chrome. It:

Routes standard BiDi commands directly to Chrome
Intercepts custom vibium:* extension commands and handles them server-side
Implements actionability checks by sending JavaScript evaluation commands to Chrome

Key Files to Read

05-webdriver-bidi/01-protocol-overview.md — Message format, sessions, commands vs events
05-webdriver-bidi/02-evolution-from-selenium.md — Historical context from WebDriver to BiDi
05-webdriver-bidi/03-vibium-extension-commands.md — How vibium:find/click/type extend the protocol

6. Interview Preparation

Subfolder: `06-interview-preparation/`

What Architects Will Ask About

"Why not just use Playwright/Selenium directly?" — You need to articulate the agent-native advantage
"How do you handle flaky tests with AI?" — Self-healing selectors, intelligent retries, screenshot analysis
"What's your CI/CD strategy?" — Headless mode, oneshot daemon, artifact collection, parallelization
"How does this scale?" — Token costs, daemon pooling, test isolation
"What about test maintenance?" — The key selling point: natural language tests + agent reasoning reduce maintenance burden by 60-85%

Key Talking Points (The "Three Levels" Framework)

When discussing your framework, structure answers at three levels:

Level 1 — The What (for PMs and non-technical stakeholders):

"We use AI agents that can drive a real browser just like a human would. They read pages, click buttons, fill forms, and verify results — but they do it through natural language instructions instead of brittle code."

Level 2 — The How (for senior engineers):

"The agent uses the vibe-check CLI skill which gives it 22 browser commands via Bash. Commands auto-wait for elements to be actionable using five server-side checks. The browser runs as a daemon for speed or oneshot for isolation. Under the hood it's WebDriver BiDi over WebSocket to Chrome."

Level 3 — The Why (for architects):

"We chose CLI skills over MCP for browser control because of token economics — a SKILL.md costs dozens of tokens vs thousands for MCP tool schemas. The skill approach means our agent's context window stays available for reasoning about test logic, analyzing failures, and writing code. The BiDi protocol is W3C-standardized, avoiding CDP vendor lock-in. Actionability is implemented server-side in Go so it's consistent across all client languages."

Key Files to Read

06-interview-preparation/01-architect-qa-scenarios.md — 20 questions with detailed answers
06-interview-preparation/02-framework-presentation.md — How to present your framework in 5, 15, and 30 minutes
06-interview-preparation/03-buzzword-decoder.md — What people actually mean by "agentic testing", "self-healing", "ReAct pattern"

7. Competitive Landscape

Subfolder: `07-competitive-landscape/`

The Current Map (2026)

Tool	Approach	Best For	Limitation
Vibium	CLI skill + BiDi	AI agent integration, token efficiency	Young ecosystem, Chrome-only (V1)
Playwright MCP	MCP server + accessibility tree	Rich page understanding, exploratory testing	High token cost, context bloat
browser-use	Python library + vision models	Visual testing, complex UIs	Slow (vision API calls), expensive
agent-browser (Vercel)	Snapshot + Refs	Minimal context usage (93% reduction)	New, limited ecosystem
Selenium 4+	WebDriver + BiDi support	Enterprise legacy integration	Heavy, complex setup
testRigor	NL-first commercial platform	Non-technical testers	Vendor lock-in, cost

Key Files to Read

07-competitive-landscape/01-tool-comparison-matrix.md — Detailed feature-by-feature comparison
07-competitive-landscape/02-when-to-use-what.md — Decision framework for choosing the right tool
07-competitive-landscape/03-future-directions.md — Where the industry is heading (AI locators, Cortex/Retina, video recording)

Quick Reference: The vibe-check Skill Commands

Navigation

Command	Purpose
`vibe-check navigate <url>`	Go to a page
`vibe-check url`	Print current URL
`vibe-check title`	Print page title

Reading Content

Command	Purpose
`vibe-check text`	Get all page text
`vibe-check text "<selector>"`	Get text of a specific element
`vibe-check html`	Get page HTML
`vibe-check find "<selector>"`	Element info (tag, text, bounding box)
`vibe-check find-all "<selector>"`	All matching elements
`vibe-check eval "<js>"`	Run JavaScript and print result
`vibe-check screenshot -o file.png`	Capture screenshot

Interaction

Command	Purpose
`vibe-check click "<selector>"`	Click an element
`vibe-check type "<selector>" "<text>"`	Type into an input
`vibe-check hover "<selector>"`	Hover over an element
`vibe-check scroll [direction]`	Scroll page
`vibe-check keys "<combo>"`	Press keys (Enter, Ctrl+a, etc.)
`vibe-check select "<selector>" "<value>"`	Pick a dropdown option

Waiting

Command	Purpose
`vibe-check wait "<selector>"`	Wait for element (visible/hidden/attached)

Tabs

Command	Purpose
`vibe-check tabs`	List open tabs
`vibe-check tab-new [url]`	Open new tab
`vibe-check tab-switch <index\|url>`	Switch tab
`vibe-check tab-close [index]`	Close tab

Daemon

Command	Purpose
`vibe-check daemon start`	Start background browser
`vibe-check daemon status`	Check if running
`vibe-check daemon stop`	Stop daemon

Reading Order Recommendation

For efficient preparation, read in this order:

Start here: 01-foundations/01-skill-anatomy.md — understand what you're working with
Then: 03-skills-vs-mcp/01-architectural-comparison.md — the key architectural decision
Then: 02-vibium-deep-dive/01-cli-command-reference.md — the actual tool
Then: 02-vibium-deep-dive/02-actionability-checks.md — the "how it works under the hood" that impresses architects
Then: 04-building-test-framework/01-architecture-decisions.md — your framework design
Then: 05-webdriver-bidi/01-protocol-overview.md — the standard beneath everything
Then: 06-interview-preparation/01-architect-qa-scenarios.md — practice answers
Finally: 07-competitive-landscape/01-tool-comparison-matrix.md — know the alternatives

Source Repository

All analysis based on: VibiumDev/vibium (Apache 2.0, 2.6k+ stars, v0.1.7)

Key primary sources:

skills/vibe-check/SKILL.md — The skill definition
docs/explanation/actionability.md — Actionability checks with source code
docs/explanation/internals.md — Architecture internals
docs/explanation/webdriver-bidi.md — BiDi protocol explanation
V2-ROADMAP.md — Future direction

Additional sources:

Agent Skills for driving Browser for UI Test Automation: The Complete QA Engineer's Guide

Table of Contents

1. Foundations: How Agent Skills Work — 01-foundations/

2. Vibium Deep Dive — 02-vibium-deep-dive/

3. Skills vs MCP: When and Why — 03-skills-vs-mcp/

4. Building an AI Test Automation Framework — 04-building-test-framework/

5. WebDriver BiDi Protocol — 05-webdriver-bidi/

6. Interview Preparation — 06-interview-preparation/

7. Competitive Landscape — 07-competitive-landscape/

1. Foundations

What Are Agent Skills?

The SKILL.md Contract

How Selection Works

How Invocation Works

Key Files to Read

2. Vibium Deep Dive

What Is Vibium?

The vibe-check Skill

Architecture: Sense → Think → Act

The Five Actionability Checks

Daemon vs Oneshot

Key Files to Read

3. Skills vs MCP: When and Why

The Core Trade-off

When to Use Skills (CLI Approach)

When to Use MCP

Key Files to Read

4. Building an AI Test Automation Framework

Framework Architecture Overview

Core Design Principles

Key Files to Read

5. WebDriver BiDi Protocol

Why This Matters for Your Interview

Evolution: WebDriver → CDP → BiDi

How Vibium Uses BiDi

Key Files to Read

6. Interview Preparation

What Architects Will Ask About

Key Talking Points (The "Three Levels" Framework)

Key Files to Read

7. Competitive Landscape

The Current Map (2026)

Key Files to Read

Quick Reference: The vibe-check Skill Commands

Navigation

Reading Content

Interaction

Waiting

Tabs

Daemon

Reading Order Recommendation

Source Repository

1. Foundations: How Agent Skills Work — `01-foundations/`

2. Vibium Deep Dive — `02-vibium-deep-dive/`

3. Skills vs MCP: When and Why — `03-skills-vs-mcp/`

4. Building an AI Test Automation Framework — `04-building-test-framework/`

5. WebDriver BiDi Protocol — `05-webdriver-bidi/`

6. Interview Preparation — `06-interview-preparation/`

7. Competitive Landscape — `07-competitive-landscape/`

The `vibe-check` Skill