QA Engineer Skills 2026QA-2026Agent Skills for driving Browser for UI Test Automation: The Complete QA Engineer's Guide

Agent Skills for driving Browser for UI Test Automation: The Complete QA Engineer's Guide

Prepared for: QA Engineer gentle introduction to the roles where heavy AI usage is expected. Focus: Driving a browser through agent skills (not MCP servers), building an AI-augmented test automation framework, and speaking about it credibly with architect-level developers.


Table of Contents

1. Foundations: How Agent Skills Work01-foundations/

2. Vibium Deep Dive02-vibium-deep-dive/

3. Skills vs MCP: When and Why03-skills-vs-mcp/

4. Building an AI Test Automation Framework04-building-test-framework/

5. WebDriver BiDi Protocol05-webdriver-bidi/

6. Interview Preparation06-interview-preparation/

7. Competitive Landscape07-competitive-landscape/


1. Foundations

Subfolder: `01-foundations/`

What Are Agent Skills?

Agent skills are reusable capability packages for AI coding agents. Unlike MCP servers that expose tool schemas over a protocol (adding thousands of tokens to context), skills inject procedural knowledge — a markdown file that teaches the agent how to accomplish domain-specific tasks using existing tools (Bash, Read, Write, etc.).

The critical insight: Skills don't add new tools. They teach the agent how to use existing tools for a specific domain.

The SKILL.md Contract

Every skill is defined by a single SKILL.md file with two sections:

---
name: vibe-check        # Lowercase, hyphens only, max 64 chars
description: |           # THE selection signal — Claude reads this to decide if the skill applies
  Browser automation via CLI. Navigate pages, click elements,
  fill forms, take screenshots, extract text.
allowed-tools: Bash      # What tools the skill may use
---

# Instructions for the agent (markdown content)
The `vibe-check` CLI automates Chrome via the command line...

How Selection Works

There is no algorithmic routing. Claude receives a formatted list of all available skills inside the Skill tool description. When a user asks something like "take a screenshot of this page," Claude's language model matches intent to skill descriptions through its forward pass — no embeddings, no classifiers, just comprehension.

How Invocation Works

When Claude decides to invoke a skill:

  1. A visible "loading" message appears to the user
  2. The SKILL.md content is injected as a hidden system message into conversation context
  3. Tool permissions from allowed-tools are temporarily granted
  4. Claude executes the skill's instructions using available tools (primarily Bash for CLI skills)
  5. Permissions revert when the skill completes

Key Files to Read

  • 01-foundations/01-skill-anatomy.md — Full breakdown of SKILL.md structure with annotated examples
  • 01-foundations/02-skill-lifecycle.md — Discovery, selection, invocation, execution, teardown
  • 01-foundations/03-token-economics.md — Why skills cost dozens of tokens vs thousands for MCP

2. Vibium Deep Dive

Subfolder: `02-vibium-deep-dive/`

What Is Vibium?

Vibium is browser automation infrastructure built by the creator of Selenium and Appium. It's a single Go binary (~10MB) that:

  • Launches and manages Chrome via WebDriver BiDi
  • Exposes 22 CLI commands for browser control
  • Runs as a daemon (browser persists between commands) or oneshot (fresh browser per command)
  • Also provides an MCP server and JS/Python client libraries

The vibe-check Skill

The vibe-check skill (from skills/vibe-check/SKILL.md) is a single SKILL.md file that teaches an AI agent all 22 CLI commands. When installed, the agent can drive Chrome through Bash:

# The agent executes these through the Bash tool
vibe-check navigate https://app.example.com/login
vibe-check type "input[name=email]" "user@test.com"
vibe-check type "input[name=password]" "secret123"
vibe-check click "button[type=submit]"
vibe-check wait "h1"
vibe-check text "h1"                    # → "Welcome, User"
vibe-check screenshot -o dashboard.png

Architecture: Sense → Think → Act

Vibium's roadmap follows a robotics control loop:

Layer Component Status Purpose
Act Clicker Shipped (V1) Browser automation via BiDi
Sense Retina V2 planned Chrome extension that observes everything
Think Cortex V2 planned SQLite-backed memory + navigation planning

The Five Actionability Checks

Before any interaction, Vibium verifies server-side (in Go, not in client code):

  1. Visible — Element has non-zero size, not display:none or visibility:hidden
  2. Stable — Position unchanged over 50ms (catches animations)
  3. ReceivesEvents — Not obscured by another element (elementFromPoint check)
  4. Enabled — Not disabled, aria-disabled, or inside disabled <fieldset>
  5. Editable — (for type only) Accepts text input, not readonly

These run in a polling loop with 100ms intervals until all pass or timeout (default 30s).

Daemon vs Oneshot

Mode How Best For
Daemon (default) Background process keeps browser alive Interactive sessions, chaining commands
Oneshot Fresh browser per command, torn down after CI pipelines, isolated test runs

Key Files to Read

  • 02-vibium-deep-dive/01-cli-command-reference.md — All 22 commands with examples and edge cases
  • 02-vibium-deep-dive/02-actionability-checks.md — The five checks with actual JavaScript source code
  • 02-vibium-deep-dive/03-daemon-architecture.md — Process model, cleanup, zombie prevention
  • 02-vibium-deep-dive/04-extension-commands.md — How vibium:find, vibium:click, vibium:type work over BiDi

3. Skills vs MCP: When and Why

Subfolder: `03-skills-vs-mcp/`

The Core Trade-off

Dimension Skills (CLI) MCP Server
Token cost Dozens (SKILL.md ~200-500 lines) Thousands (tool schemas + accessibility trees)
State management Stateless between commands (daemon handles browser state) Persistent connection with rich introspection
Setup npx skills add <repo> claude mcp add <name> -- <command>
How agent interacts Bash tool executes CLI commands Dedicated MCP tools (browser_click, etc.)
Error handling Exit codes + stderr Structured JSON error responses
Best for High-throughput agents balancing many tasks Exploratory automation, self-healing loops

When to Use Skills (CLI Approach)

  • Your agent is doing more than just browser work (writing code, running tests, reading files)
  • You need to minimize context window consumption
  • You want simple, composable commands that chain with other CLI tools
  • You're in a CI/CD pipeline where token costs matter
  • Playwright's own docs now acknowledge CLI+Skills is more token-efficient

When to Use MCP

  • Exploratory testing where the agent needs rich page introspection
  • Self-healing test flows that require iterative reasoning over DOM structure
  • Long-running autonomous workflows with continuous browser context
  • You need accessibility tree analysis for semantic understanding

Key Files to Read

  • 03-skills-vs-mcp/01-architectural-comparison.md — Deep side-by-side with diagrams
  • 03-skills-vs-mcp/02-token-budget-analysis.md — Real numbers: how many tokens each approach costs
  • 03-skills-vs-mcp/03-hybrid-strategies.md — Using both together in one framework

4. Building an AI Test Automation Framework

Subfolder: `04-building-test-framework/`

Framework Architecture Overview

┌─────────────────────────────────────────────────────┐
│                    Test Runner                      │
│  (pytest / Jest / custom orchestrator)              │
├─────────────────────────────────────────────────────┤
│              AI Agent Layer (Claude Code)           │
│  ┌─────────────┐  ┌──────────────┐  ┌────────────┐  │
│  │ vibe-check  │  │ Test Skills  │  │ Reporting  │  │
│  │ skill       │  │ (custom)     │  │ skill      │  │
│  └──────┬──────┘  └──────┬───────┘  └─────┬──────┘  │
├─────────┼────────────────┼────────────────┼─────────┤
│         │    Bash Tool   │                │         │
│         ▼                ▼                ▼         │
│  ┌─────────────┐  ┌─────────────┐  ┌────────────┐   │
│  │ Vibium CLI  │  │ Test Utils  │  │ Report Gen │   │
│  │ (vibe-check)│  │ (scripts)   │  │ (scripts)  │   │
│  └──────┬──────┘  └─────────────┘  └────────────┘   │
│         │                                           │
│         ▼                                           │
│  ┌─────────────┐                                    │
│  │  Chrome     │                                    │
│  │  (BiDi)     │                                    │
│  └─────────────┘                                    │
└─────────────────────────────────────────────────────┘

Core Design Principles

  1. Agent as orchestrator, not executor — The AI agent decides what to test and how to interact, the framework handles mechanics
  2. Natural language test definitions — Tests describe intent ("verify login works with valid credentials") not implementation
  3. Self-healing selectors — When a selector fails, the agent uses vibe-check find-all and vibe-check text to reason about alternatives
  4. Screenshot-driven debugging — Every failure produces a screenshot + page text for the agent to analyze

Key Files to Read

  • 04-building-test-framework/01-architecture-decisions.md — ADRs for the framework
  • 04-building-test-framework/02-test-patterns.md — Patterns for AI-driven tests (navigation, forms, assertions, data extraction)
  • 04-building-test-framework/03-ci-cd-integration.md — Running in GitHub Actions, handling headless mode, artifacts
  • 04-building-test-framework/04-reporting-and-observability.md — What to capture, how to present results
  • 04-building-test-framework/05-self-healing-strategies.md — How agents recover from broken selectors

5. WebDriver BiDi Protocol

Subfolder: `05-webdriver-bidi/`

Why This Matters for Your Interview

WebDriver BiDi is the W3C standard that Vibium is built on. Understanding it shows you know the layer below the tools — critical for architect-level conversations.

Evolution: WebDriver → CDP → BiDi

Protocol Year Transport Direction Owned By
WebDriver 2018 (W3C) HTTP+JSON Request/Response W3C
CDP 2017 WebSocket Bidirectional Google
BiDi 2021+ WebSocket+JSON Bidirectional W3C

BiDi combines the best of both: standardized like WebDriver, bidirectional like CDP, cross-browser by design.

How Vibium Uses BiDi

Client (JS/Python/CLI)
    │
    ▼ WebSocket
Clicker (Go binary) ← BiDi Proxy
    │
    ▼ WebSocket
Chrome (BiDi endpoint)

Vibium's Go binary sits as a proxy between clients and Chrome. It:

  • Routes standard BiDi commands directly to Chrome
  • Intercepts custom vibium:* extension commands and handles them server-side
  • Implements actionability checks by sending JavaScript evaluation commands to Chrome

Key Files to Read

  • 05-webdriver-bidi/01-protocol-overview.md — Message format, sessions, commands vs events
  • 05-webdriver-bidi/02-evolution-from-selenium.md — Historical context from WebDriver to BiDi
  • 05-webdriver-bidi/03-vibium-extension-commands.md — How vibium:find/click/type extend the protocol

6. Interview Preparation

Subfolder: `06-interview-preparation/`

What Architects Will Ask About

  1. "Why not just use Playwright/Selenium directly?" — You need to articulate the agent-native advantage
  2. "How do you handle flaky tests with AI?" — Self-healing selectors, intelligent retries, screenshot analysis
  3. "What's your CI/CD strategy?" — Headless mode, oneshot daemon, artifact collection, parallelization
  4. "How does this scale?" — Token costs, daemon pooling, test isolation
  5. "What about test maintenance?" — The key selling point: natural language tests + agent reasoning reduce maintenance burden by 60-85%

Key Talking Points (The "Three Levels" Framework)

When discussing your framework, structure answers at three levels:

Level 1 — The What (for PMs and non-technical stakeholders):

"We use AI agents that can drive a real browser just like a human would. They read pages, click buttons, fill forms, and verify results — but they do it through natural language instructions instead of brittle code."

Level 2 — The How (for senior engineers):

"The agent uses the vibe-check CLI skill which gives it 22 browser commands via Bash. Commands auto-wait for elements to be actionable using five server-side checks. The browser runs as a daemon for speed or oneshot for isolation. Under the hood it's WebDriver BiDi over WebSocket to Chrome."

Level 3 — The Why (for architects):

"We chose CLI skills over MCP for browser control because of token economics — a SKILL.md costs dozens of tokens vs thousands for MCP tool schemas. The skill approach means our agent's context window stays available for reasoning about test logic, analyzing failures, and writing code. The BiDi protocol is W3C-standardized, avoiding CDP vendor lock-in. Actionability is implemented server-side in Go so it's consistent across all client languages."

Key Files to Read

  • 06-interview-preparation/01-architect-qa-scenarios.md — 20 questions with detailed answers
  • 06-interview-preparation/02-framework-presentation.md — How to present your framework in 5, 15, and 30 minutes
  • 06-interview-preparation/03-buzzword-decoder.md — What people actually mean by "agentic testing", "self-healing", "ReAct pattern"

7. Competitive Landscape

Subfolder: `07-competitive-landscape/`

The Current Map (2026)

Tool Approach Best For Limitation
Vibium CLI skill + BiDi AI agent integration, token efficiency Young ecosystem, Chrome-only (V1)
Playwright MCP MCP server + accessibility tree Rich page understanding, exploratory testing High token cost, context bloat
browser-use Python library + vision models Visual testing, complex UIs Slow (vision API calls), expensive
agent-browser (Vercel) Snapshot + Refs Minimal context usage (93% reduction) New, limited ecosystem
Selenium 4+ WebDriver + BiDi support Enterprise legacy integration Heavy, complex setup
testRigor NL-first commercial platform Non-technical testers Vendor lock-in, cost

Key Files to Read

  • 07-competitive-landscape/01-tool-comparison-matrix.md — Detailed feature-by-feature comparison
  • 07-competitive-landscape/02-when-to-use-what.md — Decision framework for choosing the right tool
  • 07-competitive-landscape/03-future-directions.md — Where the industry is heading (AI locators, Cortex/Retina, video recording)

Quick Reference: The vibe-check Skill Commands

Navigation

Command Purpose
vibe-check navigate <url> Go to a page
vibe-check url Print current URL
vibe-check title Print page title

Reading Content

Command Purpose
vibe-check text Get all page text
vibe-check text "<selector>" Get text of a specific element
vibe-check html Get page HTML
vibe-check find "<selector>" Element info (tag, text, bounding box)
vibe-check find-all "<selector>" All matching elements
vibe-check eval "<js>" Run JavaScript and print result
vibe-check screenshot -o file.png Capture screenshot

Interaction

Command Purpose
vibe-check click "<selector>" Click an element
vibe-check type "<selector>" "<text>" Type into an input
vibe-check hover "<selector>" Hover over an element
vibe-check scroll [direction] Scroll page
vibe-check keys "<combo>" Press keys (Enter, Ctrl+a, etc.)
vibe-check select "<selector>" "<value>" Pick a dropdown option

Waiting

Command Purpose
vibe-check wait "<selector>" Wait for element (visible/hidden/attached)

Tabs

Command Purpose
vibe-check tabs List open tabs
vibe-check tab-new [url] Open new tab
vibe-check tab-switch <index|url> Switch tab
vibe-check tab-close [index] Close tab

Daemon

Command Purpose
vibe-check daemon start Start background browser
vibe-check daemon status Check if running
vibe-check daemon stop Stop daemon

Reading Order Recommendation

For efficient preparation, read in this order:

  1. Start here: 01-foundations/01-skill-anatomy.md — understand what you're working with
  2. Then: 03-skills-vs-mcp/01-architectural-comparison.md — the key architectural decision
  3. Then: 02-vibium-deep-dive/01-cli-command-reference.md — the actual tool
  4. Then: 02-vibium-deep-dive/02-actionability-checks.md — the "how it works under the hood" that impresses architects
  5. Then: 04-building-test-framework/01-architecture-decisions.md — your framework design
  6. Then: 05-webdriver-bidi/01-protocol-overview.md — the standard beneath everything
  7. Then: 06-interview-preparation/01-architect-qa-scenarios.md — practice answers
  8. Finally: 07-competitive-landscape/01-tool-comparison-matrix.md — know the alternatives

Source Repository

All analysis based on: VibiumDev/vibium (Apache 2.0, 2.6k+ stars, v0.1.7)

Key primary sources:

Additional sources: