Skills vs MCP: Architectural Comparison

The Fundamental Difference

MCP connects Claude to external capabilities (tools, data, APIs). Skills teach Claude how to use those capabilities for specific domains.

If MCP provides the "kitchen and ingredients," Skills provide the "recipes."

They are not competing alternatives — they are complementary layers. But for browser automation specifically, choosing one over the other (or using both) has significant implications.

How Each Approach Works

MCP Server Approach (e.g., Playwright MCP)

┌────────────────┐    stdio/SSE     ┌──────────────────┐    CDP/BiDi   ┌─────────┐
│  Claude Code   │◄────────────────►│  MCP Server      │◄─────────────►│ Browser │
│                │                  │  (playwright-mcp)│               │         │
│  Tools:        │                  │                  │               │         │
│  - browser_    │                  │  Runs:           │               │         │
│    navigate    │                  │  - Browser pool  │               │         │
│  - browser_    │                  │  - A11y snapshots│               │         │
│    click       │                  │  - State mgmt    │               │         │
│  - browser_    │                  │                  │               │         │
│    screenshot  │                  │                  │               │         │
│  (15-25 tools) │                  │                  │               │         │
└────────────────┘                  └──────────────────┘               └─────────┘

What's loaded per API call:

15-25 tool schemas (~5,000-12,500 tokens)
Accessibility tree per page (~2,000-10,000 tokens)
Tool call/response JSON (~200 tokens each)

Skill Approach (e.g., vibe-check)

┌────────────────┐                   ┌──────────────────┐    BiDi      ┌─────────┐
│  Claude Code   │                   │  Clicker Daemon  │◄────────────►│ Browser │
│                │     Bash tool     │  (Go binary)     │              │         │
│  Context:      │────────────────►  │                  │              │         │
│  - SKILL.md    │  "vibe-check      │  Runs:           │              │         │
│    (~1K tokens)│   click 'button'" │  - BiDi proxy    │              │         │
│                │                   │  - Actionability │              │         │
│  Tools:        │◄────────────────  │  - Auto-wait     │              │         │
│  - Bash        │  "clicked: true"  │                  │              │         │
│  (1 tool)      │                   │                  │              │         │
└────────────────┘                   └──────────────────┘              └─────────┘

What's loaded per API call:

SKILL.md content (~1,000 tokens, once)
Skill description in tool list (~50 tokens)
Bash tool schema (~200 tokens, shared with all Bash usage)
Command output as plain text (~50-200 tokens)

Detailed Comparison

Token Economics

Metric	MCP (Playwright)	Skill (vibe-check)
Tool schema overhead	~5,000-12,500 tokens/turn	~50 tokens/turn
Per-interaction cost	~5,000-10,000 tokens	~130 tokens
Context after 20 steps	~60-80% consumed	~2% consumed
Remaining for reasoning	Limited	Abundant

State Management

Aspect	MCP	Skill
Browser session	Server-managed, persistent	Daemon-managed, persistent
Page state awareness	Accessibility tree (rich, semantic)	Text/screenshot (simple, visual)
Cross-command continuity	Automatic (MCP server holds context)	Automatic (daemon holds browser)
State inspection	Structured data (roles, labels, refs)	Text output + screenshots

Error Handling

Aspect	MCP	Skill
Error format	Structured JSON	Exit code + stderr text
Error detail	Element state, selector info, page context	Actionability failure reason
Agent recovery	Rich context for reasoning	Simpler but sufficient

Setup & Maintenance

Aspect	MCP	Skill
Installation	`claude mcp add name -- command`	`npx skills add repo --skill name`
Runtime dependency	MCP server process must be running	Daemon auto-starts on first command
Updates	Update MCP server package	Re-install skill (or auto-update)
Configuration	Server config file (ports, browser path, etc.)	Minimal (--headless, --oneshot flags)

Decision Matrix: When to Use What

Use Skills (CLI) When:

Your agent juggles many tasks — writing code, running tests, AND driving a browser. Token budget must be shared.
You know the selectors — your tests target specific CSS selectors, not "find something that looks like a login button."
You're in CI/CD — token costs compound across hundreds of test runs. CLI commands are cheap.
You want composability — CLI commands chain with other tools (grep, jq, awk) via pipes.
Your tests are procedural — "navigate here, click that, type this, verify that" flows.

# Skills shine here: scripted, known selectors, fast
vibe-check navigate https://app.example.com/login
vibe-check type "#email" "test@example.com"
vibe-check type "#password" "secret"
vibe-check click "#submit"
vibe-check wait ".dashboard"
vibe-check text ".welcome-msg"  # → "Welcome, Test User"

Use MCP When:

Exploratory testing — you're discovering the UI, not testing known flows.
Accessibility analysis — you need semantic understanding of page structure (ARIA roles, labels, landmarks).
Self-healing loops — the agent needs to iteratively query page structure when selectors break.
Long autonomous sessions — the agent runs for minutes, making decisions based on rich page context.
Non-developer users — MCP allows "click the big blue button" without knowing CSS selectors.

# MCP shines here: exploratory, semantic understanding
User: "Go to our app and check if all form fields have proper labels"
Agent: *uses accessibility tree to find unlabeled inputs*
Agent: *reasons about ARIA attributes*
Agent: *reports accessibility violations*

Use Both When:

Hybrid framework — MCP for discovery/exploration phases, skill for execution phases.
Different test types — Accessibility tests via MCP, functional tests via skill.
Development vs CI — MCP during development (rich feedback), skill in CI (fast, cheap).

The Playwright Team's Own Assessment

From Playwright's official documentation (2025/2026):

"Coding agents increasingly favor CLI-based workflows exposed as SKILLs over MCP because CLI invocations are more token-efficient — they avoid loading large tool schemas and verbose accessibility trees into the model context."

"MCP remains relevant for specialized agentic loops that benefit from persistent state, rich introspection, and iterative reasoning over page structure."

This is not a third-party opinion — it's the team that builds the MCP server acknowledging the trade-off.

Architecture Diagram: Hybrid Approach

                        ┌──────────────────────────┐
                        │      Claude Code Agent    │
                        │                           │
                        │  ┌────────┐ ┌──────────┐ │
                        │  │ Skills │ │ MCP Tools│ │
                        │  │(vibe-  │ │(browser_ │ │
                        │  │ check) │ │ a11y)    │ │
                        │  └───┬────┘ └────┬─────┘ │
                        └──────┼───────────┼───────┘
                               │           │
                    Bash tool  │           │ MCP protocol
                               │           │
                        ┌──────▼──┐  ┌─────▼────────┐
                        │ Vibium  │  │ Playwright   │
                        │ Daemon  │  │ MCP Server   │
                        └────┬────┘  └──────┬───────┘
                             │              │
                        BiDi │         CDP  │
                             │              │
                        ┌────▼──────────────▼──────┐
                        │        Chrome            │
                        └──────────────────────────┘

In this hybrid model:

vibe-check handles all interaction commands (navigate, click, type, screenshot) — cheap
Playwright MCP handles accessibility analysis and page discovery — rich but expensive
Each is used where it's strongest

Interview Talking Point

"We evaluated both MCP and CLI-skill approaches for browser automation. The key trade-off is between richness and efficiency. MCP gives you semantic page understanding via accessibility trees and structured tool schemas, but at a cost of 5,000-10,000+ tokens per interaction. CLI skills give you 22 browser commands for about 130 tokens per interaction — 50x cheaper. For a 20-step test, MCP might consume 80% of the context window while skills use 2%. We chose skills as the primary interface for automated test execution, with MCP available for exploratory phases where semantic page understanding matters. Even Playwright's own team acknowledges that CLI+Skills is more token-efficient for coding agents."