Token Economics: Why Skills Win on Cost

The Context Window is a Shared Resource

Every token in the context window competes for the same limited space. A coding agent typically needs context for:

Consumer	Typical Tokens	Priority
System prompt	2,000-5,000	Fixed
Conversation history	10,000-50,000	Grows
File contents (code being edited)	5,000-30,000	Essential
Tool definitions (MCP, built-in)	2,000-15,000	Fixed per tool
Agent reasoning	5,000-20,000	Essential
Total budget	~200,000

When you add browser automation, you're competing with code editing, test analysis, and reasoning for that same budget.

MCP Token Cost: The Tool Schema Tax

Every MCP tool exposes a JSON schema that's loaded into every API request:

{
  "name": "browser_click",
  "description": "Click an element on the page",
  "inputSchema": {
    "type": "object",
    "properties": {
      "selector": {
        "type": "string",
        "description": "CSS selector for the element to click"
      },
      "timeout": {
        "type": "number",
        "description": "Maximum wait time in milliseconds",
        "default": 30000
      },
      "force": {
        "type": "boolean",
        "description": "Force click even if element is not actionable",
        "default": false
      }
    },
    "required": ["selector"]
  }
}

A typical Playwright MCP server exposes 15-25 tools. Each tool definition costs 200-500 tokens. That's 3,000-12,500 tokens just for the tool schemas — loaded on EVERY API call.

Plus, MCP responses often include accessibility trees (structured DOM representations):

[role="main"] Main Content
  [role="navigation"] Nav
    [role="link"] "Home" [ref=1]
    [role="link"] "About" [ref=2]
  [role="heading"] "Welcome" [ref=3]
  [role="textbox"] "Search..." [ref=4]
  [role="button"] "Submit" [ref=5]
  ... (hundreds of elements)

A single accessibility snapshot can cost 2,000-10,000 tokens depending on page complexity.

MCP Total Cost Per Turn

Tool schemas:          ~5,000 tokens  (fixed, every turn)
Accessibility tree:    ~5,000 tokens  (per page interaction)
Tool call/response:    ~200 tokens    (per command)
─────────────────────────────────────
Total per interaction: ~10,200 tokens

Over a 20-step test:

20 steps × 10,200 tokens = ~204,000 tokens (may exceed context window!)

Skill Token Cost: The Markdown Injection

The vibe-check SKILL.md is ~100 lines, which translates to roughly 800-1,200 tokens — loaded once when the skill is invoked, then persists in context.

Skill Total Cost Per Turn

SKILL.md injection:    ~1,000 tokens  (once, not every turn)
Skill description:     ~50 tokens     (in tool listing, every turn)
Bash command:          ~30 tokens     (per command)
Command output:        ~100 tokens    (per command, just text)
─────────────────────────────────────
Total per interaction: ~130 tokens    (after initial injection)

Over a 20-step test:

Initial:    ~1,050 tokens
20 steps:   20 × 130 = ~2,600 tokens
Total:      ~3,650 tokens

Side-by-Side Comparison

Metric	MCP (Playwright)	Skill (vibe-check)	Ratio
Fixed overhead per turn	~5,000 tokens	~50 tokens	100x cheaper
Per interaction	~10,200 tokens	~130 tokens	78x cheaper
20-step test total	~204,000 tokens	~3,650 tokens	56x cheaper
Context window remaining	~0 (exhausted)	~196,000 tokens	N/A

The skill approach leaves 98% of the context window available for reasoning, code analysis, and file contents. The MCP approach can exhaust the entire context window on browser interactions alone.

Why This Matters for Test Automation

A typical test automation session involves:

Reading test specifications
Writing test code
Running browser interactions
Analyzing results
Debugging failures
Generating reports

If browser interactions consume 50-100% of your context budget (MCP approach), the agent can't effectively do the other tasks. It starts "forgetting" earlier parts of the conversation as the context compresses.

With the skill approach, browser interactions consume ~2% of the context budget, leaving the agent free to:

Hold an entire test suite in context
Reason about multi-page flows
Compare expected vs actual results with full detail
Maintain conversation history across long sessions

The Playwright Team's Own Acknowledgment

From the official Playwright documentation (2025/2026):

"Coding agents increasingly favor CLI-based workflows exposed as SKILLs over MCP because CLI invocations are more token-efficient — they avoid loading large tool schemas and verbose accessibility trees into the model context."

This isn't a third-party opinion — it's from the team that builds the MCP server itself.

When MCP Token Cost Is Justified

MCP's higher token cost buys you:

Structured page understanding — The accessibility tree gives the agent semantic knowledge about what's on the page (role, labels, relationships)
Iterative exploration — The agent can repeatedly query page structure to find elements
Rich error context — MCP returns structured errors with element state information
Session continuity — MCP maintains browser session state on the server side

For exploratory testing where you don't know what you're looking for, this is valuable. For scripted test automation where you know the selectors and flow, it's waste.

Interview Talking Point

"When we evaluated browser automation approaches, token economics was the deciding factor. Our test sessions typically run 30-50 browser commands alongside code analysis and test writing. With MCP, that would consume our entire 200K context window on browser interactions alone. With the CLI skill approach, browser commands take about 2% of our budget, leaving 98% for reasoning about test logic, analyzing failures, and maintaining context across the entire session. The Playwright team themselves acknowledged this trade-off in their docs."