QA Engineer Skills 2026QA-2026Token Budget Analysis: Real Numbers

Token Budget Analysis: Real Numbers

Context Window Sizes (2026)

Model Context Window Effective Budget*
Claude Opus 4.6 200K tokens ~150K usable
Claude Sonnet 4.5 200K tokens ~150K usable
Claude Haiku 4.5 200K tokens ~150K usable
GPT-4o 128K tokens ~100K usable
Gemini Pro 1M+ tokens ~750K usable

*Effective budget accounts for system prompts, tool definitions, and overhead.


MCP Token Breakdown: A Real Playwright Session

Tool Schema Cost (Per API Call)

Playwright MCP exposes these tools (actual schema sizes measured):

Tool Schema Tokens
browser_launch ~180
browser_navigate ~220
browser_screenshot ~200
browser_click ~250
browser_type ~280
browser_find ~240
browser_hover ~200
browser_select_option ~260
browser_wait_for_selector ~250
browser_evaluate ~220
browser_go_back ~150
browser_go_forward ~150
browser_close ~150
browser_get_text ~200
browser_get_attribute ~220
Total per API call ~3,370

This is loaded into EVERY API request, even when the agent isn't doing browser work.

Accessibility Tree Cost (Per Page Read)

A typical web page accessibility snapshot:

Page Complexity Elements A11y Tree Tokens
Simple (landing page) 20-50 ~500-1,500
Medium (form page) 50-200 ~1,500-5,000
Complex (dashboard) 200-500 ~5,000-15,000
Data-heavy (table) 500-2000 ~15,000-50,000+

A Realistic 20-Step Login Test via MCP

Step 1:  browser_launch             → 150 tokens (call) + 100 (response)
Step 2:  browser_navigate           → 200 tokens + 100
Step 3:  [a11y tree loaded]         → 3,000 tokens (login form)
Step 4:  browser_type (email)       → 180 tokens + 80
Step 5:  browser_type (password)    → 180 tokens + 80
Step 6:  browser_click (submit)     → 160 tokens + 80
Step 7:  [a11y tree loaded]         → 5,000 tokens (dashboard)
Step 8:  browser_get_text (heading) → 150 tokens + 80
Step 9:  browser_screenshot         → 150 tokens + 100
Step 10: browser_navigate (profile) → 200 tokens + 100
Step 11: [a11y tree loaded]         → 4,000 tokens (profile page)
Step 12: browser_get_text (name)    → 150 tokens + 80
Step 13: browser_click (edit)       → 160 tokens + 80
Step 14: [a11y tree loaded]         → 4,500 tokens (edit form)
Step 15: browser_type (phone)       → 180 tokens + 80
Step 16: browser_click (save)       → 160 tokens + 80
Step 17: [a11y tree loaded]         → 4,000 tokens (profile updated)
Step 18: browser_get_text (success) → 150 tokens + 80
Step 19: browser_screenshot         → 150 tokens + 100
Step 20: browser_close              → 120 tokens + 60

Tool schemas (loaded every turn):    3,370 × 20 = 67,400 tokens
A11y trees:                          20,500 tokens
Tool calls + responses:              ~4,200 tokens
────────────────────────────────────────────────────
TOTAL:                               ~92,100 tokens

That's ~61% of usable context on a simple login test.


Skill Token Breakdown: The Same Test via vibe-check

Skill Loading Cost (Once)

Component Tokens
SKILL.md injection ~1,000
Skill description in tool list ~50 per turn

The Same 20-Step Login Test via Skill

Step 0:  Skill invoked (SKILL.md loaded)  → 1,000 tokens (once)

Step 1:  Bash("vibe-check daemon start")           → 30 + 20 = 50
Step 2:  Bash("vibe-check navigate https://...")    → 40 + 20 = 60
Step 3:  Bash("vibe-check type '#email' 'user'")   → 45 + 20 = 65
Step 4:  Bash("vibe-check type '#pass' 'secret'")  → 45 + 20 = 65
Step 5:  Bash("vibe-check click '#submit'")         → 35 + 20 = 55
Step 6:  Bash("vibe-check wait '.dashboard'")       → 35 + 20 = 55
Step 7:  Bash("vibe-check text 'h1'")               → 30 + 30 = 60
Step 8:  Bash("vibe-check screenshot -o s1.png")    → 40 + 20 = 60
Step 9:  Bash("vibe-check navigate .../profile")    → 40 + 20 = 60
Step 10: Bash("vibe-check text '.name'")            → 30 + 30 = 60
Step 11: Bash("vibe-check click '#edit'")           → 35 + 20 = 55
Step 12: Bash("vibe-check type '#phone' '555'")     → 40 + 20 = 60
Step 13: Bash("vibe-check click '#save'")           → 35 + 20 = 55
Step 14: Bash("vibe-check wait '.success'")         → 35 + 20 = 55
Step 15: Bash("vibe-check text '.success'")         → 35 + 30 = 65
Step 16: Bash("vibe-check screenshot -o s2.png")    → 40 + 20 = 60
Step 17-20: verification + cleanup                  → ~200

Skill description (per turn): 50 × 20 =              1,000 tokens
Bash tool schema (shared, not browser-specific):      ~200 per turn (shared)
Skill initial load:                                   1,000 tokens
Commands + responses:                                 ~1,200 tokens
────────────────────────────────────────────────────
TOTAL:                                                ~3,200 tokens (browser-specific)

That's ~2% of usable context.


Side-by-Side Summary

Metric MCP Skill Difference
20-step test total ~92,100 tokens ~3,200 tokens 29x cheaper
Context consumed ~61% ~2% 59% more available
Context remaining ~58K tokens ~147K tokens 2.5x more headroom
Cost at $15/M tokens (input) ~$1.38 ~$0.05 28x cheaper

What You Do with the Saved Context

The 147K tokens saved by using skills instead of MCP can hold:

Content Approximate Tokens What It Enables
10 source files (~200 lines each) ~30,000 Agent reads and modifies code
Full test suite definition ~10,000 Agent understands all tests
Error analysis + debugging ~20,000 Agent reasons about failures
Conversation history ~50,000 Agent remembers earlier context
Total additional capacity ~110,000 A richer, more capable agent

When Token Cost Doesn't Matter

If you're using Gemini with a 1M+ context window, the token efficiency argument is much weaker. You can afford MCP's overhead and still have plenty of context.

However:

  • API cost still matters (you pay per token)
  • Latency scales with tokens (more tokens = slower responses)
  • Quality can degrade with very long contexts (attention dilution)

So even with large windows, keeping things lean is still beneficial.


Interview Talking Point

"I've done the math on token costs. A 20-step test via MCP consumes about 92,000 tokens — 61% of a 200K context window — primarily from tool schemas loaded every turn and accessibility trees. The same test via CLI skill costs about 3,200 tokens — 2% of the window. That's a 29x reduction. More importantly, it means the agent has 147K tokens available for reasoning about test logic, analyzing failures, and maintaining context across the session. At $15 per million input tokens, each test run costs $1.38 via MCP versus $0.05 via skills. Across hundreds of CI runs per day, that's the difference between a viable approach and an unsustainable one."