Token Budget Analysis: Real Numbers

Context Window Sizes (2026)

Model	Context Window	Effective Budget*
Claude Opus 4.6	200K tokens	~150K usable
Claude Sonnet 4.5	200K tokens	~150K usable
Claude Haiku 4.5	200K tokens	~150K usable
GPT-4o	128K tokens	~100K usable
Gemini Pro	1M+ tokens	~750K usable

*Effective budget accounts for system prompts, tool definitions, and overhead.

MCP Token Breakdown: A Real Playwright Session

Tool Schema Cost (Per API Call)

Playwright MCP exposes these tools (actual schema sizes measured):

Tool	Schema Tokens
`browser_launch`	~180
`browser_navigate`	~220
`browser_screenshot`	~200
`browser_click`	~250
`browser_type`	~280
`browser_find`	~240
`browser_hover`	~200
`browser_select_option`	~260
`browser_wait_for_selector`	~250
`browser_evaluate`	~220
`browser_go_back`	~150
`browser_go_forward`	~150
`browser_close`	~150
`browser_get_text`	~200
`browser_get_attribute`	~220
Total per API call	~3,370

This is loaded into EVERY API request, even when the agent isn't doing browser work.

Accessibility Tree Cost (Per Page Read)

A typical web page accessibility snapshot:

Page Complexity	Elements	A11y Tree Tokens
Simple (landing page)	20-50	~500-1,500
Medium (form page)	50-200	~1,500-5,000
Complex (dashboard)	200-500	~5,000-15,000
Data-heavy (table)	500-2000	~15,000-50,000+

A Realistic 20-Step Login Test via MCP

Step 1:  browser_launch             → 150 tokens (call) + 100 (response)
Step 2:  browser_navigate           → 200 tokens + 100
Step 3:  [a11y tree loaded]         → 3,000 tokens (login form)
Step 4:  browser_type (email)       → 180 tokens + 80
Step 5:  browser_type (password)    → 180 tokens + 80
Step 6:  browser_click (submit)     → 160 tokens + 80
Step 7:  [a11y tree loaded]         → 5,000 tokens (dashboard)
Step 8:  browser_get_text (heading) → 150 tokens + 80
Step 9:  browser_screenshot         → 150 tokens + 100
Step 10: browser_navigate (profile) → 200 tokens + 100
Step 11: [a11y tree loaded]         → 4,000 tokens (profile page)
Step 12: browser_get_text (name)    → 150 tokens + 80
Step 13: browser_click (edit)       → 160 tokens + 80
Step 14: [a11y tree loaded]         → 4,500 tokens (edit form)
Step 15: browser_type (phone)       → 180 tokens + 80
Step 16: browser_click (save)       → 160 tokens + 80
Step 17: [a11y tree loaded]         → 4,000 tokens (profile updated)
Step 18: browser_get_text (success) → 150 tokens + 80
Step 19: browser_screenshot         → 150 tokens + 100
Step 20: browser_close              → 120 tokens + 60

Tool schemas (loaded every turn):    3,370 × 20 = 67,400 tokens
A11y trees:                          20,500 tokens
Tool calls + responses:              ~4,200 tokens
────────────────────────────────────────────────────
TOTAL:                               ~92,100 tokens

That's ~61% of usable context on a simple login test.

Skill Token Breakdown: The Same Test via vibe-check

Skill Loading Cost (Once)

Component	Tokens
SKILL.md injection	~1,000
Skill description in tool list	~50 per turn

The Same 20-Step Login Test via Skill

Step 0:  Skill invoked (SKILL.md loaded)  → 1,000 tokens (once)

Step 1:  Bash("vibe-check daemon start")           → 30 + 20 = 50
Step 2:  Bash("vibe-check navigate https://...")    → 40 + 20 = 60
Step 3:  Bash("vibe-check type '#email' 'user'")   → 45 + 20 = 65
Step 4:  Bash("vibe-check type '#pass' 'secret'")  → 45 + 20 = 65
Step 5:  Bash("vibe-check click '#submit'")         → 35 + 20 = 55
Step 6:  Bash("vibe-check wait '.dashboard'")       → 35 + 20 = 55
Step 7:  Bash("vibe-check text 'h1'")               → 30 + 30 = 60
Step 8:  Bash("vibe-check screenshot -o s1.png")    → 40 + 20 = 60
Step 9:  Bash("vibe-check navigate .../profile")    → 40 + 20 = 60
Step 10: Bash("vibe-check text '.name'")            → 30 + 30 = 60
Step 11: Bash("vibe-check click '#edit'")           → 35 + 20 = 55
Step 12: Bash("vibe-check type '#phone' '555'")     → 40 + 20 = 60
Step 13: Bash("vibe-check click '#save'")           → 35 + 20 = 55
Step 14: Bash("vibe-check wait '.success'")         → 35 + 20 = 55
Step 15: Bash("vibe-check text '.success'")         → 35 + 30 = 65
Step 16: Bash("vibe-check screenshot -o s2.png")    → 40 + 20 = 60
Step 17-20: verification + cleanup                  → ~200

Skill description (per turn): 50 × 20 =              1,000 tokens
Bash tool schema (shared, not browser-specific):      ~200 per turn (shared)
Skill initial load:                                   1,000 tokens
Commands + responses:                                 ~1,200 tokens
────────────────────────────────────────────────────
TOTAL:                                                ~3,200 tokens (browser-specific)

That's ~2% of usable context.

Side-by-Side Summary

Metric	MCP	Skill	Difference
20-step test total	~92,100 tokens	~3,200 tokens	29x cheaper
Context consumed	~61%	~2%	59% more available
Context remaining	~58K tokens	~147K tokens	2.5x more headroom
Cost at $15/M tokens (input)	~$1.38	~$0.05	28x cheaper

What You Do with the Saved Context

The 147K tokens saved by using skills instead of MCP can hold:

Content	Approximate Tokens	What It Enables
10 source files (~200 lines each)	~30,000	Agent reads and modifies code
Full test suite definition	~10,000	Agent understands all tests
Error analysis + debugging	~20,000	Agent reasons about failures
Conversation history	~50,000	Agent remembers earlier context
Total additional capacity	~110,000	A richer, more capable agent

When Token Cost Doesn't Matter

If you're using Gemini with a 1M+ context window, the token efficiency argument is much weaker. You can afford MCP's overhead and still have plenty of context.

However:

API cost still matters (you pay per token)
Latency scales with tokens (more tokens = slower responses)
Quality can degrade with very long contexts (attention dilution)

So even with large windows, keeping things lean is still beneficial.

Interview Talking Point

"I've done the math on token costs. A 20-step test via MCP consumes about 92,000 tokens — 61% of a 200K context window — primarily from tool schemas loaded every turn and accessibility trees. The same test via CLI skill costs about 3,200 tokens — 2% of the window. That's a 29x reduction. More importantly, it means the agent has 147K tokens available for reasoning about test logic, analyzing failures, and maintaining context across the session. At $15 per million input tokens, each test run costs $1.38 via MCP versus $0.05 via skills. Across hundreds of CI runs per day, that's the difference between a viable approach and an unsustainable one."