WebDriver BiDi Protocol Overview
What It Is
WebDriver BiDi (Bidirectional) is a W3C standard for browser automation. It combines:
- The standardization of WebDriver (W3C, cross-browser)
- The bidirectionality of CDP (WebSocket, events, real-time)
Message Format
All communication is JSON over WebSocket.
Client → Browser (Commands)
{
"id": 1,
"method": "browsingContext.navigate",
"params": {
"context": "context-id-123",
"url": "https://example.com",
"wait": "complete"
}
}
Browser → Client (Responses)
{
"id": 1,
"type": "success",
"result": {
"navigation": "nav-id-456",
"url": "https://example.com"
}
}
Browser → Client (Events — Unsolicited)
{
"method": "log.entryAdded",
"params": {
"level": "error",
"text": "Uncaught TypeError: Cannot read property 'foo' of undefined",
"timestamp": 1707500000000
}
}
Events are pushed by the browser without the client asking — this is the "bidirectional" part.
Core Modules
Session Module
Manages the connection between client and browser.
// Create session
{"id": 1, "method": "session.new", "params": {"capabilities": {}}}
// Subscribe to events
{"id": 2, "method": "session.subscribe", "params": {"events": ["log.entryAdded"]}}
Browsing Context Module
Manages tabs, iframes, and navigation.
// Navigate
{"id": 3, "method": "browsingContext.navigate", "params": {"context": "ctx-1", "url": "https://example.com"}}
// Get tree (all tabs and iframes)
{"id": 4, "method": "browsingContext.getTree", "params": {}}
// Create new tab
{"id": 5, "method": "browsingContext.create", "params": {"type": "tab"}}
// Close tab
{"id": 6, "method": "browsingContext.close", "params": {"context": "ctx-1"}}
Script Module
Execute JavaScript in the browser context.
// Evaluate expression
{"id": 7, "method": "script.evaluate", "params": {
"expression": "document.title",
"target": {"context": "ctx-1"},
"awaitPromise": true
}}
// Call function (more powerful - can pass arguments)
{"id": 8, "method": "script.callFunction", "params": {
"functionDeclaration": "(selector) => document.querySelector(selector)?.textContent",
"arguments": [{"type": "string", "value": "h1"}],
"target": {"context": "ctx-1"},
"awaitPromise": true
}}
Input Module
Simulate user input (mouse, keyboard).
// Click at coordinates
{"id": 9, "method": "input.performActions", "params": {
"context": "ctx-1",
"actions": [{
"type": "pointer",
"id": "mouse",
"actions": [
{"type": "pointerMove", "x": 200, "y": 300},
{"type": "pointerDown", "button": 0},
{"type": "pointerUp", "button": 0}
]
}]
}}
// Type text
{"id": 10, "method": "input.performActions", "params": {
"context": "ctx-1",
"actions": [{
"type": "key",
"id": "keyboard",
"actions": [
{"type": "keyDown", "value": "H"},
{"type": "keyUp", "value": "H"},
{"type": "keyDown", "value": "i"},
{"type": "keyUp", "value": "i"}
]
}]
}}
Network Module (Future)
Intercept and inspect network requests.
// Enable network tracking
{"id": 11, "method": "network.addIntercept", "params": {
"phases": ["beforeRequestSent"]
}}
// Event: request made
{"method": "network.beforeRequestSent", "params": {
"request": {"url": "https://api.example.com/data", "method": "GET"}
}}
Log Module
Console log events pushed from the browser.
// Subscribe
{"id": 12, "method": "session.subscribe", "params": {"events": ["log.entryAdded"]}}
// Event received
{"method": "log.entryAdded", "params": {
"level": "error",
"text": "TypeError: x is not a function",
"source": {"realm": "realm-1"},
"timestamp": 1707500000000
}}
Sessions
Standard Session (via HTTP upgrade)
Client Browser
│ │
│── HTTP POST /session ───────────────────────►│
│ {"capabilities": {"webSocketUrl": true}} │
│◄── HTTP 200 ─────────────────────────────────│
│ {"sessionId": "...", "webSocketUrl": "ws://..."}
│ │
│── WebSocket connect ────────────────────────►│
│◄── WebSocket connected ──────────────────────│
│ │
│ Now use BiDi commands... │
BiDi-Only Session (direct WebSocket)
Client Browser
│ │
│── WebSocket connect to ws://.../ ───────────►│
│◄── WebSocket connected ──────────────────────│
│ │
│── session.new ──────────────────────────────►│
│◄── session.new result ───────────────────────│
│ │
│ Now use BiDi commands... │
Key Concepts
Browsing Context
A "browsing context" is a tab or iframe. Each has a unique ID.
Browser
├── Context "ctx-1" (Tab 1 - https://example.com)
│ ├── Context "ctx-2" (iframe - ad)
│ └── Context "ctx-3" (iframe - widget)
└── Context "ctx-4" (Tab 2 - https://docs.example.com)
User Context
User contexts provide isolation (like incognito profiles):
// Create isolated context
{"id": 1, "method": "browser.createUserContext", "params": {}}
// Create tab in isolated context
{"id": 2, "method": "browsingContext.create", "params": {
"type": "tab",
"userContext": "user-ctx-1"
}}
Event Subscriptions
Clients must explicitly subscribe to events they want:
// Subscribe to console logs and network events
{"id": 1, "method": "session.subscribe", "params": {
"events": ["log.entryAdded", "network.beforeRequestSent"],
"contexts": ["ctx-1"] // Optional: only for specific tab
}}
Browser Support (2026)
| Browser | BiDi Support | Driver |
|---|---|---|
| Chrome | Full | chromedriver |
| Firefox | Full (native) | No separate driver needed |
| Edge | Full (Chromium-based) | edgedriver |
| Safari | Partial | safaridriver |
How Vibium Fits In
Vibium's clicker binary is a BiDi proxy that sits between clients and Chrome:
Client ──WebSocket──► Clicker Proxy ──WebSocket──► Chrome
│
├── Forwards standard BiDi commands
├── Intercepts vibium:* extension commands
├── Runs actionability checks via script.callFunction
└── Performs clicks via input.performActions
The proxy pattern means:
- Clients don't need to know Chrome's BiDi endpoint directly
- Custom commands (vibium:click, etc.) are transparent to the client
- The proxy can add features (auto-wait, screenshots) without changing the protocol
Interview Talking Point
"WebDriver BiDi is the W3C successor to both classic WebDriver and CDP. It uses WebSocket for bidirectional JSON messaging — the client can send commands AND the browser can push events. The protocol is organized into modules: session, browsing context, script, input, network, and log. Vibium uses BiDi as its transport layer, with the Go binary acting as a proxy that forwards standard commands and intercepts custom
vibium:*extensions. The key advantage over CDP is standardization — BiDi is a W3C spec with buy-in from all major browser vendors, not a Google-internal protocol that can change without notice."