Vibium Extension Commands Over BiDi
The Extension Mechanism
The WebDriver BiDi specification explicitly supports implementation-defined extension modules. The naming convention requires a colon separator:
standard: browsingContext.navigate
extension: vibium:click
This is not a hack — it's a designed extension point in the W3C spec.
The Three Vibium Extension Commands
vibium:find
Purpose: Wait for an element to exist in the DOM.
Request:
{
"id": 1,
"method": "vibium:find",
"params": {
"context": "browsing-context-id",
"selector": "button.submit",
"timeout": 30000
}
}
What the proxy does internally:
- Start a polling loop (100ms interval)
- On each iteration, send
script.callFunctionto Chrome:{ "method": "script.callFunction", "params": { "functionDeclaration": "(s) => !!document.querySelector(s)", "arguments": [{"type": "string", "value": "button.submit"}], "target": {"context": "browsing-context-id"} } } - If
true, return success. Iffalse, retry until timeout.
Success Response:
{"id": 1, "type": "success", "result": {"found": true}}
Timeout Response:
{
"id": 1,
"type": "error",
"error": {"error": "timeout", "message": "timeout after 30s waiting for 'button.submit'"}
}
vibium:click
Purpose: Wait for an element to be actionable, then click it.
Request:
{
"id": 2,
"method": "vibium:click",
"params": {
"context": "browsing-context-id",
"selector": "button.submit",
"timeout": 30000
}
}
What the proxy does internally:
Step 1: Find element (vibium:find behavior)
→ script.callFunction: document.querySelector(selector) exists?
→ Repeat until found or timeout
Step 2: Check Visible
→ script.callFunction: getBoundingClientRect + getComputedStyle
→ Must have non-zero size, not hidden
Step 3: Check Stable
→ script.callFunction: getBoundingClientRect at T
→ Wait 50ms
→ script.callFunction: getBoundingClientRect at T+50ms
→ Compare: must be identical
Step 4: Check ReceivesEvents
→ script.callFunction: elementFromPoint at center
→ Must hit the target element (or its child)
Step 5: Check Enabled
→ script.callFunction: check disabled, aria-disabled, fieldset
→ Must not be disabled
Step 6: Get bounding box
→ script.callFunction: getBoundingClientRect
→ Calculate center coordinates: (x + width/2, y + height/2)
Step 7: Perform click
→ input.performActions:
pointerMove to (centerX, centerY)
pointerDown button 0
pointerUp button 0
All steps 1-5 are in a polling loop. If any check fails, wait 100ms and restart from step 1.
Success Response:
{"id": 2, "type": "success", "result": {"clicked": true}}
vibium:type
Purpose: Wait for an element to be actionable AND editable, then type text.
Request:
{
"id": 3,
"method": "vibium:type",
"params": {
"context": "browsing-context-id",
"selector": "input[name=email]",
"text": "user@example.com",
"timeout": 30000
}
}
What the proxy does internally:
Same as vibium:click steps 1-5, PLUS:
Step 5b: Check Editable
→ script.callFunction: check readonly, aria-readonly, input type
→ Must accept text input
Step 6: Focus element
→ script.callFunction: document.querySelector(selector).focus()
Step 7: Clear existing text (if any)
→ input.performActions: Ctrl+A, then Delete
Step 8: Type text character by character
→ input.performActions:
For each character in "user@example.com":
keyDown character
keyUp character
Character-by-character typing triggers all expected DOM events: keydown, keypress, input, keyup. This is critical for:
- Form validation that runs on
inputevents - Autocomplete that triggers on keystroke
- Character count limits
- Real-time search
Why Not Standard BiDi Commands?
Standard BiDi has input.performActions for clicks and keyboard input. Why add custom commands?
Without Extension Commands (Client Must Implement)
Client Chrome
│ │
│─ script.callFunction ─────────►│ (check if element exists)
│◄─ result: false ───────────────│
│ wait 100ms │
│─ script.callFunction ─────────►│ (check again)
│◄─ result: true ────────────────│
│─ script.callFunction ─────────►│ (check visible)
│◄─ result ──────────────────────│
│─ script.callFunction ─────────►│ (check stable T1)
│◄─ result ──────────────────────│
│ wait 50ms │
│─ script.callFunction ─────────►│ (check stable T2)
│◄─ result ──────────────────────│
│─ script.callFunction ─────────►│ (check receivesEvents)
│◄─ result ──────────────────────│
│─ script.callFunction ─────────►│ (check enabled)
│◄─ result ──────────────────────│
│─ script.callFunction ─────────►│ (get bounding box)
│◄─ result ──────────────────────│
│─ input.performActions ─────────►│ (click)
│◄─ result ──────────────────────│
9 round trips minimum — each adding network latency if client is remote.
With Extension Commands (Proxy Handles)
Client Proxy Chrome
│ │ │
│─ vibium:click ───►│ │
│ │─ script.call ───►│ (all checks happen locally)
│ │◄─ result ────────│
│ │─ script.call ───►│
│ │◄─ result ────────│
│ │ ... (local loop)│
│ │─ input.perform ─►│
│ │◄─ result ────────│
│◄─ success ────────│ │
Client sends 1 message, gets 1 response. All the complexity is in the proxy, which communicates with Chrome over a local WebSocket (essentially zero latency).
The Implementation in Go
Located in clicker/internal/proxy/router.go:
func (r *Router) OnClientMessage(msg []byte) {
var req BiDiMessage
json.Unmarshal(msg, &req)
switch req.Method {
case "vibium:find":
r.handleVibiumFind(req)
case "vibium:click":
r.handleVibiumClick(req)
case "vibium:type":
r.handleVibiumType(req)
default:
r.forwardToChrome(msg) // Standard command: pass through
}
}
func (r *Router) handleVibiumClick(req BiDiMessage) {
selector := req.Params.Selector
timeout := req.Params.Timeout
deadline := time.Now().Add(time.Duration(timeout) * time.Millisecond)
for {
if time.Now().After(deadline) {
r.sendError(req.ID, "timeout", fmt.Sprintf(
"timeout after %dms waiting for '%s': check '%s' failed",
timeout, selector, lastFailedCheck))
return
}
// Run all actionability checks via script.callFunction
if !r.checkVisible(selector) { sleep(100ms); continue }
if !r.checkStable(selector) { sleep(100ms); continue }
if !r.checkReceivesEvents(selector) { sleep(100ms); continue }
if !r.checkEnabled(selector) { sleep(100ms); continue }
// All checks passed — perform the click
box := r.getBoundingBox(selector)
r.performClick(box.CenterX, box.CenterY)
r.sendSuccess(req.ID, map[string]bool{"clicked": true})
return
}
}
Interview Talking Point
"Vibium uses WebDriver BiDi's extension mechanism —
vibium:find,vibium:click,vibium:type— to push actionability logic into the proxy server. Without this, each click would require 8-9 WebSocket round trips between client and browser for the actionability checks alone. With extension commands, the client sends one message and gets one response. The proxy handles the polling loop locally where latency is negligible. This is a key architectural insight: by co-locating the intelligence with the browser connection, you get both simpler clients and lower latency. And it's not a protocol hack — BiDi explicitly supports extension modules with the colon naming convention."