Architect-Level QA Interview: 20 Questions and Answers

Category 1: Architecture & Design

Q1: "Why would you choose agent-driven browser automation over traditional Playwright/Selenium?"

Answer: "Traditional automation is deterministic — you write exact steps, and they execute identically every time. That's great for regression but terrible for adaptability. When the UI changes, every affected test breaks.

Agent-driven automation adds a reasoning layer. The agent understands intent ('verify login works'), not just steps ('click button#submit'). When a selector changes, the agent can find the element by text content, page structure, or visual analysis. When an unexpected dialog appears, the agent can dismiss it and continue.

The trade-off is non-determinism and cost. We mitigate non-determinism by logging every command for reproducibility, and we manage cost through CLI skills instead of MCP to keep token usage at ~2% of context per test."

Q2: "Walk me through the architecture of your test automation framework."

Answer: "Three layers:

Layer 1 — Test Definitions (YAML): Natural language steps with optional selector hints. These are version-controlled and human-readable.

Layer 2 — Agent Orchestrator (Claude Code + Skills): The AI agent reads test definitions, loads the vibe-check skill which teaches it 22 browser commands, and executes tests using the Bash tool. The agent decides the execution strategy, handles failures, and reports results.

Layer 3 — Browser Infrastructure (Vibium): A Go binary that manages Chrome via WebDriver BiDi. Runs as a daemon in development (fast, ~100ms per command) or oneshot in CI (isolated, ~2s per command). Implements Playwright-style actionability checks server-side.

The key insight is that the agent operates at Layer 2, making intelligent decisions, while the infrastructure at Layer 3 handles the mechanical complexity of browser control. The skill is the bridge — a 100-line markdown file that gives the agent all the domain knowledge it needs."

Q3: "How does the vibe-check skill actually work under the hood?"

Answer: "When the agent decides browser automation is needed, it invokes the vibe-check skill. The system loads a markdown file — SKILL.md — into the agent's conversation context. This file documents 22 CLI commands for browser control.

The agent then executes commands via the Bash tool: vibe-check click 'button.submit'. This reaches the Vibium daemon — a Go binary running as a background process. The daemon is a WebDriver BiDi proxy: it intercepts the command, translates it to BiDi protocol messages, runs five actionability checks (visible, stable, receives-events, enabled, editable) in a polling loop, and when all pass, performs the action via BiDi's input.performActions.

The whole chain: Skill loads markdown → Agent sends Bash command → CLI connects to daemon → Daemon checks actionability → Daemon sends BiDi to Chrome → Chrome executes → Response propagates back."

Q4: "How do you handle test flakiness in an AI-driven framework?"

Answer: "We distinguish between three types of 'flakiness':

True flakiness (timing issues): Vibium's actionability checks handle this — every click/type auto-waits up to 30 seconds for the element to be visible, stable, not obscured, and enabled. This eliminates 80% of traditional flakiness.

Infrastructure flakiness (network, browser crashes): Oneshot mode in CI gives each test a fresh browser. For network issues, we set explicit timeouts and capture failure artifacts (screenshot + page text) so the agent can reason about what happened.

Test logic flakiness (non-deterministic agent behavior): We log every command the agent executes. If a test passes inconsistently, we review the command log to see where the agent's reasoning diverged. We then add selector hints or more explicit test steps to constrain the agent's decisions.

The self-healing aspect actually reduces flakiness compared to traditional frameworks — when a selector breaks, the agent finds an alternative instead of failing."

Q5: "What's your CI/CD integration strategy?"

Answer: "Headless Chrome, oneshot mode, GitHub Actions matrix strategy.

Each test group (auth, dashboard, checkout) runs as a separate matrix job with its own browser instance. Tests use --headless and VIBIUM_ONESHOT=1 for clean isolation. On failure, we capture screenshots, page text, and URL as GitHub Actions artifacts with 30-day retention.

We output JUnit XML for dashboard integration and a JSON report for programmatic analysis. The test runner is a Bash script that wraps vibe-check commands and tracks pass/fail counts with proper exit codes.

For parallel execution within a job, we use xargs -P4 to run up to 4 tests simultaneously. Each Chrome instance uses ~200MB RAM, so a standard 7GB runner supports 6 parallel workers comfortably."

Category 2: Technical Deep Dives

Q6: "Explain the actionability checks. Why are they implemented server-side?"

Answer: "Five checks run before every interaction:

Visible — Non-zero dimensions, not display:none or visibility:hidden
Stable — Position unchanged over 50ms (catches CSS animations)
ReceivesEvents — elementFromPoint() at center hits the target, not a covering element
Enabled — Not disabled, aria-disabled, or in a disabled fieldset
Editable — (type only) Accepts text input, not readonly

These run in a 100ms polling loop until all pass or timeout (30s default).

They're server-side in the Go binary because:

Single implementation — written once, not duplicated across JS, Python, CLI
Reduced latency — polling happens over local WebSocket, not client→proxy→browser round trips
Simpler clients — client code is trivial: send command, wait for response
Consistent behavior — all clients get identical timing

This is the same concept from Playwright, but architecturally different. Playwright implements these in each client library. Vibium implements them once in the proxy."

Q7: "What is WebDriver BiDi and why does it matter?"

Answer: "WebDriver BiDi is a W3C standard that combines the best of two predecessors. Classic WebDriver was standardized and cross-browser but one-directional — HTTP request/response, no events. CDP was bidirectional with rich events but Chrome-specific and unstable.

BiDi uses WebSocket for bidirectional JSON messaging — both commands from client to browser AND events pushed from browser to client. It's governed by the W3C with buy-in from all major browser vendors.

For our framework, BiDi means:

Standards-based — not dependent on Google's internal protocol
Future-proof — as browsers improve BiDi support, our tools improve
Cross-browser — same protocol for Chrome, Firefox, Edge, eventually Safari
Events — console logs, network activity pushed to us in real-time

Vibium is BiDi-native, created by the person who started Selenium. The Go binary acts as a BiDi proxy, adding actionability checks as extension commands."

Q8: "How does Vibium's BiDi proxy work?"

Answer: "The clicker binary sits between clients and Chrome as a WebSocket proxy.

For standard BiDi commands like browsingContext.navigate, it passes messages straight through. For custom vibium:* commands like vibium:click, it intercepts them and runs the actionability loop locally.

A vibium:click translates to roughly 8-10 internal BiDi calls: multiple script.callFunction for each actionability check, plus input.performActions for the actual click. But the client only sees one request and one response.

The extension mechanism is part of the BiDi spec — extension modules use a colon-separated naming convention. So vibium:click is a legitimate BiDi extension, not a protocol hack."

Q9: "Explain the token economics of skills vs MCP for browser automation."

Answer: "I've measured this. A 20-step test via Playwright MCP consumes roughly 92,000 tokens: ~67,000 from tool schemas loaded every turn (15+ tools × 20 turns), ~20,000 from accessibility trees, and ~5,000 from tool calls/responses. That's 61% of a 200K context window.

The same test via the vibe-check skill costs about 3,200 tokens: ~1,000 for the initial SKILL.md injection, ~1,000 for skill descriptions across turns, and ~1,200 for Bash commands and their text output.

That's a 29x reduction. At $15 per million input tokens, each test costs $1.38 via MCP versus $0.05 via skills. Across hundreds of CI runs per day, it's the difference between a viable and unsustainable approach.

More importantly, the 147K tokens saved stay available for the agent's reasoning — analyzing failures, comparing states, maintaining conversation history."

Q10: "How do you handle self-healing tests?"

Answer: "Three tiers:

Tier 1 (~80% of failures, ~500 tokens): When a selector fails, the agent runs vibe-check find-all to discover existing elements, matches by text content, and retries with an alternative selector.

Tier 2 (~15% of failures, ~800 tokens): Agent takes a screenshot and reads page text to understand the current state. Catches loading issues, redirects, and unexpected dialogs.

Tier 3 (~5% of failures, ~5,500 tokens): Falls back to MCP's accessibility tree for semantic page analysis when the page structure has fundamentally changed.

We track healing events in a log. High healing rates on one test mean the selectors need updating. High healing rates across tests mean a major UI refactor happened. The key distinction: self-healing should fix stale selectors, not mask flaky tests."

Category 3: Strategy & Trade-offs

Q11: "When would you NOT use agent-driven testing?"

Answer: "Three scenarios:

Performance testing — You need deterministic, repeatable measurements. Agent overhead (~200ms reasoning per step) is unacceptable for load testing.
Trivial regression checks — 'Does the homepage return 200?' doesn't need AI. A simple curl check is faster, cheaper, and more reliable.
Compliance testing with audit requirements — Some regulations require test scripts to be deterministic and reproducible. Agent reasoning introduces variability that auditors may not accept.

For these, traditional Playwright/Selenium scripts are better. The sweet spot for agents is complex functional flows, exploratory testing, and tests that need to adapt to changing UIs."

Q12: "How would you convince a skeptical architect that AI testing is production-ready?"

Answer: "I'd address the three common objections:

'AI is non-deterministic.' True, but we mitigate this by logging every command for reproducibility. If the agent took a wrong path, we can see exactly where and why. In practice, the same test produces the same sequence of commands 95%+ of the time because the agent follows SKILL.md instructions deterministically — it's only on failure recovery that reasoning diverges.

'It's too expensive.' With CLI skills, each test run costs ~$0.05 in tokens. A 500-test suite costs ~$25 per full run. Compare that to the engineering hours saved on test maintenance — even one day of a QA engineer's time pays for months of agent-driven testing.

'It's too slow.' Each browser command takes 100-300ms via daemon mode. A 20-step test completes in 3-5 seconds. The agent's reasoning adds ~200ms per step. Total: roughly comparable to traditional automation, sometimes faster because the agent doesn't need to wait for explicit sleep() calls.

Then I'd show the real data: 60-85% reduction in test maintenance time, because when the UI changes, the agent adapts instead of requiring test updates."

Q13: "How do you handle test data management?"

Answer: "Three approaches depending on isolation needs:

Inline test data: For simple tests, data is embedded in the test definition. vibe-check type '#email' 'test@example.com' — the email is right there.

API-driven setup: For tests that need specific state (user accounts, order history), we call the app's API before browser tests to create the required data. The agent can do this via curl or vibe-check eval 'fetch(...)'.

Database seeding: For CI, we run migration scripts that create a known state before the test suite. Each test group gets its own database or schema partition for isolation.

The agent doesn't manage test data directly — it focuses on UI interaction. Data setup is handled by scripts in the framework's /scripts/ directory."

Q14: "What's your testing pyramid look like with AI?"

Answer:

        /\
       /  \   Agent-driven E2E (browser tests via vibe-check)
      /    \  ~50 tests, critical user journeys
     /______\
    /        \  API/Integration tests (traditional)
   /          \ ~200 tests, business logic verification
  /____________\
 /              \ Unit tests (traditional)
/________________\ ~2000+ tests, code correctness

AI-driven testing sits at the top of the pyramid — the smallest number of tests with the highest coverage per test. We don't use AI for unit tests (deterministic, no browser needed) or most API tests (no UI involved). The agent adds value where human judgment is needed: complex flows, visual verification, adaptive interaction."

Category 4: Practical Scenarios

Q15: "Walk me through debugging a failing browser test."

Answer: "The failure artifacts give us everything:

Command log — I see the exact sequence of vibe-check commands. Step 7 was vibe-check click '.btn-confirm' which timed out.
Screenshot — I look at failures/test-name/screenshot.png. I can see the page state at failure time. In this case, there's a modal overlay blocking the button.
Page text — failures/test-name/page_text.txt shows 'Are you sure you want to proceed?' — there's a confirmation dialog the test didn't expect.
Fix — I add a step to handle the confirmation dialog: vibe-check click '.modal-confirm' before the original click. Or I update the test definition to expect this dialog.

If the agent was running, it might have self-healed — detecting the modal, clicking the confirm button, and proceeding. But we'd still review the healing log to decide if the test definition should be updated."

Q16: "How would you test a Single Page Application?"

Answer: "SPAs have specific challenges: navigation doesn't trigger full page loads, content renders asynchronously, and URLs may not change.

Key techniques with vibe-check:

Wait for specific elements, not page load:

vibe-check navigate https://spa.example.com
vibe-check wait '.app-loaded'  # Wait for React/Vue to render

Use --wait-open for initial hydration:

vibe-check navigate https://spa.example.com --wait-open 3

Verify client-side routing by checking text, not URL:

vibe-check click 'a[href=\"/dashboard\"]'
vibe-check wait '.dashboard-content'
vibe-check text 'h1'  # Verify content, not URL

Handle lazy loading:

vibe-check scroll down --amount 1000
vibe-check wait '.lazy-loaded-component'

The actionability checks handle most SPA timing issues automatically — the 30-second auto-wait means you rarely need explicit waits."

Q17: "How do you handle authentication across tests?"

Answer: "Three strategies:

Strategy 1 — Login via UI (realistic, slow):

vibe-check navigate https://app.example.com/login
vibe-check type '#email' 'test@example.com'
vibe-check type '#password' 'secret'
vibe-check click '#submit'
vibe-check wait '.dashboard'

Strategy 2 — Cookie injection (fast, reliable):

vibe-check navigate https://app.example.com
vibe-check eval 'document.cookie = \"session=abc123; path=/\"'
vibe-check navigate https://app.example.com/dashboard  # Now authenticated

Strategy 3 — API auth (fastest, for CI):

TOKEN=$(curl -s -X POST https://api.example.com/auth/login \
  -d '{"email":"test@example.com","password":"secret"}' | jq -r '.token')
vibe-check navigate https://app.example.com
vibe-check eval "localStorage.setItem('auth_token', '$TOKEN')"
vibe-check navigate https://app.example.com/dashboard

We use Strategy 1 for the actual login test, Strategy 3 for all other tests. This gives us one thorough login verification plus fast setup for everything else."

Q18: "How do you handle tests that interact with third-party services?"

Answer: "Three approaches:

Mock at the network level: Use vibe-check eval to intercept fetch() calls:

vibe-check eval '
  window._originalFetch = window.fetch;
  window.fetch = (url, opts) => {
    if (url.includes(\"stripe.com\")) {
      return Promise.resolve(new Response(JSON.stringify({id: \"mock_pi_123\"})));
    }
    return window._originalFetch(url, opts);
  }
'

Use test/sandbox environments: Most payment providers (Stripe, PayPal) have sandbox modes. Configure the app to use sandbox credentials in test environment.
Stop at the boundary: Test up to the point of third-party interaction, verify the request payload via eval, then skip the actual external call."

Q19: "What metrics do you track for your test automation?"

Answer: "Five key metrics:

Test reliability rate — % of tests that pass consistently (target: >98%)
Self-healing rate — % of runs where the agent recovered from a failure (track over time — should decrease as selectors stabilize)
Execution time — Per-test and per-suite (detect performance regressions)
Token cost — Per-test and per-suite (budget management)
Failure-to-fix time — How long between a test failure and the fix being merged (measures the value of failure artifacts)

We track these in a JSON report per run and graph trends weekly. A spike in self-healing rate means a UI deployment changed something. A spike in execution time means either the app got slower or a test got stuck."

Q20: "Where do you see AI-driven testing going in the next 2-3 years?"

Answer: "Three directions:

AI-powered locators — Instead of CSS selectors, the agent says 'click the blue submit button.' Vision models identify elements visually. Vibium's V2 roadmap includes this. The challenge is latency and cost of vision API calls.
Continuous learning — Vibium's planned Cortex component builds an 'app map' from past sessions. The agent uses this to plan multi-step navigations and avoid rediscovery. Think of it as a test intelligence layer that learns your application over time.
Autonomous QA — The agent doesn't just execute predefined tests — it explores the application, identifies potential issues, and generates tests automatically. OpenObserve already does this with a 'Council of Sub Agents' approach — 8 specialized AI agents that grew their test suite from 380 to 700+ tests.

The agent skills pattern — lightweight markdown that teaches agents domain knowledge — is becoming the standard interface. The Skills Directory already has 50,000+ skills. Browser automation is just one domain; the same pattern applies to API testing, database verification, and infrastructure validation."