QA Wiki and Runbooks

Building a Knowledge Base That Your Team Actually Uses

The difference between a useful QA knowledge base and a graveyard of outdated documents is not the tool you use -- it is the discipline you bring to creating, organizing, and maintaining it. A great QA wiki reduces onboarding time from weeks to days, prevents repeated mistakes, preserves institutional knowledge, and makes the entire team more efficient. A bad QA wiki is worse than no wiki at all, because people waste time following outdated instructions.

Building a QA Knowledge Base

What to Document

Not everything needs to be documented. Over-documentation is as harmful as under-documentation because it creates noise that obscures the signal. Focus on documenting things that are:

Document If...	Do Not Document If...
Someone asks the same question more than twice	The information is easily discoverable in code or tool UI
The process has more than 3 steps	The process changes so frequently that docs cannot keep up
Getting it wrong has significant consequences	The information is temporary (use a ticket or message instead)
A team member leaving would create a knowledge gap	The information is already well-documented externally
The task requires context that is not obvious from the tools	The audience would never look in a wiki for this

Knowledge Base Structure

A well-organized knowledge base mirrors how people look for information: by task, not by document type.

Recommended top-level structure:

QA Knowledge Base/
├── Getting Started/
│   ├── Onboarding checklist
│   ├── Environment setup
│   ├── Tool access and accounts
│   └── Team norms and processes
├── Testing Guides/
│   ├── By feature area/
│   │   ├── Checkout testing guide
│   │   ├── Search testing guide
│   │   └── Payment testing guide
│   ├── By test type/
│   │   ├── Exploratory testing guide
│   │   ├── Regression testing guide
│   │   └── Performance testing guide
│   └── By platform/
│       ├── Web testing guide
│       ├── Mobile testing guide
│       └── API testing guide
├── Runbooks/
│   ├── Environment management
│   ├── Test data management
│   ├── Deployment verification
│   └── Incident response
├── Tools and Infrastructure/
│   ├── Test framework guide
│   ├── CI/CD pipeline guide
│   ├── Test environment guide
│   └── Monitoring and alerting guide
├── Standards and Templates/
│   ├── Bug report template
│   ├── Test plan template
│   ├── Test case conventions
│   └── Severity definitions
└── Decision Log/
    ├── Why we chose Playwright over Cypress
    ├── Why we use risk-based testing for releases
    └── Why we moved to contract testing

Runbooks for Common QA Tasks

A runbook is a step-by-step procedure for a specific task. Unlike a guide (which explains concepts), a runbook is a checklist that someone can follow without prior knowledge.

Runbook Format

Every runbook should follow a consistent structure:

# Runbook: [Task Name]

## Purpose
One sentence: what this runbook helps you do.

## Prerequisites
What you need before starting (access, tools, knowledge).

## Steps

### Step 1: [Action]
Specific instruction with exact commands or UI steps.

Expected result: what you should see after this step.

### Step 2: [Action]
...

## Troubleshooting
Common issues and how to resolve them.

## Contacts
Who to ask if you get stuck.

## Last Verified
Date this runbook was last tested: [YYYY-MM-DD]
Owner: [Name]

Example: Environment Setup Runbook

# Runbook: Setting Up the QA Test Environment

## Purpose
Set up a local test environment for running E2E tests against
the staging backend.

## Prerequisites
- macOS or Linux (Windows users: use WSL2)
- Node.js 20+ installed
- Access to the GitHub organization (request via IT portal)
- VPN connected (required for staging access)

## Steps

### Step 1: Clone the Repositories

git clone git@github.com:company/webapp.git git clone git@github.com:company/e2e-tests.git


Expected: both repos cloned without permission errors.

### Step 2: Install Dependencies

cd e2e-tests npm ci npx playwright install


Expected: all packages installed, browsers downloaded.

### Step 3: Configure Environment Variables

cp .env.example .env


Edit `.env` and set:
- `BASE_URL=https://staging.example.com`
- `TEST_USER_EMAIL=qa-test@example.com`
- `TEST_USER_PASSWORD=` (get from 1Password vault "QA Shared")

### Step 4: Verify Setup

npx playwright test --project=chromium tests/smoke.spec.ts


Expected: smoke tests pass (typically 12 tests, ~30 seconds).

## Troubleshooting

| Issue | Cause | Fix |
|---|---|---|
| "Connection refused" on staging URL | VPN not connected | Connect to VPN and retry |
| Browser download fails | Corporate firewall | Use `PLAYWRIGHT_DOWNLOAD_HOST` env var (see wiki) |
| Tests timeout | Staging environment is down | Check #staging-status Slack channel |
| Permission denied on clone | SSH key not configured | Follow the SSH setup guide (link) |

## Contacts
- Environment issues: @devops-team in Slack
- Test framework issues: @qa-platform in Slack

## Last Verified
2026-01-15 by Alice Chen

Essential Runbooks for QA Teams

Runbook	Purpose	Who Uses It
Environment setup	Get new team members productive quickly	New hires, returning team members
Test data management	Create, refresh, and manage test data	All QA engineers
Deployment verification	Verify a deployment to staging or production	QA engineers, on-call engineers
Test suite maintenance	Fix flaky tests, update selectors, manage test dependencies	Automation engineers
Incident response for QA	QA-specific steps during an incident	QA engineers during incidents
Release testing	Step-by-step process for release verification	QA engineers before releases
Test environment reset	Reset environments to a clean state	All QA engineers
Access provisioning	How to request and grant access to QA tools	QA leads, new hires

Troubleshooting Guides

Troubleshooting guides document common issues and their solutions. They save hours of debugging by capturing solutions that would otherwise live only in someone's head.

Troubleshooting Guide Format

# Troubleshooting: [System or Area]

## Symptom: [What the person observes]

### Possible Cause 1: [Most likely cause]
**How to verify:** [Steps to confirm this is the cause]
**Fix:** [Steps to resolve]

### Possible Cause 2: [Second most likely cause]
**How to verify:** [Steps to confirm]
**Fix:** [Steps to resolve]

### If None of the Above Work
Escalate to [team/person] with the following information:
- [What to include in the escalation]

Example: Flaky Test Troubleshooting

# Troubleshooting: Flaky E2E Tests

## Symptom: A test passes locally but fails in CI

### Possible Cause 1: Timing issue
**How to verify:** Add a `console.log(Date.now())` before the
failing assertion. Check if the element is rendering later in CI.
**Fix:** Add an explicit wait: `await page.waitForSelector('.element')`
Do NOT use `page.waitForTimeout()` as it masks the real issue.

### Possible Cause 2: Screen size difference
**How to verify:** Check the viewport size in CI config vs local.
CI runs at 1280x720; local may be different.
**Fix:** Set viewport explicitly in the test or test config.

### Possible Cause 3: Test data dependency
**How to verify:** Run the test in isolation (`--grep "test name"`).
If it passes alone but fails in the suite, another test is
modifying shared data.
**Fix:** Ensure each test creates its own data and cleans up.

### Possible Cause 4: Parallel execution race condition
**How to verify:** Run with `--workers=1`. If it passes, the
issue is parallelism-related.
**Fix:** Isolate test data per worker or use test-level locks.

Decision Logs

Decision logs record why certain choices were made. They are invaluable when a new team member asks "why do we do it this way?" or when the team revisits a decision months later.

Decision Log Format

# Decision: [Title]

**Date:** [YYYY-MM-DD]
**Decision-makers:** [Names]
**Status:** Accepted / Superseded by [link]

## Context
What situation prompted this decision?

## Options Considered

### Option A: [Name]
- Pros: ...
- Cons: ...

### Option B: [Name]
- Pros: ...
- Cons: ...

## Decision
Which option was chosen and why.

## Consequences
What trade-offs were accepted. What follow-up actions are needed.

Example Decision Log

# Decision: Choosing Playwright over Cypress for E2E Testing

**Date:** 2025-09-15
**Decision-makers:** QA Lead, Engineering Manager, Senior SDET
**Status:** Accepted

## Context
Our Cypress test suite has grown to 350 tests and is experiencing
significant pain points: single-browser limitation (we need Safari),
no native multi-tab support, and slow execution (45 min for full suite).

## Options Considered

### Option A: Stay with Cypress
- Pros: No migration cost, team is familiar, large community
- Cons: No Safari support, single-tab only, performance ceiling

### Option B: Migrate to Playwright
- Pros: Multi-browser (including Safari), multi-tab, faster execution,
  better debugging tools, auto-wait reduces flakiness
- Cons: Migration cost (~3 weeks), team retraining needed

### Option C: Use both (Cypress for existing, Playwright for new)
- Pros: No migration risk, gradual transition
- Cons: Two frameworks to maintain, double the learning curve

## Decision
Migrate fully to Playwright (Option B). The migration cost is
justified by the long-term benefits of multi-browser support
and performance improvements.

## Consequences
- 3-week migration sprint dedicated to converting tests
- Team training sessions scheduled for weeks 1-2
- Cypress will be fully removed after migration is verified
- Expected 40% reduction in suite execution time

Searchability and Discoverability

A knowledge base that nobody can find information in is useless. Invest in discoverability.

Tagging and Categorization

Strategy	Implementation
Consistent naming conventions	`runbook-`, `guide-`, `troubleshoot-*` prefixes
Tags / labels	Tag pages by area (checkout, payment), type (runbook, guide), audience (new hire, senior)
Cross-references	Link related pages to each other ("See also: ...")
Glossary	Define acronyms and domain terms in one central glossary page
Landing pages	Create index pages for each major section with descriptions

Search Optimization

Use descriptive titles: "How to Reset the Staging Database" not "Database Runbook"
Include keywords people search for: If people search for "flaky tests," make sure that phrase appears in the relevant document
Write introductory paragraphs: Most search tools index the first paragraph heavily
Use headings that match questions: "How do I set up the test environment?" as a heading makes the page findable for that exact question

Wiki Maintenance: Preventing Documentation Rot

Documentation rot is the gradual accumulation of outdated, inaccurate, and irrelevant content that makes the entire knowledge base untrustworthy.

The Documentation Rot Cycle

New docs written → Team relies on docs → Processes change →
Docs become outdated → Team stops trusting docs →
Team stops reading docs → Team stops writing docs →
Knowledge lives only in heads → New docs written (poorly, hastily)

Breaking the Cycle

Practice	Frequency	Impact
Ownership assignment	Once (update as needed)	Every page has someone responsible
Quarterly review	Every 3 months	Owner verifies accuracy of their pages
New hire validation	With each onboarding	Newcomers flag inaccuracies in real-time
Archive unused pages	Quarterly	Reduce noise; archive pages with no views in 6 months
"Last verified" dates	On every page	Readers know how current the information is
Feedback mechanism	Always active	A simple "Was this helpful? Report an issue" link on every page

The Page Health Checklist

Run this checklist quarterly for every active page:

Is the information accurate?
Are all commands and URLs current?
Are screenshots current?
Is the page still needed? (Check view counts)
Does the page have an owner?
Is the "last verified" date updated?
Are all links working?

Tools Comparison

Tool	Type	Best For	Strengths	Weaknesses
Confluence	Enterprise wiki	Atlassian ecosystem teams	Jira integration, permissions, structured spaces	Slow, cluttered UI, search is weak, expensive
Notion	Modern wiki/database	Small to mid-size teams	Flexible databases, clean UI, templates	Limited permissions granularity, export limitations
GitHub Wiki	Git-based wiki	Developer-centric teams	Version controlled, free, integrated with repo	Limited features, no built-in search, clunky editor
Obsidian	Local-first markdown	Personal knowledge management, small teams	Fast, extensible, works offline, Markdown-native	No built-in collaboration, requires sync setup
GitBook	Documentation platform	Public-facing or customer docs	Clean design, Git sync, versioning	Limited customization, paid for teams
MkDocs + Material	Static site generator	Technical documentation	Fast, beautiful, extensible, docs-as-code	Requires deployment setup, developer-oriented
Docusaurus	Static site generator	Product documentation	Versioning, React-based, Meta-backed	Heavier setup, React knowledge helpful

Choosing the Right Tool

If your team...	Consider...
Already uses Atlassian (Jira, Bitbucket)	Confluence (integration value outweighs UI pain)
Values simplicity and flexibility	Notion
Wants docs-as-code with version control	MkDocs + Material or Docusaurus
Needs a lightweight free solution	GitHub Wiki or repository Markdown files
Has a single person managing QA docs	Obsidian (for drafting) + a shared platform (for publishing)
Needs customer-facing documentation	GitBook or Docusaurus

Hands-On Exercise

Audit your current QA knowledge base. Create an inventory of what exists, what is outdated, and what is missing.
Write one runbook for a QA task that currently lives only in someone's head (environment setup, test data refresh, or deployment verification).
Create a troubleshooting guide for the most common issue your QA team encounters.
Write a decision log for the most recent significant technical decision your team made.
Implement a quarterly documentation review process: assign owners, set review dates, and create a tracking mechanism.

Interview Talking Point: "I treat documentation as a first-class engineering deliverable, not an afterthought. On my teams, I have established docs-as-code practices where test documentation lives in the repository alongside the tests, goes through PR review, and is validated by CI -- including link checking and format linting. I build QA knowledge bases structured around tasks rather than document types, with runbooks for common procedures, troubleshooting guides for known issues, and decision logs that capture why we made certain testing choices. I assign ownership to every document and run quarterly freshness reviews, because outdated documentation is worse than no documentation. When team members leave, I conduct structured knowledge extraction sessions to capture domain knowledge before it walks out the door. The result is a team that onboards new engineers in days instead of weeks, avoids repeating past mistakes, and maintains institutional knowledge regardless of personnel changes."