QA Wiki and Runbooks
Building a Knowledge Base That Your Team Actually Uses
The difference between a useful QA knowledge base and a graveyard of outdated documents is not the tool you use -- it is the discipline you bring to creating, organizing, and maintaining it. A great QA wiki reduces onboarding time from weeks to days, prevents repeated mistakes, preserves institutional knowledge, and makes the entire team more efficient. A bad QA wiki is worse than no wiki at all, because people waste time following outdated instructions.
Building a QA Knowledge Base
What to Document
Not everything needs to be documented. Over-documentation is as harmful as under-documentation because it creates noise that obscures the signal. Focus on documenting things that are:
| Document If... | Do Not Document If... |
|---|---|
| Someone asks the same question more than twice | The information is easily discoverable in code or tool UI |
| The process has more than 3 steps | The process changes so frequently that docs cannot keep up |
| Getting it wrong has significant consequences | The information is temporary (use a ticket or message instead) |
| A team member leaving would create a knowledge gap | The information is already well-documented externally |
| The task requires context that is not obvious from the tools | The audience would never look in a wiki for this |
Knowledge Base Structure
A well-organized knowledge base mirrors how people look for information: by task, not by document type.
Recommended top-level structure:
QA Knowledge Base/
├── Getting Started/
│ ├── Onboarding checklist
│ ├── Environment setup
│ ├── Tool access and accounts
│ └── Team norms and processes
├── Testing Guides/
│ ├── By feature area/
│ │ ├── Checkout testing guide
│ │ ├── Search testing guide
│ │ └── Payment testing guide
│ ├── By test type/
│ │ ├── Exploratory testing guide
│ │ ├── Regression testing guide
│ │ └── Performance testing guide
│ └── By platform/
│ ├── Web testing guide
│ ├── Mobile testing guide
│ └── API testing guide
├── Runbooks/
│ ├── Environment management
│ ├── Test data management
│ ├── Deployment verification
│ └── Incident response
├── Tools and Infrastructure/
│ ├── Test framework guide
│ ├── CI/CD pipeline guide
│ ├── Test environment guide
│ └── Monitoring and alerting guide
├── Standards and Templates/
│ ├── Bug report template
│ ├── Test plan template
│ ├── Test case conventions
│ └── Severity definitions
└── Decision Log/
├── Why we chose Playwright over Cypress
├── Why we use risk-based testing for releases
└── Why we moved to contract testing
Runbooks for Common QA Tasks
A runbook is a step-by-step procedure for a specific task. Unlike a guide (which explains concepts), a runbook is a checklist that someone can follow without prior knowledge.
Runbook Format
Every runbook should follow a consistent structure:
# Runbook: [Task Name]
## Purpose
One sentence: what this runbook helps you do.
## Prerequisites
What you need before starting (access, tools, knowledge).
## Steps
### Step 1: [Action]
Specific instruction with exact commands or UI steps.
Expected result: what you should see after this step.
### Step 2: [Action]
...
## Troubleshooting
Common issues and how to resolve them.
## Contacts
Who to ask if you get stuck.
## Last Verified
Date this runbook was last tested: [YYYY-MM-DD]
Owner: [Name]
Example: Environment Setup Runbook
# Runbook: Setting Up the QA Test Environment
## Purpose
Set up a local test environment for running E2E tests against
the staging backend.
## Prerequisites
- macOS or Linux (Windows users: use WSL2)
- Node.js 20+ installed
- Access to the GitHub organization (request via IT portal)
- VPN connected (required for staging access)
## Steps
### Step 1: Clone the Repositories
git clone git@github.com:company/webapp.git git clone git@github.com:company/e2e-tests.git
Expected: both repos cloned without permission errors.
### Step 2: Install Dependencies
cd e2e-tests npm ci npx playwright install
Expected: all packages installed, browsers downloaded.
### Step 3: Configure Environment Variables
cp .env.example .env
Edit `.env` and set:
- `BASE_URL=https://staging.example.com`
- `TEST_USER_EMAIL=qa-test@example.com`
- `TEST_USER_PASSWORD=` (get from 1Password vault "QA Shared")
### Step 4: Verify Setup
npx playwright test --project=chromium tests/smoke.spec.ts
Expected: smoke tests pass (typically 12 tests, ~30 seconds).
## Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
| "Connection refused" on staging URL | VPN not connected | Connect to VPN and retry |
| Browser download fails | Corporate firewall | Use `PLAYWRIGHT_DOWNLOAD_HOST` env var (see wiki) |
| Tests timeout | Staging environment is down | Check #staging-status Slack channel |
| Permission denied on clone | SSH key not configured | Follow the SSH setup guide (link) |
## Contacts
- Environment issues: @devops-team in Slack
- Test framework issues: @qa-platform in Slack
## Last Verified
2026-01-15 by Alice Chen
Essential Runbooks for QA Teams
| Runbook | Purpose | Who Uses It |
|---|---|---|
| Environment setup | Get new team members productive quickly | New hires, returning team members |
| Test data management | Create, refresh, and manage test data | All QA engineers |
| Deployment verification | Verify a deployment to staging or production | QA engineers, on-call engineers |
| Test suite maintenance | Fix flaky tests, update selectors, manage test dependencies | Automation engineers |
| Incident response for QA | QA-specific steps during an incident | QA engineers during incidents |
| Release testing | Step-by-step process for release verification | QA engineers before releases |
| Test environment reset | Reset environments to a clean state | All QA engineers |
| Access provisioning | How to request and grant access to QA tools | QA leads, new hires |
Troubleshooting Guides
Troubleshooting guides document common issues and their solutions. They save hours of debugging by capturing solutions that would otherwise live only in someone's head.
Troubleshooting Guide Format
# Troubleshooting: [System or Area]
## Symptom: [What the person observes]
### Possible Cause 1: [Most likely cause]
**How to verify:** [Steps to confirm this is the cause]
**Fix:** [Steps to resolve]
### Possible Cause 2: [Second most likely cause]
**How to verify:** [Steps to confirm]
**Fix:** [Steps to resolve]
### If None of the Above Work
Escalate to [team/person] with the following information:
- [What to include in the escalation]
Example: Flaky Test Troubleshooting
# Troubleshooting: Flaky E2E Tests
## Symptom: A test passes locally but fails in CI
### Possible Cause 1: Timing issue
**How to verify:** Add a `console.log(Date.now())` before the
failing assertion. Check if the element is rendering later in CI.
**Fix:** Add an explicit wait: `await page.waitForSelector('.element')`
Do NOT use `page.waitForTimeout()` as it masks the real issue.
### Possible Cause 2: Screen size difference
**How to verify:** Check the viewport size in CI config vs local.
CI runs at 1280x720; local may be different.
**Fix:** Set viewport explicitly in the test or test config.
### Possible Cause 3: Test data dependency
**How to verify:** Run the test in isolation (`--grep "test name"`).
If it passes alone but fails in the suite, another test is
modifying shared data.
**Fix:** Ensure each test creates its own data and cleans up.
### Possible Cause 4: Parallel execution race condition
**How to verify:** Run with `--workers=1`. If it passes, the
issue is parallelism-related.
**Fix:** Isolate test data per worker or use test-level locks.
Decision Logs
Decision logs record why certain choices were made. They are invaluable when a new team member asks "why do we do it this way?" or when the team revisits a decision months later.
Decision Log Format
# Decision: [Title]
**Date:** [YYYY-MM-DD]
**Decision-makers:** [Names]
**Status:** Accepted / Superseded by [link]
## Context
What situation prompted this decision?
## Options Considered
### Option A: [Name]
- Pros: ...
- Cons: ...
### Option B: [Name]
- Pros: ...
- Cons: ...
## Decision
Which option was chosen and why.
## Consequences
What trade-offs were accepted. What follow-up actions are needed.
Example Decision Log
# Decision: Choosing Playwright over Cypress for E2E Testing
**Date:** 2025-09-15
**Decision-makers:** QA Lead, Engineering Manager, Senior SDET
**Status:** Accepted
## Context
Our Cypress test suite has grown to 350 tests and is experiencing
significant pain points: single-browser limitation (we need Safari),
no native multi-tab support, and slow execution (45 min for full suite).
## Options Considered
### Option A: Stay with Cypress
- Pros: No migration cost, team is familiar, large community
- Cons: No Safari support, single-tab only, performance ceiling
### Option B: Migrate to Playwright
- Pros: Multi-browser (including Safari), multi-tab, faster execution,
better debugging tools, auto-wait reduces flakiness
- Cons: Migration cost (~3 weeks), team retraining needed
### Option C: Use both (Cypress for existing, Playwright for new)
- Pros: No migration risk, gradual transition
- Cons: Two frameworks to maintain, double the learning curve
## Decision
Migrate fully to Playwright (Option B). The migration cost is
justified by the long-term benefits of multi-browser support
and performance improvements.
## Consequences
- 3-week migration sprint dedicated to converting tests
- Team training sessions scheduled for weeks 1-2
- Cypress will be fully removed after migration is verified
- Expected 40% reduction in suite execution time
Searchability and Discoverability
A knowledge base that nobody can find information in is useless. Invest in discoverability.
Tagging and Categorization
| Strategy | Implementation |
|---|---|
| Consistent naming conventions | runbook-*, guide-*, troubleshoot-* prefixes |
| Tags / labels | Tag pages by area (checkout, payment), type (runbook, guide), audience (new hire, senior) |
| Cross-references | Link related pages to each other ("See also: ...") |
| Glossary | Define acronyms and domain terms in one central glossary page |
| Landing pages | Create index pages for each major section with descriptions |
Search Optimization
- Use descriptive titles: "How to Reset the Staging Database" not "Database Runbook"
- Include keywords people search for: If people search for "flaky tests," make sure that phrase appears in the relevant document
- Write introductory paragraphs: Most search tools index the first paragraph heavily
- Use headings that match questions: "How do I set up the test environment?" as a heading makes the page findable for that exact question
Wiki Maintenance: Preventing Documentation Rot
Documentation rot is the gradual accumulation of outdated, inaccurate, and irrelevant content that makes the entire knowledge base untrustworthy.
The Documentation Rot Cycle
New docs written → Team relies on docs → Processes change →
Docs become outdated → Team stops trusting docs →
Team stops reading docs → Team stops writing docs →
Knowledge lives only in heads → New docs written (poorly, hastily)
Breaking the Cycle
| Practice | Frequency | Impact |
|---|---|---|
| Ownership assignment | Once (update as needed) | Every page has someone responsible |
| Quarterly review | Every 3 months | Owner verifies accuracy of their pages |
| New hire validation | With each onboarding | Newcomers flag inaccuracies in real-time |
| Archive unused pages | Quarterly | Reduce noise; archive pages with no views in 6 months |
| "Last verified" dates | On every page | Readers know how current the information is |
| Feedback mechanism | Always active | A simple "Was this helpful? Report an issue" link on every page |
The Page Health Checklist
Run this checklist quarterly for every active page:
- Is the information accurate?
- Are all commands and URLs current?
- Are screenshots current?
- Is the page still needed? (Check view counts)
- Does the page have an owner?
- Is the "last verified" date updated?
- Are all links working?
Tools Comparison
| Tool | Type | Best For | Strengths | Weaknesses |
|---|---|---|---|---|
| Confluence | Enterprise wiki | Atlassian ecosystem teams | Jira integration, permissions, structured spaces | Slow, cluttered UI, search is weak, expensive |
| Notion | Modern wiki/database | Small to mid-size teams | Flexible databases, clean UI, templates | Limited permissions granularity, export limitations |
| GitHub Wiki | Git-based wiki | Developer-centric teams | Version controlled, free, integrated with repo | Limited features, no built-in search, clunky editor |
| Obsidian | Local-first markdown | Personal knowledge management, small teams | Fast, extensible, works offline, Markdown-native | No built-in collaboration, requires sync setup |
| GitBook | Documentation platform | Public-facing or customer docs | Clean design, Git sync, versioning | Limited customization, paid for teams |
| MkDocs + Material | Static site generator | Technical documentation | Fast, beautiful, extensible, docs-as-code | Requires deployment setup, developer-oriented |
| Docusaurus | Static site generator | Product documentation | Versioning, React-based, Meta-backed | Heavier setup, React knowledge helpful |
Choosing the Right Tool
| If your team... | Consider... |
|---|---|
| Already uses Atlassian (Jira, Bitbucket) | Confluence (integration value outweighs UI pain) |
| Values simplicity and flexibility | Notion |
| Wants docs-as-code with version control | MkDocs + Material or Docusaurus |
| Needs a lightweight free solution | GitHub Wiki or repository Markdown files |
| Has a single person managing QA docs | Obsidian (for drafting) + a shared platform (for publishing) |
| Needs customer-facing documentation | GitBook or Docusaurus |
Hands-On Exercise
- Audit your current QA knowledge base. Create an inventory of what exists, what is outdated, and what is missing.
- Write one runbook for a QA task that currently lives only in someone's head (environment setup, test data refresh, or deployment verification).
- Create a troubleshooting guide for the most common issue your QA team encounters.
- Write a decision log for the most recent significant technical decision your team made.
- Implement a quarterly documentation review process: assign owners, set review dates, and create a tracking mechanism.
Interview Talking Point: "I treat documentation as a first-class engineering deliverable, not an afterthought. On my teams, I have established docs-as-code practices where test documentation lives in the repository alongside the tests, goes through PR review, and is validated by CI -- including link checking and format linting. I build QA knowledge bases structured around tasks rather than document types, with runbooks for common procedures, troubleshooting guides for known issues, and decision logs that capture why we made certain testing choices. I assign ownership to every document and run quarterly freshness reviews, because outdated documentation is worse than no documentation. When team members leave, I conduct structured knowledge extraction sessions to capture domain knowledge before it walks out the door. The result is a team that onboards new engineers in days instead of weeks, avoids repeating past mistakes, and maintains institutional knowledge regardless of personnel changes."