Performance and Chaos Engineering
A modern QA architect does not just verify correctness -- they verify that systems perform under pressure and recover gracefully from failure. This chapter covers AI-driven load profiling, modern load testing tooling, chaos engineering principles, performance budgets in CI, and the unique challenges of performance-testing LLM-powered features.
Chapter Contents
1. Load Testing — 01-load-testing/
- AI-Driven Profiling — Replace guesswork with production-derived traffic models
- k6 and Locust — Hands-on scripting with the two leading open-source load testing tools
- Tool Comparison — Decision matrix for choosing the right load testing tool
2. Chaos Engineering — 02-chaos-engineering/
- Principles and Cycle — The scientific method applied to system resilience
- Litmus Experiments — Writing and running chaos experiments on Kubernetes
- Chaos Tools — Chaos Monkey, Litmus, Gremlin, Chaos Mesh, and more
3. Performance Budgets — 03-performance-budgets/
- Lighthouse CI — Enforcing frontend performance budgets automatically
- Web Vitals in CI — GitHub Actions workflows for performance gates
4. LLM Performance — 04-llm-performance/
- LLM Metrics — TTFT, tokens per second, cold start latency, rate limit headroom
- Load Testing LLM Endpoints — k6 scripts for AI endpoint stress testing
5. SRE Skills — 05-sre-skills/
- SLO, SLI, Error Budgets — The SRE framework QA architects must master
- Game Days — Designing and running incident response exercises
6. Cloud-Native Performance — 06-cloud-native/
- Serverless Performance — Cold starts, concurrency limits, and testing strategies
- Kubernetes Scaling — Validating HPA behavior and architecture-specific testing
Why This Matters
Performance and resilience are two sides of the same coin. Performance testing tells you how fast the system runs under load; chaos engineering tells you what happens when that load arrives during a failure. Together, they give QA architects a complete picture of system quality under real-world conditions.
Core principle: If you cannot measure it in CI, it will regress in production.