Deployment Strategies
Where Tests Fit in Deployment
Deployment is the moment tests face reality. Different deployment strategies create different risk profiles, and each demands a different testing approach. Understanding these strategies helps QA engineers design tests that provide the right level of confidence at the right time.
Blue-Green Deployment
How It Works
Two identical environments exist: blue (current production) and green (new version). The new version is deployed to green. Once validated, traffic switches from blue to green. If anything goes wrong, switch back instantly.
Load Balancer
/ \
Blue (v2.3) Green (v2.4)
[current] [new, being tested]
↑
Run full test suite here
before switching traffic
Where Tests Run
- Before the switch: Run your full browser test suite, integration tests, and performance smoke tests against the green environment
- After the switch: Run smoke tests against production to verify the switch was clean
- Rollback trigger: If post-switch smoke tests fail, switch back to blue immediately
# Example: Blue-green deployment with testing gates
deploy-to-green:
runs-on: ubuntu-latest
steps:
- run: ./deploy.sh green
test-green:
needs: deploy-to-green
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm ci
- run: npx playwright test --project=regression
env:
BASE_URL: https://green.example.com
switch-traffic:
needs: test-green
runs-on: ubuntu-latest
steps:
- run: ./switch-traffic.sh blue-to-green
smoke-production:
needs: switch-traffic
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm ci
- run: npx playwright test --project=smoke
env:
BASE_URL: https://production.example.com
Risk Level: Low
Instant rollback by switching traffic back to blue. The full test suite runs against the new version before any real users see it.
QA Implications
- You need a complete, reliable test suite that can run against an isolated environment
- Tests must be environment-agnostic (configurable via
BASE_URL) - Test data in the green environment must match production-like conditions
- The test suite must finish in a reasonable time (blocking deployment for 2 hours is unacceptable)
Canary Deployment
How It Works
The new version is deployed to a small subset of servers (the "canary"). A small percentage of real traffic (typically 1-5%) is routed to the canary. If metrics look good, traffic gradually increases until the canary becomes the new production.
Load Balancer
/ | \
v2.3 v2.3 v2.4 (canary)
95% 5% of traffic
↑
Monitor metrics here:
- Error rate
- Latency p99
- Business metrics
Where Tests Run
- Before canary deployment: Run the standard test suite against a staging environment
- During canary: Rely on synthetic monitoring and real-time metrics rather than full test suites -- the canary is serving real traffic
- Promotion decision: Based on metric comparison between canary and baseline
# Example: Canary monitoring
canary-monitoring:
runs-on: ubuntu-latest
steps:
- name: Wait for canary stabilization
run: sleep 300 # 5 minutes for metrics to accumulate
- name: Compare canary metrics
run: |
CANARY_ERROR_RATE=$(curl -s "$METRICS_API/canary/error-rate")
BASELINE_ERROR_RATE=$(curl -s "$METRICS_API/baseline/error-rate")
if (( $(echo "$CANARY_ERROR_RATE > $BASELINE_ERROR_RATE * 1.1" | bc -l) )); then
echo "Canary error rate ($CANARY_ERROR_RATE) exceeds baseline ($BASELINE_ERROR_RATE) by >10%"
exit 1
fi
- name: Run synthetic smoke tests
run: npx playwright test --project=smoke
env:
BASE_URL: https://canary.example.com
Risk Level: Medium
Real users hit the new code early, but only a small percentage. If the canary is bad, only 5% of users are affected, and rollback is automatic.
QA Implications
- Synthetic monitoring tests must be lightweight and fast (they run against live traffic)
- Focus on metrics: error rates, latency percentiles, business conversion rates
- You need good observability (logging, metrics, tracing) to detect canary issues
- Test the rollback mechanism itself -- a canary that cannot roll back is useless
Rolling Deployment
How It Works
Instances are updated one at a time (or in small batches) across the fleet. At any given moment during the rollout, some instances run the old version and some run the new version.
Start: [v2.3] [v2.3] [v2.3] [v2.3]
Step 1: [v2.4] [v2.3] [v2.3] [v2.3] ← Health check on v2.4
Step 2: [v2.4] [v2.4] [v2.3] [v2.3] ← Health check on v2.4
Step 3: [v2.4] [v2.4] [v2.4] [v2.3] ← Health check on v2.4
Step 4: [v2.4] [v2.4] [v2.4] [v2.4] ← Done
Where Tests Run
- Before deployment: Standard test suite against staging
- During rollout: Health checks on each updated instance before proceeding
- After full rollout: Smoke tests against production
Risk Level: Medium
Mixed versions run simultaneously during the rollout. This can cause issues if the old and new versions have incompatible database schemas or API contracts.
QA Implications
- Tests must account for backward compatibility during the rollout window
- Health check endpoints must be meaningful (not just returning 200)
- Database migrations must be backward-compatible (expand-contract pattern)
- API versioning becomes important -- can old clients talk to new servers and vice versa?
Strategy Comparison
| Strategy | How It Works | Where Tests Run | Risk Level | Rollback Speed | Best For |
|---|---|---|---|---|---|
| Blue-Green | Two environments; traffic switches | Full suite against green before switch | Low | Instant | Teams with reliable test suites |
| Canary | Small % of traffic to new version | Monitoring and synthetic tests | Medium | Fast (automatic) | High-traffic apps with good observability |
| Rolling | Instances update one at a time | Health checks per instance | Medium | Slow (roll forward/back) | Stateless services with large fleets |
Feature Flags as a Testing Strategy
Feature flags decouple deployment from release. You deploy the code but keep new features hidden behind flags, then enable them gradually.
// Feature flag check in application code
if (featureFlags.isEnabled('new-checkout-flow', { userId })) {
return newCheckoutFlow();
} else {
return legacyCheckoutFlow();
}
QA implications of feature flags:
- Test both states of every flag (enabled and disabled)
- Test flag combinations if features interact
- Verify that disabling a flag cleanly reverts to the old behavior
- Clean up old flags -- stale flags create technical debt and increase test complexity
// Test both flag states
test('checkout flow with new feature enabled', async () => {
await setFeatureFlag('new-checkout-flow', true);
// ... test new behavior
});
test('checkout flow with new feature disabled', async () => {
await setFeatureFlag('new-checkout-flow', false);
// ... test legacy behavior
});
Testing the Deployment Pipeline Itself
The deployment pipeline is software. It can have bugs. Test it.
What to verify:
- Rollback works correctly (trigger a rollback and verify the old version is restored)
- Health checks actually detect unhealthy instances (deploy a broken version and verify the pipeline stops)
- Smoke tests run against the correct environment (not accidentally testing staging when you think you are testing production)
- Notifications fire on failure (break the pipeline intentionally and verify Slack/email alerts)
Hands-On Exercise
- Identify which deployment strategy your team uses. If you do not know, ask your DevOps/platform team.
- Map out where tests currently run in your deployment process. Are there gaps?
- Write a smoke test suite that can run against any environment (configurable via
BASE_URL) - If using canary deployments, identify the key metrics you should monitor during rollout
- Test your rollback procedure. Can you roll back a deployment in under 5 minutes?
- If your team uses feature flags, write tests that cover both flag states for a current feature