Deployment Strategies

Where Tests Fit in Deployment

Deployment is the moment tests face reality. Different deployment strategies create different risk profiles, and each demands a different testing approach. Understanding these strategies helps QA engineers design tests that provide the right level of confidence at the right time.

Blue-Green Deployment

How It Works

Two identical environments exist: blue (current production) and green (new version). The new version is deployed to green. Once validated, traffic switches from blue to green. If anything goes wrong, switch back instantly.

                    Load Balancer
                    /           \
              Blue (v2.3)    Green (v2.4)
              [current]      [new, being tested]
                              ↑
                         Run full test suite here
                         before switching traffic

Where Tests Run

Before the switch: Run your full browser test suite, integration tests, and performance smoke tests against the green environment
After the switch: Run smoke tests against production to verify the switch was clean
Rollback trigger: If post-switch smoke tests fail, switch back to blue immediately

# Example: Blue-green deployment with testing gates
deploy-to-green:
  runs-on: ubuntu-latest
  steps:
    - run: ./deploy.sh green

test-green:
  needs: deploy-to-green
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - run: npm ci
    - run: npx playwright test --project=regression
      env:
        BASE_URL: https://green.example.com

switch-traffic:
  needs: test-green
  runs-on: ubuntu-latest
  steps:
    - run: ./switch-traffic.sh blue-to-green

smoke-production:
  needs: switch-traffic
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - run: npm ci
    - run: npx playwright test --project=smoke
      env:
        BASE_URL: https://production.example.com

Risk Level: Low

Instant rollback by switching traffic back to blue. The full test suite runs against the new version before any real users see it.

QA Implications

You need a complete, reliable test suite that can run against an isolated environment
Tests must be environment-agnostic (configurable via BASE_URL)
Test data in the green environment must match production-like conditions
The test suite must finish in a reasonable time (blocking deployment for 2 hours is unacceptable)

Canary Deployment

How It Works

The new version is deployed to a small subset of servers (the "canary"). A small percentage of real traffic (typically 1-5%) is routed to the canary. If metrics look good, traffic gradually increases until the canary becomes the new production.

                    Load Balancer
                   /      |      \
              v2.3     v2.3     v2.4 (canary)
              95%                5% of traffic
                                ↑
                           Monitor metrics here:
                           - Error rate
                           - Latency p99
                           - Business metrics

Where Tests Run

Before canary deployment: Run the standard test suite against a staging environment
During canary: Rely on synthetic monitoring and real-time metrics rather than full test suites -- the canary is serving real traffic
Promotion decision: Based on metric comparison between canary and baseline

# Example: Canary monitoring
canary-monitoring:
  runs-on: ubuntu-latest
  steps:
    - name: Wait for canary stabilization
      run: sleep 300  # 5 minutes for metrics to accumulate

    - name: Compare canary metrics
      run: |
        CANARY_ERROR_RATE=$(curl -s "$METRICS_API/canary/error-rate")
        BASELINE_ERROR_RATE=$(curl -s "$METRICS_API/baseline/error-rate")

        if (( $(echo "$CANARY_ERROR_RATE > $BASELINE_ERROR_RATE * 1.1" | bc -l) )); then
          echo "Canary error rate ($CANARY_ERROR_RATE) exceeds baseline ($BASELINE_ERROR_RATE) by >10%"
          exit 1
        fi

    - name: Run synthetic smoke tests
      run: npx playwright test --project=smoke
      env:
        BASE_URL: https://canary.example.com

Risk Level: Medium

Real users hit the new code early, but only a small percentage. If the canary is bad, only 5% of users are affected, and rollback is automatic.

QA Implications

Synthetic monitoring tests must be lightweight and fast (they run against live traffic)
Focus on metrics: error rates, latency percentiles, business conversion rates
You need good observability (logging, metrics, tracing) to detect canary issues
Test the rollback mechanism itself -- a canary that cannot roll back is useless

Rolling Deployment

How It Works

Instances are updated one at a time (or in small batches) across the fleet. At any given moment during the rollout, some instances run the old version and some run the new version.

Start:    [v2.3] [v2.3] [v2.3] [v2.3]
Step 1:   [v2.4] [v2.3] [v2.3] [v2.3]  ← Health check on v2.4
Step 2:   [v2.4] [v2.4] [v2.3] [v2.3]  ← Health check on v2.4
Step 3:   [v2.4] [v2.4] [v2.4] [v2.3]  ← Health check on v2.4
Step 4:   [v2.4] [v2.4] [v2.4] [v2.4]  ← Done

Where Tests Run

Before deployment: Standard test suite against staging
During rollout: Health checks on each updated instance before proceeding
After full rollout: Smoke tests against production

Risk Level: Medium

Mixed versions run simultaneously during the rollout. This can cause issues if the old and new versions have incompatible database schemas or API contracts.

QA Implications

Tests must account for backward compatibility during the rollout window
Health check endpoints must be meaningful (not just returning 200)
Database migrations must be backward-compatible (expand-contract pattern)
API versioning becomes important -- can old clients talk to new servers and vice versa?

Strategy Comparison

Strategy	How It Works	Where Tests Run	Risk Level	Rollback Speed	Best For
Blue-Green	Two environments; traffic switches	Full suite against green before switch	Low	Instant	Teams with reliable test suites
Canary	Small % of traffic to new version	Monitoring and synthetic tests	Medium	Fast (automatic)	High-traffic apps with good observability
Rolling	Instances update one at a time	Health checks per instance	Medium	Slow (roll forward/back)	Stateless services with large fleets

Feature Flags as a Testing Strategy

Feature flags decouple deployment from release. You deploy the code but keep new features hidden behind flags, then enable them gradually.

// Feature flag check in application code
if (featureFlags.isEnabled('new-checkout-flow', { userId })) {
  return newCheckoutFlow();
} else {
  return legacyCheckoutFlow();
}

QA implications of feature flags:

Test both states of every flag (enabled and disabled)
Test flag combinations if features interact
Verify that disabling a flag cleanly reverts to the old behavior
Clean up old flags -- stale flags create technical debt and increase test complexity

// Test both flag states
test('checkout flow with new feature enabled', async () => {
  await setFeatureFlag('new-checkout-flow', true);
  // ... test new behavior
});

test('checkout flow with new feature disabled', async () => {
  await setFeatureFlag('new-checkout-flow', false);
  // ... test legacy behavior
});

Testing the Deployment Pipeline Itself

The deployment pipeline is software. It can have bugs. Test it.

What to verify:

Rollback works correctly (trigger a rollback and verify the old version is restored)
Health checks actually detect unhealthy instances (deploy a broken version and verify the pipeline stops)
Smoke tests run against the correct environment (not accidentally testing staging when you think you are testing production)
Notifications fire on failure (break the pipeline intentionally and verify Slack/email alerts)

Hands-On Exercise

Identify which deployment strategy your team uses. If you do not know, ask your DevOps/platform team.
Map out where tests currently run in your deployment process. Are there gaps?
Write a smoke test suite that can run against any environment (configurable via BASE_URL)
If using canary deployments, identify the key metrics you should monitor during rollout
Test your rollback procedure. Can you roll back a deployment in under 5 minutes?
If your team uses feature flags, write tests that cover both flag states for a current feature