QA Engineer Skills 2026QA-2026Canary Deployments

Canary Deployments

What Is a Canary Deployment?

A canary deployment routes a small percentage of production traffic to the new version while the old version serves the rest. Unlike feature flags (which control features at the application level), canary deployments control at the infrastructure level -- the user does not know they are hitting a different version.

The name comes from the "canary in a coal mine" -- a small group of users acts as an early warning system. If the canary version shows degraded metrics, traffic is shifted back to the stable version before most users are affected.


Deployment Strategy Comparison

Strategy Traffic Split Rollback Speed Infrastructure Cost Observability Need
Canary 1-10% new, rest old Seconds (shift traffic) Low (few new pods) High (compare metrics)
Blue-Green 100% switch Seconds (DNS/LB switch) High (2x infrastructure) Medium
Rolling Gradual pod replacement Minutes (scale down new) Low Medium
Shadow/Dark 0% user-facing (mirror) N/A (no user impact) Medium (duplicate processing) High

When to Use Each

  • Canary: Default choice for critical services where you want statistical validation before full rollout
  • Blue-Green: When you need instant, complete rollback capability (e.g., database schema changes)
  • Rolling: For non-critical services where gradual replacement is sufficient
  • Shadow: For validating a complete rewrite against production traffic without user impact

Canary Analysis with Kayenta (Spinnaker)

Kayenta is Netflix's automated canary analysis tool, integrated with Spinnaker. It compares metrics between the canary and baseline versions and produces a statistical judgment.

{
  "canaryConfig": {
    "name": "checkout-service-canary",
    "judge": {
      "judgeConfigurations": {},
      "name": "NetflixACAJudge-v1.0"
    },
    "metrics": [
      {
        "name": "error_rate",
        "query": {
          "type": "prometheus",
          "customInlineTemplate": "sum(rate(http_requests_total{status=~\"5..\",app=\"checkout\",version=\"${scope}\"}[5m])) / sum(rate(http_requests_total{app=\"checkout\",version=\"${scope}\"}[5m]))"
        },
        "analysisConfigurations": {
          "canary": {
            "direction": "increase",
            "critical": true,
            "mustHaveData": true
          }
        },
        "scopeName": "default"
      },
      {
        "name": "latency_p99",
        "query": {
          "type": "prometheus",
          "customInlineTemplate": "histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{app=\"checkout\",version=\"${scope}\"}[5m])) by (le))"
        },
        "analysisConfigurations": {
          "canary": {
            "direction": "increase",
            "critical": true
          }
        },
        "scopeName": "default"
      },
      {
        "name": "saturation_cpu",
        "query": {
          "type": "prometheus",
          "customInlineTemplate": "avg(rate(container_cpu_usage_seconds_total{app=\"checkout\",version=\"${scope}\"}[5m]))"
        },
        "analysisConfigurations": {
          "canary": {
            "direction": "increase",
            "critical": false
          }
        },
        "scopeName": "default"
      }
    ],
    "classifier": {
      "groupWeights": {
        "Errors": 40,
        "Latency": 35,
        "Saturation": 25
      }
    }
  }
}

How Kayenta Scores Canaries

  1. Collect metrics from both canary and baseline for the analysis window
  2. Compare distributions using the Mann-Whitney U test (non-parametric)
  3. Score each metric as Pass, Marginal, or Fail
  4. Apply group weights (Errors 40%, Latency 35%, Saturation 25%)
  5. Produce a final score (0-100). Typically, >70 = promote, <50 = rollback, 50-70 = extend observation

Canary with Argo Rollouts (Kubernetes-Native)

For teams not using Spinnaker, Argo Rollouts provides Kubernetes-native canary deployments:

# argo-canary-rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: checkout-service
spec:
  replicas: 10
  strategy:
    canary:
      canaryService: checkout-canary
      stableService: checkout-stable
      trafficRouting:
        istio:
          virtualService:
            name: checkout-vsvc
            routes:
              - primary
      steps:
        - setWeight: 5       # 5% to canary
        - pause: { duration: 5m }
        - analysis:
            templates:
              - templateName: canary-analysis
            args:
              - name: service-name
                value: checkout-service
        - setWeight: 25      # 25% to canary
        - pause: { duration: 10m }
        - analysis:
            templates:
              - templateName: canary-analysis
        - setWeight: 50      # 50% to canary
        - pause: { duration: 10m }
        - analysis:
            templates:
              - templateName: canary-analysis
        - setWeight: 100     # promote canary to stable

---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: canary-analysis
spec:
  metrics:
    - name: error-rate
      interval: 1m
      successCondition: result < 0.01
      provider:
        prometheus:
          address: http://prometheus.monitoring:9090
          query: |
            sum(rate(http_requests_total{status=~"5..",app="checkout",
              rollouts-pod-template-hash="{{args.canary-hash}}"}[5m]))
            /
            sum(rate(http_requests_total{app="checkout",
              rollouts-pod-template-hash="{{args.canary-hash}}"}[5m]))
    - name: latency-p99
      interval: 1m
      successCondition: result < 0.5
      provider:
        prometheus:
          address: http://prometheus.monitoring:9090
          query: |
            histogram_quantile(0.99, sum(rate(
              http_request_duration_seconds_bucket{app="checkout",
                rollouts-pod-template-hash="{{args.canary-hash}}"}[5m])) by (le))

Key Decisions for Canary Deployments

How Much Traffic to Send to Canary?

Traffic Percentage Use Case Risk Level
1% High-risk changes (payment, auth) Very low
5% Standard feature releases Low
10% Low-risk changes with high confidence Low
25% Changes that need more traffic volume for statistical significance Medium

How Long to Observe?

The observation window depends on traffic volume and the statistical significance required:

  • High traffic services (>1000 rps): 10-15 minutes provides enough data points
  • Medium traffic (100-1000 rps): 30-60 minutes
  • Low traffic (<100 rps): 2-6 hours (consider synthetic traffic augmentation)

What Metrics to Compare?

At minimum, compare these between canary and baseline:

  1. Error rate (critical -- always include)
  2. Latency percentiles (p50, p95, p99)
  3. Saturation metrics (CPU, memory per pod)
  4. Business metrics (conversion rate, revenue per request -- if available in real-time)

Common Canary Pitfalls

Pitfall Problem Solution
Insufficient traffic Cannot reach statistical significance Increase canary percentage or observation window
Only checking averages Masks tail latency regressions Compare p95 and p99, not just mean
No automatic rollback Human delay allows more users to be affected Configure automatic rollback on metric threshold
Ignoring business metrics Technically fast but functionally broken Include conversion rate and error count in canary analysis
Same-version canary Canary always passes because it is identical to baseline Verify canary is actually running the new version
Cache warming effects Canary starts slow due to cold caches Allow a warm-up period before starting metric comparison

Canary Deployment Checklist

Before enabling canary deployments:

  • Metrics pipeline can differentiate traffic by version (labels, headers, or pod identity)
  • Automated rollback is configured (not just manual intervention)
  • Analysis window is long enough for your traffic volume
  • Both canary and baseline are monitored by the same dashboards
  • Alert routing accounts for canary failures (do not page for expected experiments)
  • The team understands that a rollback is a success, not a failure -- you caught a problem before it reached all users