Synthetic Monitoring

What Is Synthetic Monitoring?

Synthetic monitoring means running automated tests continuously against production -- not just during deployments, but 24/7. These are not load tests; they are lightweight probes that verify critical user journeys remain functional, detecting issues between deployments that no CI pipeline would catch.

Examples of what synthetic monitoring catches:

Third-party service degradation (payment provider down)
Certificate expiration
CDN misconfigurations
DNS resolution failures
Regional connectivity issues
Database connection pool exhaustion
Slow memory leaks that accumulate over hours

Synthetic Monitoring Architecture

  +-------------------+
  | Scheduler (cron)  |  -- Every 5 minutes from 3 regions
  +---------+---------+
            |
            v
  +---------+---------+     +---------+---------+     +---------+---------+
  | us-east-1 runner  |     | eu-west-1 runner  |     | ap-south-1 runner |
  | - Login flow      |     | - Login flow      |     | - Login flow      |
  | - Search flow     |     | - Search flow     |     | - Search flow     |
  | - Checkout flow   |     | - Checkout flow   |     | - Checkout flow   |
  +---------+---------+     +---------+---------+     +---------+---------+
            |                         |                         |
            +------------+------------+------------+------------+
                         |
                         v
              +----------+----------+
              | Metrics / Alerting  |   <-- Grafana, Datadog, PagerDuty
              | - Pass/fail status  |
              | - Response times    |
              | - Screenshot diffs  |
              +---------------------+

Playwright Synthetic Monitor Script

// synthetic/checkout-flow.spec.ts
// Runs every 5 minutes against production
import { test, expect } from '@playwright/test';

test.describe('Production Checkout Flow', () => {
  test.setTimeout(30_000); // 30s hard timeout for synthetic tests

  test('complete purchase of a test product', async ({ page }) => {
    // Step 1: Navigate and verify homepage
    const startTime = Date.now();
    await page.goto('https://store.example.com');
    await expect(page.locator('h1')).toContainText('Welcome');

    const homepageLoadTime = Date.now() - startTime;
    console.log(`METRIC homepage_load_ms=${homepageLoadTime}`);

    // Step 2: Search for test product
    await page.fill('[data-testid="search-input"]', 'synthetic-test-product');
    await page.click('[data-testid="search-button"]');
    await expect(page.locator('[data-testid="search-results"]')).toBeVisible();

    // Step 3: Add to cart
    await page.click(
      '[data-testid="product-card"]:first-child [data-testid="add-to-cart"]'
    );
    await expect(page.locator('[data-testid="cart-count"]')).toHaveText('1');

    // Step 4: Begin checkout (using test payment method)
    await page.click('[data-testid="cart-icon"]');
    await page.click('[data-testid="checkout-button"]');

    // Use test payment method that does not charge
    await page.fill('[data-testid="card-number"]', '4242424242424242');
    await page.fill('[data-testid="card-expiry"]', '12/28');
    await page.fill('[data-testid="card-cvc"]', '123');

    await page.click('[data-testid="place-order"]');

    // Step 5: Verify order confirmation
    await expect(page.locator('[data-testid="order-confirmation"]')).toBeVisible({
      timeout: 15_000,
    });
    await expect(page.locator('[data-testid="order-id"]')).toBeVisible();

    const totalFlowTime = Date.now() - startTime;
    console.log(`METRIC checkout_flow_total_ms=${totalFlowTime}`);

    // Assert performance budget
    expect(totalFlowTime).toBeLessThan(15_000);
  });
});

Running Synthetics as a Cron Job in CI

# .github/workflows/synthetic-monitor.yml
name: Synthetic Monitoring
on:
  schedule:
    - cron: '*/5 * * * *'  # every 5 minutes
  workflow_dispatch: {}     # allow manual trigger

jobs:
  synthetic-us-east:
    runs-on: ubuntu-latest  # GitHub-hosted runner (US East)
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - run: npm ci
      - name: Install Playwright browsers
        run: npx playwright install --with-deps chromium

      - name: Run synthetic tests
        run: npx playwright test synthetic/ --reporter=json
        env:
          BASE_URL: https://store.example.com
          SYNTHETIC_MODE: true

      - name: Push metrics to Datadog
        if: always()
        run: |
          python scripts/push_synthetic_metrics.py \
            --results playwright-report/results.json \
            --region us-east-1 \
            --datadog-api-key ${{ secrets.DATADOG_API_KEY }}

      - name: Alert on failure
        if: failure()
        run: |
          curl -X POST "${{ secrets.PAGERDUTY_EVENTS_URL }}" \
            -H "Content-Type: application/json" \
            -d '{
              "routing_key": "${{ secrets.PD_ROUTING_KEY }}",
              "event_action": "trigger",
              "payload": {
                "summary": "Synthetic checkout flow FAILED in us-east-1",
                "severity": "critical",
                "source": "github-actions-synthetic"
              }
            }'

Designing Effective Synthetic Tests

What to Monitor

Journey	Priority	Frequency	Timeout
Homepage load	Critical	Every 1 min	10s
User login	Critical	Every 2 min	15s
Product search	High	Every 5 min	15s
Add to cart	High	Every 5 min	15s
Checkout flow	Critical	Every 5 min	30s
API health endpoints	Critical	Every 30s	5s

Best Practices

Use dedicated test data. Create a "synthetic-test-product" that is always in stock, never on sale, and easily identified in analytics filters.
Tag synthetic traffic. Add a header or query parameter (?synthetic=true) so analytics and billing systems can filter out synthetic activity.
Run from multiple regions. A test passing in us-east-1 and failing in ap-south-1 immediately identifies regional issues.
Keep tests simple. Synthetic tests should be the simplest possible path through a critical flow. Complex multi-branch test logic belongs in CI, not in production monitoring.
Set aggressive timeouts. If a checkout flow takes more than 30 seconds in production, it is effectively broken regardless of whether it eventually succeeds.
Capture screenshots on failure. Attach screenshots to alert payloads so the on-call engineer can see what the user would see.

Synthetic Monitoring Platforms

Platform	Type	Browser Support	Regions	Cost
Datadog Synthetic	SaaS	Chrome, API	100+ locations	Per-test pricing
Grafana Synthetic	SaaS / Self-hosted	Chrome, API	30+ locations	Included in Grafana Cloud
Checkly	SaaS	Playwright-native	20+ locations	Per-check pricing
GitHub Actions (DIY)	Self-hosted	Any (Playwright)	GitHub runner regions	Runner minutes
AWS CloudWatch Synthetics	SaaS	Chrome (Puppeteer)	All AWS regions	Per-canary pricing

Choosing a Platform

Checkly is the best choice for teams already using Playwright, as it runs Playwright scripts natively
Datadog Synthetic integrates seamlessly if you already use Datadog for monitoring
GitHub Actions DIY is cost-effective for small teams willing to build their own reporting pipeline
Grafana Synthetic is ideal for teams invested in the Grafana ecosystem

Metrics to Track from Synthetic Monitoring

Metric	Purpose	Alert Threshold
Pass/fail rate per journey	Overall health	Any failure
Availability (% of passing runs)	SLO compliance	< 99.5% over 7 days
Response time per step	Performance tracking	> 2x baseline
Regional availability	Geographic health	Any region < 99%
Screenshot diff score	Visual regression	> 10% pixel difference

Synthetic monitoring is the night watch of your production environment. When your team is asleep, synthetic tests are verifying that critical user journeys still work.