Synthetic Monitoring
What Is Synthetic Monitoring?
Synthetic monitoring means running automated tests continuously against production -- not just during deployments, but 24/7. These are not load tests; they are lightweight probes that verify critical user journeys remain functional, detecting issues between deployments that no CI pipeline would catch.
Examples of what synthetic monitoring catches:
- Third-party service degradation (payment provider down)
- Certificate expiration
- CDN misconfigurations
- DNS resolution failures
- Regional connectivity issues
- Database connection pool exhaustion
- Slow memory leaks that accumulate over hours
Synthetic Monitoring Architecture
+-------------------+
| Scheduler (cron) | -- Every 5 minutes from 3 regions
+---------+---------+
|
v
+---------+---------+ +---------+---------+ +---------+---------+
| us-east-1 runner | | eu-west-1 runner | | ap-south-1 runner |
| - Login flow | | - Login flow | | - Login flow |
| - Search flow | | - Search flow | | - Search flow |
| - Checkout flow | | - Checkout flow | | - Checkout flow |
+---------+---------+ +---------+---------+ +---------+---------+
| | |
+------------+------------+------------+------------+
|
v
+----------+----------+
| Metrics / Alerting | <-- Grafana, Datadog, PagerDuty
| - Pass/fail status |
| - Response times |
| - Screenshot diffs |
+---------------------+
Playwright Synthetic Monitor Script
// synthetic/checkout-flow.spec.ts
// Runs every 5 minutes against production
import { test, expect } from '@playwright/test';
test.describe('Production Checkout Flow', () => {
test.setTimeout(30_000); // 30s hard timeout for synthetic tests
test('complete purchase of a test product', async ({ page }) => {
// Step 1: Navigate and verify homepage
const startTime = Date.now();
await page.goto('https://store.example.com');
await expect(page.locator('h1')).toContainText('Welcome');
const homepageLoadTime = Date.now() - startTime;
console.log(`METRIC homepage_load_ms=${homepageLoadTime}`);
// Step 2: Search for test product
await page.fill('[data-testid="search-input"]', 'synthetic-test-product');
await page.click('[data-testid="search-button"]');
await expect(page.locator('[data-testid="search-results"]')).toBeVisible();
// Step 3: Add to cart
await page.click(
'[data-testid="product-card"]:first-child [data-testid="add-to-cart"]'
);
await expect(page.locator('[data-testid="cart-count"]')).toHaveText('1');
// Step 4: Begin checkout (using test payment method)
await page.click('[data-testid="cart-icon"]');
await page.click('[data-testid="checkout-button"]');
// Use test payment method that does not charge
await page.fill('[data-testid="card-number"]', '4242424242424242');
await page.fill('[data-testid="card-expiry"]', '12/28');
await page.fill('[data-testid="card-cvc"]', '123');
await page.click('[data-testid="place-order"]');
// Step 5: Verify order confirmation
await expect(page.locator('[data-testid="order-confirmation"]')).toBeVisible({
timeout: 15_000,
});
await expect(page.locator('[data-testid="order-id"]')).toBeVisible();
const totalFlowTime = Date.now() - startTime;
console.log(`METRIC checkout_flow_total_ms=${totalFlowTime}`);
// Assert performance budget
expect(totalFlowTime).toBeLessThan(15_000);
});
});
Running Synthetics as a Cron Job in CI
# .github/workflows/synthetic-monitor.yml
name: Synthetic Monitoring
on:
schedule:
- cron: '*/5 * * * *' # every 5 minutes
workflow_dispatch: {} # allow manual trigger
jobs:
synthetic-us-east:
runs-on: ubuntu-latest # GitHub-hosted runner (US East)
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
- run: npm ci
- name: Install Playwright browsers
run: npx playwright install --with-deps chromium
- name: Run synthetic tests
run: npx playwright test synthetic/ --reporter=json
env:
BASE_URL: https://store.example.com
SYNTHETIC_MODE: true
- name: Push metrics to Datadog
if: always()
run: |
python scripts/push_synthetic_metrics.py \
--results playwright-report/results.json \
--region us-east-1 \
--datadog-api-key ${{ secrets.DATADOG_API_KEY }}
- name: Alert on failure
if: failure()
run: |
curl -X POST "${{ secrets.PAGERDUTY_EVENTS_URL }}" \
-H "Content-Type: application/json" \
-d '{
"routing_key": "${{ secrets.PD_ROUTING_KEY }}",
"event_action": "trigger",
"payload": {
"summary": "Synthetic checkout flow FAILED in us-east-1",
"severity": "critical",
"source": "github-actions-synthetic"
}
}'
Designing Effective Synthetic Tests
What to Monitor
| Journey | Priority | Frequency | Timeout |
|---|---|---|---|
| Homepage load | Critical | Every 1 min | 10s |
| User login | Critical | Every 2 min | 15s |
| Product search | High | Every 5 min | 15s |
| Add to cart | High | Every 5 min | 15s |
| Checkout flow | Critical | Every 5 min | 30s |
| API health endpoints | Critical | Every 30s | 5s |
Best Practices
Use dedicated test data. Create a "synthetic-test-product" that is always in stock, never on sale, and easily identified in analytics filters.
Tag synthetic traffic. Add a header or query parameter (
?synthetic=true) so analytics and billing systems can filter out synthetic activity.Run from multiple regions. A test passing in us-east-1 and failing in ap-south-1 immediately identifies regional issues.
Keep tests simple. Synthetic tests should be the simplest possible path through a critical flow. Complex multi-branch test logic belongs in CI, not in production monitoring.
Set aggressive timeouts. If a checkout flow takes more than 30 seconds in production, it is effectively broken regardless of whether it eventually succeeds.
Capture screenshots on failure. Attach screenshots to alert payloads so the on-call engineer can see what the user would see.
Synthetic Monitoring Platforms
| Platform | Type | Browser Support | Regions | Cost |
|---|---|---|---|---|
| Datadog Synthetic | SaaS | Chrome, API | 100+ locations | Per-test pricing |
| Grafana Synthetic | SaaS / Self-hosted | Chrome, API | 30+ locations | Included in Grafana Cloud |
| Checkly | SaaS | Playwright-native | 20+ locations | Per-check pricing |
| GitHub Actions (DIY) | Self-hosted | Any (Playwright) | GitHub runner regions | Runner minutes |
| AWS CloudWatch Synthetics | SaaS | Chrome (Puppeteer) | All AWS regions | Per-canary pricing |
Choosing a Platform
- Checkly is the best choice for teams already using Playwright, as it runs Playwright scripts natively
- Datadog Synthetic integrates seamlessly if you already use Datadog for monitoring
- GitHub Actions DIY is cost-effective for small teams willing to build their own reporting pipeline
- Grafana Synthetic is ideal for teams invested in the Grafana ecosystem
Metrics to Track from Synthetic Monitoring
| Metric | Purpose | Alert Threshold |
|---|---|---|
| Pass/fail rate per journey | Overall health | Any failure |
| Availability (% of passing runs) | SLO compliance | < 99.5% over 7 days |
| Response time per step | Performance tracking | > 2x baseline |
| Regional availability | Geographic health | Any region < 99% |
| Screenshot diff score | Visual regression | > 10% pixel difference |
Synthetic monitoring is the night watch of your production environment. When your team is asleep, synthetic tests are verifying that critical user journeys still work.