Ephemeral Environments

What Ephemeral Environments Are

Ephemeral environments are disposable copies of your entire infrastructure stack, spun up for each pull request and torn down after merge. Instead of sharing a single staging environment across all developers (with its inevitable "who broke staging?" moments), each PR gets its own isolated world.

The developer pushes a PR. CI creates a complete environment -- VPC, database, queues, app containers, API gateway. Tests run against this real infrastructure at a unique preview URL. The PR comment shows test results and a link to the preview. When the PR merges or closes, everything is destroyed.

Architecture

PR #142 opened
    |
    +-- Terraform workspace: pr-142
    |     +-- VPC (isolated)
    |     +-- RDS (small instance, seeded test data)
    |     +-- ECS service (app containers)
    |     +-- S3 buckets (prefixed pr-142-)
    |     +-- API Gateway (pr-142.preview.example.com)
    |
    +-- Run full test suite against pr-142.preview.example.com
    |     +-- Unit tests (already passed in CI)
    |     +-- Integration tests (real database, real queues)
    |     +-- E2E browser tests (Playwright against preview URL)
    |     +-- Security scan (DAST against live endpoints)
    |
    +-- PR comment with test results + preview URL
    |
PR #142 merged
    |
    +-- terraform workspace select pr-142 && terraform destroy -auto-approve

Terraform Workspaces for Ephemeral Environments

Terraform workspaces provide lightweight isolation. Each workspace maintains its own state file, so resources created in one workspace do not affect another.

Workspace-Aware Configuration

# main.tf -- workspace-aware naming
locals {
  env_name = terraform.workspace == "default" ? "production" : terraform.workspace
  prefix   = "myapp-${local.env_name}"

  # Use smaller instances for PR environments
  is_ephemeral = terraform.workspace != "default"
}

resource "aws_s3_bucket" "data" {
  bucket = "${local.prefix}-data"
  # Each PR gets its own bucket: myapp-pr-142-data
}

resource "aws_db_instance" "main" {
  identifier     = "${local.prefix}-db"
  instance_class = local.is_ephemeral ? "db.t4g.micro" : "db.r6g.xlarge"
  allocated_storage = local.is_ephemeral ? 20 : 500
  # PR environments use minimal instance sizes to control cost

  # Skip final snapshot for ephemeral environments
  skip_final_snapshot = local.is_ephemeral

  # Use a smaller backup retention for ephemeral
  backup_retention_period = local.is_ephemeral ? 0 : 7
}

resource "aws_ecs_service" "app" {
  name            = "${local.prefix}-app"
  desired_count   = local.is_ephemeral ? 1 : 3
  # PR environments need only one replica
}

# DNS entry for the preview URL
resource "aws_route53_record" "preview" {
  count   = local.is_ephemeral ? 1 : 0
  zone_id = data.aws_route53_zone.preview.zone_id
  name    = "${local.env_name}.preview.example.com"
  type    = "CNAME"
  records = [aws_lb.app.dns_name]
  ttl     = 60
}

output "preview_url" {
  value = local.is_ephemeral ? "https://${local.env_name}.preview.example.com" : ""
}

CI Pipeline for Ephemeral Environments

#!/bin/bash
# scripts/ephemeral-env.sh
set -euo pipefail

ACTION="$1"  # "create" or "destroy"
PR_NUM=$(echo "$GITHUB_REF" | grep -oP '\d+')
WORKSPACE="pr-${PR_NUM}"

case "$ACTION" in
  create)
    cd terraform/
    terraform workspace new "$WORKSPACE" 2>/dev/null || \
      terraform workspace select "$WORKSPACE"

    terraform apply -auto-approve \
      -var="pr_number=${PR_NUM}" \
      -var="git_sha=${GITHUB_SHA:0:8}"

    PREVIEW_URL=$(terraform output -raw preview_url)
    echo "Preview environment ready: $PREVIEW_URL"

    # Wait for health check
    for i in $(seq 1 30); do
      if curl -sf "$PREVIEW_URL/healthz" > /dev/null; then
        echo "Environment is healthy"
        break
      fi
      echo "Waiting for environment to be ready... ($i/30)"
      sleep 10
    done

    # Run tests
    cd ..
    npx playwright test --base-url "$PREVIEW_URL"
    TEST_RESULT=$?

    # Post results to PR
    if [ $TEST_RESULT -eq 0 ]; then
      gh pr comment "$PR_NUM" --body \
        "Preview: $PREVIEW_URL | Tests: PASSED"
    else
      gh pr comment "$PR_NUM" --body \
        "Preview: $PREVIEW_URL | Tests: FAILED (see CI logs)"
    fi
    ;;

  destroy)
    cd terraform/
    terraform workspace select "$WORKSPACE"
    terraform destroy -auto-approve
    terraform workspace select default
    terraform workspace delete "$WORKSPACE"
    echo "Environment pr-${PR_NUM} destroyed"
    ;;
esac

Pulumi Stacks for Ephemeral Environments

Pulumi stacks are the equivalent of Terraform workspaces but with programmatic control:

// index.ts -- Pulumi stack per PR
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";

const stack = pulumi.getStack(); // e.g., "pr-142"
const isEphemeral = stack.startsWith("pr-");

const db = new aws.rds.Instance("main", {
    instanceClass: isEphemeral ? "db.t4g.micro" : "db.r6g.xlarge",
    allocatedStorage: isEphemeral ? 20 : 500,
    skipFinalSnapshot: isEphemeral, // No snapshot needed for PR envs
    identifier: `myapp-${stack}-db`,
    engine: "postgres",
    engineVersion: "16",
});

const service = new aws.ecs.Service("app", {
    desiredCount: isEphemeral ? 1 : 3,
    // ... other configuration
});

export const dbEndpoint = db.endpoint;
export const previewUrl = isEphemeral
    ? pulumi.interpolate`https://${stack}.preview.example.com`
    : undefined;

Cost Control Strategies

Ephemeral environments can be expensive if not managed carefully. A single PR environment with an RDS instance, ECS service, and load balancer costs roughly $5-15/day. With 20 active PRs, that is $100-300/day.

Strategy	Implementation	Savings
Auto-destroy after N hours	GitHub Action cron job + `terraform destroy`	60-80%
Minimal instance sizes	Conditional sizing based on workspace name	70-90%
Shared read-only resources	Reference production VPC, DNS zone via data sources	20-30%
Spot instances for compute	`capacity_type = "SPOT"` for EKS nodes	60-70%
Scheduled scale-to-zero	Lambda that scales down PR envs outside business hours	50-60%
TTL tags on all resources	Automated cleanup of resources older than 48 hours	Prevents zombie resources

Auto-Destroy Cron Job

# .github/workflows/cleanup-ephemeral.yml
name: Cleanup Stale Ephemeral Environments
on:
  schedule:
    - cron: '0 2 * * *'  # Daily at 2 AM UTC
  workflow_dispatch:

jobs:
  cleanup:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Find stale environments
        run: |
          cd terraform/
          terraform workspace list | grep "pr-" | while read ws; do
            WS_NAME=$(echo "$ws" | tr -d ' *')
            PR_NUM=$(echo "$WS_NAME" | grep -oP '\d+')

            # Check if the PR is still open
            PR_STATE=$(gh pr view "$PR_NUM" --json state -q '.state' 2>/dev/null || echo "UNKNOWN")

            if [ "$PR_STATE" != "OPEN" ]; then
              echo "Destroying stale environment: $WS_NAME (PR state: $PR_STATE)"
              terraform workspace select "$WS_NAME"
              terraform destroy -auto-approve
              terraform workspace select default
              terraform workspace delete "$WS_NAME"
            fi
          done

Resource TTL Tags

# Add TTL tags to all ephemeral resources
locals {
  common_tags = merge(
    {
      Team        = "platform"
      Environment = local.env_name
      ManagedBy   = "terraform"
    },
    local.is_ephemeral ? {
      EphemeralTTL = timeadd(timestamp(), "48h")
      PRNumber     = var.pr_number
    } : {}
  )
}

Database Seeding for Ephemeral Environments

Ephemeral environments need data to be useful. Options from fastest to most realistic:

Approach	Speed	Realism	Best For
Empty schema only	Seconds	Low	API testing with test data creation
Fixture data (SQL scripts)	Seconds	Medium	Predictable test scenarios
Anonymized production snapshot	Minutes	High	Realistic testing, demo environments
Production read replica	Fast (no copy)	Highest	Read-only testing against real data

# Seed ephemeral database from fixture files
psql "$DATABASE_URL" < fixtures/schema.sql
psql "$DATABASE_URL" < fixtures/test-data.sql

Ephemeral environments transform testing confidence because they eliminate the "works in staging" problem. When every PR gets its own infrastructure, you know exactly what changed and whether it works.