Ephemeral Environments
What Ephemeral Environments Are
Ephemeral environments are disposable copies of your entire infrastructure stack, spun up for each pull request and torn down after merge. Instead of sharing a single staging environment across all developers (with its inevitable "who broke staging?" moments), each PR gets its own isolated world.
The developer pushes a PR. CI creates a complete environment -- VPC, database, queues, app containers, API gateway. Tests run against this real infrastructure at a unique preview URL. The PR comment shows test results and a link to the preview. When the PR merges or closes, everything is destroyed.
Architecture
PR #142 opened
|
+-- Terraform workspace: pr-142
| +-- VPC (isolated)
| +-- RDS (small instance, seeded test data)
| +-- ECS service (app containers)
| +-- S3 buckets (prefixed pr-142-)
| +-- API Gateway (pr-142.preview.example.com)
|
+-- Run full test suite against pr-142.preview.example.com
| +-- Unit tests (already passed in CI)
| +-- Integration tests (real database, real queues)
| +-- E2E browser tests (Playwright against preview URL)
| +-- Security scan (DAST against live endpoints)
|
+-- PR comment with test results + preview URL
|
PR #142 merged
|
+-- terraform workspace select pr-142 && terraform destroy -auto-approve
Terraform Workspaces for Ephemeral Environments
Terraform workspaces provide lightweight isolation. Each workspace maintains its own state file, so resources created in one workspace do not affect another.
Workspace-Aware Configuration
# main.tf -- workspace-aware naming
locals {
env_name = terraform.workspace == "default" ? "production" : terraform.workspace
prefix = "myapp-${local.env_name}"
# Use smaller instances for PR environments
is_ephemeral = terraform.workspace != "default"
}
resource "aws_s3_bucket" "data" {
bucket = "${local.prefix}-data"
# Each PR gets its own bucket: myapp-pr-142-data
}
resource "aws_db_instance" "main" {
identifier = "${local.prefix}-db"
instance_class = local.is_ephemeral ? "db.t4g.micro" : "db.r6g.xlarge"
allocated_storage = local.is_ephemeral ? 20 : 500
# PR environments use minimal instance sizes to control cost
# Skip final snapshot for ephemeral environments
skip_final_snapshot = local.is_ephemeral
# Use a smaller backup retention for ephemeral
backup_retention_period = local.is_ephemeral ? 0 : 7
}
resource "aws_ecs_service" "app" {
name = "${local.prefix}-app"
desired_count = local.is_ephemeral ? 1 : 3
# PR environments need only one replica
}
# DNS entry for the preview URL
resource "aws_route53_record" "preview" {
count = local.is_ephemeral ? 1 : 0
zone_id = data.aws_route53_zone.preview.zone_id
name = "${local.env_name}.preview.example.com"
type = "CNAME"
records = [aws_lb.app.dns_name]
ttl = 60
}
output "preview_url" {
value = local.is_ephemeral ? "https://${local.env_name}.preview.example.com" : ""
}
CI Pipeline for Ephemeral Environments
#!/bin/bash
# scripts/ephemeral-env.sh
set -euo pipefail
ACTION="$1" # "create" or "destroy"
PR_NUM=$(echo "$GITHUB_REF" | grep -oP '\d+')
WORKSPACE="pr-${PR_NUM}"
case "$ACTION" in
create)
cd terraform/
terraform workspace new "$WORKSPACE" 2>/dev/null || \
terraform workspace select "$WORKSPACE"
terraform apply -auto-approve \
-var="pr_number=${PR_NUM}" \
-var="git_sha=${GITHUB_SHA:0:8}"
PREVIEW_URL=$(terraform output -raw preview_url)
echo "Preview environment ready: $PREVIEW_URL"
# Wait for health check
for i in $(seq 1 30); do
if curl -sf "$PREVIEW_URL/healthz" > /dev/null; then
echo "Environment is healthy"
break
fi
echo "Waiting for environment to be ready... ($i/30)"
sleep 10
done
# Run tests
cd ..
npx playwright test --base-url "$PREVIEW_URL"
TEST_RESULT=$?
# Post results to PR
if [ $TEST_RESULT -eq 0 ]; then
gh pr comment "$PR_NUM" --body \
"Preview: $PREVIEW_URL | Tests: PASSED"
else
gh pr comment "$PR_NUM" --body \
"Preview: $PREVIEW_URL | Tests: FAILED (see CI logs)"
fi
;;
destroy)
cd terraform/
terraform workspace select "$WORKSPACE"
terraform destroy -auto-approve
terraform workspace select default
terraform workspace delete "$WORKSPACE"
echo "Environment pr-${PR_NUM} destroyed"
;;
esac
Pulumi Stacks for Ephemeral Environments
Pulumi stacks are the equivalent of Terraform workspaces but with programmatic control:
// index.ts -- Pulumi stack per PR
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
const stack = pulumi.getStack(); // e.g., "pr-142"
const isEphemeral = stack.startsWith("pr-");
const db = new aws.rds.Instance("main", {
instanceClass: isEphemeral ? "db.t4g.micro" : "db.r6g.xlarge",
allocatedStorage: isEphemeral ? 20 : 500,
skipFinalSnapshot: isEphemeral, // No snapshot needed for PR envs
identifier: `myapp-${stack}-db`,
engine: "postgres",
engineVersion: "16",
});
const service = new aws.ecs.Service("app", {
desiredCount: isEphemeral ? 1 : 3,
// ... other configuration
});
export const dbEndpoint = db.endpoint;
export const previewUrl = isEphemeral
? pulumi.interpolate`https://${stack}.preview.example.com`
: undefined;
Cost Control Strategies
Ephemeral environments can be expensive if not managed carefully. A single PR environment with an RDS instance, ECS service, and load balancer costs roughly $5-15/day. With 20 active PRs, that is $100-300/day.
| Strategy | Implementation | Savings |
|---|---|---|
| Auto-destroy after N hours | GitHub Action cron job + terraform destroy |
60-80% |
| Minimal instance sizes | Conditional sizing based on workspace name | 70-90% |
| Shared read-only resources | Reference production VPC, DNS zone via data sources | 20-30% |
| Spot instances for compute | capacity_type = "SPOT" for EKS nodes |
60-70% |
| Scheduled scale-to-zero | Lambda that scales down PR envs outside business hours | 50-60% |
| TTL tags on all resources | Automated cleanup of resources older than 48 hours | Prevents zombie resources |
Auto-Destroy Cron Job
# .github/workflows/cleanup-ephemeral.yml
name: Cleanup Stale Ephemeral Environments
on:
schedule:
- cron: '0 2 * * *' # Daily at 2 AM UTC
workflow_dispatch:
jobs:
cleanup:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Find stale environments
run: |
cd terraform/
terraform workspace list | grep "pr-" | while read ws; do
WS_NAME=$(echo "$ws" | tr -d ' *')
PR_NUM=$(echo "$WS_NAME" | grep -oP '\d+')
# Check if the PR is still open
PR_STATE=$(gh pr view "$PR_NUM" --json state -q '.state' 2>/dev/null || echo "UNKNOWN")
if [ "$PR_STATE" != "OPEN" ]; then
echo "Destroying stale environment: $WS_NAME (PR state: $PR_STATE)"
terraform workspace select "$WS_NAME"
terraform destroy -auto-approve
terraform workspace select default
terraform workspace delete "$WS_NAME"
fi
done
Resource TTL Tags
# Add TTL tags to all ephemeral resources
locals {
common_tags = merge(
{
Team = "platform"
Environment = local.env_name
ManagedBy = "terraform"
},
local.is_ephemeral ? {
EphemeralTTL = timeadd(timestamp(), "48h")
PRNumber = var.pr_number
} : {}
)
}
Database Seeding for Ephemeral Environments
Ephemeral environments need data to be useful. Options from fastest to most realistic:
| Approach | Speed | Realism | Best For |
|---|---|---|---|
| Empty schema only | Seconds | Low | API testing with test data creation |
| Fixture data (SQL scripts) | Seconds | Medium | Predictable test scenarios |
| Anonymized production snapshot | Minutes | High | Realistic testing, demo environments |
| Production read replica | Fast (no copy) | Highest | Read-only testing against real data |
# Seed ephemeral database from fixture files
psql "$DATABASE_URL" < fixtures/schema.sql
psql "$DATABASE_URL" < fixtures/test-data.sql
Ephemeral environments transform testing confidence because they eliminate the "works in staging" problem. When every PR gets its own infrastructure, you know exactly what changed and whether it works.