Migration Testing

Schema migrations are one of the riskiest deployment operations. A bad migration can cause data loss, corruption, or extended downtime. Testing migrations before they reach production is essential — and often overlooked.

What Makes Migrations Risky

Risk	Example	Impact
Data loss	Dropping a column that still has data	Irreversible data destruction
Data corruption	Type conversion truncates values	Silent data modification
Downtime	Adding an index on a large table without `CONCURRENTLY`	Table locked for minutes
Failed rollback	Rollback script does not restore original schema	Cannot undo a bad migration
Default value bug	New NOT NULL column without default — existing rows fail	Application errors
Foreign key break	Adding FK to column with orphaned data	Migration fails

Migration Testing Checklist

Test	How
Migration applies cleanly	Run on a copy of production schema
Rollback works	Apply, roll back — schema returns to original state
Data is preserved	Insert known data before migration, verify after
Default values	New NOT NULL column must have a default — verify existing rows
Performance	Adding index on large table must not lock for minutes
Application compatibility	New schema works with both old and new application code

Testing Migrations with Scripts

#!/bin/bash
set -euo pipefail

echo "=== Creating test database from production schema ==="
pg_dump --schema-only production_db > schema.sql
createdb migration_test && psql migration_test < schema.sql

echo "=== Loading seed data ==="
psql migration_test < test_seed_data.sql

echo "=== Recording pre-migration state ==="
psql migration_test -c "SELECT COUNT(*) FROM users" > pre_counts.txt
psql migration_test -c "\d users" > pre_schema.txt

echo "=== Applying migration ==="
psql migration_test < migrations/0042_add_phone_column.sql

echo "=== Verifying migration ==="
psql migration_test -c "\d users"                        # Verify new column exists
psql migration_test -c "SELECT COUNT(*) FROM users"      # Verify no rows lost
psql migration_test -c "SELECT phone FROM users LIMIT 5" # Verify new column accessible

echo "=== Testing rollback ==="
psql migration_test < migrations/0042_rollback.sql

echo "=== Verifying rollback ==="
psql migration_test -c "\d users"                        # Verify column removed
psql migration_test -c "SELECT COUNT(*) FROM users"      # Verify no rows lost

echo "=== Cleanup ==="
dropdb migration_test

echo "=== Migration test PASSED ==="

Testing Migrations in Python

import subprocess
import psycopg2
import pytest

@pytest.fixture
def migration_db():
    """Create a disposable database for migration testing."""
    db_name = "migration_test_" + str(int(time.time()))

    # Create database from production schema
    subprocess.run(["createdb", db_name], check=True)
    subprocess.run(
        f"pg_dump --schema-only production_db | psql {db_name}",
        shell=True, check=True
    )

    conn = psycopg2.connect(dbname=db_name)
    yield conn, db_name

    conn.close()
    subprocess.run(["dropdb", db_name], check=True)

def test_migration_applies_cleanly(migration_db):
    conn, db_name = migration_db
    # Apply migration
    subprocess.run(
        f"psql {db_name} < migrations/0042_add_phone_column.sql",
        shell=True, check=True
    )
    # Verify new column exists
    cursor = conn.cursor()
    cursor.execute("""
        SELECT column_name, data_type, is_nullable
        FROM information_schema.columns
        WHERE table_name = 'users' AND column_name = 'phone'
    """)
    row = cursor.fetchone()
    assert row is not None, "Phone column was not created"
    assert row[1] == "character varying"  # Expected data type

def test_migration_preserves_data(migration_db):
    conn, db_name = migration_db
    cursor = conn.cursor()

    # Insert known data before migration
    cursor.execute("""
        INSERT INTO users (name, email) VALUES
        ('Alice', 'alice@test.com'),
        ('Bob', 'bob@test.com')
    """)
    conn.commit()

    # Apply migration
    subprocess.run(
        f"psql {db_name} < migrations/0042_add_phone_column.sql",
        shell=True, check=True
    )

    # Verify data is preserved
    conn = psycopg2.connect(dbname=db_name)  # Reconnect
    cursor = conn.cursor()
    cursor.execute("SELECT name, email FROM users ORDER BY name")
    rows = cursor.fetchall()
    assert len(rows) == 2
    assert rows[0][0] == "Alice"
    assert rows[1][0] == "Bob"

def test_migration_rollback(migration_db):
    conn, db_name = migration_db

    # Apply migration
    subprocess.run(
        f"psql {db_name} < migrations/0042_add_phone_column.sql",
        shell=True, check=True
    )

    # Rollback
    subprocess.run(
        f"psql {db_name} < migrations/0042_rollback.sql",
        shell=True, check=True
    )

    # Verify column is removed
    conn = psycopg2.connect(dbname=db_name)
    cursor = conn.cursor()
    cursor.execute("""
        SELECT column_name FROM information_schema.columns
        WHERE table_name = 'users' AND column_name = 'phone'
    """)
    assert cursor.fetchone() is None, "Phone column should not exist after rollback"

def test_not_null_column_has_default(migration_db):
    conn, db_name = migration_db
    cursor = conn.cursor()

    # Insert data before migration
    cursor.execute("INSERT INTO users (name, email) VALUES ('Test', 'test@test.com')")
    conn.commit()

    # Apply migration that adds NOT NULL column
    subprocess.run(
        f"psql {db_name} < migrations/0043_add_status_column.sql",
        shell=True, check=True
    )

    # Verify existing rows have the default value
    conn = psycopg2.connect(dbname=db_name)
    cursor = conn.cursor()
    cursor.execute("SELECT status FROM users WHERE email = 'test@test.com'")
    status = cursor.fetchone()[0]
    assert status is not None, "Existing rows should have default status value"
    assert status == "active", "Default status should be 'active'"

Performance Considerations

Index Creation

Adding an index to a table with millions of rows can lock the table for minutes:

-- BAD: locks the table during index creation
CREATE INDEX idx_users_email ON users (email);

-- GOOD: creates the index without locking (PostgreSQL)
CREATE INDEX CONCURRENTLY idx_users_email ON users (email);

Testing Migration Performance

def test_migration_completes_in_time(migration_db):
    """Migration should complete within acceptable time."""
    import time
    conn, db_name = migration_db

    # Load realistic data volume
    cursor = conn.cursor()
    cursor.execute("""
        INSERT INTO users (name, email)
        SELECT 'User ' || i, 'user' || i || '@test.com'
        FROM generate_series(1, 100000) AS i
    """)
    conn.commit()

    start = time.time()
    subprocess.run(
        f"psql {db_name} < migrations/0042_add_phone_column.sql",
        shell=True, check=True
    )
    duration = time.time() - start

    assert duration < 30, f"Migration took {duration:.1f}s (limit: 30s)"

Migration Testing in CI

# GitHub Actions migration testing
migration-test:
  runs-on: ubuntu-latest
  services:
    postgres:
      image: postgres:15
      env:
        POSTGRES_PASSWORD: testpass
      ports:
        - 5432:5432
  steps:
    - uses: actions/checkout@v4
    - name: Load production schema
      run: psql -h localhost -U postgres -f schema_dump.sql
    - name: Run migration
      run: psql -h localhost -U postgres -f migrations/latest.sql
    - name: Verify migration
      run: pytest tests/migrations/ -v
    - name: Test rollback
      run: psql -h localhost -U postgres -f migrations/latest_rollback.sql

Practical Exercise

Take a migration file (or write one that adds a new column)
Write a script that tests: applies cleanly, preserves data, rollback works
Test what happens when the migration adds a NOT NULL column without a default to a table with existing data
Measure migration time with 100,000 rows and set a performance threshold
Add migration testing to a CI pipeline

Key Takeaways

Migrations are high-risk operations — always test on a production schema copy
Test the full lifecycle: apply, verify, rollback, verify again
Verify data preservation: known data before migration must exist after
NOT NULL columns need defaults — test with existing data
Performance testing prevents long table locks in production
Automate migration testing in CI to catch issues before deployment