Data Manipulation

Setting up test data directly in the database is faster than going through the UI or API. INSERT, UPDATE, and DELETE let you create precise test states, manipulate data for edge-case testing, and clean up after test runs. But with power comes responsibility — unsafe data manipulation in test environments can corrupt shared resources.

INSERT: Creating Test Data

-- Create a test user with specific attributes
INSERT INTO users (id, name, email, role, created_at)
VALUES ('test-uuid-001', 'Test User', 'testuser@automation.com', 'admin',
        NOW() - INTERVAL '90 days');

-- Create multiple orders for the test user
INSERT INTO orders (id, user_id, status, total, created_at) VALUES
    ('order-001', 'test-uuid-001', 'completed', 99.99, NOW() - INTERVAL '30 days'),
    ('order-002', 'test-uuid-001', 'pending',  149.50, NOW() - INTERVAL '1 day'),
    ('order-003', 'test-uuid-001', 'cancelled', 75.00, NOW() - INTERVAL '15 days');

-- Create line items for an order
INSERT INTO line_items (order_id, product_id, quantity, unit_price) VALUES
    ('order-001', 'prod-a', 2, 25.00),
    ('order-001', 'prod-b', 1, 49.99);

Why Direct DB Setup Is Faster

Method	Time to Create User + 3 Orders	Dependencies
UI clicks	Minutes	Browser, page loads, forms
API calls	Seconds	Auth flow, business logic
Direct SQL	Milliseconds	Database connection only

For complex test scenarios (user with 100 orders, specific date distributions, edge-case data), SQL is orders of magnitude faster.

UPDATE: Modifying Test State

-- Set a user's password to expired (test expired password flow)
UPDATE users
SET password_expires_at = NOW() - INTERVAL '1 day'
WHERE email = 'testuser@automation.com';

-- Set an order to a specific status (test status-dependent logic)
UPDATE orders
SET status = 'shipped', shipped_at = NOW() - INTERVAL '2 hours'
WHERE id = 'order-001';

-- Simulate a user who has been inactive for 6 months
UPDATE users
SET last_login_at = NOW() - INTERVAL '6 months'
WHERE email = 'testuser@automation.com';

DELETE: Cleaning Up

-- Cleanup in reverse dependency order (child records first)
DELETE FROM line_items WHERE order_id IN (
    SELECT id FROM orders WHERE user_id = 'test-uuid-001'
);
DELETE FROM orders WHERE user_id = 'test-uuid-001';
DELETE FROM users WHERE id = 'test-uuid-001';

Why Dependency Order Matters

Foreign key constraints prevent deleting a parent record while child records exist. If you try to delete a user who has orders, the database will reject the deletion. Always delete in reverse dependency order: line_items first, then orders, then users.

Transactions: The Safety Net

Transactions ensure that test data changes are atomic (all or nothing) and can be rolled back.

Transaction for Test Setup

-- Everything succeeds or nothing does
BEGIN;

INSERT INTO users (id, name, email, role)
VALUES ('test-uuid-002', 'Transaction User', 'txn@test.com', 'viewer');

INSERT INTO orders (id, user_id, status, total)
VALUES ('order-txn', 'test-uuid-002', 'pending', 50.00);

-- If either INSERT fails, ROLLBACK undoes everything
COMMIT;

Transaction-Based Test Isolation

The most powerful pattern for test data management: wrap each test in a transaction and roll it back afterward. The database returns to its original state after every test.

@pytest.fixture
def db_transaction(db):
    """Wrap each test in a transaction that rolls back."""
    db.autocommit = False
    yield db
    db.rollback()  # All changes are undone after each test

def test_user_creation(db_transaction, api_client):
    api_client.post("/users", json={"name": "Alice", "email": "alice@test.com"})

    cursor = db_transaction.cursor()
    cursor.execute("SELECT * FROM users WHERE email = 'alice@test.com'")
    assert cursor.fetchone() is not None
    # After the test, db.rollback() removes the user automatically

def test_another_test(db_transaction):
    # This test starts with a clean database — no Alice from the previous test
    cursor = db_transaction.cursor()
    cursor.execute("SELECT * FROM users WHERE email = 'alice@test.com'")
    assert cursor.fetchone() is None  # Alice does not exist

Transaction Limitations

Scenario	Transaction Rollback Works?
Test creates data via direct SQL	Yes
Test creates data via API call to same DB	Yes (if API uses same connection)
Test creates data via API call to separate service	No (different connection/transaction)
Test modifies external systems (S3, email)	No (cannot rollback external side effects)

For API tests where rollback is not possible, use cleanup fixtures instead:

@pytest.fixture
def create_user(api_client):
    created_ids = []

    def _create(**kwargs):
        r = api_client.post("/users", json=kwargs)
        user = r.json()
        created_ids.append(user["id"])
        return user

    yield _create

    for uid in created_ids:
        api_client.delete(f"/users/{uid}")

Safe Practices

Practice	Why
Use transactions with rollback	`BEGIN; ... ROLLBACK;` — test data never persists
Use identifiable prefixes	`test-`, `automation-` — easy to find and clean up
Never run destructive SQL against production	Use read-only credentials for prod verification
Clean up in reverse dependency order	Avoid foreign key constraint violations
Use unique identifiers	Prevent collisions between parallel test runs

Identifiable Test Data

-- Prefix all test data for easy identification and cleanup
INSERT INTO users (id, name, email) VALUES
    ('test-auto-001', 'AUTO Test User 1', 'auto-test-1@automation.test'),
    ('test-auto-002', 'AUTO Test User 2', 'auto-test-2@automation.test');

-- Emergency cleanup: find and remove all automation test data
DELETE FROM orders WHERE user_id LIKE 'test-auto-%';
DELETE FROM users WHERE id LIKE 'test-auto-%';
-- OR
DELETE FROM users WHERE email LIKE '%@automation.test';

Read-Only Production Access

@pytest.fixture(scope="session")
def prod_db():
    """Read-only connection to production for verification queries only."""
    conn = psycopg2.connect(
        host=os.environ["PROD_DB_HOST"],
        dbname="production",
        user="readonly_user",      # Read-only database user
        password=os.environ["PROD_DB_READONLY_PASS"],
        options="-c default_transaction_read_only=on"  # Extra safety
    )
    yield conn
    conn.close()

Bulk Data Generation

For performance testing or large data set scenarios:

-- Generate 1000 test users
INSERT INTO users (id, name, email, role, created_at)
SELECT
    'perf-test-' || generate_series AS id,
    'Perf User ' || generate_series AS name,
    'perftest' || generate_series || '@test.com' AS email,
    CASE WHEN generate_series % 3 = 0 THEN 'admin'
         WHEN generate_series % 3 = 1 THEN 'editor'
         ELSE 'viewer' END AS role,
    NOW() - (generate_series || ' days')::INTERVAL AS created_at
FROM generate_series(1, 1000);

Practical Exercise

Write INSERT statements to create a user with 3 orders and 2 line items per order
Write a cleanup script that deletes all test data in the correct dependency order
Create a pytest fixture that wraps tests in transactions with rollback
Write a test that creates data via API and verifies it in the database, then verify the rollback cleaned it up
Write a bulk data generation script that creates 500 users with random roles

Key Takeaways

Direct SQL is the fastest way to create test data
Always clean up in reverse dependency order (children before parents)
Transaction rollback is the cleanest test isolation pattern
Use identifiable prefixes for test data (test-*, auto-*)
Never run write operations against production — use read-only credentials
For API-created data that cannot be rolled back, use cleanup fixtures