Data Manipulation
Setting up test data directly in the database is faster than going through the UI or API. INSERT, UPDATE, and DELETE let you create precise test states, manipulate data for edge-case testing, and clean up after test runs. But with power comes responsibility — unsafe data manipulation in test environments can corrupt shared resources.
INSERT: Creating Test Data
-- Create a test user with specific attributes
INSERT INTO users (id, name, email, role, created_at)
VALUES ('test-uuid-001', 'Test User', 'testuser@automation.com', 'admin',
NOW() - INTERVAL '90 days');
-- Create multiple orders for the test user
INSERT INTO orders (id, user_id, status, total, created_at) VALUES
('order-001', 'test-uuid-001', 'completed', 99.99, NOW() - INTERVAL '30 days'),
('order-002', 'test-uuid-001', 'pending', 149.50, NOW() - INTERVAL '1 day'),
('order-003', 'test-uuid-001', 'cancelled', 75.00, NOW() - INTERVAL '15 days');
-- Create line items for an order
INSERT INTO line_items (order_id, product_id, quantity, unit_price) VALUES
('order-001', 'prod-a', 2, 25.00),
('order-001', 'prod-b', 1, 49.99);
Why Direct DB Setup Is Faster
| Method | Time to Create User + 3 Orders | Dependencies |
|---|---|---|
| UI clicks | Minutes | Browser, page loads, forms |
| API calls | Seconds | Auth flow, business logic |
| Direct SQL | Milliseconds | Database connection only |
For complex test scenarios (user with 100 orders, specific date distributions, edge-case data), SQL is orders of magnitude faster.
UPDATE: Modifying Test State
-- Set a user's password to expired (test expired password flow)
UPDATE users
SET password_expires_at = NOW() - INTERVAL '1 day'
WHERE email = 'testuser@automation.com';
-- Set an order to a specific status (test status-dependent logic)
UPDATE orders
SET status = 'shipped', shipped_at = NOW() - INTERVAL '2 hours'
WHERE id = 'order-001';
-- Simulate a user who has been inactive for 6 months
UPDATE users
SET last_login_at = NOW() - INTERVAL '6 months'
WHERE email = 'testuser@automation.com';
DELETE: Cleaning Up
-- Cleanup in reverse dependency order (child records first)
DELETE FROM line_items WHERE order_id IN (
SELECT id FROM orders WHERE user_id = 'test-uuid-001'
);
DELETE FROM orders WHERE user_id = 'test-uuid-001';
DELETE FROM users WHERE id = 'test-uuid-001';
Why Dependency Order Matters
Foreign key constraints prevent deleting a parent record while child records exist. If you try to delete a user who has orders, the database will reject the deletion. Always delete in reverse dependency order: line_items first, then orders, then users.
Transactions: The Safety Net
Transactions ensure that test data changes are atomic (all or nothing) and can be rolled back.
Transaction for Test Setup
-- Everything succeeds or nothing does
BEGIN;
INSERT INTO users (id, name, email, role)
VALUES ('test-uuid-002', 'Transaction User', 'txn@test.com', 'viewer');
INSERT INTO orders (id, user_id, status, total)
VALUES ('order-txn', 'test-uuid-002', 'pending', 50.00);
-- If either INSERT fails, ROLLBACK undoes everything
COMMIT;
Transaction-Based Test Isolation
The most powerful pattern for test data management: wrap each test in a transaction and roll it back afterward. The database returns to its original state after every test.
@pytest.fixture
def db_transaction(db):
"""Wrap each test in a transaction that rolls back."""
db.autocommit = False
yield db
db.rollback() # All changes are undone after each test
def test_user_creation(db_transaction, api_client):
api_client.post("/users", json={"name": "Alice", "email": "alice@test.com"})
cursor = db_transaction.cursor()
cursor.execute("SELECT * FROM users WHERE email = 'alice@test.com'")
assert cursor.fetchone() is not None
# After the test, db.rollback() removes the user automatically
def test_another_test(db_transaction):
# This test starts with a clean database — no Alice from the previous test
cursor = db_transaction.cursor()
cursor.execute("SELECT * FROM users WHERE email = 'alice@test.com'")
assert cursor.fetchone() is None # Alice does not exist
Transaction Limitations
| Scenario | Transaction Rollback Works? |
|---|---|
| Test creates data via direct SQL | Yes |
| Test creates data via API call to same DB | Yes (if API uses same connection) |
| Test creates data via API call to separate service | No (different connection/transaction) |
| Test modifies external systems (S3, email) | No (cannot rollback external side effects) |
For API tests where rollback is not possible, use cleanup fixtures instead:
@pytest.fixture
def create_user(api_client):
created_ids = []
def _create(**kwargs):
r = api_client.post("/users", json=kwargs)
user = r.json()
created_ids.append(user["id"])
return user
yield _create
for uid in created_ids:
api_client.delete(f"/users/{uid}")
Safe Practices
| Practice | Why |
|---|---|
| Use transactions with rollback | BEGIN; ... ROLLBACK; — test data never persists |
| Use identifiable prefixes | test-*, automation-* — easy to find and clean up |
| Never run destructive SQL against production | Use read-only credentials for prod verification |
| Clean up in reverse dependency order | Avoid foreign key constraint violations |
| Use unique identifiers | Prevent collisions between parallel test runs |
Identifiable Test Data
-- Prefix all test data for easy identification and cleanup
INSERT INTO users (id, name, email) VALUES
('test-auto-001', 'AUTO Test User 1', 'auto-test-1@automation.test'),
('test-auto-002', 'AUTO Test User 2', 'auto-test-2@automation.test');
-- Emergency cleanup: find and remove all automation test data
DELETE FROM orders WHERE user_id LIKE 'test-auto-%';
DELETE FROM users WHERE id LIKE 'test-auto-%';
-- OR
DELETE FROM users WHERE email LIKE '%@automation.test';
Read-Only Production Access
@pytest.fixture(scope="session")
def prod_db():
"""Read-only connection to production for verification queries only."""
conn = psycopg2.connect(
host=os.environ["PROD_DB_HOST"],
dbname="production",
user="readonly_user", # Read-only database user
password=os.environ["PROD_DB_READONLY_PASS"],
options="-c default_transaction_read_only=on" # Extra safety
)
yield conn
conn.close()
Bulk Data Generation
For performance testing or large data set scenarios:
-- Generate 1000 test users
INSERT INTO users (id, name, email, role, created_at)
SELECT
'perf-test-' || generate_series AS id,
'Perf User ' || generate_series AS name,
'perftest' || generate_series || '@test.com' AS email,
CASE WHEN generate_series % 3 = 0 THEN 'admin'
WHEN generate_series % 3 = 1 THEN 'editor'
ELSE 'viewer' END AS role,
NOW() - (generate_series || ' days')::INTERVAL AS created_at
FROM generate_series(1, 1000);
Practical Exercise
- Write INSERT statements to create a user with 3 orders and 2 line items per order
- Write a cleanup script that deletes all test data in the correct dependency order
- Create a pytest fixture that wraps tests in transactions with rollback
- Write a test that creates data via API and verifies it in the database, then verify the rollback cleaned it up
- Write a bulk data generation script that creates 500 users with random roles
Key Takeaways
- Direct SQL is the fastest way to create test data
- Always clean up in reverse dependency order (children before parents)
- Transaction rollback is the cleanest test isolation pattern
- Use identifiable prefixes for test data (
test-*,auto-*) - Never run write operations against production — use read-only credentials
- For API-created data that cannot be rolled back, use cleanup fixtures