JOINs and Aggregation
When you find a bug, SQL helps you investigate scope, patterns, and root cause. JOINs combine data across tables. GROUP BY and HAVING aggregate data to reveal patterns. Together, they are your most powerful investigation tools.
JOIN Types
| JOIN Type | Returns | Use Case |
|---|---|---|
INNER JOIN |
Only matching rows from both tables | Orders with their items |
LEFT JOIN |
All rows from left, matching from right (NULL if none) | Users who may or may not have orders |
RIGHT JOIN |
All rows from right, matching from left (NULL if none) | Less common; same as LEFT JOIN with tables swapped |
FULL OUTER JOIN |
All rows from both tables | Orphaned records on both sides |
CROSS JOIN |
Every combination of rows | Generating test data combinations |
Visual Understanding
INNER JOIN: Only where both tables have matching data
LEFT JOIN: Everything from left table + matching from right
RIGHT JOIN: Everything from right table + matching from left
FULL OUTER: Everything from both, NULLs where no match
Common JOIN Patterns for QA
Finding Orphaned Records
-- Users who registered but never placed an order
SELECT u.id, u.email, u.created_at
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.id IS NULL;
-- Line items without a parent order (data integrity issue)
SELECT li.id, li.order_id, li.product_name
FROM line_items li
LEFT JOIN orders o ON o.id = li.order_id
WHERE o.id IS NULL;
-- Orders without line items (should never happen)
SELECT o.id, o.total, o.created_at
FROM orders o
LEFT JOIN line_items li ON li.order_id = o.id
WHERE li.id IS NULL;
Cross-Table Verification
-- Verify user's order count matches what the profile API shows
SELECT u.id, u.email, COUNT(o.id) AS actual_order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.email = 'alice@test.com'
GROUP BY u.id, u.email;
-- Compare API response "total_spent" with database reality
SELECT u.id, u.email, COALESCE(SUM(o.total), 0) AS actual_total_spent
FROM users u
LEFT JOIN orders o ON o.user_id = u.id AND o.status = 'completed'
WHERE u.email = 'alice@test.com'
GROUP BY u.id, u.email;
Multi-Table Investigation
-- Full order details: user, order, items, payment
SELECT
u.email AS customer,
o.id AS order_id,
o.status AS order_status,
li.product_name,
li.quantity,
li.unit_price,
p.method AS payment_method,
p.status AS payment_status
FROM orders o
JOIN users u ON u.id = o.user_id
JOIN line_items li ON li.order_id = o.id
LEFT JOIN payments p ON p.order_id = o.id
WHERE o.id = 'order-42';
GROUP BY and HAVING
GROUP BY collapses rows into groups. HAVING filters groups (like WHERE, but for aggregated data).
Detecting Duplicates
-- Duplicate emails (data integrity issue)
SELECT email, COUNT(*) AS count
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
-- Duplicate orders for the same user within 1 minute (possible double-click bug)
SELECT user_id, COUNT(*) AS order_count, MIN(created_at), MAX(created_at)
FROM orders
WHERE created_at > NOW() - INTERVAL '1 hour'
GROUP BY user_id
HAVING COUNT(*) > 1
AND MAX(created_at) - MIN(created_at) < INTERVAL '1 minute';
Error Analysis
-- Error distribution in last 24 hours
SELECT error_code, COUNT(*) AS occurrences,
MIN(created_at) AS first_seen,
MAX(created_at) AS last_seen
FROM error_logs
WHERE created_at > NOW() - INTERVAL '24 hours'
GROUP BY error_code
ORDER BY occurrences DESC;
-- Errors by hour (find peak error times)
SELECT DATE_TRUNC('hour', created_at) AS hour,
COUNT(*) AS error_count
FROM error_logs
WHERE created_at > NOW() - INTERVAL '7 days'
GROUP BY DATE_TRUNC('hour', created_at)
ORDER BY hour;
-- Error rate by endpoint
SELECT endpoint, COUNT(*) AS total_requests,
SUM(CASE WHEN status_code >= 500 THEN 1 ELSE 0 END) AS errors,
ROUND(100.0 * SUM(CASE WHEN status_code >= 500 THEN 1 ELSE 0 END) / COUNT(*), 2) AS error_rate
FROM request_logs
WHERE created_at > NOW() - INTERVAL '24 hours'
GROUP BY endpoint
HAVING SUM(CASE WHEN status_code >= 500 THEN 1 ELSE 0 END) > 0
ORDER BY error_rate DESC;
Business Metrics
-- Orders per status (verify expected distribution)
SELECT status, COUNT(*) AS count
FROM orders
GROUP BY status
ORDER BY count DESC;
-- Average order value by month
SELECT DATE_TRUNC('month', created_at) AS month,
COUNT(*) AS order_count,
ROUND(AVG(total), 2) AS avg_order_value,
ROUND(SUM(total), 2) AS total_revenue
FROM orders
WHERE status = 'completed'
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;
-- Top 10 customers by order count
SELECT u.email, COUNT(o.id) AS order_count, SUM(o.total) AS total_spent
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE o.status = 'completed'
GROUP BY u.email
ORDER BY order_count DESC
LIMIT 10;
Subqueries
-- Users who have placed orders above the average order value
SELECT u.email, o.total
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE o.total > (SELECT AVG(total) FROM orders WHERE status = 'completed');
-- Products that have never been ordered
SELECT p.name
FROM products p
WHERE p.id NOT IN (
SELECT DISTINCT product_id FROM line_items
);
-- Most recent order for each user
SELECT u.email, o.id, o.total, o.created_at
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE o.created_at = (
SELECT MAX(o2.created_at)
FROM orders o2
WHERE o2.user_id = u.id
);
Using JOINs in Test Automation
def test_order_total_matches_line_items(db, api_client):
"""After creating an order, verify total equals sum of line items."""
# Create order via API
order = api_client.post("/orders", json={
"items": [
{"product_id": 1, "quantity": 2},
{"product_id": 2, "quantity": 1}
]
}).json()
# Verify in database
cursor = db.cursor()
cursor.execute("""
SELECT o.total, SUM(li.quantity * li.unit_price) AS calculated
FROM orders o
JOIN line_items li ON li.order_id = o.id
WHERE o.id = %s
GROUP BY o.total
""", (order["id"],))
row = cursor.fetchone()
assert row is not None
assert row[0] == row[1], f"Order total {row[0]} != calculated {row[1]}"
def test_no_orphaned_line_items(db):
"""No line items should exist without a parent order."""
cursor = db.cursor()
cursor.execute("""
SELECT COUNT(*)
FROM line_items li
LEFT JOIN orders o ON o.id = li.order_id
WHERE o.id IS NULL
""")
orphan_count = cursor.fetchone()[0]
assert orphan_count == 0, f"Found {orphan_count} orphaned line items"
Practical Exercise
- Write a query that finds users who registered but never placed an order
- Write a query that finds duplicate email addresses in the users table
- Write a query that shows error distribution by hour for the last 7 days
- Write a query that finds orders where the total does not match the sum of line items
- Integrate one of these queries into a pytest test
Key Takeaways
- LEFT JOIN finds orphaned records and missing relationships
- GROUP BY + HAVING reveals duplicates, patterns, and distributions
- Subqueries answer "compared to what?" questions (above average, never ordered)
- Use JOINs in test automation to cross-validate API data against the database
- Error analysis queries are invaluable for post-incident investigation