Testing Pyramid in CI
Structuring Pipeline Stages Around the Test Pyramid
Not all tests belong at the same stage. The testing pyramid dictates not just what kinds of tests you write, but when they run in your pipeline. Structure your pipeline to provide fast feedback first and expensive validation later.
/\
/ \
/ E2E \ Minutes | On merge to main
/--------\
/ \
/ Integration \ Minutes | On pull request
/--------------\
/ \
/ Unit Tests \ Seconds | On every push
/--------------------\
Stage 1: On Every Push (Seconds to Minutes)
This is the fastest feedback loop. Developers get results before they context-switch away from the code they just wrote.
What runs here:
- Linting and static analysis (ESLint, Pylint, Ruff)
- Type checking (TypeScript
tsc --noEmit, mypy) - Unit tests with coverage reporting
- Formatting checks (Prettier, Black)
Why these and not others:
- They are fast (typically under 2 minutes)
- They require no external services (databases, APIs)
- They catch the most common mistakes (syntax errors, type mismatches, broken logic)
# Example: Fast feedback job
lint-and-unit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: 'npm'
- run: npm ci
- run: npm run lint
- run: npm run typecheck
- run: npm run test:unit -- --coverage
- uses: actions/upload-artifact@v4
if: always()
with:
name: coverage
path: coverage/lcov.info
Target time: Under 3 minutes. If linting and unit tests take longer than this, developers will stop waiting for them.
Stage 2: On Pull Request / Merge to Develop (Minutes)
This stage runs when code is proposed for integration. It catches issues that require real infrastructure -- databases, message queues, external service contracts.
What runs here:
- Integration tests against real databases and services
- API contract tests (Pact, Schemathesis)
- Component tests (individual services or modules tested with real dependencies)
- Security scanning (SAST tools like Snyk, Semgrep)
# Example: Integration test job with database service
integration-tests:
needs: lint-and-unit
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16
env:
POSTGRES_DB: test_db
POSTGRES_PASSWORD: testpass
ports:
- 5432:5432
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
redis:
image: redis:7
ports:
- 6379:6379
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: 'npm'
- run: npm ci
- run: npm run db:migrate
env:
DATABASE_URL: postgres://postgres:testpass@localhost:5432/test_db
- run: npm run test:integration
env:
DATABASE_URL: postgres://postgres:testpass@localhost:5432/test_db
REDIS_URL: redis://localhost:6379
Target time: Under 10 minutes. This is the stage most prone to slowness -- monitor it closely.
Stage 3: On Merge to Main / Pre-Deploy (Minutes to Tens of Minutes)
This is the final validation before code reaches users. Run the expensive, thorough tests here.
What runs here:
- Full browser test suite across multiple browsers
- Visual regression tests (Playwright visual comparisons, Percy, Chromatic)
- Performance smoke tests (Lighthouse, k6 with basic thresholds)
- Accessibility audits (axe-core, Pa11y)
# Example: Cross-browser E2E tests with sharding
browser-tests:
needs: integration-tests
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
browser: [chromium, firefox, webkit]
shard: [1, 2, 3, 4]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: 'npm'
- run: npm ci
- uses: actions/cache@v4
with:
path: ~/.cache/ms-playwright
key: playwright-${{ hashFiles('package-lock.json') }}
- run: npx playwright install --with-deps ${{ matrix.browser }}
- run: npx playwright test --project=${{ matrix.browser }} --shard=${{ matrix.shard }}/4
- uses: actions/upload-artifact@v4
if: failure()
with:
name: traces-${{ matrix.browser }}-${{ matrix.shard }}
path: test-results/
Target time: Under 20 minutes with parallelization. Without sharding, browser tests easily take 45+ minutes.
Stage 4: Post-Deploy (Continuous)
After deployment, testing does not stop. Post-deploy checks verify that the application works in the actual production environment.
What runs here:
- Smoke tests against the deployed environment (critical user flows)
- Synthetic monitoring (scheduled tests that simulate user behavior)
- Canary analysis (comparing error rates between old and new versions)
- Health check endpoints
# Example: Post-deploy smoke tests
smoke-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: 'npm'
- run: npm ci
- run: npx playwright install --with-deps chromium
- run: npx playwright test --project=smoke
env:
BASE_URL: https://production.example.com
- name: Notify on failure
if: failure()
run: |
curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
-H 'Content-Type: application/json' \
-d '{"text":"Production smoke tests failed after deploy!"}'
Common Mistakes in Pipeline Test Structure
Running Everything on Every Push
If browser tests run on every push, developers wait 20 minutes for feedback on a one-line fix. Run only fast tests on push; save expensive tests for PR and merge events.
Skipping the Integration Layer
Teams often jump from unit tests to browser tests, missing the integration layer. This means bugs in database queries, API contracts, or service interactions are only caught by slow, brittle E2E tests.
Not Failing Fast Enough
If unit tests are broken, do not waste compute running integration and browser tests. Use job dependencies (needs) to create a pipeline where each stage gates the next.
Identical Tests at Multiple Stages
If your integration tests already verify the login API, your browser tests do not need to test login through the API again. Browser tests should focus on UI behavior that cannot be tested at lower levels.
Mapping Test Types to Pipeline Stages
| Test Type | Stage | Trigger | Typical Duration | Purpose |
|---|---|---|---|---|
| Lint / Type check | 1 | Every push | 30s - 1m | Catch syntax and type errors |
| Unit tests | 1 | Every push | 30s - 3m | Verify logic in isolation |
| Integration tests | 2 | PR / merge to develop | 3 - 10m | Verify component interactions |
| Contract tests | 2 | PR / merge to develop | 1 - 5m | Verify API schemas match |
| Browser tests | 3 | Merge to main | 5 - 20m | Verify end-to-end user flows |
| Visual regression | 3 | Merge to main | 3 - 10m | Catch unintended UI changes |
| Performance tests | 3 | Merge to main | 5 - 15m | Verify response time budgets |
| Smoke tests | 4 | Post-deploy | 1 - 3m | Verify critical paths in production |
| Synthetic monitoring | 4 | Scheduled | 1 - 5m | Ongoing production health |
Hands-On Exercise
- Map your current test suite to the four pipeline stages. Which stage is each test type in? Are any stages missing?
- Measure the duration of each stage. Is any stage longer than its target time?
- Identify tests that are running at the wrong stage (e.g., slow tests on every push)
- Create a pipeline that has at least three stages with job dependencies between them
- Verify that a failure in stage 1 prevents stages 2 and 3 from running