Testing Pyramid in CI

Structuring Pipeline Stages Around the Test Pyramid

Not all tests belong at the same stage. The testing pyramid dictates not just what kinds of tests you write, but when they run in your pipeline. Structure your pipeline to provide fast feedback first and expensive validation later.

                    /\
                   /  \
                  / E2E \          Minutes   | On merge to main
                 /--------\
                /          \
               / Integration \     Minutes   | On pull request
              /--------------\
             /                \
            /    Unit Tests    \   Seconds   | On every push
           /--------------------\

Stage 1: On Every Push (Seconds to Minutes)

This is the fastest feedback loop. Developers get results before they context-switch away from the code they just wrote.

What runs here:

Linting and static analysis (ESLint, Pylint, Ruff)
Type checking (TypeScript tsc --noEmit, mypy)
Unit tests with coverage reporting
Formatting checks (Prettier, Black)

Why these and not others:

They are fast (typically under 2 minutes)
They require no external services (databases, APIs)
They catch the most common mistakes (syntax errors, type mismatches, broken logic)

# Example: Fast feedback job
lint-and-unit:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-node@v4
      with:
        node-version: 20
        cache: 'npm'
    - run: npm ci
    - run: npm run lint
    - run: npm run typecheck
    - run: npm run test:unit -- --coverage
    - uses: actions/upload-artifact@v4
      if: always()
      with:
        name: coverage
        path: coverage/lcov.info

Target time: Under 3 minutes. If linting and unit tests take longer than this, developers will stop waiting for them.

Stage 2: On Pull Request / Merge to Develop (Minutes)

This stage runs when code is proposed for integration. It catches issues that require real infrastructure -- databases, message queues, external service contracts.

What runs here:

Integration tests against real databases and services
API contract tests (Pact, Schemathesis)
Component tests (individual services or modules tested with real dependencies)
Security scanning (SAST tools like Snyk, Semgrep)

# Example: Integration test job with database service
integration-tests:
  needs: lint-and-unit
  runs-on: ubuntu-latest
  services:
    postgres:
      image: postgres:16
      env:
        POSTGRES_DB: test_db
        POSTGRES_PASSWORD: testpass
      ports:
        - 5432:5432
      options: >-
        --health-cmd pg_isready
        --health-interval 10s
        --health-timeout 5s
        --health-retries 5
    redis:
      image: redis:7
      ports:
        - 6379:6379
  steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-node@v4
      with:
        node-version: 20
        cache: 'npm'
    - run: npm ci
    - run: npm run db:migrate
      env:
        DATABASE_URL: postgres://postgres:testpass@localhost:5432/test_db
    - run: npm run test:integration
      env:
        DATABASE_URL: postgres://postgres:testpass@localhost:5432/test_db
        REDIS_URL: redis://localhost:6379

Target time: Under 10 minutes. This is the stage most prone to slowness -- monitor it closely.

Stage 3: On Merge to Main / Pre-Deploy (Minutes to Tens of Minutes)

This is the final validation before code reaches users. Run the expensive, thorough tests here.

What runs here:

Full browser test suite across multiple browsers
Visual regression tests (Playwright visual comparisons, Percy, Chromatic)
Performance smoke tests (Lighthouse, k6 with basic thresholds)
Accessibility audits (axe-core, Pa11y)

# Example: Cross-browser E2E tests with sharding
browser-tests:
  needs: integration-tests
  runs-on: ubuntu-latest
  strategy:
    fail-fast: false
    matrix:
      browser: [chromium, firefox, webkit]
      shard: [1, 2, 3, 4]
  steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-node@v4
      with:
        node-version: 20
        cache: 'npm'
    - run: npm ci
    - uses: actions/cache@v4
      with:
        path: ~/.cache/ms-playwright
        key: playwright-${{ hashFiles('package-lock.json') }}
    - run: npx playwright install --with-deps ${{ matrix.browser }}
    - run: npx playwright test --project=${{ matrix.browser }} --shard=${{ matrix.shard }}/4
    - uses: actions/upload-artifact@v4
      if: failure()
      with:
        name: traces-${{ matrix.browser }}-${{ matrix.shard }}
        path: test-results/

Target time: Under 20 minutes with parallelization. Without sharding, browser tests easily take 45+ minutes.

Stage 4: Post-Deploy (Continuous)

After deployment, testing does not stop. Post-deploy checks verify that the application works in the actual production environment.

What runs here:

Smoke tests against the deployed environment (critical user flows)
Synthetic monitoring (scheduled tests that simulate user behavior)
Canary analysis (comparing error rates between old and new versions)
Health check endpoints

# Example: Post-deploy smoke tests
smoke-tests:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-node@v4
      with:
        node-version: 20
        cache: 'npm'
    - run: npm ci
    - run: npx playwright install --with-deps chromium
    - run: npx playwright test --project=smoke
      env:
        BASE_URL: https://production.example.com
    - name: Notify on failure
      if: failure()
      run: |
        curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
          -H 'Content-Type: application/json' \
          -d '{"text":"Production smoke tests failed after deploy!"}'

Common Mistakes in Pipeline Test Structure

Running Everything on Every Push

If browser tests run on every push, developers wait 20 minutes for feedback on a one-line fix. Run only fast tests on push; save expensive tests for PR and merge events.

Skipping the Integration Layer

Teams often jump from unit tests to browser tests, missing the integration layer. This means bugs in database queries, API contracts, or service interactions are only caught by slow, brittle E2E tests.

Not Failing Fast Enough

If unit tests are broken, do not waste compute running integration and browser tests. Use job dependencies (needs) to create a pipeline where each stage gates the next.

Identical Tests at Multiple Stages

If your integration tests already verify the login API, your browser tests do not need to test login through the API again. Browser tests should focus on UI behavior that cannot be tested at lower levels.

Mapping Test Types to Pipeline Stages

Test Type	Stage	Trigger	Typical Duration	Purpose
Lint / Type check	1	Every push	30s - 1m	Catch syntax and type errors
Unit tests	1	Every push	30s - 3m	Verify logic in isolation
Integration tests	2	PR / merge to develop	3 - 10m	Verify component interactions
Contract tests	2	PR / merge to develop	1 - 5m	Verify API schemas match
Browser tests	3	Merge to main	5 - 20m	Verify end-to-end user flows
Visual regression	3	Merge to main	3 - 10m	Catch unintended UI changes
Performance tests	3	Merge to main	5 - 15m	Verify response time budgets
Smoke tests	4	Post-deploy	1 - 3m	Verify critical paths in production
Synthetic monitoring	4	Scheduled	1 - 5m	Ongoing production health

Hands-On Exercise

Map your current test suite to the four pipeline stages. Which stage is each test type in? Are any stages missing?
Measure the duration of each stage. Is any stage longer than its target time?
Identify tests that are running at the wrong stage (e.g., slow tests on every push)
Create a pipeline that has at least three stages with job dependencies between them
Verify that a failure in stage 1 prevents stages 2 and 3 from running