Artifact Management

Why Artifacts Matter

Every test run should produce artifacts that help you diagnose failures without re-running the pipeline. When a browser test fails at 2 AM on a nightly run, you should be able to download the trace, screenshot, and logs the next morning and understand exactly what happened -- without triggering another 20-minute pipeline run.

Essential Artifact Types

Test Reports

Test reports are the primary artifacts. They tell you what passed, what failed, and provide details for each failure.

Format	Best For	Consumed By
JUnit XML	Universal standard; every CI platform can parse it	GitHub Actions annotations, GitLab MR widgets, Jenkins test results
HTML reports	Human-readable, shareable with non-technical stakeholders	Browsers, Slack links
Allure reports	Rich interactive reports with history and trends	Allure server, static hosting
JSON results	Programmatic analysis, custom dashboards	Scripts, Grafana, custom tools

# Generate JUnit XML for CI platform integration
- run: npx playwright test --reporter=junit
  env:
    PLAYWRIGHT_JUNIT_OUTPUT_NAME: results.xml

# Generate HTML for human consumption
- run: npx playwright test --reporter=html

# Generate both
- run: npx playwright test --reporter=junit,html

Pro tip: Configure your test runner to produce JUnit XML by default (for the CI platform to parse) and HTML reports on demand (for manual investigation).

Screenshots and Videos

Browser test failures are almost impossible to diagnose without visual evidence. Configure your test framework to capture screenshots on failure and optionally record video.

// playwright.config.ts
export default defineConfig({
  use: {
    screenshot: 'only-on-failure',
    video: 'retain-on-failure',
    trace: 'retain-on-failure',
  },
});

Playwright traces are especially valuable. A trace captures a timeline of every action, network request, DOM snapshot, and console log. You can open them in the Trace Viewer and step through the test execution like a debugger.

- uses: actions/upload-artifact@v4
  if: failure()
  with:
    name: playwright-traces
    path: test-results/
    retention-days: 14

Coverage Reports

Coverage reports track which lines of code were exercised during testing. They are useful for:

Identifying untested code paths
Tracking coverage trends over time
Verifying that new code has tests

- run: npm run test:unit -- --coverage --coverageReporters=lcov --coverageReporters=text
- uses: actions/upload-artifact@v4
  if: always()
  with:
    name: coverage-report
    path: coverage/

Important: Track coverage trends, not absolute numbers. A coverage threshold of 80% is meaningless if the uncovered 20% contains the most critical business logic. Better: require that new code is covered, not that the entire codebase meets a threshold.

Logs

Application logs from test containers, browser console logs, and network HAR files provide context that test reports alone cannot.

# Capture Docker container logs
- name: Capture service logs
  if: failure()
  run: |
    docker logs test-db > postgres.log 2>&1
    docker logs test-app > application.log 2>&1

- uses: actions/upload-artifact@v4
  if: failure()
  with:
    name: service-logs
    path: |
      postgres.log
      application.log

HAR Files (Network Traffic)

HAR (HTTP Archive) files capture every network request and response during a test. They are invaluable for debugging API-related test failures.

// Capture HAR file in Playwright
const context = await browser.newContext({
  recordHar: { path: 'test-results/network.har' }
});
// ... run tests ...
await context.close(); // HAR file is saved on close

Retention Policies

Artifacts consume storage. Set retention policies that balance debugging needs with costs.

Artifact Type	Passing Runs	Failing Runs	Rationale
Test reports (XML/HTML)	7 days	30 days	Failing reports are needed for investigation
Screenshots/traces	Do not upload	14 days	Only useful for debugging failures
Coverage reports	30 days	30 days	Needed for trend analysis
Logs	3 days	14 days	Useful for root cause analysis
HAR files	Do not upload	7 days	Large files; only needed for API debugging

# GitHub Actions retention configuration
- uses: actions/upload-artifact@v4
  with:
    name: test-report
    path: test-results/
    retention-days: 14  # Override default (90 days)

Organizing Artifacts

When a matrix strategy produces multiple artifacts, name them clearly so you can find the right one.

# Bad: generic names
name: test-results  # Which browser? Which shard?

# Good: descriptive names with matrix values
name: traces-${{ matrix.browser }}-${{ matrix.shard }}
# Produces: traces-chromium-1, traces-chromium-2, traces-firefox-1, etc.

For multi-job pipelines, prefix artifacts with the job name:

# Job: unit-tests
name: unit-coverage

# Job: integration-tests
name: integration-report

# Job: browser-tests
name: browser-traces-${{ matrix.browser }}

Consuming Artifacts

In Pull Requests

Most CI platforms can parse JUnit XML and display test results directly in PRs:

GitHub Actions: Use dorny/test-reporter action to add test results as PR checks
GitLab CI: Use artifacts:reports:junit to show results in the merge request widget
Jenkins: The JUnit plugin parses XML and shows results in the build page

# GitHub Actions: Show test results in PR
- uses: dorny/test-reporter@v1
  if: always()
  with:
    name: Playwright Tests
    path: results.xml
    reporter: java-junit

Downloading for Local Debugging

# GitHub CLI: Download artifacts from a specific run
gh run download 12345678 -n playwright-traces

# Open Playwright traces locally
npx playwright show-trace test-results/trace.zip

Aggregating Across Runs

For trend analysis, push test results to an external system:

Allure TestOps: Aggregates results across runs and shows trends
Grafana + InfluxDB: Custom dashboards for test metrics
TestRail / Zephyr: Upload results via API for traceability

Anti-Patterns in Artifact Management

Anti-Pattern	Problem	Fix
No artifacts collected	Failures require re-running the pipeline to debug	Always upload reports, screenshots, and logs
Artifacts only on success	You only need debugging artifacts when things fail	Use `if: failure()` or `if: always()`
No retention policy	Storage costs grow unbounded	Set retention days per artifact type
Generic artifact names	Cannot tell which browser or shard failed	Include matrix variables in artifact names
Uploading everything always	Wastes storage on passing runs	Upload traces/screenshots only on failure

Hands-On Exercise

Configure your test framework to produce JUnit XML, HTML reports, and screenshots on failure
Add artifact upload steps to your pipeline with appropriate if conditions
Intentionally break a test and download the artifacts to debug it locally
Set up retention policies that differ for passing and failing runs
Configure PR annotations using a test reporter action so failures are visible without clicking into the pipeline