Artifact Management
Why Artifacts Matter
Every test run should produce artifacts that help you diagnose failures without re-running the pipeline. When a browser test fails at 2 AM on a nightly run, you should be able to download the trace, screenshot, and logs the next morning and understand exactly what happened -- without triggering another 20-minute pipeline run.
Essential Artifact Types
Test Reports
Test reports are the primary artifacts. They tell you what passed, what failed, and provide details for each failure.
| Format | Best For | Consumed By |
|---|---|---|
| JUnit XML | Universal standard; every CI platform can parse it | GitHub Actions annotations, GitLab MR widgets, Jenkins test results |
| HTML reports | Human-readable, shareable with non-technical stakeholders | Browsers, Slack links |
| Allure reports | Rich interactive reports with history and trends | Allure server, static hosting |
| JSON results | Programmatic analysis, custom dashboards | Scripts, Grafana, custom tools |
# Generate JUnit XML for CI platform integration
- run: npx playwright test --reporter=junit
env:
PLAYWRIGHT_JUNIT_OUTPUT_NAME: results.xml
# Generate HTML for human consumption
- run: npx playwright test --reporter=html
# Generate both
- run: npx playwright test --reporter=junit,html
Pro tip: Configure your test runner to produce JUnit XML by default (for the CI platform to parse) and HTML reports on demand (for manual investigation).
Screenshots and Videos
Browser test failures are almost impossible to diagnose without visual evidence. Configure your test framework to capture screenshots on failure and optionally record video.
// playwright.config.ts
export default defineConfig({
use: {
screenshot: 'only-on-failure',
video: 'retain-on-failure',
trace: 'retain-on-failure',
},
});
Playwright traces are especially valuable. A trace captures a timeline of every action, network request, DOM snapshot, and console log. You can open them in the Trace Viewer and step through the test execution like a debugger.
- uses: actions/upload-artifact@v4
if: failure()
with:
name: playwright-traces
path: test-results/
retention-days: 14
Coverage Reports
Coverage reports track which lines of code were exercised during testing. They are useful for:
- Identifying untested code paths
- Tracking coverage trends over time
- Verifying that new code has tests
- run: npm run test:unit -- --coverage --coverageReporters=lcov --coverageReporters=text
- uses: actions/upload-artifact@v4
if: always()
with:
name: coverage-report
path: coverage/
Important: Track coverage trends, not absolute numbers. A coverage threshold of 80% is meaningless if the uncovered 20% contains the most critical business logic. Better: require that new code is covered, not that the entire codebase meets a threshold.
Logs
Application logs from test containers, browser console logs, and network HAR files provide context that test reports alone cannot.
# Capture Docker container logs
- name: Capture service logs
if: failure()
run: |
docker logs test-db > postgres.log 2>&1
docker logs test-app > application.log 2>&1
- uses: actions/upload-artifact@v4
if: failure()
with:
name: service-logs
path: |
postgres.log
application.log
HAR Files (Network Traffic)
HAR (HTTP Archive) files capture every network request and response during a test. They are invaluable for debugging API-related test failures.
// Capture HAR file in Playwright
const context = await browser.newContext({
recordHar: { path: 'test-results/network.har' }
});
// ... run tests ...
await context.close(); // HAR file is saved on close
Retention Policies
Artifacts consume storage. Set retention policies that balance debugging needs with costs.
| Artifact Type | Passing Runs | Failing Runs | Rationale |
|---|---|---|---|
| Test reports (XML/HTML) | 7 days | 30 days | Failing reports are needed for investigation |
| Screenshots/traces | Do not upload | 14 days | Only useful for debugging failures |
| Coverage reports | 30 days | 30 days | Needed for trend analysis |
| Logs | 3 days | 14 days | Useful for root cause analysis |
| HAR files | Do not upload | 7 days | Large files; only needed for API debugging |
# GitHub Actions retention configuration
- uses: actions/upload-artifact@v4
with:
name: test-report
path: test-results/
retention-days: 14 # Override default (90 days)
Organizing Artifacts
When a matrix strategy produces multiple artifacts, name them clearly so you can find the right one.
# Bad: generic names
name: test-results # Which browser? Which shard?
# Good: descriptive names with matrix values
name: traces-${{ matrix.browser }}-${{ matrix.shard }}
# Produces: traces-chromium-1, traces-chromium-2, traces-firefox-1, etc.
For multi-job pipelines, prefix artifacts with the job name:
# Job: unit-tests
name: unit-coverage
# Job: integration-tests
name: integration-report
# Job: browser-tests
name: browser-traces-${{ matrix.browser }}
Consuming Artifacts
In Pull Requests
Most CI platforms can parse JUnit XML and display test results directly in PRs:
- GitHub Actions: Use
dorny/test-reporteraction to add test results as PR checks - GitLab CI: Use
artifacts:reports:junitto show results in the merge request widget - Jenkins: The JUnit plugin parses XML and shows results in the build page
# GitHub Actions: Show test results in PR
- uses: dorny/test-reporter@v1
if: always()
with:
name: Playwright Tests
path: results.xml
reporter: java-junit
Downloading for Local Debugging
# GitHub CLI: Download artifacts from a specific run
gh run download 12345678 -n playwright-traces
# Open Playwright traces locally
npx playwright show-trace test-results/trace.zip
Aggregating Across Runs
For trend analysis, push test results to an external system:
- Allure TestOps: Aggregates results across runs and shows trends
- Grafana + InfluxDB: Custom dashboards for test metrics
- TestRail / Zephyr: Upload results via API for traceability
Anti-Patterns in Artifact Management
| Anti-Pattern | Problem | Fix |
|---|---|---|
| No artifacts collected | Failures require re-running the pipeline to debug | Always upload reports, screenshots, and logs |
| Artifacts only on success | You only need debugging artifacts when things fail | Use if: failure() or if: always() |
| No retention policy | Storage costs grow unbounded | Set retention days per artifact type |
| Generic artifact names | Cannot tell which browser or shard failed | Include matrix variables in artifact names |
| Uploading everything always | Wastes storage on passing runs | Upload traces/screenshots only on failure |
Hands-On Exercise
- Configure your test framework to produce JUnit XML, HTML reports, and screenshots on failure
- Add artifact upload steps to your pipeline with appropriate
ifconditions - Intentionally break a test and download the artifacts to debug it locally
- Set up retention policies that differ for passing and failing runs
- Configure PR annotations using a test reporter action so failures are visible without clicking into the pipeline