Living API Documentation
The Documentation Staleness Problem
API docs go stale within weeks of writing. Engineers change endpoints, add fields, and adjust error codes -- but forget to update the docs. This creates a dangerous gap: consumers build against the documentation, but the implementation has diverged.
AI agents can solve this by continuously comparing the implementation to the documentation and creating pull requests to fix discrepancies.
Agent Architecture for Living Docs
class APIDocumentationAgent:
"""Agent that keeps API documentation synchronized with implementation."""
def __init__(self, llm, repo_path: str, spec_path: str):
self.llm = llm
self.repo_path = repo_path
self.spec_path = spec_path
def daily_sync(self):
"""Run daily to detect and fix documentation drift."""
# Step 1: Extract actual API behavior from code
routes = self.extract_routes_from_code()
# Step 2: Parse current documentation
documented = self.parse_openapi_spec()
# Step 3: Compare and find drifts
drifts = self.compare(routes, documented)
# Step 4: For each drift, generate a fix
fixes = []
for drift in drifts:
if drift.type == "UNDOCUMENTED_ENDPOINT":
fix = self.generate_endpoint_docs(drift.endpoint)
fixes.append(fix)
elif drift.type == "MISSING_FIELD":
fix = self.generate_field_docs(drift.endpoint, drift.field)
fixes.append(fix)
elif drift.type == "STALE_EXAMPLE":
fix = self.regenerate_example(drift.endpoint)
fixes.append(fix)
# Step 5: Create a PR with the fixes
if fixes:
self.create_documentation_pr(fixes)
def extract_routes_from_code(self) -> list:
"""Use AI to parse route definitions from the source code."""
prompt = f"""
Analyze the following source files and extract all API routes.
For each route, identify:
- HTTP method and path
- Request body schema (from validation decorators or type hints)
- Response schema (from return statements or serializer usage)
- Authentication requirements
- Status codes returned
Source files:
{self.read_route_files()}
"""
return self.llm.generate_structured(prompt, schema=RouteList)
def compare(self, actual_routes, documented_routes) -> list:
"""Compare actual routes against documentation."""
drifts = []
actual_paths = {(r.method, r.path) for r in actual_routes}
documented_paths = {(r.method, r.path) for r in documented_routes}
# Undocumented endpoints
for method, path in actual_paths - documented_paths:
drifts.append(Drift(
type="UNDOCUMENTED_ENDPOINT",
endpoint=f"{method} {path}",
message=f"Endpoint exists in code but not in documentation"
))
# Documented but removed endpoints
for method, path in documented_paths - actual_paths:
drifts.append(Drift(
type="REMOVED_ENDPOINT",
endpoint=f"{method} {path}",
message=f"Endpoint in documentation but not found in code"
))
# Field-level comparison for shared endpoints
for method, path in actual_paths & documented_paths:
actual = next(r for r in actual_routes if r.method == method and r.path == path)
documented = next(r for r in documented_routes if r.method == method and r.path == path)
drifts.extend(self.compare_fields(actual, documented))
return drifts
def create_documentation_pr(self, fixes: list):
"""Create a git branch, apply fixes, and open a PR."""
branch_name = f"docs/api-sync-{datetime.now().strftime('%Y%m%d')}"
# Create branch
subprocess.run(["git", "checkout", "-b", branch_name], cwd=self.repo_path)
# Apply each fix to the OpenAPI spec
spec = yaml.safe_load(open(self.spec_path))
for fix in fixes:
self.apply_fix(spec, fix)
yaml.dump(spec, open(self.spec_path, "w"), default_flow_style=False)
# Commit and push
subprocess.run(["git", "add", self.spec_path], cwd=self.repo_path)
subprocess.run(
["git", "commit", "-m", f"docs: sync API documentation ({len(fixes)} fixes)"],
cwd=self.repo_path
)
subprocess.run(["git", "push", "origin", branch_name], cwd=self.repo_path)
# Create PR via GitHub CLI
pr_body = self.generate_pr_body(fixes)
subprocess.run([
"gh", "pr", "create",
"--title", f"docs: sync API documentation ({len(fixes)} fixes)",
"--body", pr_body,
"--base", "main",
], cwd=self.repo_path)
The Layered API Test Strategy
Putting everything together, an AI-powered API test strategy has six layers:
Layer 1: CONTRACT TESTS (Pact)
-- Consumer expectations published to broker
-- Provider verification runs on every PR
-- AI generates and updates contracts from client code
Layer 2: SCHEMA VALIDATION (OpenAPI)
-- AI generates test suite from schema
-- Drift detection runs daily
-- Response shape validation on every endpoint
Layer 3: FUNCTIONAL TESTS (pytest + httpx)
-- AI generates from schema + business rules
-- Human curates for domain correctness
-- Runs on every PR
Layer 4: SEMANTIC FUZZING
-- AI generates contextually meaningful payloads
-- Runs nightly on staging
-- Anomalies triaged by AI, confirmed by human
Layer 5: EVENT-DRIVEN TESTS (Kafka/SQS/EventBridge)
-- Event publication and consumption verification
-- Idempotency and ordering tests
-- Dead letter queue monitoring
Layer 6: LIVING DOCUMENTATION
-- AI agent compares code to docs daily
-- Generates PRs for drift fixes
-- Human reviews and merges
Cost-Benefit by Layer
| Layer | Setup Cost | Maintenance Cost | Bug Detection Value | AI Leverage |
|---|---|---|---|---|
| Contract tests | Medium | Low (auto-updated) | High (integration) | High |
| Schema validation | Low | Very low | Medium (shape) | Very high |
| Functional tests | Medium | Medium | High (logic) | High |
| Semantic fuzzing | Low | Very low | High (security) | Very high |
| Event-driven tests | High | Medium | High (async) | Medium |
| Living docs | Low | Very low | Medium (accuracy) | Very high |
Running the Full Strategy in CI
# .github/workflows/api-testing.yml
name: API Test Strategy
on:
pull_request:
paths: ['app/**', 'docs/openapi.yaml']
jobs:
contract-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm run pact:test
- run: npm run pact:publish
schema-validation:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: python -m pytest tests/api/ -v
functional-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: docker-compose up -d
- run: python -m pytest tests/functional/ -v --cov=app
# Nightly only
semantic-fuzzing:
if: github.event.schedule
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: docker-compose up -d
- run: python -m api_fuzzer --output fuzz-report.json
# Nightly only
drift-detection:
if: github.event.schedule
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: docker-compose up -d
- run: python -m schema_drift_detector --output drift-report.json
Interview Talking Point
"I think about API testing in layers, and AI plays a different role at each layer. At the contract layer, AI analyzes our client code and generates Pact consumer tests automatically -- it detects every HTTP call pattern and builds the contract expectations. At the schema layer, AI generates comprehensive tests from our OpenAPI spec, covering every field constraint, enum value, and auth scenario. At the fuzzing layer, AI generates semantically meaningful payloads -- not random bytes, but SQL injection strings in name fields, boundary values for prices, Unicode edge cases -- which catches real vulnerabilities that random fuzzing misses. For event-driven architectures, I focus on idempotency, ordering, and dead-letter testing because those are where async systems silently fail. The most underrated capability is schema drift detection: an AI agent compares our OpenAPI spec to the actual API behavior daily and creates a PR whenever the documentation is stale. This eliminates the entire class of 'the docs say X but the API does Y' bugs that plague every microservice team I have worked with."
Key Takeaway
Living documentation is the final layer in AI-powered API testing. An AI agent that runs daily, compares your OpenAPI spec against actual API behavior, and creates PRs for discrepancies eliminates the entire class of documentation staleness bugs. Combined with the other five layers (contracts, schema validation, functional tests, fuzzing, event-driven tests), this creates a comprehensive API quality system where AI handles the systematic work and humans focus on domain judgment and review.