Living API Documentation

The Documentation Staleness Problem

API docs go stale within weeks of writing. Engineers change endpoints, add fields, and adjust error codes -- but forget to update the docs. This creates a dangerous gap: consumers build against the documentation, but the implementation has diverged.

AI agents can solve this by continuously comparing the implementation to the documentation and creating pull requests to fix discrepancies.

Agent Architecture for Living Docs

class APIDocumentationAgent:
    """Agent that keeps API documentation synchronized with implementation."""

    def __init__(self, llm, repo_path: str, spec_path: str):
        self.llm = llm
        self.repo_path = repo_path
        self.spec_path = spec_path

    def daily_sync(self):
        """Run daily to detect and fix documentation drift."""

        # Step 1: Extract actual API behavior from code
        routes = self.extract_routes_from_code()

        # Step 2: Parse current documentation
        documented = self.parse_openapi_spec()

        # Step 3: Compare and find drifts
        drifts = self.compare(routes, documented)

        # Step 4: For each drift, generate a fix
        fixes = []
        for drift in drifts:
            if drift.type == "UNDOCUMENTED_ENDPOINT":
                fix = self.generate_endpoint_docs(drift.endpoint)
                fixes.append(fix)
            elif drift.type == "MISSING_FIELD":
                fix = self.generate_field_docs(drift.endpoint, drift.field)
                fixes.append(fix)
            elif drift.type == "STALE_EXAMPLE":
                fix = self.regenerate_example(drift.endpoint)
                fixes.append(fix)

        # Step 5: Create a PR with the fixes
        if fixes:
            self.create_documentation_pr(fixes)

    def extract_routes_from_code(self) -> list:
        """Use AI to parse route definitions from the source code."""
        prompt = f"""
        Analyze the following source files and extract all API routes.
        For each route, identify:
        - HTTP method and path
        - Request body schema (from validation decorators or type hints)
        - Response schema (from return statements or serializer usage)
        - Authentication requirements
        - Status codes returned

        Source files:
        {self.read_route_files()}
        """
        return self.llm.generate_structured(prompt, schema=RouteList)

    def compare(self, actual_routes, documented_routes) -> list:
        """Compare actual routes against documentation."""
        drifts = []

        actual_paths = {(r.method, r.path) for r in actual_routes}
        documented_paths = {(r.method, r.path) for r in documented_routes}

        # Undocumented endpoints
        for method, path in actual_paths - documented_paths:
            drifts.append(Drift(
                type="UNDOCUMENTED_ENDPOINT",
                endpoint=f"{method} {path}",
                message=f"Endpoint exists in code but not in documentation"
            ))

        # Documented but removed endpoints
        for method, path in documented_paths - actual_paths:
            drifts.append(Drift(
                type="REMOVED_ENDPOINT",
                endpoint=f"{method} {path}",
                message=f"Endpoint in documentation but not found in code"
            ))

        # Field-level comparison for shared endpoints
        for method, path in actual_paths & documented_paths:
            actual = next(r for r in actual_routes if r.method == method and r.path == path)
            documented = next(r for r in documented_routes if r.method == method and r.path == path)
            drifts.extend(self.compare_fields(actual, documented))

        return drifts

    def create_documentation_pr(self, fixes: list):
        """Create a git branch, apply fixes, and open a PR."""
        branch_name = f"docs/api-sync-{datetime.now().strftime('%Y%m%d')}"

        # Create branch
        subprocess.run(["git", "checkout", "-b", branch_name], cwd=self.repo_path)

        # Apply each fix to the OpenAPI spec
        spec = yaml.safe_load(open(self.spec_path))
        for fix in fixes:
            self.apply_fix(spec, fix)
        yaml.dump(spec, open(self.spec_path, "w"), default_flow_style=False)

        # Commit and push
        subprocess.run(["git", "add", self.spec_path], cwd=self.repo_path)
        subprocess.run(
            ["git", "commit", "-m", f"docs: sync API documentation ({len(fixes)} fixes)"],
            cwd=self.repo_path
        )
        subprocess.run(["git", "push", "origin", branch_name], cwd=self.repo_path)

        # Create PR via GitHub CLI
        pr_body = self.generate_pr_body(fixes)
        subprocess.run([
            "gh", "pr", "create",
            "--title", f"docs: sync API documentation ({len(fixes)} fixes)",
            "--body", pr_body,
            "--base", "main",
        ], cwd=self.repo_path)

The Layered API Test Strategy

Putting everything together, an AI-powered API test strategy has six layers:

Layer 1: CONTRACT TESTS (Pact)
  -- Consumer expectations published to broker
  -- Provider verification runs on every PR
  -- AI generates and updates contracts from client code

Layer 2: SCHEMA VALIDATION (OpenAPI)
  -- AI generates test suite from schema
  -- Drift detection runs daily
  -- Response shape validation on every endpoint

Layer 3: FUNCTIONAL TESTS (pytest + httpx)
  -- AI generates from schema + business rules
  -- Human curates for domain correctness
  -- Runs on every PR

Layer 4: SEMANTIC FUZZING
  -- AI generates contextually meaningful payloads
  -- Runs nightly on staging
  -- Anomalies triaged by AI, confirmed by human

Layer 5: EVENT-DRIVEN TESTS (Kafka/SQS/EventBridge)
  -- Event publication and consumption verification
  -- Idempotency and ordering tests
  -- Dead letter queue monitoring

Layer 6: LIVING DOCUMENTATION
  -- AI agent compares code to docs daily
  -- Generates PRs for drift fixes
  -- Human reviews and merges

Cost-Benefit by Layer

Layer	Setup Cost	Maintenance Cost	Bug Detection Value	AI Leverage
Contract tests	Medium	Low (auto-updated)	High (integration)	High
Schema validation	Low	Very low	Medium (shape)	Very high
Functional tests	Medium	Medium	High (logic)	High
Semantic fuzzing	Low	Very low	High (security)	Very high
Event-driven tests	High	Medium	High (async)	Medium
Living docs	Low	Very low	Medium (accuracy)	Very high

Running the Full Strategy in CI

# .github/workflows/api-testing.yml
name: API Test Strategy

on:
  pull_request:
    paths: ['app/**', 'docs/openapi.yaml']

jobs:
  contract-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm run pact:test
      - run: npm run pact:publish

  schema-validation:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: python -m pytest tests/api/ -v

  functional-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: docker-compose up -d
      - run: python -m pytest tests/functional/ -v --cov=app

  # Nightly only
  semantic-fuzzing:
    if: github.event.schedule
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: docker-compose up -d
      - run: python -m api_fuzzer --output fuzz-report.json

  # Nightly only
  drift-detection:
    if: github.event.schedule
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: docker-compose up -d
      - run: python -m schema_drift_detector --output drift-report.json

Interview Talking Point

"I think about API testing in layers, and AI plays a different role at each layer. At the contract layer, AI analyzes our client code and generates Pact consumer tests automatically -- it detects every HTTP call pattern and builds the contract expectations. At the schema layer, AI generates comprehensive tests from our OpenAPI spec, covering every field constraint, enum value, and auth scenario. At the fuzzing layer, AI generates semantically meaningful payloads -- not random bytes, but SQL injection strings in name fields, boundary values for prices, Unicode edge cases -- which catches real vulnerabilities that random fuzzing misses. For event-driven architectures, I focus on idempotency, ordering, and dead-letter testing because those are where async systems silently fail. The most underrated capability is schema drift detection: an AI agent compares our OpenAPI spec to the actual API behavior daily and creates a PR whenever the documentation is stale. This eliminates the entire class of 'the docs say X but the API does Y' bugs that plague every microservice team I have worked with."

Key Takeaway

Living documentation is the final layer in AI-powered API testing. An AI agent that runs daily, compares your OpenAPI spec against actual API behavior, and creates PRs for discrepancies eliminates the entire class of documentation staleness bugs. Combined with the other five layers (contracts, schema validation, functional tests, fuzzing, event-driven tests), this creates a comprehensive API quality system where AI handles the systematic work and humans focus on domain judgment and review.