Context Feeding Strategies

The Core Problem: LLMs Are Only as Good as Their Input

An LLM generating tests without context is like a QA engineer writing tests without reading the requirements. The output might look syntactically correct, but it will miss domain constraints, use wrong method names, and hallucinate APIs that do not exist.

The art of context feeding is deciding what to include, how much to include, and in what order -- given that every token of context costs money and competes for space in the model's attention window.

The Context Hierarchy

Not all context is equal. When your token budget is limited, prioritize ruthlessly:

Priority 1: The actual specification (OpenAPI schema, AC, Figma annotations)
Priority 2: Existing test patterns in the codebase (so AI matches style)
Priority 3: Domain constraints (business rules not in the spec)
Priority 4: Technical stack details (frameworks, helpers, fixtures)
Priority 5: Examples of good vs bad tests from prior reviews

Why this order? Priority 1 prevents hallucination (the AI tests what actually exists). Priority 2 prevents style drift (the AI writes tests your team recognizes). Priority 3 catches business logic that specs often omit. Priority 4 ensures the code compiles. Priority 5 is a bonus that improves quality over time.

What to Feed and How

Source	How to Feed	Why It Matters
OpenAPI/Swagger spec	Paste the relevant endpoint JSON/YAML directly	Exact field names, types, constraints -- eliminates guesswork
User story + AC	Copy from Jira/Linear verbatim	Preserves original intent, edge cases mentioned in comments
Existing test file	Paste 2-3 representative tests as "style guide"	AI matches naming, structure, assertion style, fixtures
Database schema	Paste CREATE TABLE statements	Reveals constraints AI can test (NOT NULL, UNIQUE, FK, CHECK)
Error code documentation	Paste the error catalogue	AI generates tests triggering each documented error
UI mockup/Figma	Describe the layout or use screenshot + vision model	Generates accessibility and layout tests
CI configuration	Paste relevant test commands	AI understands how tests will be run (parallel, coverage flags)

Feeding an OpenAPI Schema

Here is the OpenAPI schema for the endpoint under test:

```yaml
paths:
  /api/v2/orders:
    post:
      summary: Create a new order
      security:
        - BearerAuth: [customer]
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CreateOrder'
      responses:
        '201':
          description: Order created
        '400':
          description: Validation error
        '401':
          description: Unauthorized
        '409':
          description: Duplicate order (idempotency key conflict)

components:
  schemas:
    CreateOrder:
      type: object
      required: [items, shipping_address, idempotency_key]
      properties:
        items:
          type: array
          minItems: 1
          maxItems: 50
          items:
            type: object
            required: [product_id, quantity]
            properties:
              product_id:
                type: string
                format: uuid
              quantity:
                type: integer
                minimum: 1
                maximum: 100
        shipping_address:
          $ref: '#/components/schemas/Address'
        idempotency_key:
          type: string
          format: uuid
        coupon_code:
          type: string
          pattern: "^[A-Z0-9]{8}$"

Notice that every constraint in the schema (minItems, maxItems, minimum, maximum, pattern, format) is a test case waiting to be generated. The LLM sees these constraints and produces boundary value tests automatically.

Feeding Existing Tests as a Style Guide

Here are two existing tests from our codebase. Match their style exactly:

```python
class TestOrderCreation:
    """Tests for POST /api/v2/orders endpoint."""

    def test_should_create_order_when_valid_payload(
        self, api_client, auth_headers, product_factory
    ):
        # Arrange
        product = product_factory.create()
        payload = {
            "items": [{"product_id": str(product.id), "quantity": 2}],
            "shipping_address": VALID_ADDRESS,
            "idempotency_key": str(uuid4()),
        }

        # Act
        response = api_client.post(
            "/api/v2/orders", json=payload, headers=auth_headers
        )

        # Assert
        assert response.status_code == 201
        order = response.json()
        assert order["status"] == "pending"
        assert len(order["items"]) == 1
        assert order["items"][0]["quantity"] == 2

    def test_should_reject_order_when_empty_items_list(
        self, api_client, auth_headers
    ):
        # Arrange
        payload = {
            "items": [],
            "shipping_address": VALID_ADDRESS,
            "idempotency_key": str(uuid4()),
        }

        # Act
        response = api_client.post(
            "/api/v2/orders", json=payload, headers=auth_headers
        )

        # Assert
        assert response.status_code == 400
        assert "items" in response.json()["detail"].lower()

By showing two tests -- one happy path, one validation error -- the AI learns:

Class structure with docstrings
Fixture-based dependency injection (api_client, auth_headers, product_factory)
Naming convention: test_should_X_when_Y
Comment markers for Arrange/Act/Assert
Assertion style (status code + specific field checks)
Use of VALID_ADDRESS constant and uuid4()

Anti-Pattern: The Context Dump

Do not paste your entire codebase into the prompt. LLMs degrade with irrelevant context. A focused 200-line excerpt produces better tests than a 5000-line dump.

This is called the needle-in-haystack problem -- the more hay, the harder the LLM works to find the needle. Research from 2024-2025 consistently shows that models perform best when relevant context is placed at the beginning or end of the prompt, and performance degrades with large amounts of irrelevant middle content.

Symptoms of Context Overload

Generated tests reference functions from the wrong file
Tests mix styles from different parts of the codebase
The LLM "forgets" constraints mentioned early in the prompt
Output is shorter and less detailed than expected (model ran out of output tokens processing bloated input)

The Fix: Context Windowing

Instead of dumping everything, use a context window approach:

Step 1: Feed the spec (Priority 1) -- generate initial tests
Step 2: Review output -- identify style mismatches
Step 3: Feed 2-3 existing tests as style examples (Priority 2) -- regenerate
Step 4: Review output -- identify missing domain rules
Step 5: Add domain constraints (Priority 3) -- regenerate specific tests

This iterative approach keeps each prompt focused and produces better results than a single massive prompt.

Advanced Strategy: Context Compression

When you must include a lot of context, compress it. Instead of pasting a 500-line source file, summarize it:

The UserService class has these public methods:
- create_user(dto: CreateUserDTO) -> User — validates email uniqueness, hashes password
- get_user(id: UUID) -> User — raises NotFoundError if missing
- update_user(id: UUID, dto: UpdateUserDTO) -> User — partial update, re-validates email if changed
- delete_user(id: UUID) -> None — soft delete (sets deleted_at timestamp)

Key constraints:
- Email must be unique (case-insensitive)
- Password minimum 8 chars, must include a number
- Soft-deleted users cannot log in but their data is retained for 30 days

This 10-line summary carries the same information as a 200-line source file for test generation purposes.

Strategy: Multi-Turn Context Building

For complex features, build context across multiple prompts in a conversation:

Turn 1: "Here is the OpenAPI schema for the payments endpoint. Summarize the
         test scenarios you would create."

Turn 2: "Good. Here are our existing payment tests for reference style.
         Now generate the first 10 tests matching this style."

Turn 3: "These look good. Now add tests for the edge cases: expired cards,
         insufficient funds, and currency conversion rounding."

Turn 4: "Review all generated tests. Which ones are testing the mock
         instead of the real behavior? Flag any tautology tests."

Each turn adds context incrementally, and the conversation history provides implicit context from prior turns. This is more effective than a single massive prompt because:

The LLM's attention is focused on one concern at a time
You can course-correct between turns
You build a review-as-you-go workflow

Context Feeding Checklist

Before sending a test generation prompt, verify:

[ ] Specification artifact is included (schema, AC, or story)
[ ] Only relevant portions are included (not the entire file)
[ ] Existing test examples are provided for style matching
[ ] Business rules not in the spec are stated explicitly
[ ] Framework and language are specified
[ ] Auth mechanism and test helpers are named exactly
[ ] Output format expectations are clear
[ ] Token budget is reasonable (< 8K input for focused generation)

Key Takeaway

Context feeding is the highest-leverage skill in AI-augmented test design. The right 200 lines of context produce better tests than 5000 lines of context dump. Prioritize specifications over code, feed incrementally rather than all at once, and always include style examples from your existing test suite.