Context Feeding Strategies
The Core Problem: LLMs Are Only as Good as Their Input
An LLM generating tests without context is like a QA engineer writing tests without reading the requirements. The output might look syntactically correct, but it will miss domain constraints, use wrong method names, and hallucinate APIs that do not exist.
The art of context feeding is deciding what to include, how much to include, and in what order -- given that every token of context costs money and competes for space in the model's attention window.
The Context Hierarchy
Not all context is equal. When your token budget is limited, prioritize ruthlessly:
Priority 1: The actual specification (OpenAPI schema, AC, Figma annotations)
Priority 2: Existing test patterns in the codebase (so AI matches style)
Priority 3: Domain constraints (business rules not in the spec)
Priority 4: Technical stack details (frameworks, helpers, fixtures)
Priority 5: Examples of good vs bad tests from prior reviews
Why this order? Priority 1 prevents hallucination (the AI tests what actually exists). Priority 2 prevents style drift (the AI writes tests your team recognizes). Priority 3 catches business logic that specs often omit. Priority 4 ensures the code compiles. Priority 5 is a bonus that improves quality over time.
What to Feed and How
| Source | How to Feed | Why It Matters |
|---|---|---|
| OpenAPI/Swagger spec | Paste the relevant endpoint JSON/YAML directly | Exact field names, types, constraints -- eliminates guesswork |
| User story + AC | Copy from Jira/Linear verbatim | Preserves original intent, edge cases mentioned in comments |
| Existing test file | Paste 2-3 representative tests as "style guide" | AI matches naming, structure, assertion style, fixtures |
| Database schema | Paste CREATE TABLE statements | Reveals constraints AI can test (NOT NULL, UNIQUE, FK, CHECK) |
| Error code documentation | Paste the error catalogue | AI generates tests triggering each documented error |
| UI mockup/Figma | Describe the layout or use screenshot + vision model | Generates accessibility and layout tests |
| CI configuration | Paste relevant test commands | AI understands how tests will be run (parallel, coverage flags) |
Feeding an OpenAPI Schema
Here is the OpenAPI schema for the endpoint under test:
```yaml
paths:
/api/v2/orders:
post:
summary: Create a new order
security:
- BearerAuth: [customer]
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/CreateOrder'
responses:
'201':
description: Order created
'400':
description: Validation error
'401':
description: Unauthorized
'409':
description: Duplicate order (idempotency key conflict)
components:
schemas:
CreateOrder:
type: object
required: [items, shipping_address, idempotency_key]
properties:
items:
type: array
minItems: 1
maxItems: 50
items:
type: object
required: [product_id, quantity]
properties:
product_id:
type: string
format: uuid
quantity:
type: integer
minimum: 1
maximum: 100
shipping_address:
$ref: '#/components/schemas/Address'
idempotency_key:
type: string
format: uuid
coupon_code:
type: string
pattern: "^[A-Z0-9]{8}$"
Notice that every constraint in the schema (minItems, maxItems, minimum, maximum, pattern, format) is a test case waiting to be generated. The LLM sees these constraints and produces boundary value tests automatically.
Feeding Existing Tests as a Style Guide
Here are two existing tests from our codebase. Match their style exactly:
```python
class TestOrderCreation:
"""Tests for POST /api/v2/orders endpoint."""
def test_should_create_order_when_valid_payload(
self, api_client, auth_headers, product_factory
):
# Arrange
product = product_factory.create()
payload = {
"items": [{"product_id": str(product.id), "quantity": 2}],
"shipping_address": VALID_ADDRESS,
"idempotency_key": str(uuid4()),
}
# Act
response = api_client.post(
"/api/v2/orders", json=payload, headers=auth_headers
)
# Assert
assert response.status_code == 201
order = response.json()
assert order["status"] == "pending"
assert len(order["items"]) == 1
assert order["items"][0]["quantity"] == 2
def test_should_reject_order_when_empty_items_list(
self, api_client, auth_headers
):
# Arrange
payload = {
"items": [],
"shipping_address": VALID_ADDRESS,
"idempotency_key": str(uuid4()),
}
# Act
response = api_client.post(
"/api/v2/orders", json=payload, headers=auth_headers
)
# Assert
assert response.status_code == 400
assert "items" in response.json()["detail"].lower()
By showing two tests -- one happy path, one validation error -- the AI learns:
- Class structure with docstrings
- Fixture-based dependency injection (
api_client,auth_headers,product_factory) - Naming convention:
test_should_X_when_Y - Comment markers for Arrange/Act/Assert
- Assertion style (status code + specific field checks)
- Use of
VALID_ADDRESSconstant anduuid4()
Anti-Pattern: The Context Dump
Do not paste your entire codebase into the prompt. LLMs degrade with irrelevant context. A focused 200-line excerpt produces better tests than a 5000-line dump.
This is called the needle-in-haystack problem -- the more hay, the harder the LLM works to find the needle. Research from 2024-2025 consistently shows that models perform best when relevant context is placed at the beginning or end of the prompt, and performance degrades with large amounts of irrelevant middle content.
Symptoms of Context Overload
- Generated tests reference functions from the wrong file
- Tests mix styles from different parts of the codebase
- The LLM "forgets" constraints mentioned early in the prompt
- Output is shorter and less detailed than expected (model ran out of output tokens processing bloated input)
The Fix: Context Windowing
Instead of dumping everything, use a context window approach:
Step 1: Feed the spec (Priority 1) -- generate initial tests
Step 2: Review output -- identify style mismatches
Step 3: Feed 2-3 existing tests as style examples (Priority 2) -- regenerate
Step 4: Review output -- identify missing domain rules
Step 5: Add domain constraints (Priority 3) -- regenerate specific tests
This iterative approach keeps each prompt focused and produces better results than a single massive prompt.
Advanced Strategy: Context Compression
When you must include a lot of context, compress it. Instead of pasting a 500-line source file, summarize it:
The UserService class has these public methods:
- create_user(dto: CreateUserDTO) -> User — validates email uniqueness, hashes password
- get_user(id: UUID) -> User — raises NotFoundError if missing
- update_user(id: UUID, dto: UpdateUserDTO) -> User — partial update, re-validates email if changed
- delete_user(id: UUID) -> None — soft delete (sets deleted_at timestamp)
Key constraints:
- Email must be unique (case-insensitive)
- Password minimum 8 chars, must include a number
- Soft-deleted users cannot log in but their data is retained for 30 days
This 10-line summary carries the same information as a 200-line source file for test generation purposes.
Strategy: Multi-Turn Context Building
For complex features, build context across multiple prompts in a conversation:
Turn 1: "Here is the OpenAPI schema for the payments endpoint. Summarize the
test scenarios you would create."
Turn 2: "Good. Here are our existing payment tests for reference style.
Now generate the first 10 tests matching this style."
Turn 3: "These look good. Now add tests for the edge cases: expired cards,
insufficient funds, and currency conversion rounding."
Turn 4: "Review all generated tests. Which ones are testing the mock
instead of the real behavior? Flag any tautology tests."
Each turn adds context incrementally, and the conversation history provides implicit context from prior turns. This is more effective than a single massive prompt because:
- The LLM's attention is focused on one concern at a time
- You can course-correct between turns
- You build a review-as-you-go workflow
Context Feeding Checklist
Before sending a test generation prompt, verify:
[ ] Specification artifact is included (schema, AC, or story)
[ ] Only relevant portions are included (not the entire file)
[ ] Existing test examples are provided for style matching
[ ] Business rules not in the spec are stated explicitly
[ ] Framework and language are specified
[ ] Auth mechanism and test helpers are named exactly
[ ] Output format expectations are clear
[ ] Token budget is reasonable (< 8K input for focused generation)
Key Takeaway
Context feeding is the highest-leverage skill in AI-augmented test design. The right 200 lines of context produce better tests than 5000 lines of context dump. Prioritize specifications over code, feed incrementally rather than all at once, and always include style examples from your existing test suite.