The Anatomy of a High-Quality Test Prompt
Why Prompt Structure Matters
A poor prompt produces poor tests. The difference between "write tests for login" and a structured prompt is the difference between a junior engineer guessing and a senior engineer analyzing requirements. LLMs are statistical pattern matchers -- the quality of their output is directly proportional to the specificity and structure of your input.
In test generation, this means you need to think of your prompt as a test specification document, not a casual request. Every piece of context you include (or omit) shapes the coverage, quality, and usefulness of the generated tests.
The Five Elements of an Effective Test-Generation Prompt
Every high-quality test prompt contains five elements. Missing any one of them degrades output quality significantly.
| Element | Purpose | Example |
|---|---|---|
| Context | What system/feature is under test | "An e-commerce checkout API built in Node.js/Express" |
| Artifact | The spec, schema, or story to test against | "Here is the OpenAPI schema: ..." |
| Constraints | Framework, language, patterns to follow | "Use Jest + Supertest, follow AAA pattern" |
| Coverage goals | What categories of tests to generate | "Include happy path, auth failures, validation errors, edge cases" |
| Output format | How to structure the response | "One test per it() block, group by endpoint" |
Element 1: Context
Context tells the LLM what kind of system it is dealing with. Without it, the LLM makes generic assumptions. With it, the LLM tailors its output to the technology stack and domain.
Weak context:
Write tests for the checkout flow.
Strong context:
We have an e-commerce checkout API built in Node.js 20 with Express 4.
The checkout flow involves:
1. Cart validation (check stock availability)
2. Payment processing via Stripe API (test mode)
3. Order creation in PostgreSQL
4. Email confirmation via SendGrid
The API is behind JWT authentication. All prices are in cents (integer).
The strong context tells the LLM about the tech stack, the business flow, the external dependencies, the data format, and the auth mechanism. This eliminates entire classes of hallucination.
Element 2: Artifact
The artifact is the primary source of truth for test generation. It can be an OpenAPI schema, user story, acceptance criteria, database schema, or even a screenshot described in text.
Best practice: Paste the artifact directly into the prompt rather than describing it. "Here is the OpenAPI schema" followed by the actual YAML is far more effective than "the endpoint accepts a name field and a price field."
**Artifact — OpenAPI excerpt:**
```yaml
paths:
/api/v2/checkout:
post:
requestBody:
required: true
content:
application/json:
schema:
type: object
required: [cart_id, payment_method_id]
properties:
cart_id:
type: string
format: uuid
payment_method_id:
type: string
coupon_code:
type: string
pattern: "^[A-Z0-9]{8}$"
Element 3: Constraints
Constraints prevent the LLM from generating tests in the wrong framework, language, or style. They also enforce team conventions.
**Constraints:**
- Language: TypeScript 5.x
- Framework: Vitest with supertest
- Pattern: Arrange/Act/Assert with explicit comments
- Naming: "should [behavior] when [condition]"
- No external HTTP calls -- mock all Stripe/SendGrid calls
- Use factory functions from ./tests/factories.ts
- Each test file must import { describe, it, expect } from 'vitest'
Element 4: Coverage Goals
Without explicit coverage goals, the LLM defaults to happy-path tests. You must explicitly request negative cases, edge cases, and boundary values.
**Coverage requirements:**
- Every acceptance criterion must have at least one test
- Include at least 2 negative/error cases per endpoint
- Include 1 boundary value test for every numeric field
- Test auth: valid token, expired token, missing token, wrong role
- Test idempotency: calling the endpoint twice with the same cart_id
Element 5: Output Format
Controlling the output format makes the generated code immediately usable rather than requiring reformatting.
**Output format:**
- One describe block per endpoint
- One it block per test case
- Group related tests with nested describe blocks
- Include JSDoc comment above each test explaining the scenario
- Output as a single TypeScript file that can be saved directly to tests/
Putting It All Together: A Complete Prompt
You are a senior QA engineer. Given the following user story, acceptance criteria,
and technical context, generate a comprehensive test suite.
**User Story:**
As a customer, I want to apply a coupon code during checkout so that I get
a discount on my order.
**Acceptance Criteria:**
1. Valid coupon codes reduce the order total by the specified percentage
2. Expired coupons return a clear error message
3. Coupons can only be used once per customer
4. Invalid coupon format is rejected before hitting the database
**Technical Context:**
- Backend: Node.js 20, Express 4, TypeScript
- Database: PostgreSQL 15 via Prisma ORM
- Test framework: Vitest + Supertest
- Auth: JWT tokens via get_test_token("customer") helper
- Coupon format: 8 uppercase alphanumeric characters (regex: ^[A-Z0-9]{8}$)
**Generate:**
1. Unit tests for coupon validation logic (no DB, no HTTP)
2. Integration tests for the POST /api/v2/checkout endpoint with coupon
3. Edge case tests (empty string coupon, SQL injection in coupon field, etc.)
**For each test, include:**
- Test name: "should [expected behavior] when [condition]"
- Arrange/Act/Assert structure with comments
- Explicit assertions (not just "expect something")
**Coverage requirements:**
- All 4 acceptance criteria must have at least one test
- At least 2 negative/error cases per AC
- Boundary test for coupon format (7 chars, 8 chars, 9 chars)
Common Prompt Mistakes and How to Fix Them
Mistake 1: The Vague Request
# BAD
Write tests for the user service.
# GOOD
Generate pytest tests for the UserService.create_user() method defined
in app/services/user_service.py. The method accepts a CreateUserDTO
and returns a User model. Test validation, duplicate email handling,
and the happy path.
Mistake 2: Missing Framework Specification
Without specifying the framework, the LLM might generate unittest-style Python when you use pytest, or Mocha-style JavaScript when you use Jest.
Mistake 3: No Style Example
If your team has specific conventions, paste 2-3 existing tests as a "style guide." The LLM will match naming patterns, assertion styles, and fixture usage.
**Style reference (existing test from our codebase):**
```python
class TestUserCreation:
def test_should_create_user_when_valid_email(self, db_session, user_factory):
# Arrange
dto = user_factory.build(email="new@example.com")
# Act
user = UserService(db_session).create_user(dto)
# Assert
assert user.id is not None
assert user.email == "new@example.com"
assert user.created_at is not None
Follow this exact pattern for all generated tests.
### Mistake 4: Asking for Too Much at Once
If you ask for 100 tests in a single prompt, quality degrades in the later tests. Instead, generate in batches of 10-20, review, then generate the next batch.
---
## The Prompt Iteration Loop
Prompt engineering is not one-shot. Use this loop:
- DRAFT the prompt with all five elements
- GENERATE a small batch (5-10 tests)
- EVALUATE: Are assertions correct? Are APIs real? Is the style right?
- REFINE the prompt based on what was wrong
- REGENERATE with the improved prompt
- REPEAT until output quality is consistently high
- SAVE the refined prompt as a template for future use
This loop typically converges in 2-3 iterations. The investment in prompt refinement pays off every time you reuse the template for a similar feature.
---
## Key Takeaway
The five elements (Context, Artifact, Constraints, Coverage Goals, Output Format) form a **test generation specification**. Treat your prompt with the same rigor you would treat a requirements document. The LLM is only as good as the specification you give it.