The Anatomy of a High-Quality Test Prompt

Why Prompt Structure Matters

A poor prompt produces poor tests. The difference between "write tests for login" and a structured prompt is the difference between a junior engineer guessing and a senior engineer analyzing requirements. LLMs are statistical pattern matchers -- the quality of their output is directly proportional to the specificity and structure of your input.

In test generation, this means you need to think of your prompt as a test specification document, not a casual request. Every piece of context you include (or omit) shapes the coverage, quality, and usefulness of the generated tests.

The Five Elements of an Effective Test-Generation Prompt

Every high-quality test prompt contains five elements. Missing any one of them degrades output quality significantly.

Element	Purpose	Example
Context	What system/feature is under test	"An e-commerce checkout API built in Node.js/Express"
Artifact	The spec, schema, or story to test against	"Here is the OpenAPI schema: ..."
Constraints	Framework, language, patterns to follow	"Use Jest + Supertest, follow AAA pattern"
Coverage goals	What categories of tests to generate	"Include happy path, auth failures, validation errors, edge cases"
Output format	How to structure the response	"One test per `it()` block, group by endpoint"

Element 1: Context

Context tells the LLM what kind of system it is dealing with. Without it, the LLM makes generic assumptions. With it, the LLM tailors its output to the technology stack and domain.

Weak context:

Write tests for the checkout flow.

Strong context:

We have an e-commerce checkout API built in Node.js 20 with Express 4.
The checkout flow involves:
1. Cart validation (check stock availability)
2. Payment processing via Stripe API (test mode)
3. Order creation in PostgreSQL
4. Email confirmation via SendGrid

The API is behind JWT authentication. All prices are in cents (integer).

The strong context tells the LLM about the tech stack, the business flow, the external dependencies, the data format, and the auth mechanism. This eliminates entire classes of hallucination.

Element 2: Artifact

The artifact is the primary source of truth for test generation. It can be an OpenAPI schema, user story, acceptance criteria, database schema, or even a screenshot described in text.

Best practice: Paste the artifact directly into the prompt rather than describing it. "Here is the OpenAPI schema" followed by the actual YAML is far more effective than "the endpoint accepts a name field and a price field."

**Artifact — OpenAPI excerpt:**
```yaml
paths:
  /api/v2/checkout:
    post:
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required: [cart_id, payment_method_id]
              properties:
                cart_id:
                  type: string
                  format: uuid
                payment_method_id:
                  type: string
                coupon_code:
                  type: string
                  pattern: "^[A-Z0-9]{8}$"

Element 3: Constraints

Constraints prevent the LLM from generating tests in the wrong framework, language, or style. They also enforce team conventions.

**Constraints:**
- Language: TypeScript 5.x
- Framework: Vitest with supertest
- Pattern: Arrange/Act/Assert with explicit comments
- Naming: "should [behavior] when [condition]"
- No external HTTP calls -- mock all Stripe/SendGrid calls
- Use factory functions from ./tests/factories.ts
- Each test file must import { describe, it, expect } from 'vitest'

Element 4: Coverage Goals

Without explicit coverage goals, the LLM defaults to happy-path tests. You must explicitly request negative cases, edge cases, and boundary values.

**Coverage requirements:**
- Every acceptance criterion must have at least one test
- Include at least 2 negative/error cases per endpoint
- Include 1 boundary value test for every numeric field
- Test auth: valid token, expired token, missing token, wrong role
- Test idempotency: calling the endpoint twice with the same cart_id

Element 5: Output Format

Controlling the output format makes the generated code immediately usable rather than requiring reformatting.

**Output format:**
- One describe block per endpoint
- One it block per test case
- Group related tests with nested describe blocks
- Include JSDoc comment above each test explaining the scenario
- Output as a single TypeScript file that can be saved directly to tests/

Putting It All Together: A Complete Prompt

You are a senior QA engineer. Given the following user story, acceptance criteria,
and technical context, generate a comprehensive test suite.

**User Story:**
As a customer, I want to apply a coupon code during checkout so that I get
a discount on my order.

**Acceptance Criteria:**
1. Valid coupon codes reduce the order total by the specified percentage
2. Expired coupons return a clear error message
3. Coupons can only be used once per customer
4. Invalid coupon format is rejected before hitting the database

**Technical Context:**
- Backend: Node.js 20, Express 4, TypeScript
- Database: PostgreSQL 15 via Prisma ORM
- Test framework: Vitest + Supertest
- Auth: JWT tokens via get_test_token("customer") helper
- Coupon format: 8 uppercase alphanumeric characters (regex: ^[A-Z0-9]{8}$)

**Generate:**
1. Unit tests for coupon validation logic (no DB, no HTTP)
2. Integration tests for the POST /api/v2/checkout endpoint with coupon
3. Edge case tests (empty string coupon, SQL injection in coupon field, etc.)

**For each test, include:**
- Test name: "should [expected behavior] when [condition]"
- Arrange/Act/Assert structure with comments
- Explicit assertions (not just "expect something")

**Coverage requirements:**
- All 4 acceptance criteria must have at least one test
- At least 2 negative/error cases per AC
- Boundary test for coupon format (7 chars, 8 chars, 9 chars)

Common Prompt Mistakes and How to Fix Them

Mistake 1: The Vague Request

# BAD
Write tests for the user service.

# GOOD
Generate pytest tests for the UserService.create_user() method defined
in app/services/user_service.py. The method accepts a CreateUserDTO
and returns a User model. Test validation, duplicate email handling,
and the happy path.

Mistake 2: Missing Framework Specification

Without specifying the framework, the LLM might generate unittest-style Python when you use pytest, or Mocha-style JavaScript when you use Jest.

Mistake 3: No Style Example

If your team has specific conventions, paste 2-3 existing tests as a "style guide." The LLM will match naming patterns, assertion styles, and fixture usage.

**Style reference (existing test from our codebase):**
```python
class TestUserCreation:
    def test_should_create_user_when_valid_email(self, db_session, user_factory):
        # Arrange
        dto = user_factory.build(email="new@example.com")

        # Act
        user = UserService(db_session).create_user(dto)

        # Assert
        assert user.id is not None
        assert user.email == "new@example.com"
        assert user.created_at is not None

Follow this exact pattern for all generated tests.


### Mistake 4: Asking for Too Much at Once

If you ask for 100 tests in a single prompt, quality degrades in the later tests. Instead, generate in batches of 10-20, review, then generate the next batch.

---

## The Prompt Iteration Loop

Prompt engineering is not one-shot. Use this loop:

DRAFT the prompt with all five elements
GENERATE a small batch (5-10 tests)
EVALUATE: Are assertions correct? Are APIs real? Is the style right?
REFINE the prompt based on what was wrong
REGENERATE with the improved prompt
REPEAT until output quality is consistently high
SAVE the refined prompt as a template for future use


This loop typically converges in 2-3 iterations. The investment in prompt refinement pays off every time you reuse the template for a similar feature.

---

## Key Takeaway

The five elements (Context, Artifact, Constraints, Coverage Goals, Output Format) form a **test generation specification**. Treat your prompt with the same rigor you would treat a requirements document. The LLM is only as good as the specification you give it.