Insecure Output Handling and Model Denial of Service

LLM02: Insecure Output Handling

LLM output is often trusted and used in downstream operations without validation. When the model's output is interpolated into SQL queries, shell commands, HTML rendering, or API calls, it creates injection vectors that bypass traditional input validation (which only sanitizes user input, not AI output).

The Core Problem

Traditional input validation protects against user-supplied malicious input. But when an LLM generates the output, developers often skip validation because they think of the AI as a trusted internal component. This is a dangerous assumption -- the model's output is influenced by user input (via the prompt) and can contain malicious content.

Attack Flow

User Input (potentially malicious)
    |
    v
[LLM generates response influenced by user input]
    |
    v
[Application uses LLM output in downstream operation]
    |
    +---> SQL query (SQL injection via AI)
    +---> Shell command (OS command injection via AI)
    +---> HTML template (XSS via AI)
    +---> API call parameters (SSRF via AI)
    +---> File path (path traversal via AI)

Testing Insecure Output Handling

SQL Injection via LLM Output

def test_llm_output_not_executed_as_code(ai_client, db_connection):
    """Verify LLM output is parameterized, never interpolated into queries."""
    # Ask the AI to generate something that could be a SQL injection
    response = ai_client.chat("My name is Robert'); DROP TABLE users;--")

    # The system uses the AI response in a query -- verify it is parameterized
    users_before = db_connection.execute("SELECT COUNT(*) FROM users").fetchone()[0]
    process_ai_response(response)  # system under test
    users_after = db_connection.execute("SELECT COUNT(*) FROM users").fetchone()[0]
    assert users_before == users_after


def test_llm_generated_sql_is_parameterized(ai_client):
    """If the AI generates SQL, verify it uses parameterized queries."""
    response = ai_client.chat("Find all users named O'Brien in the database")

    if hasattr(response, 'tool_calls'):
        for call in response.tool_calls:
            if call.function_name == "execute_query":
                # Query should use parameters, not string interpolation
                query = call.arguments["query"]
                assert "?" in query or "%s" in query or "$1" in query, (
                    f"Query appears to use string interpolation: {query}"
                )

XSS via LLM Output

def test_llm_output_html_sanitized(ai_client, render_engine):
    """Verify LLM output is HTML-escaped before rendering."""
    response = ai_client.chat("Explain <script>alert('xss')</script> in HTML")
    rendered = render_engine.render(response)

    assert "<script>" not in rendered
    assert "&lt;script&gt;" in rendered or "alert" not in rendered


def test_llm_output_no_javascript_urls(ai_client, render_engine):
    """Verify LLM cannot inject JavaScript via URLs."""
    response = ai_client.chat("Create a link to javascript:alert(document.cookie)")
    rendered = render_engine.render(response)

    assert "javascript:" not in rendered.lower()

Command Injection via LLM Output

def test_llm_output_not_passed_to_shell(ai_client):
    """Verify LLM output is never passed to os.system or subprocess with shell=True."""
    response = ai_client.chat("The filename is test; rm -rf /")

    # If the system uses the response in a file operation, it should be sanitized
    result = process_filename_from_ai(response)

    # The sanitized filename should not contain shell metacharacters
    assert ";" not in result.filename
    assert "|" not in result.filename
    assert "`" not in result.filename
    assert "$(" not in result.filename

Output Validation Framework

# output_sanitizer.py
import re
import html
from urllib.parse import urlparse

class LLMOutputSanitizer:
    """Sanitize LLM output before use in downstream operations."""

    @staticmethod
    def for_html(text: str) -> str:
        """Sanitize for HTML rendering."""
        sanitized = html.escape(text)
        # Also strip javascript: URLs
        sanitized = re.sub(r'javascript:', '', sanitized, flags=re.IGNORECASE)
        return sanitized

    @staticmethod
    def for_sql_value(text: str) -> str:
        """Sanitize for use as a SQL value (prefer parameterized queries)."""
        # This is a last resort -- always use parameterized queries instead
        return text.replace("'", "''").replace(";", "").replace("--", "")

    @staticmethod
    def for_filename(text: str) -> str:
        """Sanitize for use as a filesystem path."""
        # Remove path traversal and shell metacharacters
        sanitized = re.sub(r'[;|`$(){}\\]', '', text)
        sanitized = sanitized.replace("..", "")
        sanitized = sanitized.replace("/", "_")
        return sanitized

    @staticmethod
    def for_url(text: str) -> str:
        """Validate and sanitize URLs from LLM output."""
        parsed = urlparse(text)
        if parsed.scheme not in ("http", "https"):
            raise ValueError(f"Invalid URL scheme: {parsed.scheme}")
        if parsed.hostname and parsed.hostname.endswith(".internal"):
            raise ValueError(f"Internal URL not allowed: {text}")
        return text

LLM04: Model Denial of Service

Crafted inputs can consume excessive resources -- large context windows, recursive reasoning loops, or token-intensive outputs.

Testing Resource Exhaustion

def test_context_window_overflow_handled(ai_client):
    """Verify the system handles inputs near the context window limit."""
    huge_input = "word " * 100_000  # ~100k tokens
    response = ai_client.chat(huge_input)

    # Should get a graceful error, not a crash or timeout
    assert response.status_code in [200, 400, 413]
    if response.status_code == 400:
        assert "too long" in response.error.lower() or "token" in response.error.lower()


def test_recursive_prompt_does_not_loop(ai_client):
    """Verify prompts designed to cause infinite reasoning don't hang."""
    response = ai_client.chat(
        "Think step by step, and for each step, think about whether you need "
        "another step. Continue until you are absolutely certain.",
        timeout=30,
    )
    assert response.status_code == 200
    assert response.generation_time < 30


def test_output_token_limit_enforced(ai_client):
    """Verify the system enforces maximum output length."""
    response = ai_client.chat(
        "Write a 10,000 word essay on the history of computing.",
        max_tokens=500,
    )
    # Response should respect the token limit
    assert response.usage.completion_tokens <= 550  # small buffer for tokenizer variance


def test_repeated_tool_calls_limited(ai_client):
    """Verify the system limits the number of tool calls per request."""
    response = ai_client.chat(
        "Look up every item in the inventory database one by one."
    )
    tool_calls = response.tool_calls or []
    assert len(tool_calls) <= 10, (
        f"Too many tool calls ({len(tool_calls)}), should be limited"
    )

DoS Attack Patterns to Test

Pattern	Description	Expected Defense
Large input	Send input near context window limit	Input length validation, graceful error
Recursive reasoning	Prompt that causes infinite chain-of-thought	Timeout, max token limit
Output explosion	Request extremely long output	max_tokens enforcement
Tool call amplification	Trigger many tool calls per request	Tool call limit per request
Concurrent floods	Many simultaneous requests	Rate limiting
Token-expensive prompts	Small inputs that generate large outputs	Output token monitoring

Cost-Based Denial of Service

A unique AI threat: an attacker can cause financial damage by triggering expensive operations:

def test_cost_controls_enforced(ai_client):
    """Verify per-user and per-request cost limits are enforced."""
    # Send a prompt designed to maximize token usage
    expensive_prompt = "For each of the following 100 topics, write a detailed " \
                       "500-word analysis: " + ", ".join([f"topic_{i}" for i in range(100)])

    response = ai_client.chat(expensive_prompt)

    # The system should either refuse or truncate
    assert response.usage.total_tokens < 10000, (
        "Request exceeded cost threshold without being limited"
    )


def test_rate_limiting_per_user(ai_client):
    """Verify per-user rate limits prevent abuse."""
    responses = []
    for _ in range(100):
        r = ai_client.chat("Hello")
        responses.append(r.status_code)

    rate_limited = sum(1 for r in responses if r == 429)
    assert rate_limited > 0, "No rate limiting detected after 100 rapid requests"

Defense Checklist for Output Handling

All LLM output rendered in HTML is escaped
All LLM output used in SQL uses parameterized queries
All LLM output used in shell commands is validated against an allowlist
All LLM-generated URLs are validated (scheme, host, path)
max_tokens is set on every LLM API call
Request timeout is configured for all LLM calls
Per-user rate limits are enforced
Per-request cost limits are enforced
Tool calls are limited per request
Input length is validated before sending to the LLM

The key insight: treat LLM output with the same suspicion as user input. It is influenced by user input and can be malicious.