Serverless Performance Testing

The Unique Challenges of Serverless

Serverless functions (AWS Lambda, Google Cloud Functions, Azure Functions) introduce performance characteristics absent from traditional applications. The execution model -- ephemeral compute instances that scale to zero and back -- creates testing challenges that standard load testing approaches do not address.

Key Challenges and Testing Strategies

Challenge	Why It Matters	Testing Strategy
Cold starts	First invocation after idle can take 1-10s	Periodic cold start tests with idle gaps
Concurrency limits	Provider-enforced limits (e.g., 1,000 concurrent Lambda)	Ramp-to-limit tests to find the ceiling
Memory-CPU coupling	Lambda ties CPU to memory allocation	Test across memory configurations
Execution duration limits	Lambda max 15 min, Cloud Functions max 60 min	Long-running operation stress tests
Statelessness	No local state between invocations	Verify external state store performance
Deployment package size	Affects cold start duration	Measure cold start vs. package size
VPC cold starts	Functions in a VPC have longer cold starts (ENI attachment)	Test with and without VPC

Cold Start Deep Dive

Cold starts are the most impactful serverless performance concern. A cold start occurs when the cloud provider needs to:

Provision a new execution environment
Download your deployment package
Initialize the runtime (Node.js, Python, Java, etc.)
Execute your initialization code (imports, connections, model loading)

Cold Start Benchmarks by Runtime

Runtime	Typical Cold Start	With VPC	With Large Bundle
Node.js	100-300ms	+300-500ms	+50-200ms
Python	200-500ms	+300-500ms	+100-300ms
Go	50-100ms	+300-500ms	+20-50ms
Java	1-5s	+300-500ms	+500ms-2s
.NET	500ms-2s	+300-500ms	+200-500ms

Measuring Cold Starts with k6

// k6-serverless-cold-start.js
import http from 'k6/http';
import { Trend, Counter } from 'k6/metrics';
import { sleep } from 'k6';

const coldStartLatency = new Trend('cold_start_ms', true);
const warmLatency = new Trend('warm_latency_ms', true);
const coldStartCount = new Counter('cold_start_detected');

export const options = {
  scenarios: {
    cold_start_measurement: {
      executor: 'per-vu-iterations',
      vus: 1,
      iterations: 8,
      maxDuration: '60m',
    },
  },
  thresholds: {
    cold_start_ms: ['p(95)<5000'],     // cold starts under 5s
    warm_latency_ms: ['p(95)<500'],     // warm requests under 500ms
  },
};

export default function () {
  // Wait for scale-to-zero (adjust based on provider settings)
  const idleMinutes = [0, 1, 3, 5, 10, 15, 20, 30];
  const iteration = __ITER;
  const idleTime = idleMinutes[iteration] || 30;

  console.log(`Waiting ${idleTime} minutes for idle...`);
  sleep(idleTime * 60);

  // First request after idle = likely cold start
  const coldRes = http.get('https://api-gw.example.com/function', {
    timeout: '60s',
  });
  const firstLatency = coldRes.timings.duration;

  // If first request is >3x the expected warm latency, it is a cold start
  const isColdStart = firstLatency > 1000; // threshold: 1s
  if (isColdStart) {
    coldStartLatency.add(firstLatency);
    coldStartCount.add(1);
    console.log(`Cold start after ${idleTime}min idle: ${firstLatency}ms`);
  }

  // Warm requests for comparison
  for (let i = 0; i < 5; i++) {
    const warmRes = http.get('https://api-gw.example.com/function');
    warmLatency.add(warmRes.timings.duration);
    sleep(0.5);
  }
}

Concurrency Limit Testing

Every serverless provider enforces concurrency limits. Hitting the limit results in throttling (429 errors) that can cascade through your application:

// k6-concurrency-limit.js
import http from 'k6/http';
import { Counter, Rate } from 'k6/metrics';

const throttled = new Counter('throttled_requests');
const throttleRate = new Rate('throttle_rate');

export const options = {
  scenarios: {
    ramp_to_limit: {
      executor: 'ramping-arrival-rate',
      startRate: 10,
      timeUnit: '1s',
      preAllocatedVUs: 100,
      maxVUs: 2000,
      stages: [
        { duration: '1m', target: 50 },
        { duration: '1m', target: 100 },
        { duration: '1m', target: 200 },
        { duration: '1m', target: 500 },
        { duration: '1m', target: 1000 },  // likely exceeds limit
        { duration: '2m', target: 100 },   // cool down
      ],
    },
  },
};

export default function () {
  const res = http.get('https://api-gw.example.com/function');

  if (res.status === 429) {
    throttled.add(1);
    throttleRate.add(true);
  } else {
    throttleRate.add(false);
  }
}

What the Results Tell You

At what request rate does throttling begin? This is your effective concurrency ceiling.
How does the provider behave when throttled? Some providers queue requests; others reject immediately.
What is the recovery time after a burst? How long until throttle rate returns to zero?
Is reserved concurrency sufficient? If you have configured reserved concurrency, does it hold under load?

Memory Configuration Testing

For AWS Lambda, CPU is proportional to memory. More memory means more CPU, which can reduce execution time enough to offset the higher per-ms cost:

# lambda_memory_optimizer.py
"""
Test a Lambda function across memory configurations to find the cost-optimal setting.
More memory = faster execution but higher per-ms cost.
The sweet spot minimizes (execution_time_ms * memory_mb * cost_per_gb_ms).
"""
import boto3
import time
import json

lambda_client = boto3.client('lambda')

def benchmark_memory_config(function_name: str, payload: dict, memory_sizes: list[int]) -> list:
    results = []

    for memory_mb in memory_sizes:
        # Update function memory
        lambda_client.update_function_configuration(
            FunctionName=function_name,
            MemorySize=memory_mb,
        )
        time.sleep(10)  # wait for update to propagate

        # Run 10 invocations and collect timings
        durations = []
        for _ in range(10):
            start = time.perf_counter()
            response = lambda_client.invoke(
                FunctionName=function_name,
                Payload=json.dumps(payload),
            )
            wall_time = (time.perf_counter() - start) * 1000
            billed_ms = json.loads(response['Payload'].read())
            durations.append(wall_time)

        avg_duration = sum(durations) / len(durations)
        # AWS pricing: $0.0000166667 per GB-second
        cost_per_invocation = (memory_mb / 1024) * (avg_duration / 1000) * 0.0000166667

        results.append({
            "memory_mb": memory_mb,
            "avg_duration_ms": round(avg_duration, 1),
            "p99_duration_ms": round(sorted(durations)[8], 1),
            "cost_per_invocation": f"${cost_per_invocation:.8f}",
        })

    return results

# Example usage
results = benchmark_memory_config(
    "my-function",
    {"key": "test-payload"},
    [128, 256, 512, 1024, 2048, 3072],
)
for r in results:
    print(f"{r['memory_mb']}MB: {r['avg_duration_ms']}ms avg, {r['cost_per_invocation']}/invocation")

Serverless Performance Optimization Checklist

Optimization	Impact	Effort
Minimize deployment package size	Reduces cold start by 50-200ms	Low
Use provisioned concurrency for critical functions	Eliminates cold starts	Medium (cost)
Move VPC-dependent functions outside VPC where possible	Reduces cold start by 300-500ms	Medium
Use connection pooling for database connections	Prevents connection exhaustion	Medium
Implement keep-alive pings during low-traffic hours	Prevents scale-to-zero	Low
Choose Go or Node.js over Java for latency-sensitive functions	5-10x cold start reduction	High (rewrite)
Use Lambda layers for shared dependencies	Reduces package size, improves caching	Low
Enable ARM64 (Graviton2) for Lambda	20% cost reduction, similar performance	Low

When Serverless Is NOT the Right Choice

Performance testing may reveal that serverless is wrong for your use case:

Signal	Implication
Cold starts exceed your latency SLO	Consider containers (ECS, Cloud Run with min-instances)
Constant high concurrency (>500 concurrent)	Containers are more cost-effective at sustained load
Execution time regularly hits limits (15 min)	Move to containers or batch processing
Memory requirements > 10GB	Lambda max is 10GB; use ECS/Fargate
GPU required (ML inference)	Use dedicated GPU instances or SageMaker

The goal of serverless performance testing is not to prove serverless works -- it is to find the boundaries where it stops working and plan accordingly.