Serverless Performance Testing
The Unique Challenges of Serverless
Serverless functions (AWS Lambda, Google Cloud Functions, Azure Functions) introduce performance characteristics absent from traditional applications. The execution model -- ephemeral compute instances that scale to zero and back -- creates testing challenges that standard load testing approaches do not address.
Key Challenges and Testing Strategies
| Challenge | Why It Matters | Testing Strategy |
|---|---|---|
| Cold starts | First invocation after idle can take 1-10s | Periodic cold start tests with idle gaps |
| Concurrency limits | Provider-enforced limits (e.g., 1,000 concurrent Lambda) | Ramp-to-limit tests to find the ceiling |
| Memory-CPU coupling | Lambda ties CPU to memory allocation | Test across memory configurations |
| Execution duration limits | Lambda max 15 min, Cloud Functions max 60 min | Long-running operation stress tests |
| Statelessness | No local state between invocations | Verify external state store performance |
| Deployment package size | Affects cold start duration | Measure cold start vs. package size |
| VPC cold starts | Functions in a VPC have longer cold starts (ENI attachment) | Test with and without VPC |
Cold Start Deep Dive
Cold starts are the most impactful serverless performance concern. A cold start occurs when the cloud provider needs to:
- Provision a new execution environment
- Download your deployment package
- Initialize the runtime (Node.js, Python, Java, etc.)
- Execute your initialization code (imports, connections, model loading)
Cold Start Benchmarks by Runtime
| Runtime | Typical Cold Start | With VPC | With Large Bundle |
|---|---|---|---|
| Node.js | 100-300ms | +300-500ms | +50-200ms |
| Python | 200-500ms | +300-500ms | +100-300ms |
| Go | 50-100ms | +300-500ms | +20-50ms |
| Java | 1-5s | +300-500ms | +500ms-2s |
| .NET | 500ms-2s | +300-500ms | +200-500ms |
Measuring Cold Starts with k6
// k6-serverless-cold-start.js
import http from 'k6/http';
import { Trend, Counter } from 'k6/metrics';
import { sleep } from 'k6';
const coldStartLatency = new Trend('cold_start_ms', true);
const warmLatency = new Trend('warm_latency_ms', true);
const coldStartCount = new Counter('cold_start_detected');
export const options = {
scenarios: {
cold_start_measurement: {
executor: 'per-vu-iterations',
vus: 1,
iterations: 8,
maxDuration: '60m',
},
},
thresholds: {
cold_start_ms: ['p(95)<5000'], // cold starts under 5s
warm_latency_ms: ['p(95)<500'], // warm requests under 500ms
},
};
export default function () {
// Wait for scale-to-zero (adjust based on provider settings)
const idleMinutes = [0, 1, 3, 5, 10, 15, 20, 30];
const iteration = __ITER;
const idleTime = idleMinutes[iteration] || 30;
console.log(`Waiting ${idleTime} minutes for idle...`);
sleep(idleTime * 60);
// First request after idle = likely cold start
const coldRes = http.get('https://api-gw.example.com/function', {
timeout: '60s',
});
const firstLatency = coldRes.timings.duration;
// If first request is >3x the expected warm latency, it is a cold start
const isColdStart = firstLatency > 1000; // threshold: 1s
if (isColdStart) {
coldStartLatency.add(firstLatency);
coldStartCount.add(1);
console.log(`Cold start after ${idleTime}min idle: ${firstLatency}ms`);
}
// Warm requests for comparison
for (let i = 0; i < 5; i++) {
const warmRes = http.get('https://api-gw.example.com/function');
warmLatency.add(warmRes.timings.duration);
sleep(0.5);
}
}
Concurrency Limit Testing
Every serverless provider enforces concurrency limits. Hitting the limit results in throttling (429 errors) that can cascade through your application:
// k6-concurrency-limit.js
import http from 'k6/http';
import { Counter, Rate } from 'k6/metrics';
const throttled = new Counter('throttled_requests');
const throttleRate = new Rate('throttle_rate');
export const options = {
scenarios: {
ramp_to_limit: {
executor: 'ramping-arrival-rate',
startRate: 10,
timeUnit: '1s',
preAllocatedVUs: 100,
maxVUs: 2000,
stages: [
{ duration: '1m', target: 50 },
{ duration: '1m', target: 100 },
{ duration: '1m', target: 200 },
{ duration: '1m', target: 500 },
{ duration: '1m', target: 1000 }, // likely exceeds limit
{ duration: '2m', target: 100 }, // cool down
],
},
},
};
export default function () {
const res = http.get('https://api-gw.example.com/function');
if (res.status === 429) {
throttled.add(1);
throttleRate.add(true);
} else {
throttleRate.add(false);
}
}
What the Results Tell You
- At what request rate does throttling begin? This is your effective concurrency ceiling.
- How does the provider behave when throttled? Some providers queue requests; others reject immediately.
- What is the recovery time after a burst? How long until throttle rate returns to zero?
- Is reserved concurrency sufficient? If you have configured reserved concurrency, does it hold under load?
Memory Configuration Testing
For AWS Lambda, CPU is proportional to memory. More memory means more CPU, which can reduce execution time enough to offset the higher per-ms cost:
# lambda_memory_optimizer.py
"""
Test a Lambda function across memory configurations to find the cost-optimal setting.
More memory = faster execution but higher per-ms cost.
The sweet spot minimizes (execution_time_ms * memory_mb * cost_per_gb_ms).
"""
import boto3
import time
import json
lambda_client = boto3.client('lambda')
def benchmark_memory_config(function_name: str, payload: dict, memory_sizes: list[int]) -> list:
results = []
for memory_mb in memory_sizes:
# Update function memory
lambda_client.update_function_configuration(
FunctionName=function_name,
MemorySize=memory_mb,
)
time.sleep(10) # wait for update to propagate
# Run 10 invocations and collect timings
durations = []
for _ in range(10):
start = time.perf_counter()
response = lambda_client.invoke(
FunctionName=function_name,
Payload=json.dumps(payload),
)
wall_time = (time.perf_counter() - start) * 1000
billed_ms = json.loads(response['Payload'].read())
durations.append(wall_time)
avg_duration = sum(durations) / len(durations)
# AWS pricing: $0.0000166667 per GB-second
cost_per_invocation = (memory_mb / 1024) * (avg_duration / 1000) * 0.0000166667
results.append({
"memory_mb": memory_mb,
"avg_duration_ms": round(avg_duration, 1),
"p99_duration_ms": round(sorted(durations)[8], 1),
"cost_per_invocation": f"${cost_per_invocation:.8f}",
})
return results
# Example usage
results = benchmark_memory_config(
"my-function",
{"key": "test-payload"},
[128, 256, 512, 1024, 2048, 3072],
)
for r in results:
print(f"{r['memory_mb']}MB: {r['avg_duration_ms']}ms avg, {r['cost_per_invocation']}/invocation")
Serverless Performance Optimization Checklist
| Optimization | Impact | Effort |
|---|---|---|
| Minimize deployment package size | Reduces cold start by 50-200ms | Low |
| Use provisioned concurrency for critical functions | Eliminates cold starts | Medium (cost) |
| Move VPC-dependent functions outside VPC where possible | Reduces cold start by 300-500ms | Medium |
| Use connection pooling for database connections | Prevents connection exhaustion | Medium |
| Implement keep-alive pings during low-traffic hours | Prevents scale-to-zero | Low |
| Choose Go or Node.js over Java for latency-sensitive functions | 5-10x cold start reduction | High (rewrite) |
| Use Lambda layers for shared dependencies | Reduces package size, improves caching | Low |
| Enable ARM64 (Graviton2) for Lambda | 20% cost reduction, similar performance | Low |
When Serverless Is NOT the Right Choice
Performance testing may reveal that serverless is wrong for your use case:
| Signal | Implication |
|---|---|
| Cold starts exceed your latency SLO | Consider containers (ECS, Cloud Run with min-instances) |
| Constant high concurrency (>500 concurrent) | Containers are more cost-effective at sustained load |
| Execution time regularly hits limits (15 min) | Move to containers or batch processing |
| Memory requirements > 10GB | Lambda max is 10GB; use ECS/Fargate |
| GPU required (ML inference) | Use dedicated GPU instances or SageMaker |
The goal of serverless performance testing is not to prove serverless works -- it is to find the boundaries where it stops working and plan accordingly.