Measuring RASP Performance Overhead: Methodology and Results

Performance benchmark graph with two latency distribution curves — baseline application without agent (blue) and application with Raven.io agent (red), showing minimal separation at p50 and p99

Performance overhead claims from security vendors follow a consistent pattern: the headline number is always small, the methodology is always unstated, and the comparison is always against an ideal benchmark that does not resemble production workloads. "Under 3ms overhead" could mean anything from "3ms at p50 on a static HTML server" to "3ms at p99 under 5,000 concurrent requests against a Java application with 200 database queries per request."

These are not the same measurement. This post describes exactly how we benchmark Raven.io's overhead, what workloads we test against, and what the numbers actually mean for production deployments.

Why Overhead Measurement Is Harder Than It Looks

Instrumentation overhead is not a single number. It varies by programming language, by the operations the application performs per request, by JVM/runtime warm-up state, by CPU availability, and by the number of concurrent requests. A benchmark that measures one of these dimensions and reports it as "the overhead" is misleading.

The categories of overhead that matter are: per-request overhead (the latency added to each HTTP request by agent processing), JVM startup overhead (additional startup time due to bytecode instrumentation), and memory overhead (additional heap usage by the agent). We report all three, measured separately.

Per-request overhead also has a distribution — not just a mean. Security-sensitive workloads care about p99 and p999 latency because tail latency causes timeout cascades in microservice architectures. A mean overhead of 1ms is irrelevant if the p999 overhead is 50ms and it causes downstream service timeouts during traffic spikes.

Test Environments and Workloads

We run benchmarks against four application archetypes that represent the majority of enterprise Java and Node.js workloads:

Archetype 1: API Gateway (high request volume, minimal database I/O). 2,000 requests/second, each request reads from a Redis cache and returns a JSON response. Average baseline response time 8ms. This workload tests the per-request check overhead with minimal database instrumentation activity.

Archetype 2: CRUD API (moderate volume, heavy database I/O). 500 requests/second, each request executes 5-15 SQL queries against PostgreSQL. Average baseline response time 45ms. This workload tests database driver instrumentation overhead under realistic database-heavy conditions.

Archetype 3: Batch Processor (low request volume, CPU-intensive). 50 requests/second, each request processes a document (parsing, transformation, storage). Average baseline response time 200ms. This tests overhead as a percentage of total request time for compute-intensive operations.

Archetype 4: Report Generator (very low volume, complex queries). 10 requests/second, each request executes 50-200 SQL queries to generate aggregate reports. Average baseline response time 2000ms. This tests the absolute ceiling of database instrumentation overhead.

Measurement Methodology

For each archetype, we run the same test twice with identical hardware, software, and traffic generation: once without the Raven.io agent, once with it in blocking mode (not just observe mode — blocking mode adds the overhead of the blocking decision). We use k6 for traffic generation, with 30-minute steady-state warmup periods before recording measurements. We record 15 minutes of measurements after warmup.

Hardware: AWS c5.2xlarge (8 vCPU, 16GB RAM) for the application server, c5.xlarge for the traffic generator, RDS db.t3.medium for the PostgreSQL database (Archetypes 2 and 4), Redis cache.t3.micro for Archetype 1.

We record: p50, p95, p99, p999 request latency; requests per second (to measure throughput impact); CPU utilization; heap memory usage. All measurements are averages of three independent runs.

Results: Java Agent

For Archetype 1 (API Gateway), the p50 overhead is 0.4ms (baseline: 8ms, instrumented: 8.4ms). The p99 overhead is 0.9ms. The p999 overhead is 2.1ms. Throughput impact: negligible (within measurement noise at -0.3%).

For Archetype 2 (CRUD API), the p50 overhead is 1.8ms (baseline: 45ms, instrumented: 46.8ms, 4% increase). The p99 overhead is 2.7ms. The p999 overhead is 4.2ms. Each of the 5-15 SQL queries per request adds approximately 0.15-0.25ms of check overhead.

For Archetype 3 (Batch Processor), the p50 overhead is 3.1ms (baseline: 200ms, instrumented: 203.1ms, 1.5% increase). Overhead as a percentage decreases as baseline latency increases — the instrumentation cost is flat, not proportional.

For Archetype 4 (Report Generator), the p50 overhead is 18ms (baseline: 2000ms, instrumented: 2018ms, 0.9% increase). The 50-200 queries per request each add a small fixed overhead that accumulates. This is the workload most affected in absolute terms, but the percentage increase is still under 1%.

Results: Node.js Agent

Node.js instrumentation operates differently from Java because Node.js's single-threaded event loop means instrumentation overhead affects all concurrent requests sharing that thread. For Archetype 2 at 500 requests/second on a Node.js Express application:

p50 overhead: 1.2ms. p99 overhead: 2.8ms. p999 overhead: 5.4ms. Throughput impact: -2.1% (slightly more significant than Java due to event loop contention under load). Memory overhead: approximately 28MB additional V8 heap usage.

JVM Startup Overhead

The Java agent adds startup overhead because of bytecode instrumentation at class load time. For a typical Spring Boot application with 800-1200 classes, startup time increases by 3-6 seconds (measured as time from JVM start to first successful HTTP response). For applications with faster startup requirements — serverless deployments, frequently recycled containers — this is a relevant consideration. The startup overhead is a one-time cost per JVM instance, not per request.

For container deployments, Kubernetes liveness probes should account for the additional startup time by adjusting initialDelaySeconds. We document the typical additional delay in the deployment guide for each supported framework.

What These Numbers Mean for Production Decisions

For applications where baseline p99 latency is above 50ms (most CRUD APIs, most report generators), Raven.io's overhead is below 6% of baseline latency at p99. For applications where baseline latency is below 20ms (cache-heavy APIs, static content), overhead as a percentage is higher but the absolute addition is under 1ms at p50.

The workloads where RASP overhead is most noticeable are high-frequency, low-latency APIs where every millisecond is tracked — trading systems, real-time bidding, latency-sensitive messaging. For these workloads, we recommend a pilot with representative traffic to measure overhead in your specific environment before production deployment. For the majority of enterprise web applications, the overhead is within the range that load testing variation obscures it.

The full benchmark data — raw numbers, test scripts, and environment configuration — is available to trial customers who request it. We do not publish it publicly because benchmark results without the full methodology context are easy to misinterpret. Run it yourself against your own application for the number that actually matters.

Measure Overhead Against Your Own Application

The Raven.io trial includes our benchmarking scripts pre-configured for your stack. Run the overhead measurement yourself in staging before committing to production deployment.

Start a Trial