Memory Collector Benchmark Results

This page presents the latest benchmark results for the Memory Collector, demonstrating its performance characteristics and overhead when running in typical deployment scenarios.

Overview

The benchmark simulates a realistic production environment by gradually ramping up a load generator over 200 seconds, followed by a steady-state period to evaluate collector behavior under sustained load. This approach allows us to measure the collector's resource consumption patterns under various load conditions.

All results shown here are automatically published from the most recent successful benchmark run. The benchmark configuration may evolve over time, so specific parameters might differ slightly from those described in this documentation.

Workload Performance

This graph shows the overall performance of the test workload during the benchmark run, displaying: - Request rate (requests per second) - Median latency - P95 latency (95th percentile) - P99 latency (99th percentile)

Collector Resource Consumption

CPU Utilization

This graph displays two key metrics: 1. Collector userspace CPU consumption in millicores as the experiment progresses 2. Total CPU utilization of everything except the collector

From these measurements, we can calculate the collector's CPU overhead as a percentage of the total workload. The benchmark uses the pidstats utility to gather these measurements.

Memory Utilization

This graph shows the memory consumption of the collector over time. The collector processes data in "row groups," compressing measurements into an efficient representation before periodically flushing to storage. Memory usage typically stabilizes between 300-350MB during extended runs as the collection and flushing processes reach equilibrium.

eBPF Overhead

The benchmark captures eBPF overhead through sampling profiles, which are converted to flame graphs. To analyze the percentage of samples attributed to eBPF processing, look for the bpf_prog symbol in the flame graph:

eBPF Flame Graph

Collected Data Examples

The collector stores measurements in Parquet format with a well-defined schema. Here are links to sample data and schema information:

Simple Schema - Basic schema structure
Sample Records (100) - 100 representative records
First Records (Head) - First 100 records from the dataset
Detailed Schema - Complete schema with column descriptions (note that this includes column descriptions that are repeated for each row group in the Parquet file)

Data Measured by the Collector

LLC Misses Over Time

LLC Misses by Process

This graph illustrates Last Level Cache (LLC) misses over a 500ms period, broken down by process. LLC misses are a key indicator of memory subsystem interference, helping identify potential noisy neighbors in the system.

Performance Slowdown Analysis

The benchmark analyzes the relationship between LLC misses and application efficiency by comparing Cycles Per Instruction (CPI) during periods of high cache contention versus normal operation:

LLC Misses Distribution

This distribution shows CPI values across different levels of LLC miss activity for each process.

LLC Misses Slowdown

The slowdown graph quantifies the performance impact of high LLC miss rates by showing the ratio between average CPI during high cache miss periods compared to median cache miss periods. Higher values indicate greater performance degradation when cache contention occurs.