Memory Collector Telemetry Strategy
Overview
Our telemetry strategy prioritizes high-resolution, low-level data collection to build a foundation for understanding memory subsystem interference. By focusing on simplicity and data quality in the initial collector, we can enable rapid iteration and validation of detection algorithms.
The key aspects of our approach are:
- Collect per-process, per-core metrics at 1 millisecond granularity to capture interference at a meaningful timescale
- Collect per-process cache occupancy metrics at 1 millisecond granularity
- Generate synchronized datasets for joint analysis
- Implement in stages to manage complexity
This "firehose" telemetry will enable us to build a dataset for offline analysis, allowing us to identify patterns and develop algorithms for real-time interference detection.
Telemetry Collection
The collector will monitor and record the following metrics for each process at 1 millisecond granularity:
- Process ID
- Core ID
- Core frequency during the measured interval
- Cycles
- Instructions
- Last level cache misses
Modern cloud environments routinely run dozens or even hundreds of applications on a single server, each with its own dynamic memory usage patterns. In an extreme case, with 100 applications changing phase every second on average, there would be a phase change every 10 milliseconds in aggregate.
The 1 millisecond telemetry granularity enables us to detect this behavior and characterize interference at a meaningful timescale.
In addition to these per-process metrics, we will also collect cache occupancy measurements using Intel RDT's Cache Monitoring Technology (CMT) or an equivalent mechanism. This data will be collected per process at the same 1 millisecond granularity.
Monitoring cache usage per process is necessary because caches maintain state across context switches and are shared by all threads of a process.
Data Format
For the initial version, telemetry will be written to CSV files to simplify data collection and analysis. Each row will represent a single measurement interval for a specific process.
We will generate two datasets:
- Per-process, per-core measurements (process ID, core ID, frequency, cycles, instructions, LLC misses)
- Per-process cache occupancy measurements
While these datasets will be separate, they will be synchronized and aligned by timestamp to enable joint analysis.
Implementation Stages
To manage complexity, we will implement telemetry collection in two stages:
- Collect per-process, per-core measurements (process ID, core ID, frequency, cycles, instructions, LLC misses)
- Add per-process cache occupancy measurements using Intel RDT or an equivalent mechanism
This staged approach allows us to validate the core telemetry pipeline before adding the complexity of cache monitoring.
For the cache monitoring stage, we will need to assign each process a unique identifier (e.g., CLOS for Intel RDT) to track its cache usage. This will require additional system-level coordination and metadata management.
Analysis and Algorithm Development
By collecting high-resolution telemetry from multiple clusters, both real-world deployments and benchmark environments, we aim to build a representative dataset capturing a wide range of interference scenarios.
Analyzing this data offline using big data techniques will help us identify common interference patterns, resource usage signatures, and relevant metrics for detecting contention.
These insights will inform the development of algorithms for real-time interference detection in future collector versions. Starting with a thorough understanding of low-level behavior is key to building effective higher-level detection and mitigation strategies.