Skip to content

RMID Tracking Design

Overview

The RMID (Resource Monitoring ID) tracking system maintains a record of RMID allocations and deallocations in userspace, integrated with the time slot based aggregation system. The system consists of three main components:

  1. Kernel Module:
  2. Manages RMID allocation and deallocation
  3. Sends time-based and sched_switch based tracepoints to trigger perf measurement in eBPF
  4. Reads Intel RDT memory bandwidth and cache footprint metrics by RMID and sends them to eBPF
  5. eBPF Program:
  6. Relays RMID lifecycle and RDT measurements to userspace
  7. Reads perf measurements (cycles, instructions, etc) and sends to userspace
  8. Userspace Collector: Processes events and maintains RMID state

Kernel Module Semantics

The kernel module provides the following guarantees for RMID management:

  1. RMID Allocation:
  2. RMIDs are allocated to thread group leaders (processes)
  3. All threads within a process share the same RMID
  4. RMID 0 is reserved and considered invalid
  5. Each RMID uniquely identifies a single process during any measurement window
  6. Allocation includes process metadata (comm, tgid)

  7. RMID Lifetime:

  8. RMIDs remain valid until explicitly freed
  9. RMIDs are freed when a process terminates
  10. After being freed, an RMID cannot be reused for at least 2ms (limbo period)
  11. This limbo period ensures measurement intervals (1ms) remain unambiguous
  12. Prevents the ABA problem where measurements from different processes could be mixed

  13. RMID State:

  14. The kernel maintains the mapping between processes and RMIDs
  15. RMIDs are process-specific and persist across thread creation
  16. RMIDs are automatically freed when a process exits
  17. On systems with hardware RDT support, RMIDs are programmed into MSRs
  18. On systems without RDT support, RMIDs are emulated for consistent behavior

  19. Resource Management:

  20. RMIDs are a limited resource (typically 512 maximum)
  21. Freed RMIDs are added to a FIFO queue for reuse
  22. The FIFO reuse policy allows cache footprints to decay before reuse
  23. The limbo period is kept minimal (2ms) to maintain high RMID availability

  24. State Dumps

  25. A procfs interface allows dumping current RMID assignments.
  26. Enables collectors to receive state of processes that existed before collector startup

Message Protocol

The eBPF program communicates three types of events to userspace through a perf event array:

  1. Performance Measurement, including cycles, instructions, LLC misses, and time delta -- attributed to RMID
  2. RMID Allocation
  3. RMID Free

Message Flow

  1. eBPF code sends all messages via a single perf event array
  2. Messages are enqueued in arrival time order in the per-cpu ring buffers
  3. Userspace processes messages from all per-cpu ring buffers in global timestamp order

Userspace RMID Package

Components

  1. Metadata Structure:
  2. Maintains metadata for each RMID
  3. Message Structure:
  4. Holds previously received messages that have not been processed into the current state
  5. Tracker Structure:
  6. Maintains current RMID state
  7. Queues updates for ordered processing
  8. Preserves metadata after RMID free

Key Operations

  1. Alloc(rmid, comm, tgid, timestamp):
  2. Enqueues RMID allocation with metadata
  3. Maintains timestamp order

  4. Free(rmid, timestamp):

  5. Enqueues RMID free event
  6. Maintains timestamp order

  7. Advance(timestamp):

  8. Processes queued events up to timestamp
  9. Updates current state snapshot
  10. Maintains FIFO ordering of events

Integration with Time Slot System

  1. Time Slot Structure:
  2. 1ms duration
  3. Maintains window of several slots
  4. Retires oldest slot when window advances

  5. RMID State Management:

  6. RMID tracker advances with each time slot retirement
  7. Metadata preserved after free for correct attribution
  8. Kernel's 2ms limbo period ensures measurement integrity
  9. Each RMID uniquely identifies a single process during any 1ms measurement window

  10. Event Processing:

  11. All events (perf, alloc, free) processed in timestamp order
  12. RMID state advanced before writing each time slot
  13. Measurements attributed using RMID state from appropriate time slot

Metadata Preservation

The system preserves RMID metadata after an RMID is freed to ensure correct attribution of measurements within the same time slot. This is necessary because:

  1. An RMID may be freed during a time slot
  2. Measurements from that RMID may still arrive for the same time slot
  3. The metadata is needed to properly attribute these measurements
  4. The kernel's 2ms limbo period prevents incorrect attribution by ensuring no RMID reuse within measurement windows

Error Handling

  1. Message Parsing:
  2. Invalid message types logged and skipped
  3. Malformed messages logged and skipped
  4. Lost messages tracked and reported

  5. Time Ordering:

  6. Messages processed strictly in timestamp order
  7. Safe timestamp arithmetic for overflow handling
  8. Efficient queue management

  9. Resource Management:

  10. Proper cleanup on shutdown
  11. Memory usage bounded by window size
  12. Efficient state tracking
  13. RMID allocation failures logged when no RMIDs are available that have been free long enough