Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring

Lei Jin and Sangyeun Cho.

Proceedings of the Workshop on Chip Multiprocessor Memory Systems and Interconnects (CMP-MSI) during IEEE Int'l Symposium on High-Performance Computer Architecture (HPCA), Phoenix, AZ, February 2007.

Abstract:

Private caching and shared caching are the two conventional approaches to managing distributed L2 caches in current multicore processors. Unfortunately, neither shared caching, nor private caching guarantees optimal performance under different workloads, especially when many processor cores and cache slices are provided on a switched network. This paper takes a very different approach from the existing hardware-based schemes, allowing data to be flexibly mapped to cache slices at the memory page granularity. Using a profile-guided execution-driven simulation method, we perform a limit study on the performance-optimal two-dimensional page mappings, given a multicore memory hierarchy and on-chip network configuration. Our study shows that a judicious data mapping to improve both on-chip miss rate and cache access latency results in significant perofmrance improvement (up to 108%), exceeding the two existing methods. Our result strongly suggests that a well-convceived dynamic data mapping mechanism will achieve similarly high performance on an OS-managed distributed L2 cache structure.