Accurately Approximating Superscalar Processor Performance from Traces

Kiyeon Lee, Shayne Evans, and Sangyeun Cho.

Proceedings of the IEEE Int'l Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 238~248, Boston, Massachusetts, April 2009.

Abstract:

Trace-driven simulation of superscalar processors is particularly complicated. The dynamic nature of superscalar processors combined with the static nature of traces can lead to large inaccuracies in the results, especially when traces contain only a subset of executed instructions for trace reduction. The main problem in the filtered trace simulation is that the trace does not contain enough information with which one can predict the actual penalty of a cache miss. In this paper, we discuss and evaluate three strategies to quantify the impact of a long latency memory access in a superscalar processor when traces have only L1 cache misses. The strategies are based on models about how a cache miss is treated with respect to other cache misses: (1) isolated cache miss model, (2) independent cache miss model, and (3) pairwise dependent cache miss model. Our experimental results demonstrate that trace-driven simulation using the pairwise dependent cache miss model produces reasonably accurate results (4.8% RMS error) under perfect branch prediction. Our work forms a basis for fast, accurate, and configurable multicore processor simulation using a pre-determined processor core design.