ProfDP: A Novel Differential, Data-centric Profiler tyat Guides Data Placement in Heterogeneous Memory Systems (under support of NSF 1618620)
New memory technologies, such as non-volatile memory and stacked memory, have reformed the memory hierarchies in modern and emerging computer architectures. It becomes common to see memories of different types integrated into the same system, as known as heterogeneous memory. Typically, a heterogeneous memory system consists of a small fast component and a large slow component. This encourages new style of data processing and exposes developers with a new problem: given two memory types, how shall we redesign applications to benefit from this memory arrangement and decide on the efficient data placement? Existing methods perform detailed memory access pattern analysis to guide data placement. However, these methods are heavyweight and ignore the interactions between software and hardware.
To address these issues, we develop ProfDP, a lightweight profiler that employs differential data-centric analysis to provide intuitive guidance for data placement in heterogeneous memory. Evaluated with a number of parallel benchmarks running on a state-of-the-art emulator and a real machine with heterogeneous memory, we show that ProfDP is able to guide nearly-optimal data placement to maximize performance with minimum programming efforts.
Unique Features in ProfDP
Lightweight: There is no heavyweight instrumentation required to memory operations.
Accurate: ProfDP directly uses measurement instead of pattern analysis to guide data placement.
Informative: ProfDP performs data-centric analysis, which guides the placement of data objects on a high level.
ProfDP uses two techniques: (1) Performance monitoring units (PMU) available in modern CPU processors and (2) differential analysis across multiple runs. ProfDP uses address sampling in PMUs to compute a new metric --- the average memory latency (in CPU cycles) per sampled memory load operation. On the system with heterogeneous memory, ProfDP first runs the program with all data in the fast memory and then runs it with all data in the slow memory. By comparing the two profiles (called differential analysis), ProfDP can identify which code is more sensitive to fast memory. Furthermore, ProfDP associate this metric with data objects (static or heap data) via the data-centric profiling. ProfDP can also tell which data objects are more sensitive to fast/slow memory. ProfDP then only recommend to place the sensitive data objects to the fast memory.
[Source Code] https://github.com/HPCToolkit/hpctoolkit/tree/hpctoolkit-datacentric
ProfDP is built atop HPCToolkit. It is currently in a branch of HPCToolkit, which will be integrated into the trunk soon.
"ProfDP: A Lightweight Profiler to Guide Data Placement in Heterogeneous Memory Systems", Shasha Wen, Lucy Cherkasova, Felix Xiaozhu Lin, Xu Liu, The 32nd ACM International Conference on Supercomputing, Jun 12-15th, 2018, Beijing China. Acceptance ratio: 18.7% (36/193).