Affinity-aware thread mapping is a method to effectively exploit cache resources in multicore processors. We propose an affinity- and architecture-aware thread mapping technique which maximizes data reuse and minimizes remote communications and cache coherency costs of multi-threaded applications. It consists of three main components: Data Sharing Estimator, Affine Mapping Finder and Maximum Speedup Predictor. Data Sharing Estimator creates application-specific data dependency signatures used by Affine Mapping Finder to determine the appropriate thread mapping of application for a given architecture. To prevent excessive thread migration, Maximum Speedup Predictor estimates the speedup of the obtained mapping and ignores it if it causes no significant performance improvement. The proposed framework is evaluated using Phoenix benchmark suite on two different multicore architectures. The proposed thread mapping approach gives 25% improvement in performance compared to default Linux scheduler. We also elucidate that affinity-based thread mapping approaches, which only consider the number of shared blocks, are not appropriate enough to accurately estimate data dependency between threads and determine the proper thread mapping.
- Thread mapping
- Cache hierarchy
- Data sharing
- Data reuse
- Inter- and intra-thread communication cost