NUMA-Aware Task Scheduling Strategy Aiming to Reduce Cache Conflicts

Thiago de Campos Ribeiro Nolasco; Pedro Henrique Penna; Henrique Cota de Freitas

doi:10.5753/sscad.2025.16702

Thiago de Campos Ribeiro Nolasco PUC Minas
Pedro Henrique Penna Microsoft Research
Henrique Cota de Freitas PUC Minas

DOI: https://doi.org/10.5753/sscad.2025.16702

Resumo

This paper presents a NUMA-aware scheduling strategy that reduces cache conflicts by analyzing recent cache index access histories through cumulative distribution functions (CDF). The approach aims to minimize last-level cache (LLC) interference while maintaining load balance across CPUs. We developed a Rust-based simulator to evaluate the strategy under Zipf-distributed workloads, comparing it against the Distributed Intensity Online (DIO) strategy. Results show that the proposed method improves cache hit rates by up to 8.2%, reduces load imbalance up to 18%, and decreases tail latency by 14% relative to DIO. These improvements highlight the potential of fine-grained cacheoblivious scheduling strategies for real-world operating systems.

Referências

M. Banikazemi, D. Poff, and B. Abali. Pam: A novel performance/power aware metascheduler for multi-core systems. In SC ’08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, pages 1–12, 2008. DOI: 10.1109/SC.2008.5222643.

S. Blagodurov, S. Zhuravlev, A. Fedorova, and A. Kamali. A case for numa-aware contention management on multicore systems. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT ’10, page 557–558, New York, NY, USA, 2010. Association for Computing Machinery. ISBN 9781450301787. DOI: 10.1145/1854273.1854350.

G. Daci and M. Tartari. A comparative review of contention-aware scheduling algorithms to avoid contention in multicore systems. In V. V. Das, editor, Proceedings of the Third International Conference on Trends in Information, Telecommunication and Computing, pages 99–106, New York, NY, 2013. Springer New York. ISBN 978-1-4614-3363-7.

A. Drebes, A. Pop, K. Heydemann, N. Drach, and A. Cohen. Numa-aware scheduling and memory allocation for data-flow task-parallel applications. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’16, New York, NY, USA, 2016. Association for Computing Machinery. ISBN 9781450340922. DOI: 10.1145/2851141.2851193.

A. Fog. The Microarchitecture of Intel, AMD and VIA CPUs, 2025. URL [link]. [Online].

N. Guan, M. Stigge, W. Yi, and G. Yu. Cache-aware scheduling and analysis for multicores. In Proceedings of the Seventh ACM International Conference on Embedded Software, EMSOFT ’09, page 245–254, New York, NY, USA, 2009. Association for Computing Machinery. ISBN 9781605586274. DOI: 10.1145/1629335.1629369.

M. Gupta, L. Bhargava, and S. Indu. Mapping techniques in multicore processors: Current and future trends. The Journal of Supercomputing, 77:9308–9363, 2021. DOI: 10.1007/s11227-021-03650-6.

Y. Jiang, X. Shen, J. Chen, and R. Tripathi. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT ’08, page 220–229, New York, NY, USA, 2008. Association for Computing Machinery. ISBN 9781605582825. DOI: 10.1145/1454115.1454146.

Z. Majo and T. R. Gross. Memory management in numa multicore systems: trapped between cache contention and interconnect overhead. In Proceedings of the International Symposium on Memory Management, ISMM ’11, page 11–20, New York, NY, USA, 2011. Association for Computing Machinery. ISBN 9781450302630. DOI: 10.1145/1993478.1993481.

P. H. Penna, A. T. A. Gomes, M. Castro, P. D.M. Plentz, H. C. Freitas, F. Broquedis, and J.-F. Méhaut. A comprehensive performance evaluation of the binlpt workload-aware loop scheduler. Concurrency and Computation: Practice and Experience, 31 (18):e5170, 2019. DOI: 10.1002/cpe.5170. URL [link].

W. Turchetta and K. Gardner. Understanding slowdown in large-scale heterogeneous systems. In E. Hyytiä and V. Kavitha, editors, Performance Evaluation Methodologies and Tools, pages 197–206. Springer Nature, Cham, Switzerland, 2023. ISBN 978-3-031-31234-2.

M. Villalba. Aws lambda functions now scale 12 times faster when handling high-volume requests. AWS News Blog, Nov. 2023. URL [link]. [Online].

Y. Yang and J. Zhu. Write skew and zipf distribution: Evidence and implications. ACM Trans. Storage, 12(4), June 2016. ISSN 1553-3077. DOI: 10.1145/2908557.

G. Zhou, W. Tian, and R. Buyya. Deep reinforcement learning-based methods for resource scheduling in cloud computing: A review and future directions. arXiv preprint arXiv:2105.04086, 2021. URL [link]. [Online].

S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In Proceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XV, page 129–142, New York, NY, USA, 2010. Association for Computing Machinery. ISBN 9781605588391. DOI: 10.1145/1736020.1736036.