Intrusiveness and Scalability of OMPT-Based Tracing Tools for Task-based OpenMP Applications

  • Rayan Raddatz de Matos UFRGS
  • Lucas Mello Schnorr UFRGS

Resumo


Task-based parallel programming has become popular for handling irregular parallelism in modern HPC applications. This paradigm requires tailored performance analysis tools, with the OpenMP Tools (OMPT) API being the state-of-the-art for tracing task-based execution events. However, since large-scale applications can generate enormous numbers of tasks, understanding the intrusion of OMPT callbacks and existing tracing tools is crucial. This article proposes a methodology to investigate and compare the intrusiveness and scalability of OMPT-based tracers, evaluating Score-P, Extrae, TiKKi, and custom tracers under various configurations that stress task numbers and registered events. We demonstrate that OMPT-based tracer intrusiveness varies significantly across tools, with some achieving low intrusion and good scalability while others exhibit substantial performance degradation as parallelism increases.

Referências

Agullo, E., Aumage, O., Faverge, M., Furmento, N., Pruvost, F., Sergent, M., and Thibault, S. P. (2017). Achieving high performance on supercomputers with a sequential task-based programming model. IEEE Trans. on Paral. and Distrib. Syst.

Augonnet, C., Thibault, S., and Namyst, R. (2010). StarPU: a runtime system for scheduling tasks over accelerator-based multicore machines. PhD thesis, INRIA.

Ayguadé, E., Copty, N., Duran, A., Hoeflinger, J., Lin, Y., Massaioli, F., Teruel, X., Unnikrishnan, P., and Zhang, G. (2008). The design of openmp tasks. IEEE Transactions on Parallel and Distributed systems, 20(3):404–418.

da Silva, G. J. and de Oliveira Stein, B. (2002). Uma biblioteca genérica de geração de rastros de execução para visualização de programas. In Anais do I Simpósio de Informática da Região Centro.

Daoudi, I., Virouleau, P., Gautier, T., Thibault, S., and Aumage, O. (2020). somp: Simulating openmp task-based applications with numa effects. In The 16th Intl. Workshop on OpenMP, page 197–211, Berlin, Heidelberg. Springer-Verlag.

Daumen, A., Carribault, P., Trahay, F., and Thomas, G. (2019). Scalomp: Analyzing the scalability of openmp applications. In OpenMP: Conquering the Full Hardware Spectrum, pages 36–49, Cham. Springer International Publishing.

Dongarra, J. and Keyes, D. (2024). The co-evolution of computational physics and high-performance computing. Nature Reviews Physics, 6(10):621–627.

Dongarra, J., Tomov, S., Luszczek, P., Kurzak, J., Gates, M., Yamazaki, I., Anzt, H., Haidar, A., and Abdelfattah, A. (2017). With extreme computing, the rules have changed. Computing in Science & Engineering, 19(3):52–62.

Duran, A., Teruel, X., Ferrer, R., Martorell, X., and Ayguade, E. (2009). Barcelona openmp tasks suite: A set of benchmarks targeting the exploitation of task parallelism in openmp. In 2009 International Conference on Parallel Processing, pages 124–131.

Eichenberger, A. E., Mellor-Crummey, J., Schulz, M., Wong, M., Copty, N., Dietrich, R., Liu, X., Loh, E., and Lorenz, D. (2013). Ompt: An openmp tools application programming interface for performance analysis. In OpenMP in the Era of Low Power Devices and Accelerators, Berlin, Heidelberg. Springer Berlin Heidelberg.

Feld, C., Convent, S., Hermanns, M.-A., Protze, J., Geimer, M., and Mohr, B. (2019). Score-p and ompt: navigating the perils of callback-driven parallel runtime introspection. In International Workshop on OpenMP, pages 21–35. Springer.

Gautier, T., Pérez, C., and Richard, J. (2018). On the Impact of OpenMP Task Granularity. In The 14th Intl. Workshop on OpenMP for Evolving Arch., pages 205–221. Springer.

Gmbh, F., Bericht, I., Malony, A., Shende, S., and Mohr, B. (2001). Design and prototype of a performance tool interface for openmp. Journal of Supercomputing, 23.

Hoque, R., Herault, T., Bosilca, G., and Dongarra, J. (2017). Dynamic task discovery in parsec: A data-flow task-based runtime. In Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, pages 1–8.

Hunold, S., Ajanohoun, J. I., Vardas, I., and Träff, J. L. (2022). An overhead analysis of mpi profiling and tracing tools. In Proceedings of the 2nd Workshop on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn Strategy, pages 5–13.

Leandro Nesi, L., Garcia Pinto, V., Cogo Miletto, M., and Schnorr, L. M. (2020). StarVZ: Performance Analysis of Task-Based Parallel Applications. preprint at [link].

Llort, G., Filgueras, A., Jiménez-González, D., Servat, H., Teruel, X., Mercadal, E., Álvarez, C., Giménez, J., Martorell, X., Ayguadé, E., et al. (2016). The secrets of the accelerators unveiled: Tracing heterogeneous executions through ompt. In International Workshop on OpenMP, pages 217–236. Springer.

Matos, R. and Schnorr, L. (2025). Quantificando o impacto do rastreamento em aplicações paralelas openmp baseadas em tarefas. In Anais da XXV Escola Regional de Alto Desempenho da Região Sul, pages 109–112, Porto Alegre, RS, Brasil. SBC.

Mey, D. A., Biersdorf, S., Bischof, C., Diethelm, K., Eschweiler, D., Gerndt, M., Knüpfer, A., Lorenz, D., Malony, A., Nagel, W. E., et al. (2011). Score-p: A unified performance measurement system for petascale applications. In Proceedings of an Intl. Conf. on Competence in High Performance Comp., pages 85–97. Springer.

Miletto, M. and Schnorr, L. (2019). Openmp and starpu abreast: the impact of runtime in task-based block qr factorization performance. In Anais do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD), pages 25–36.

Muddukrishna, A., Jonsson, P. A., and Brorsson, M. (2015). Characterizing task-based openmp programs. PLOS ONE, 10(4):1–29.

Nesi, L. L., Miletto, M., Pinto, V., and Schnorr, L. (2021). Minicursos da XXI Escola Regional de Alto Desempenho da Região Sul, chapter Desenvolvimento de Aplicações Baseadas em Tarefas com OpenMP Tasks, page 131–152. SBC.

OpenMP (2018). OpenMP application program interface version 5.0.

Pei, Y., Bosilca, G., and Dongarra, J. (2022). Sequential task flow runtime model improvements and limitations. In IEEE/ACM Intl. Workshop on Runtime and Operating Systems for Supercomputers (ROSS), pages 1–8. IEEE.

Pinto, V. and Filho, C. S. (2024). Improving performance visualization of openmp task-based applications. In Anais do XXV Simpósio em Sistemas Computacionais de Alto Desempenho, pages 156–167, Porto Alegre, RS, Brasil. SBC.

Schmidl, D., Terboven, C., an Mey, D., and Müller, M. S. (2014). Suitability of performance tools for openmp task-parallel programs. In Intl. Workshop on Par. Tools for HPC, pages 25–37. Springer.

Virouleau, P., Brunet, P., Broquedis, F., Furmento, N., Thibault, S., Aumage, O., and Gautier, T. (2014). Evaluation of openmp dependent tasks with the kastors benchmark suite. In International Workshop on OpenMP, pages 16–29. Springer.
Publicado
28/10/2025
MATOS, Rayan Raddatz de; SCHNORR, Lucas Mello. Intrusiveness and Scalability of OMPT-Based Tracing Tools for Task-based OpenMP Applications. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 26. , 2025, Bonito/MS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 446-457. DOI: https://doi.org/10.5753/sscad.2025.16751.