Fortran DO CONCURRENT Evaluation in Multi-core for NAS-PB Conjugate Gradient and a Porous Media Application
Resumo
High-performance computing exploits the hardware resources available to accelerate the applications’ executions, whereas achieving such an exploitation of hardware resources demands software programming. Hence, several parallel programming interfaces (PPIs) are used for sequential programs to call thread resources and parallelism routines. There are explicit PPIs (e.g., Pthreads and TBB) or implicit (e.g., OpenMP and OpenACC). Another approach is parallel programming languages like the Fortran 2008 specification, which natively provides the DO CONCURRENT resource. However, DO CONCURRENT’s evaluation is still limited. In this paper, we explore and compare the native parallelism of FORTRAN with the directives provided by the OpenMP and OpenACC PPIs in the NAS-PB CG benchmark and a porous media application. The results show that the DO CONCURRENT provides parallel CPU code with numerical compatibility for scientific applications. Moreover, DO CONCURRENT achieves in multi-cores a performance comparable to and even slightly better than other PPIs, such as OpenMP. Our work also contributes with a method to use DO CONCURRENT.Referências
Chandrasekaran, S. and Juckeland, G. (2017). OpenACC for Programmers: Concepts and Strategies. Addison-Wesley Professional, 1st edition.
Chapman, B., Mehrotra, P., and Zima, H. (1998). Enhancing OpenMP with features for locality control. In Proc. ECWMF Workshop” Towards Teracomputing-The Use of Parallel Processors in Meteorology, Austrian. Citeseer, PSU.
da Silva, H. U., Lucca, N., Schepke, C., de Oliveira, D. P., and da Cruz Cristaldo, C. F. (2022). Parallel OpenMP and OpenACC Porous Media Simulation. The Journal of Supercomputing.
Hammond, J. R., Deakin, T., Cownie, J., and McIntosh-Smith, S. (2022). Benchmarking Fortran DO CONCURRENT on CPUs and GPUs Using BabelStream. In 2022 IEEE/ACM International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), pages 82–99.
ISO Central Secretary (2018). Information technology — Programming languages — Fortran — Part 1: Base language. Standard ISO/IEC 1539-1:2018, International Organization for Standardization, Geneva, CH.
Kennedy, K., Koelbel, C., and Zima, H. (2007). The Rise and Fall of High Performance Fortran: An Historical Object Lesson. In Proceedings of the Third ACM SIGPLAN Conference on History of Programming Languages, HOPL III, page 7–1–7–22, New York, NY, USA. Association for Computing Machinery.
Kirk, D. B. and Wen-Mei, W. H. (2016). Programming massively parallel processors: a hands-on approach. Morgan kaufmann.
Koelbel, C. H., Loveman, D., Schreiber, R. S., Jr., G. L. S., and Zosel, M. (1993). High Performance Fortran Handbook. The MIT Press.
Löff, J., Griebler, D., Mencagli, G., Araujo, G., Torquati, M., Danelutto, M., and Fernandes, L. G. (2021). The nas parallel benchmarks for evaluating c++ parallel programming frameworks on shared-memory architectures. Future Generation Computer Systems, 125:743–757.
McCalpin, J. D. (1991-2007). Stream: Sustainable memory bandwidth in high performance computers. Technical report, University of Virginia, Charlottesville, Virginia. A continually updated technical report. [link].
OpenACC (2023). What is OpenACC? OpenMP (2023). The OpenMP API Specification for Parallel Programming.
Ozen, G. (2018). Compiler and runtime based parallelization & optimization for GPUs. PhD thesis, Department of Computer Architecture - DAC Universitat Politècnica de Catalunya - UPC.
Ozen, G. and Lopez, G. (2020). Accelerating Fortran DO CONCURRENT with GPUs and the NVIDIA HPC SDK.
Reid, J. (2018). The new features of fortran 2018. SIGPLAN Fortran Forum, 37(1):5–43.
Saad, Y. (2003). Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, second edition.
Shewchuk, J. R. (1994). An introduction to the conjugate gradient method without the agonizing pain. Technical report, Carnegie Mellon University, USA.
Silva, H. U., Schepke, C., da Cruz Cristaldo, C. F., de Oliveira, D. P., and Lucca, N. (2022). An Efficient Parallel Model for Coupled Open-Porous Medium Problem Applied to Grain Drying Processing. In Gitler, I., Barrios Hernández, C. J., and Meneses, E., editors, High Performance Computing, pages 250–264, Cham. Springer International Publishing.
Stulajter, M. M., Caplan, R. M., and Linker, J. A. (2022). Can Fortran’s ‘do concurrent’ Replace Directives for Accelerated Computing? In Bhalachandra, S., Daley, C., and Melesse Vergara, V., editors, Accelerator Programming Using Directives, pages 3–21, Cham. Springer International Publishing.
Versteeg, H. K. and Malalasekera, W. (2007). An Introduction to Computational Fluid Dynamics: The Finite Volume Method. Pearson Education.
Vetter, J. S. (2013). Contemporary High Performance Computing: from Petascale Toward Exascale. CRC Press.
Vogel, A., Griebler, D., and Fernandes, L. G. (2021). Providing High-level Self-adaptive Abstractions for Stream Parallelism on Multicores. Software: Practice and Experience, 51(6):1194–1217.
Chapman, B., Mehrotra, P., and Zima, H. (1998). Enhancing OpenMP with features for locality control. In Proc. ECWMF Workshop” Towards Teracomputing-The Use of Parallel Processors in Meteorology, Austrian. Citeseer, PSU.
da Silva, H. U., Lucca, N., Schepke, C., de Oliveira, D. P., and da Cruz Cristaldo, C. F. (2022). Parallel OpenMP and OpenACC Porous Media Simulation. The Journal of Supercomputing.
Hammond, J. R., Deakin, T., Cownie, J., and McIntosh-Smith, S. (2022). Benchmarking Fortran DO CONCURRENT on CPUs and GPUs Using BabelStream. In 2022 IEEE/ACM International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), pages 82–99.
ISO Central Secretary (2018). Information technology — Programming languages — Fortran — Part 1: Base language. Standard ISO/IEC 1539-1:2018, International Organization for Standardization, Geneva, CH.
Kennedy, K., Koelbel, C., and Zima, H. (2007). The Rise and Fall of High Performance Fortran: An Historical Object Lesson. In Proceedings of the Third ACM SIGPLAN Conference on History of Programming Languages, HOPL III, page 7–1–7–22, New York, NY, USA. Association for Computing Machinery.
Kirk, D. B. and Wen-Mei, W. H. (2016). Programming massively parallel processors: a hands-on approach. Morgan kaufmann.
Koelbel, C. H., Loveman, D., Schreiber, R. S., Jr., G. L. S., and Zosel, M. (1993). High Performance Fortran Handbook. The MIT Press.
Löff, J., Griebler, D., Mencagli, G., Araujo, G., Torquati, M., Danelutto, M., and Fernandes, L. G. (2021). The nas parallel benchmarks for evaluating c++ parallel programming frameworks on shared-memory architectures. Future Generation Computer Systems, 125:743–757.
McCalpin, J. D. (1991-2007). Stream: Sustainable memory bandwidth in high performance computers. Technical report, University of Virginia, Charlottesville, Virginia. A continually updated technical report. [link].
OpenACC (2023). What is OpenACC? OpenMP (2023). The OpenMP API Specification for Parallel Programming.
Ozen, G. (2018). Compiler and runtime based parallelization & optimization for GPUs. PhD thesis, Department of Computer Architecture - DAC Universitat Politècnica de Catalunya - UPC.
Ozen, G. and Lopez, G. (2020). Accelerating Fortran DO CONCURRENT with GPUs and the NVIDIA HPC SDK.
Reid, J. (2018). The new features of fortran 2018. SIGPLAN Fortran Forum, 37(1):5–43.
Saad, Y. (2003). Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, second edition.
Shewchuk, J. R. (1994). An introduction to the conjugate gradient method without the agonizing pain. Technical report, Carnegie Mellon University, USA.
Silva, H. U., Schepke, C., da Cruz Cristaldo, C. F., de Oliveira, D. P., and Lucca, N. (2022). An Efficient Parallel Model for Coupled Open-Porous Medium Problem Applied to Grain Drying Processing. In Gitler, I., Barrios Hernández, C. J., and Meneses, E., editors, High Performance Computing, pages 250–264, Cham. Springer International Publishing.
Stulajter, M. M., Caplan, R. M., and Linker, J. A. (2022). Can Fortran’s ‘do concurrent’ Replace Directives for Accelerated Computing? In Bhalachandra, S., Daley, C., and Melesse Vergara, V., editors, Accelerator Programming Using Directives, pages 3–21, Cham. Springer International Publishing.
Versteeg, H. K. and Malalasekera, W. (2007). An Introduction to Computational Fluid Dynamics: The Finite Volume Method. Pearson Education.
Vetter, J. S. (2013). Contemporary High Performance Computing: from Petascale Toward Exascale. CRC Press.
Vogel, A., Griebler, D., and Fernandes, L. G. (2021). Providing High-level Self-adaptive Abstractions for Stream Parallelism on Multicores. Software: Practice and Experience, 51(6):1194–1217.
Publicado
23/10/2024
Como Citar
TREMARIN, Gabriel Dineck; MARCIANO, Anna Victória Gonçalves; SCHEPKE, Claudio; VOGEL, Adriano.
Fortran DO CONCURRENT Evaluation in Multi-core for NAS-PB Conjugate Gradient and a Porous Media Application. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 25. , 2024, São Carlos/SP.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 133-143.
DOI: https://doi.org/10.5753/sscad.2024.244796.