Uma API em linguagem C++ para programas com laços paralelos e suporte a multi-CPUs e multi-GPUs
Abstract
This article presents a high-level C++ API to implement parallel programs using loops and reductions. It intends to provide a solution for the gap of APIs that support the developing of applications which can be simultaneously processed by multi-CPUs and multi-GPUs. Our hypothesis estimates that scientific applications can explore heterogeneous processing in multi-CPUs and multi-GPUs to achieve a better performance than exploring just an accelerator. Results obtained from experiments with scientific mini-applications developed applying the new API suggest that combining CPUs and GPUs processing can lead to performance gains.References
Adcock, A. B., Sullivan, B. D., Hernandez, O. R., and Mahoney, M. W. (2013). Eva- luating OpenMP Tasking at Scale for the Computation of Graph Hyperbolicity. In Proceedings of the OpenMP in the Era of Low Power Devices and Accelerators, 9th International Workshop on OpenMP, IWOMP 2013, pages 71–83, Canberra, ACT, Australia. Springer Berlin Heidelberg.
Augonnet, C., Thibault, S., Namyst, R., and Wacrenier, P.-A. (2011). StarPU: a unied platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, 23(2):187–198.
Broquedis, F., Gautier, T., and Danjean, V. (2012). libKOMP, an Efcient OpenMP Run- time System for Both Fork-Join and Data Flow Paradigms. In Proc. of the OpenMP in a Heterogeneous World - 8th IWOMP, pages 102–115, Rome, Italy.
Bueno, J., Martorell, X., Badia, R. M., Ayguadé, E., and Labarta, J. (2013). Implementing OmpSs Support for Regions of Data in Architectures with Multiple Address Spaces. In Proceedings of the 27th International Conference on Supercomputing, ICS '13, pages 359–368, Eugene, Oregon, USA. ACM.
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J. W., Lee, S. H., and Skadron, K. (2009). Rodinia: A benchmark suite for heterogeneous computing. In Workload Cha- racterization, 2009. IISWC 2009. IEEE International Symposium on, pages 44–54.
Duran, A., Ayguadé, E., Badia, R. M., Labarta, J., Martinell, L., Martorell, X., and Planas, J. (2011). Ompss: a Proposal for Programming Heterogeneous Multi-Core Architec- tures. Parallel Processing Letters, 21(2):173–193.
Duran, A., Teruel, X., Ferrer, R., Martorell, X., and Ayguade, E. (2009). Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Paral- lelism in OpenMP. In International Conference on Parallel Processing, 2009. ICPP '09, pages 124–131.
Edwards, H. C., Sunderland, D., Porter, V., Amsler, C., and Mish, S. (2012). Manycore performance-portability: Kokkos multidimensional array library. Scientic Program- ming, 20(2):89–114.
Edwards, H. C., Trott, C. R., and Sunderland, D. (2014). Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. Journal of Pa- rallel and Distributed Computing, 74(12):3202 – 3216. Domain-Specic Languages and High-Level Frameworks for High-Performance Computing.
Garland, M., Kudlur, M., and Zheng, Y. (2012). Designing a Unied Programming Model In SC '12: Proc. Conference on High Performance for Heterogeneous Machines. Computing Networking, Storage and Analysis.
Gautier, T., Lima, J. V. F., Maillard, N., and Rafn, B. (2013). XKaapi: A Runtime System In Proceedings for Data-Flow Task Programming on Heterogeneous Architectures. of the 27th IEEE International Symposium on Parallel and Distributed Processing, IPDPS '13, pages 1299–1308, Washington, DC, USA. IEEE Computer Society.
Gregory, K. and Miller, A. (2012). C++ AMP: Accelerated Massive Parallelism with Microsoft R(cid:13) Visual C++ R(cid:13). Developer Reference. Microsoft Press.
Heller, T., Kaiser, H., and Iglberger, K. (2013). Application of the ParalleX Execution Model to Stencil-based Problems. Comput. Sci., 28(2-3):253–261.
Hugo, A.-E., Guermouche, A., Wacrenier, P.-A., and Namyst, R. (2013). Composing Multiple StarPU Applications over Heterogeneous Machines: A Supervised Approach. In Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), pages 1050–1059.
OpenMP (2016). OpenMP Application Program Interface Version 4.5. http://www.openmp.org/mp-documents/openmp-4.5.pdf. Acesso em: 19 jul 2016.
Stroustrup, B. (2013). The C++ Programming Language. Addison-Wesley Professional, 4th edition.
Thrust (2016). http://thrust.github.io/ Acesso em: 21 mai 2016.
Virouleau, P., Broquedis, F., Gautier, T., and Rastello, F. (2016). Using data dependen- cies to improve task-based scheduling strategies on NUMA architectures. In Euro-Par 2016, Euro-Par 2016, Grenoble, France.
Augonnet, C., Thibault, S., Namyst, R., and Wacrenier, P.-A. (2011). StarPU: a unied platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, 23(2):187–198.
Broquedis, F., Gautier, T., and Danjean, V. (2012). libKOMP, an Efcient OpenMP Run- time System for Both Fork-Join and Data Flow Paradigms. In Proc. of the OpenMP in a Heterogeneous World - 8th IWOMP, pages 102–115, Rome, Italy.
Bueno, J., Martorell, X., Badia, R. M., Ayguadé, E., and Labarta, J. (2013). Implementing OmpSs Support for Regions of Data in Architectures with Multiple Address Spaces. In Proceedings of the 27th International Conference on Supercomputing, ICS '13, pages 359–368, Eugene, Oregon, USA. ACM.
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J. W., Lee, S. H., and Skadron, K. (2009). Rodinia: A benchmark suite for heterogeneous computing. In Workload Cha- racterization, 2009. IISWC 2009. IEEE International Symposium on, pages 44–54.
Duran, A., Ayguadé, E., Badia, R. M., Labarta, J., Martinell, L., Martorell, X., and Planas, J. (2011). Ompss: a Proposal for Programming Heterogeneous Multi-Core Architec- tures. Parallel Processing Letters, 21(2):173–193.
Duran, A., Teruel, X., Ferrer, R., Martorell, X., and Ayguade, E. (2009). Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Paral- lelism in OpenMP. In International Conference on Parallel Processing, 2009. ICPP '09, pages 124–131.
Edwards, H. C., Sunderland, D., Porter, V., Amsler, C., and Mish, S. (2012). Manycore performance-portability: Kokkos multidimensional array library. Scientic Program- ming, 20(2):89–114.
Edwards, H. C., Trott, C. R., and Sunderland, D. (2014). Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. Journal of Pa- rallel and Distributed Computing, 74(12):3202 – 3216. Domain-Specic Languages and High-Level Frameworks for High-Performance Computing.
Garland, M., Kudlur, M., and Zheng, Y. (2012). Designing a Unied Programming Model In SC '12: Proc. Conference on High Performance for Heterogeneous Machines. Computing Networking, Storage and Analysis.
Gautier, T., Lima, J. V. F., Maillard, N., and Rafn, B. (2013). XKaapi: A Runtime System In Proceedings for Data-Flow Task Programming on Heterogeneous Architectures. of the 27th IEEE International Symposium on Parallel and Distributed Processing, IPDPS '13, pages 1299–1308, Washington, DC, USA. IEEE Computer Society.
Gregory, K. and Miller, A. (2012). C++ AMP: Accelerated Massive Parallelism with Microsoft R(cid:13) Visual C++ R(cid:13). Developer Reference. Microsoft Press.
Heller, T., Kaiser, H., and Iglberger, K. (2013). Application of the ParalleX Execution Model to Stencil-based Problems. Comput. Sci., 28(2-3):253–261.
Hugo, A.-E., Guermouche, A., Wacrenier, P.-A., and Namyst, R. (2013). Composing Multiple StarPU Applications over Heterogeneous Machines: A Supervised Approach. In Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), pages 1050–1059.
OpenMP (2016). OpenMP Application Program Interface Version 4.5. http://www.openmp.org/mp-documents/openmp-4.5.pdf. Acesso em: 19 jul 2016.
Stroustrup, B. (2013). The C++ Programming Language. Addison-Wesley Professional, 4th edition.
Thrust (2016). http://thrust.github.io/ Acesso em: 21 mai 2016.
Virouleau, P., Broquedis, F., Gautier, T., and Rastello, F. (2016). Using data dependen- cies to improve task-based scheduling strategies on NUMA architectures. In Euro-Par 2016, Euro-Par 2016, Grenoble, France.
Published
2016-10-05
How to Cite
DI DOMENICO, Daniel; LIMA, João.
Uma API em linguagem C++ para programas com laços paralelos e suporte a multi-CPUs e multi-GPUs. In: SYMPOSIUM ON HIGH PERFORMANCE COMPUTING SYSTEMS (SSCAD), 17. , 2016, Aracajú.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2016
.
p. 85-96.
DOI: https://doi.org/10.5753/wscad.2016.14250.