Caminhamento Paralelo Barnes-Hut com Vetorização AVX2

  • Wagner Zola Federal University of Parana
  • Armando Delgado Federal University of Parana
  • Rodrigo Morante Blanco Federal University of Parana

Abstract


The Barnes-Hut algorithm is an approximate method used for gravitational simulation of N-Bodies. The irregular nature of this code presents challenges for its computing in parallel systems. Additional obstacles in this computation pattern arise to effectively use the computational capability of multicore architectures with SIMD instructions. Our focus on this work is to implement and analyze the efficiency of the Barnes-Hut parallel traversals with implicit octrees and the use of AVX2 vector instructions. The experiments validate the effectiveness of our method, which has high GFLOP/s rate and takes considerably less energy in simulations.

References

Arora, N., Shringarpure, A., and Vuduc, R. W. (2009). Direct n-body kernels for multicore platforms. In ICPP 2009, International Conference on Parallel Processing, Vienna, Austria, 22-25 September 2009, pages 379–387.

Asanovic, K., Bodik, R., Catanzaro, B. C., Gebis, J. J., Husbands, P., Keutzer, K., Patterson, D. A., Plishker, W. L., Shalf, J., Williams, S. W., and Yelick, K. A. (2006). The landscape of parallel computing research: A view from berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley.

Barnes, J. and Hut, P. (1986). A hierarchical O(N log N) force-calculation algorithm. Nature, 324:446–449.

Bédorf, J., Gaburov, E., and Portegies Zwart, S. (2012). A sparse octree gravitational n-body code that runs entirely on the GPU processor. J. Comput. Phys., 231(7):2825–2839.

Burtscher, M. and Pingali, K. (2011). Chapter 6 - an efficient CUDA implementation of the tree-based barnes hut n-body algorithm. In Hwu, W.-m. W., editor, GPU Computing Gems Emerald Edition, pages 75 – 92. Morgan Kaufmann, Boston.

Hennessy, J. L. and Patterson, D. A. (2011). Computer Architecture, Fifth Edition: A Quantitative Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 5th edition.

Lange, B. and Fortin, P. (2014). Parallel dual tree traversal on multi-core and many-core architectures for astrophysical n-body simulations. In Euro-Par 2014: Proceedings of the 20th International European Conference on Parallel and Distributed Computing, pages 716–727.

Long Wang, R. S. et al. (2015). NBODY6++GPU: ready for the gravitational million-body problem. Monthly Notices of the Royal Astronomical Society, 450(4):4070–4080.

Nunan Zola, W. M., Bona, L. C., and Silva, F. (2014). Fast GPU parallel N-Body tree traversal with Simulated Wide-Warp. In Parallel and Distributed Systems (ICPADS), 2014 20th IEEE International Conference on, pages 718–725.

Plummer, H. C. (1911). On the problem of distribution in globular star clusters. Monthly Notices of the Royal Astronomical Society, 71:460–470.

Treibig, J., Hager, G., and Wellein, G. (2010). Likwid: A lightweight performance-oriented tool suite for x86 multicore environments. In Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, San Diego CA.

Yokota, R. (2012). An FMM based on dual tree traversal for many-core architectures. CoRR, abs/1209.3516.

Zecena, I., Burtscher, M., Jin, T., and Zong, Z. (2013). Evaluating the performance and energy efficiency of n-body codes on multi-core cpus and gpus. In 32nd IEEE International Performance Computing and Communications Conference, IPCCC’13.
Published
2019-11-08
ZOLA, Wagner; DELGADO, Armando; BLANCO, Rodrigo Morante. Caminhamento Paralelo Barnes-Hut com Vetorização AVX2. In: SYMPOSIUM ON HIGH PERFORMANCE COMPUTING SYSTEMS (SSCAD), 20. , 2019, Campo Grande. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 454-461. DOI: https://doi.org/10.5753/wscad.2019.8691.