Utilização de aceleradores embarcados de baixo consumo na implementação de sistemas de HPC

  • Emilio Hoffmann de O. UNIJUI
  • Jorge Silva Jr. UFRGS
  • Edson Padoin UNIJUI / UFRGS
  • Phillipe Navaux UFRGS

Abstract


This work aims to analyze the performance and energy efficiency of low-power embedded accelerators to implement HPC systems due current power consumption limitations. Tests using the 3 levels of SHOC benchmark were performed on 5 NVIDIA conventional GPUs accelerators and a low-power accelerator embedded on MPSoC Jetson. Conventional accelerators such as the NVIDIA K80 achieved performance of up to 3750 GFLOPS and energy efficiency of 25 GFLOPS/W, whereas with the low power accelerator TK1 was obtained performances of only 301 GFLOPS and a higher energy efficiency, equivalent to 26, 2 GFLOPS/W.

References

Bellorini, E. A. and Galante, G. (2009). Resolução do problema de difus˜ao de calor usando gpus. In Escola Regional de Alto Desempenho, volume 9, pages 245–248. ERAD.

Bergman, K., Borkar, S., Campbell, D., Carlson, W., Dally, W., Denneau, M., Franzon, P., Harrod, W., Hill, K., Hiller, J., et al. (2008). Exascale computing study: Technology challenges in achieving exascale systems. Defense Advanced Research Projects Agency Information Processing Techniques Ofce (DARPA IPTO), Tech. Rep, 15.

Danalis, A. e. a. (2010). The scalable heterogeneous computing (SHOC) benchmark suite. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pages 63–74. ACM.

Dongarra, J., Meuer, H., and Strohmaier, E. (2015). TOP500 Supercomputer Sites.

Feng, W. and Lin, H. (2010). The Green500 List: Year Two. In International Parallel and Distributed Processing Workshops (IPDPSW), Atlanta, Georgia, USA. IEEE.

Gulo, C. A. S. J. (2012). Técnicas de paralelização em gpgpu aplicadas em algoritmo para remoção de ruído multiplicativo. Dissertação (mestrado) Universidade Estadual Paulista, Instituto de Biociências, Letras e Ciências Exatas. http://hdl.handle.net/11449/89336.

Huang, S., Xiao, S., and Feng, W.-c. (2009). On the energy efciency of graphics processing units for scientic computing. In Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pages 1–8. IEEE.

Jiao, Y., Lin, H., Balaji, P., and Feng, W. Power and performance characterization of computational kernels on the gpu. In Green Computing and Communications (GreenCom), 2010 IEEE/ACM Int'l Conference on & Int'l Conference on Cyber, Physical and Social Computing (CPSCom), pages 221–228. IEEE.

Jin, Z. and Yang, X. (2011). A variational model to remove the multiplicative noise in ultrasound images. Journal of Mathematical Imaging and Vision, 39(1):62–74.

Lindholm, E., Nickolls, J., Oberman, S. F., and Montrym, J. (2008). Tesla: A unied graphics and computing architecture. IEEE Micro, pages 39–55.

Liu, W., Du, Z., Xiao, Y., Bader, D., and Xu, C. (2011). A waterfall model to achieve energy efcient tasks mapping for large scale gpu clusters. In Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on, pages 82–92. IEEE.

Montblanc Project (2015). European approach towards energy efcient high performance. http://montblanc-project.eu/.

Nvidia (2009). NVIDIA's Next Generation CUDA Compute Architecture: FERMI. http://www.nvidia.com/content/pdf/fermi/whitepaper.pdf.

NVIDIA (2014a). NVIDIA's Next Generation CUDA Compute Architecture: Kepler http://international.download.nvidia.com/pdf/kepler/NVIDIA-Kepler-GK110/210. GK110-GK210-Architecture-Whitepaper.pdf.

NVIDIA Whitepaper: http://www.nvidia.com/content/PDF/tegrawhitepapers/tegra-K1-whitepaper.pdf.

NVIDIA (2015). Publicaç˜oes sobre o produto. http://www.nvidia.com.br/object/tesla product literature br.html.

Padoin, E. L., Pilla, L. L., Boito, F. Z., Kassick, R. V., Velho, P., and Navaux, P. O. A. (2013). Evaluating application performance and energy consumption on hybrid CPU+GPU architecture. Cluster Computing, 16(3):511–525. 10.1007/s10586-0120219-6.

Schäi, B., Przywara, B., Bellosa, F., Bogner, T., Weeren, S., Harrison, R., and Anglade, A. (2009). Energy efficient servers in europe. http://ec.europa.eu/energy/intelligent/projects/sites/ieeprojects/files/projects/documents/e-server e server final publishable report en.pdf.

Zanotto, L., Ferreira, A., and Matsumoto, M. (2012). Arquitetura e Programação de GPU Nvidia. pages 1–7.
Published
2015-10-18
HOFFMANN DE O., Emilio; SILVA JR., Jorge; PADOIN, Edson; NAVAUX, Phillipe. Utilização de aceleradores embarcados de baixo consumo na implementação de sistemas de HPC. In: BRAZILIAN SYMPOSIUM ON HIGH PERFORMANCE COMPUTING SYSTEMS (SSCAD), 16. , 2015, Florianópolis. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2015 . p. 252-263. DOI: https://doi.org/10.5753/wscad.2015.14288.