Towards an Autonomous Framework for HPC Optimization: Using Machine Learning for Energy and Performance Modeling

  • Vinícius Klôh Laboratório Nacional de Computação Científica
  • Matheus Gritz Laboratório Nacional de Computação Científica
  • Bruno Schulze LNCC
  • Mariza Ferro LNCC

Abstract


Performance and energy efficiency are now critical concerns in high performance scientific computing. It is expected that requirements of the scientific problem should guide the orchestration of different techniques of energy saving, in order to improve the balance between energy consumption and application performance. To enable this balance, we propose the development of an autonomous framework to make this orchestration and present the ongoing research to this development, more specifically, focusing in the characterization of the scientific applications and the performance modeling tasks using Machine Learning.

References

Bhimani, J., Mi, N., Leeser, M., and Yang, Z. (2017). Fim: performance prediction for parallel computation in iterative data processing applications. In 2017 IEEE 10th International conference on cloud computing (CLOUD), pages 359–366. IEEE.

Carrington, L., Laurenzano, M. A., and Tiwari, A. (2013). Inferring large-scale computation behavior via trace extrapolation. In 2013 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum, pages 1667–1674.

Ferro, M. (2015). Avaliação de Sistemas de Computação Massivamente Paralela e Distribuı́da: Uma metodologia voltada aos requisitos das aplicações cientı́ficas. Tese de doutorado, Laboratório Nacional de Computação Cientı́fica, Petrópolis - RJ.

Ferro, M., Mc Evoy, G., and Schulze, B. (2017a). Analysis of High Performance Applications Using Workload Requirements, pages 7–10. Springer International Publishing, Cham.

Ferro, M., Silva, G. D., Klóh, V. P., and Schulze, B. (2017b). Challenges in HPC Evaluation: Towards a methodology for scientific applications’ requirements. IOS Press, Amsterdam. accepted to publish.

Malakar, P., Balaprakash, P., Vishwanath, V., Morozov, V., and Kumaran, K. (2018). Benchmarking machine learning methods for performance modeling of scientific applications. 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS).

Martı́nez, V., Dupros, F., Castro, M., and Navaux, P. (2017). Performance improvement of stencil computations for multi-core architectures based on machine learning. Procedia Computer Science, 108:305–314.

Messina, P. (2017). The exascale computing project. Computing in Science Engineering, 19(3):63–67.

Mury, A. R., Schulze, B., Licht, F., de Bona, L. C., and Ferro, M. (2015). A concurrency mitigation proposal for sharing environments: An affinity approach based on applications classes. In Al-Saidi, A., Fleischer, R., Maamar, Z., and Rana, O. F., editors, Intelligent Cloud Computing, volume 8993 of Lecture Notes in Computer Science, pages 26–45. Springer International Publishing.

Rajovic, N., Carpenter, P. M., Gelado, I., Puzovic, N., Ramirez, A., and Valero, M. (2013) Supercomputing with commodity cpus: Are mobile socs ready for hpc? In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’13, pages 40:1–40:12, New York, NY, USA. ACM.

Silva, G. D., Klôh, V. P., Yokoyama, A., Ferro, M., and Schulze, B. (2018) Smcis: Scientific applications monitoring tool for hpc environments. In 2018 Symposium on High Performance Computing Systems (WSCAD), pages 148–154. https://ieeexplore.ieee.org/document/8748925.

Simon, H. D. (2013). Barriers to Exascale Computing. Springer Berlin Heidelberg, Berlin, Heidelberg.

Wong, A., Rexachs, D., and Luque, E. (2015). Parallel application signature for performance analysis and prediction. IEEE Transactions on Parallel and Distributed Systems, 26(7):2009–2019.
Published
2019-11-08
KLÔH, Vinícius; GRITZ, Matheus; SCHULZE, Bruno; FERRO, Mariza. Towards an Autonomous Framework for HPC Optimization: Using Machine Learning for Energy and Performance Modeling. In: SYMPOSIUM ON HIGH PERFORMANCE COMPUTING SYSTEMS (SSCAD), 20. , 2019, Campo Grande. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 438-445. DOI: https://doi.org/10.5753/wscad.2019.8689.