Portabilidade e Eficiência do Método Fletcher de Aplicações Sísmicas em Arquiteturas Multicore e GPU

Matheus Serpa; Pablo José Pavan; Jairo Panetta; Antônio Azambuja; Alexandre Carissimi; Philippe Olivier Navaux

doi:10.5753/wscad.2019.8666

Matheus Serpa Universidade Federal do Rio Grande do Sul
Pablo José Pavan Universidade Federal do Rio Grande do Sul
Jairo Panetta Instituto Tecnológico de Aeronáutica
Antônio Azambuja Petrobras
Alexandre Carissimi Universidade Federal do Rio Grande do Sul
Philippe Olivier Navaux Universidade Federal do Rio Grande do Sul

DOI: https://doi.org/10.5753/wscad.2019.8666

Resumo

A simulação da propagação de ondas acústicas é a base das ferramentas de imagem sı́smica utilizadas pela indústria de petróleo e gás. Para realizar tais simulações, arquiteturas de CAD são empregadas, fornecendo resultados mais rápidos e com maior precisão a cada geração de processadores. Entretanto, para atingir alto desempenho nessas arquiteturas, vários desafios devem ser levados em consideração no momento do desenvolvimento da aplicação. Neste artigo, a Modelagem Fletcher foi otimizada para multicore e GPU e o desempenho, o consumo de energia e a eficiência energética de oito versões do código foram avaliados. Os resultados mostram que a versão CUDA tem o melhor desempenho e eficiência energética; no entanto, a versão OpenACC que tem a vantagem da portabilidade, tem um desempenho e degradação de eficiência energética de apenas 10 e 8% comparado com CUDA. ∗

Referências

Andreolli, C., Thierry, P., Borges, L., Skinner, G., and Yount, C. (2015). Characterization and Optimization Methodology Applied to Stencil Computations. In Reinders, J. and Jeffers, J., editors, High Performance Parallelism Pearls, pages 377–396. Morgan Kaufmann, Boston.

Caballero, D., Farrés, A., Duran, A., Hanzich, M., Fernández, S., and Martorell, X. (2015). Optimizing Fully Anisotropic Elastic Propagation on Intel Xeon Phi Coprocessors. In 2nd EAGE Workshop on HPC for Upstream, pages 1–6.

Carrijo Nasciutti, T., Panetta, J., and Pais Lopes, P. (2018). Evaluating optimizations that reduce global memory accesses of stencil computations in gpgpus. Concurrency and Computation: Practice and Experience, page e4929.

Castro, M., Francesquini, E., Dupros, F., Aochi, H., Navaux, P. O. A., and Méhaut, J.-F. (2016). Seismic wave propagation simulations on low-power and performance-centric manycores. Parallel Computing, 54.

Chandra, R., Dagum, L., Kohr, D., Menon, R., Maydan, D., and McDonald, J. (2001) Parallel programming in OpenMP. Morgan kaufmann.

Clapp, R. G. (2015). Seismic Processing and the Computer Revolution(s). In Society of Exploration Geophysicists (SEG) Technical Program Expanded Abstracts 2015, pages 4832–4837.

Clapp, R. G., Fu, H., and Lindtjorn, O. (2010). Selecting the right hardware for reverse time migration. The Leading Edge, 29(1).

Fletcher, R. P., Du, X., and Fowler, P. J. (2009). Reverse time migration in tilted transversely isotropic (tti) media. Geophysics, 74(6):WCA179–WCA187.

J. Dongarra, H. M. and Strohmaier, E. (2019). Top500 supercomputer: June 2019. https://www.top500.org/lists/2019/06/. [Acesso em: 10 Jul. 2019].

Kukreja, N., Louboutin, M., Vieira, F., Luporini, F., Lange, M., and Gorman, G. (2016) Devito: Automated fast finite difference computation. In Procs. of the 6th Intl. Workshop on Domain-Spec. Lang. and High-Level Frameworks for HPC, WOLFHPC ’16, pages 11–19. IEEE Press.

Lukawski, M. Z., Anderson, B. J., Augustine, C., Capuano Jr, L. E., Beckers, K. F., Livesay, B., and Tester, J. W. (2014). Cost analysis of oil, gas, and geothermal well drilling. Journal of Petroleum Science and Engineering, 118:1–14.

Memeti, S., Li, L., Pllana, S., Kołodziej, J., and Kessler, C. (2017). Benchmarking opencl, openacc, openmp, and cuda: programming productivity, performance, and energy consumption. In Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, pages 1–6. ACM.

Niu, X., Jin, Q., Luk, W., and Weston, S. (2014). A Self-Aware Tuning and SelfAware Evaluation Method for Finite-Difference Applications in Reconfigurable Systems. ACM Trans. on Reconf. Technology and Systems, 7(2). Nvidia (2016). Developer Zone - CUDA Toolkit Documentation.

Ott, R. L. and Longnecker, M. T. (2015). An introduction to statistical methods and data analysis. Nelson Education.

Pavan, P. J., Serpa, M. S., Padoin, E. L., Schnorr, L. M., Navaux, P. O. A., and Panetta, J. (2018). Improving i/o performance of rtm algorithm for oil and gas simulation. In 2018 Symposium on High Performance Computing Systems (WSCAD), pages 270–270. IEEE.

Qutob, H. et al. (2004). Underbalanced drilling

Rubio, F., Farrés, A., Hanzich, M., de la Puente, J., and Ferrer, M. (2013). Optimizing Isotropic and Fully-anisotropic Elastic Modelling on Multi-GPU Platforms. In 75th EAGE Conference & Exhibition, pages 10–13. EAGE.

Sabne, A., Sakdhnagool, P., Lee, S., and Vetter, J. S. (2014). Evaluating performance portability of openacc. In International Workshop on Languages and Compilers for Parallel Computing, pages 51–66. Springer.

Sanders, J. and Kandrot, E. (2010). CUDA by example: an introduction to generalpurpose GPU programming. Addison-Wesley Professional.

Serpa, M. S., Cruz, E. H., Diener, M., Krause, A. M., Navaux, P. O. A., Panetta, J., Farrés, A., Rosas, C., and Hanzich, M. (2019a). Optimization strategies for geophysics models on manycore systems. The International Journal of High Performance Computing Applications, 33(3):473–486.

Serpa, M. S., Moreira, F. B., Navaux, P. O., Cruz, E. H., Diener, M., Griebler, D., and Fernandes, L. G. (2019b). Memory performance and bottlenecks in multicore and gpu architectures. In 2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pages 233–236. IEEE.

Slaight, T. (2002). Platform management ipmi controllers, sensors, and tools. In Intel Developer Forum.

Subramaniam, B., Saunders, W., Scogland, T., and Feng, W.-c. (2013). Trends in energyefficient computing: A perspective from the green500. In 2013 International Green Computing Conference Proceedings, pages 1–8. IEEE.

Terpstra, D., Jagode, H., You, H., and Dongarra, J. (2010). Collecting performance data with papi-c. In Tools for High Performance Computing 2009, pages 157–173. Springer.

Wienke, S., Springer, P., Terboven, C., and an Mey, D. (2012). Openacc—first experiences with real-world applications. In European Conference on Parallel Processing, pages 859–870. Springer.

Witten, I. H., Frank, E., Hall, M. A., and Pal, C. J. (2016). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.

Yuen, D. A., Wang, L., Chi, X., Johnsson, L., Ge, W., and Shi, Y. (2013). GPU solutions to multi-scale problems in science and engineering. Springer.

Zhebel, E., Minisini, S., Kononov, A., and Mulder, W. (2013). Performance and scalability of finite-difference and finite-element wave-propagation modeling on Intel’s Xeon Phi. In Society of Exploration Geophysicists (SEG) Technical Program Expanded Abstracts 2013, pages 3386–3390.