Implementação e Avaliação de Políticas de Escalonamento de Warps no Vortex, uma GPGPU de Código Aberto

Samuel A. O. Magalhães; Poliana A. C. Oliveira; Renan A. Marks

doi:10.5753/sscad.2025.16709

Samuel A. O. Magalhães CEFET-MG
Poliana A. C. Oliveira CEFET-MG
Renan A. Marks UFMS

DOI: https://doi.org/10.5753/sscad.2025.16709

Resumo

Apesar da popularidade das GPUs de propósito geral, arquiteturas proprietárias limitam experimentações acadêmicas, motivando o uso de projetos em código aberto como o Vortex, uma GPGPU baseada no conjunto de instruções RISC-V. Este trabalho estende o Vortex com duas estratégias adicionais de escalonamento de warps: PTA, proposta neste trabalho e que prioriza warps com maior número de threads ativas; e GTO, política já conhecida na literatura, que favorece de forma gananciosa warps ativos até serem bloqueados. Os resultados das simulações com cargas do Rodinia Benchmark revelaram que o GTO e PTA podem alcançar uma redução de ciclos de até 26% e 21,5%, respectivamente, quando comparado à política nativa do Vortex, Round Robin.

Referências

Aamodt, T. M., Fung, W. W. L., Rogers, T. G., and Martonosi, M. (2018). Generalpurpose graphics processor architectures. Morgan & Claypool Publishers. ISBN: 9781627056182.

Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J. W., Lee, S.-H., and Skadron, K. (2009). Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE international symposium on workload characterization (IISWC), pages 44–54. Ieee.

Elsabbagh, F., Asgari, B., Kim, H., and Yalamanchili, S. (2019). Vortex RISC-V GPGPU System: Extending the ISA, Synthesizing the Microarchitecture, and Modeling the Software Stack.

Elsabbagh, F., Tine, B., Roshan, P., Lyons, E., Kim, E., Shim, D. E., Zhu, L., Lim, S. K., et al. (2020). Vortex: OpenCL Compatible RISC-V GPGPU. arXiv preprint arXiv:2002.12151.

Hennessy, J. L. and Patterson, D. A. (2017). Computer organization and design RISC-V edition: The hardware software interface.

Khairy, M., Shen, Z., Aamodt, T. M., and Rogers, T. G. (2020). Accel-Sim: An extensible simulation framework for validated GPU modeling. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pages 473–486. IEEE.

Lakshminarayana, N. B. and Kim, H. (2010). Effect of instruction fetch and memory scheduling on GPU performance. In Workshop on Language, Compiler, and Architecture Support for GPGPU, volume 88.

Lee, M., Song, S., Moon, J., Kim, J., Seo, W., Cho, Y., and Ryu, S. (2014). Improving GPGPU resource utilization through alternative thread block scheduling. In 2014 IEEE 20th international symposium on high performance computer architecture (HPCA), pages 260–271. IEEE.

Narasiman, V., Shebanow, M., Lee, C. J., Miftakhutdinov, R., Mutlu, O., and Patt, Y. N. (2011). Improving GPU performance via large warps and two-level warp scheduling. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, pages 308–317.

Park, J. J. K., Park, Y., and Mahlke, S. (2015). ELF: Maximizing memory-level parallelism for GPUs with coordinated warp and fetch scheduling. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–12.

Peddie, J. (2022). The History of the GPU - Steps to Invention. Springer. ISBN 9783031109683.

Tine, B. (2021). Vortex microarchitecture. In 54th Annual IEEE/ACM International Symposium on Microarchitecture. Association for Computing Machinery.

Tine, B., Yalamarthy, K. P., Elsabbagh, F., and Hyesoon, K. (2021). Vortex: Extending the RISC-V ISA for GPGPU and 3D-Graphics. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO ’21, page 754–766, New York, NY, USA. Association for Computing Machinery.

Volkov, V. (2016). Understanding latency hiding on GPUs. University of California, Berkeley.