Evaluating Memory Constraints of RISC-V Matrix Accelerators using gem5
Resumo
Matrix multiplication is a core operation in artificial intelligence workloads, often limited by memory bandwidth in modern computing accelerators. This study explores the architectural integration of a prototype RISC-V matrix extension using the gem5 simulator by modeling various memory hierarchy configurations, ranging from private caches to direct DRAM connections. Results demonstrate that strategic memory hierarchy placement significantly enhances computational throughput and efficiency. Our matrix implementation achieves 1.35x the performance of OpenBLAS using the same architectural state and 87% of the theoretical maximum.Referências
Alibaba Cloud (2023). XuanTie Matrix Multiply Extension Instructions. [link].
Alvarenga, L., Ferrari, V., Souza, R., Pereira, M., and Araujo, G. (2024). Convbench: A comprehensive benchmark for 2d convolution primitive evaluation.
Binkert, N., Beckmann, B., Black, G., Reinhardt, S. K., Saidi, A., Basu, A., Hestness, J., Hower, D. R., Krishna, T., Sardashti, S., Sen, R., Sewell, K., Shoaib, M., Vaish, N., Hill, M. D., and Wood, D. A. (2011). The gem5 simulator. SIGARCH Comput. Archit. News, 39(2):1–7.
C. Aquino, I., Wanner, L., and Rigo, S. (2024). Architectural Simulation with gem5, chapter 4, pages 92–118. Sociedade Brasileira de Computação, São Carlos, SP.
Goto, K. and Geijn, R. A. v. d. (2008). Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw., 34(3).
Kim, H., Ye, G., Wang, N., Yazdanbakhsh, A., and Kim, N. S. (2024). Exploiting Intel Advanced Matrix Extensions (AMX) for Large Language Model Inference. IEEE Computer Architecture Letters, 23(1):117–120.
Lowe-Power, J., Akram, A., Amin, R., Hill, M. D., Wood, D. A., Chen, D. H., Hsu, L., Krishna, T., Agarwal, N., Wright, A. R., et al. (2020). The gem5 Simulator: Version 20.0+. arXiv preprint arXiv:2007.03152.
McCalpin, J. (1995). Memory bandwidth and machine balance in high performance computers. IEEE Technical Committee on Computer Architecture Newsletter, pages 19–25.
RISC-V International (2025). RISC-V International. [link].
Rogers, S., Slycord, J., Baharani, M., and Tabkhi, H. (2020). gem5-salam: A system architecture for llvm-based accelerator modeling. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 471–482.
SiFive (2024). Sifive proposal for risc-v ame extension. [link].
Vieira, J., Roma, N., Falcao, G., and Tomás, P. (2024). gem5-accel: A pre-rtl simulation toolchain for accelerator architecture validation. IEEE Computer Architecture Letters, 23(1):1–4.
Volokitin, V., Kozinov, E., Kustikova, V., Liniov, A., and Meyerov, I. (2023). Case Study for Running Memory-Bound Kernels on RISC-V CPUs. In Malyshkin, V., editor, Parallel Computing Technologies, pages 51–65, Cham. Springer Nature Switzerland.
Wang, C., Song, P., Zhao, H., Zhang, F., Wang, J., and Zhang, L. (2024). High-Utilization GPGPU Design for Accelerating GEMM Workloads: An Incremental Approach. In 2024 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–5.
Waterman, A., Lee, Y., Patterson, D. A., and Asanovic, K. (2014). The RISC-V instruction set manual, volume I: User-level ISA, version 2.0. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2014-54, page 4.
Weidmann, M. (2021). Introducing the scalable matrix extension for the armv9-a architecture. [link].
Yessin, G., Badawy, A. H. A., Narayana, V., Mayhew, D., and Ghazawi, T. E. (2014). ”CERE”: A CachE Recommendation Engine: Efficient Evolutionary Cache Hierarchy Design Space Exploration. In 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), pages 566–573.
Alvarenga, L., Ferrari, V., Souza, R., Pereira, M., and Araujo, G. (2024). Convbench: A comprehensive benchmark for 2d convolution primitive evaluation.
Binkert, N., Beckmann, B., Black, G., Reinhardt, S. K., Saidi, A., Basu, A., Hestness, J., Hower, D. R., Krishna, T., Sardashti, S., Sen, R., Sewell, K., Shoaib, M., Vaish, N., Hill, M. D., and Wood, D. A. (2011). The gem5 simulator. SIGARCH Comput. Archit. News, 39(2):1–7.
C. Aquino, I., Wanner, L., and Rigo, S. (2024). Architectural Simulation with gem5, chapter 4, pages 92–118. Sociedade Brasileira de Computação, São Carlos, SP.
Goto, K. and Geijn, R. A. v. d. (2008). Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw., 34(3).
Kim, H., Ye, G., Wang, N., Yazdanbakhsh, A., and Kim, N. S. (2024). Exploiting Intel Advanced Matrix Extensions (AMX) for Large Language Model Inference. IEEE Computer Architecture Letters, 23(1):117–120.
Lowe-Power, J., Akram, A., Amin, R., Hill, M. D., Wood, D. A., Chen, D. H., Hsu, L., Krishna, T., Agarwal, N., Wright, A. R., et al. (2020). The gem5 Simulator: Version 20.0+. arXiv preprint arXiv:2007.03152.
McCalpin, J. (1995). Memory bandwidth and machine balance in high performance computers. IEEE Technical Committee on Computer Architecture Newsletter, pages 19–25.
RISC-V International (2025). RISC-V International. [link].
Rogers, S., Slycord, J., Baharani, M., and Tabkhi, H. (2020). gem5-salam: A system architecture for llvm-based accelerator modeling. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 471–482.
SiFive (2024). Sifive proposal for risc-v ame extension. [link].
Vieira, J., Roma, N., Falcao, G., and Tomás, P. (2024). gem5-accel: A pre-rtl simulation toolchain for accelerator architecture validation. IEEE Computer Architecture Letters, 23(1):1–4.
Volokitin, V., Kozinov, E., Kustikova, V., Liniov, A., and Meyerov, I. (2023). Case Study for Running Memory-Bound Kernels on RISC-V CPUs. In Malyshkin, V., editor, Parallel Computing Technologies, pages 51–65, Cham. Springer Nature Switzerland.
Wang, C., Song, P., Zhao, H., Zhang, F., Wang, J., and Zhang, L. (2024). High-Utilization GPGPU Design for Accelerating GEMM Workloads: An Incremental Approach. In 2024 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–5.
Waterman, A., Lee, Y., Patterson, D. A., and Asanovic, K. (2014). The RISC-V instruction set manual, volume I: User-level ISA, version 2.0. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2014-54, page 4.
Weidmann, M. (2021). Introducing the scalable matrix extension for the armv9-a architecture. [link].
Yessin, G., Badawy, A. H. A., Narayana, V., Mayhew, D., and Ghazawi, T. E. (2014). ”CERE”: A CachE Recommendation Engine: Efficient Evolutionary Cache Hierarchy Design Space Exploration. In 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), pages 566–573.
Publicado
28/10/2025
Como Citar
AQUINO, Iago C.; KREBS, Casio P.; WANNER, Lucas; RIGO, Sandro.
Evaluating Memory Constraints of RISC-V Matrix Accelerators using gem5. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 26. , 2025, Bonito/MS.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 193-204.
DOI: https://doi.org/10.5753/sscad.2025.16632.
