MSE: A Matrix Sparsity Extension for RISC-V
Resumo
The increase in size and computing demands of modern Convolutional Neural Networks (CNNs) requires techniques that can reduce model size and accelerate inference execution. A popular method is pruning, where some model weights are zeroed in a controlled way while maintaining model accuracy. This method can create structured sparsity patterns in the weight matrix, allowing storage in compressed formats. This reduces the model’s memory footprint while maintaining sequential access to the values. In this paper, we propose the Matrix Sparsity Extension (MSE). The proposed RISC-V instruction set extension takes advantage of the sparsity formats generated by pruning, reducing the amount of memory operations needed for matrix multiplication. MSE was able to provide speedups close to 1.24x over the baseline uncompressed operation, reaching 1.75x when the sparse case offers better alignment in cache.Referências
Advanced Micro Devices, Inc. (2024). “AMD Instinct MI300” Instruction Set Architecture: Reference Guide. Advanced Micro Devices, Inc., Santa Clara, CA, USA. Version published July 15, 2024.
C. Aquino, I., Wanner, L., and Rigo, S. (2024). Architectural Simulation with gem5, chapter 4, pages 92–118. Sociedade Brasileira de Computação, São Carlos, SP.
Cao, S., Zhang, C., Yao, Z., Xiao, W., Nie, L., Zhan, D., Liu, Y., Wu, M., and Zhang, L. (2019). Efficient and effective sparse lstm on fpga with bank-balanced sparsity. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA ’19, page 63–72, New York, NY, USA. Association for Computing Machinery.
He, Y. and Xiao, L. (2024). Structured pruning for deep convolutional neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(5):2900–2919.
Intel Corporation (2024). Intel® 64 and IA-32 Architectures Optimization Reference Manual: Documentation Changes. Document Number: 355308-003US, Chapter 20: Intel® Advanced Matrix Extensions (Intel® AMX).
Jeong, G., Damani, S., Bambhaniya, A. R., Qin, E., Hughes, C. J., Subramoney, S., Kim, H., and Krishna, T. (2023). Vegeta: Vertically-integrated extensions for sparse/dense gemm tile acceleration on cpus.
Lin, B., Zheng, N., Wang, L., Cao, S., Ma, L., Zhang, Q., Zhu, Y., Cao, T., Xue, J., Yang, Y., and Yang, F. (2023). Efficient gpu kernels for n:m-sparse weights in deep learning. In Sixth Conference on Machine Learning and Systems (MLSys’23).
Lowe-Power, J., Akram, A., Amin, R., Hill, M. D., Wood, D. A., Chen, D. H., Hsu, L., Krishna, T., Agarwal, N., Wright, A. R., et al. (2020). The gem5 simulator: Version 20.0+. arXiv preprint arXiv:2007.03152.
Maturana, D. and Scherer, S. (2015). Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 922–928.
Mienye, I. D., Swart, T. G., Obaido, G., Jordan, M., and Ilono, P. (2025). Deep convolutional neural networks in medical image analysis: A review. Information, 16(3).
Mishra, A. K., Latorre, J. A., Pool, J., Stosic, D., Stosic, D., Venkatesh, G., Yu, C., and Micikevicius, P. (2021). Accelerating sparse deep neural networks. CoRR, abs/2104.08378.
NVIDIA Corporation (2020). Nvidia a100 tensor core gpu architecture. Technical Report V1.0, NVIDIA Corporation. Whitepaper.
Ta, T., Randall, J., and Batten, C. (2025). Sparsezipper: Enhancing matrix extensions to accelerate spgemm on cpus.
Tang, A., Quan, P., Niu, L., and Shi, Y. (2022). A survey for sparse regularization based compression methods. Annals of Data Science, 9(4):695–722.
Tang, Y., Zhao, C., Wang, J., Zhang, C., Sun, Q., Zheng, W. X., Du, W., Qian, F., and Kurths, J. (2023). Perception and navigation in autonomous systems in the era of learning: A survey. IEEE Transactions on Neural Networks and Learning Systems, 34(12):9604–9624.
Titopoulos, V., Alexandridis, K., Peltekis, C., Nicopoulos, C., and Dimitrakopoulos, G. (2023). Indexmac: A custom risc-v vector instruction to accelerate structured-sparse matrix multiplications.
Titopoulos, V., Alexandridis, K., Peltekis, C., Nicopoulos, C., and Dimitrakopoulos, G. (2025). Optimizing structured-sparse matrix multiplication in risc-v vector processors.
Waterman, A., Asanović, K., and Foundation, R.-V. (2021). The risc-v instruction set manual, volume i: Unprivileged isa. Technical report, RISC-V Foundation. Document Version 20191213.
C. Aquino, I., Wanner, L., and Rigo, S. (2024). Architectural Simulation with gem5, chapter 4, pages 92–118. Sociedade Brasileira de Computação, São Carlos, SP.
Cao, S., Zhang, C., Yao, Z., Xiao, W., Nie, L., Zhan, D., Liu, Y., Wu, M., and Zhang, L. (2019). Efficient and effective sparse lstm on fpga with bank-balanced sparsity. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA ’19, page 63–72, New York, NY, USA. Association for Computing Machinery.
He, Y. and Xiao, L. (2024). Structured pruning for deep convolutional neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(5):2900–2919.
Intel Corporation (2024). Intel® 64 and IA-32 Architectures Optimization Reference Manual: Documentation Changes. Document Number: 355308-003US, Chapter 20: Intel® Advanced Matrix Extensions (Intel® AMX).
Jeong, G., Damani, S., Bambhaniya, A. R., Qin, E., Hughes, C. J., Subramoney, S., Kim, H., and Krishna, T. (2023). Vegeta: Vertically-integrated extensions for sparse/dense gemm tile acceleration on cpus.
Lin, B., Zheng, N., Wang, L., Cao, S., Ma, L., Zhang, Q., Zhu, Y., Cao, T., Xue, J., Yang, Y., and Yang, F. (2023). Efficient gpu kernels for n:m-sparse weights in deep learning. In Sixth Conference on Machine Learning and Systems (MLSys’23).
Lowe-Power, J., Akram, A., Amin, R., Hill, M. D., Wood, D. A., Chen, D. H., Hsu, L., Krishna, T., Agarwal, N., Wright, A. R., et al. (2020). The gem5 simulator: Version 20.0+. arXiv preprint arXiv:2007.03152.
Maturana, D. and Scherer, S. (2015). Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 922–928.
Mienye, I. D., Swart, T. G., Obaido, G., Jordan, M., and Ilono, P. (2025). Deep convolutional neural networks in medical image analysis: A review. Information, 16(3).
Mishra, A. K., Latorre, J. A., Pool, J., Stosic, D., Stosic, D., Venkatesh, G., Yu, C., and Micikevicius, P. (2021). Accelerating sparse deep neural networks. CoRR, abs/2104.08378.
NVIDIA Corporation (2020). Nvidia a100 tensor core gpu architecture. Technical Report V1.0, NVIDIA Corporation. Whitepaper.
Ta, T., Randall, J., and Batten, C. (2025). Sparsezipper: Enhancing matrix extensions to accelerate spgemm on cpus.
Tang, A., Quan, P., Niu, L., and Shi, Y. (2022). A survey for sparse regularization based compression methods. Annals of Data Science, 9(4):695–722.
Tang, Y., Zhao, C., Wang, J., Zhang, C., Sun, Q., Zheng, W. X., Du, W., Qian, F., and Kurths, J. (2023). Perception and navigation in autonomous systems in the era of learning: A survey. IEEE Transactions on Neural Networks and Learning Systems, 34(12):9604–9624.
Titopoulos, V., Alexandridis, K., Peltekis, C., Nicopoulos, C., and Dimitrakopoulos, G. (2023). Indexmac: A custom risc-v vector instruction to accelerate structured-sparse matrix multiplications.
Titopoulos, V., Alexandridis, K., Peltekis, C., Nicopoulos, C., and Dimitrakopoulos, G. (2025). Optimizing structured-sparse matrix multiplication in risc-v vector processors.
Waterman, A., Asanović, K., and Foundation, R.-V. (2021). The risc-v instruction set manual, volume i: Unprivileged isa. Technical report, RISC-V Foundation. Document Version 20191213.
Publicado
28/10/2025
Como Citar
RIBAS, Luc; AQUINO, Iago; FELZMANN, Isaías; WANNER, Lucas; ARAÚJO, Guido.
MSE: A Matrix Sparsity Extension for RISC-V. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 26. , 2025, Bonito/MS.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 254-265.
DOI: https://doi.org/10.5753/sscad.2025.16697.
