Extração Eficiente de MFCCs em FPGA: Uma implementação Aberta e Flexível

Julio N. Avelar; Vinicius P. M. Miguel; Tiago S. Zaparoli; Enzo P. Bertoloti; Gabriel Oliveira; Rodolfo Azevedo

doi:10.5753/sscad.2025.16748

Julio N. Avelar UNICAMP
Vinicius P. M. Miguel UNICAMP
Tiago S. Zaparoli UNICAMP
Enzo P. Bertoloti UNICAMP
Gabriel Oliveira UNICAMP
Rodolfo Azevedo UNICAMP

DOI: https://doi.org/10.5753/sscad.2025.16748

Resumo

Este artigo apresenta um conjunto de IPs de código aberto e parametrizáveis para extração de MFCCs em FPGAs, otimizado para aplicações de borda e de tempo real. A implementação em um FPGA Kintex-7 processa um quadro de áudio em apenas 56 µs, superando em 1.62 vezes um CPU de alto desempenho (AMD Ryzen 7 7700). A arquitetura é eficiente em recursos, utilizando cerca de 5900 LUTs e 31 DSPs, e mantém alta precisão (erro absoluto médio de 8.18%), validando-se como uma solução de alta performance e baixo consumo para análise de áudio em tempo real.

Referências

Abdul, Z. K. and Al-Talabani, A. K. (2022). Mel Frequency Cepstral Coefficient and its Applications: A Review.

Anshu, Raghuvanshi, A., and Muchahary, D. (2022). Fpga design for efficient speech processing system.

Bahoura, M. and Ezzaidi, H. (2013). Hardware implementation of MFCC feature extraction for respiratory sounds analysis.

Boujelben, O. and Bahoura, M. (2018). Efficient fpga-based architecture of an automatic wheeze detector using a combination of mfcc and svm algorithms.

Cohen, Y., Faccio, M., and Rozenes, S. (2025). Vocal Communication Between Cobots and Humans to Enhance Productivity and Safety: Review and Discussion.

Dao, V.-L., Nguyen, V.-D., Nguyen, H.-D., and Hoang, V.-P. (2017). Hardware Implementation of MFCC Feature Extraction for Speech Recognition on FPGA. Cham.

Ehkan, P., Zakaria, F. F., Warip, M. N. M., Sauli, Z., and Elshaikh, M. (2015). Hardware Implementation of MFCC-Based Feature Extraction for Speaker Recognition.

Fariselli, M., Rusci, M., Cambonie, J., and Flamand, E. (2021). Integer-Only Approximated MFCC for Ultra-Low Power Audio NN Processing on Multi-Core MCUs.

K, N., Gadamsetty, M., and J Kailath, B. (2019). Fpga implementation of speech recognizer for isolated words.

Kou, H., Shang, W., Lane, I., and Chong, J. (2013). Efficient MFCC feature extraction on Graphics Processing Units.

Michálek, J. and Vaněk, J. (2014). An open-source GPU-accelerated feature extraction tool.

Nguyen, T., Pham, L., Nguyen, H., Bui, B., Ngo, D., and Hoang, T. (2016). A High Performance Dynamic ASIC-Based Audio Signal Feature Extraction (MFCC).

Paul S, B. S., Glittas, A. X., and Gopalakrishnan, L. (2021). A low latency modular-level deeply integrated MFCC feature extraction architecture for speech recognition. Integration, 76:69–75.

Tsai, T. and Wang, C. (2024). GMM-Based Speaker Verification System with Hardware MFCC in SoC Design. 83(19).

Wang, J.-C., Wang, J.-F., and Weng, Y.-S. (2000). Chip design of mel frequency cepstral coefficients for speech recognition. volume 6.

Wassi, G., Iloga, S., Romain, O., and Granado, B. (2015). FPGA-based real-time MFCC extraction for automatic audio indexing on FM broadcast data.

Ye, T., Peng, T., and Yang, L. (2025). Review on Sound-Based Industrial Predictive Maintenance: From Feature Engineering to Deep Learning. Mathematics, 13(11).

Yi, A. and Talakoub, O. (2008). Implementing a Speech Recognition System on a Graphics Processor Unit ( GPU ) using CUDA.

Zhao, C., Yamamura, N., Tsutsui, H., and Ohgane, T. (2024). Evaluation of Computational Cost and Result Accuracy in Design and Efficient Implementation of Log-Mel Spectrogram and MFCC Feature Extraction Using Fixed-Point Arithmetic on FPGA.

Zhou, Y., Al-Hawaj, K. M., and Zhang, Z. (2017). A New Approach to Automatic Memory Banking using Trace-Based Address Mining. FPGA ’17.