Research on Intelligent Speech Interaction System Based on Residual Neural Network and Baidu Speech Platform
DOI: https://doi.org/10.62517/jike.202404213
Author(s)
Qiongfei Wu, Junhao Wu, Yi Chen, Zhiqiang Zhang
Affiliation(s)
School of Intelligent Engineering, Wuhan Institute of Design and Science, Wuhan, Hubei China
Abstract
To address the dual needs of convenience and security in human-computer interaction under the context of the Internet of Things (IoT) and Artificial Intelligence (AI), a system has been designed based on Raspberry Pi4B, which integrates voice recognition, speech synthesis, and speaker verification functions. Voice recognition and speech synthesis capabilities leverage Baidu's speech platform technology, while speaker verification employs a Residual Neural Network (ResNet34) model based on the PyTorch framework. With a focus on enhancing the user experience, the system incorporates the snowboy offline voice wake-up engine for voice interaction and utilizes Python's Tkinter library to implement a customized graphical user interface (GUI). After strict testing and verification, this system not only efficiently and friendly meets various voice interaction scenarios in the field of Internet of Things technology, but also utilizes voiceprint recognition technology to ensure the application security of the system. It also provides research value for the innovation of open-source hardware platforms in the field of artificial intelligence.
Keywords
Voice Recognition; Speech Synthesis; Speaker Verification; Baidu Voice; Neural Network
References
[1] He Song, Huang Wei, Wu Xiyao, et al. Design of Intelligent Voice Interaction Robot Based on Cloud Platform. Software Engineering, 2021-24 (04): 55-59.
[2] Lu Man, Chen Jiayue. Design of Intelligent Voice Interaction System Based on Raspberry Pi. Technology and Innovation, 2023, (18): 47-49.
[3] Huang Yiqiu, Zhou Cheng, et al. Design of Intelligent Household Medicine Box Control System. Internet of Things Technology, 2019, 9 (07): 95-96+100.
[4] Li Gang Python: Instance based GUI (Tkinter) programming. Computer programming skills and maintenance, 2022, (06): 7-9.
[5] Liu Shilong, Xie Dian, Tang Zhiyuan, et al. Voice interaction method for intelligent guided cane. Internet of Things Technology, 2024, 14 (03): 128-130.
[6] Sun Lin, Yang Lin. Research on the Application of Speech Technology Teaching Based on Baidu AI Platform. China Education Technology Equipment, 2021, (15): 117-118+126.
[7] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770-778.
[8] Xing Xiaohai. Research on Voiceprint Recognition Based on Residual Networks and Attention Mechanisms. Ningxia University, 2022.
[9] Li Yuxin, Wang Jiaxin, Liu Lijun. Garbage classification recognition mini program based on ResNet34 convolutional neural network. Computer and Information Technology, 2024, 32 (02):1-3.
[10] Zhang Hailong, Wang Liheng, Ji Xinran. Design of a deep learning based voiceprint recognition identity verification system. Automation and Instrumentation, 2024, 39 (04):130-134.
[11] Yeyupiaoling. (2022). Voiceprint recognition system based on Pytorch. https://github.com/yeyupiaoling/VoiceprintRecognition-Pytorch.
[12] Bajrami X, Gashi B. Face recognition with Raspberry Pi using deep neural networks. International Journal of Computational Vision and Robotics, 2022, 12(2):177-193.
[13] Bi Zongying. Research on Voiceprint Recognition Model Based on Deep Learning. Central North University, 2023.
[14] Yanxiong Li, Zhongjie Jiang. Speaker verification using attentive multi-scale convolutional recurrent network. Applied Soft Computing, Volume 126, 2022, 109291, ISSN 1568-4946.
[15] Ali Bou Nassif, Ismail Shahin. Speech Recognition Using Deep Neural Networks: A Systematic Review. IEEE Access, vol. 7, pp. 19143-19165, 2019.