引用本文格式: 郭东岳,林毅,杨波. 基于CGRU多输入特征的地空通话自动切分 [J]. 四川大学学报: 自然科学版, 2020, 57: 887~893.
 
基于CGRU多输入特征的地空通话自动切分
Automatic speech segmentation for air-ground communication based on multi-input CGRU neural network
摘要点击 234  全文点击 27  投稿时间:2019-06-25  修订日期:2019-12-19
查看全文  查看/发表评论  下载PDF阅读器
DOI编号   
中文关键词   语音切分;语音端点检测;地空通话;卷积神经网络  循环神经网络
英文关键词   speech segmentation  VAD  ground-to-air conversation  CNN  RNN
基金项目   国家自然科学基金委员会与中国民用航空局联合项目(U1833115)
作者单位E-mail
郭东岳 四川大学 视觉合成图形图像技术国防重点学科实验室 18865518502@163.com 
林毅 四川大学 视觉合成图形图像技术国防重点学科实验室  
杨波 1.四川大学视觉合成图形图像技术国防重点学科实验室 成都 610065
2.四川大学计算机学院 成都 610065 
boyang@scu.edu.cn 
Author NameAffiliationE-mail
GUO Dong-Yue National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University 18865518502@163.com 
LIN Yi National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University  
YANG Bo National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu 610065, China 2. College of Computer Science, Sichuan University, Chengdu 610065, China boyang@scu.edu.cn 
中文摘要
    自动语音切分是语音识别、声纹识别、语音降噪等语音应用中非常重要的预处理环节,切分算法的优劣直接影响了系统输出结果的精度.在空管地空通话中,传输信道噪声、天气因素以及说话人工作状态均会对语音信号产生影响,进而在一定程度上影响语音切分性能.在分析空管地空通话语音特性基础上,提出了一种基于CGRU网络多输入特征的自动语音切分方法.该方法结合地空通话的特点,采用深度学习的方法进一步提取语音信号的时域和频域非线性特征,将语音信号帧分类为语音帧、结束帧以及其他帧三类.实验对比了多种语音特征作为输入对切分效果的影响,同时验证了GMM、CNN、CLDNN、CGRU等切分算法在真实地空通话测试集上的表现,并提出了一种简单预测结果平滑算法.实验结果表明,文中提出的自动切分方法在地空通话中具有明显优势,分类模型的AUC值达到了0.98.
英文摘要
    Automatic Speech segmentation is a very important pre processing approach in many large scale applications such as speech recognition, speaker recognition and speech noise reduction. The performance of the segmentation algorithm directly affects the accuracy of the system output. In the air traffic control, the quality of the channel, the weather factor and the workload level of the speaker hugely affect the speech segmentation performance. In this paper, by analyzing the speech feature of air ground communication, an automatic speech segmentation approach is proposed based on CGRU network. The proposed method analyzes the characteristics of air ground communication, and uses the deep learning method to further extract the time domain and frequency domain nonlinear features of the speech signal, and classifies the speech signal frame into three categories: speech, end signal and others. The experiment compares the effects of multiple speech features as input on the segmentation effect, and verifies the performance of GMM, CNN, CLDNN, CGRU and other segmentation algorithms on the air ground communication test dataset, a simple prediction result smoothing algorithm is presented. The experimental results show that the automatic segmentation method proposed in this paper has obvious advantages in air ground communication, the AUC value of the classification model reaches 0.98.

您是第 3782256 位访问者

版权所有 @ 2007《四川大学学报 (自然科学版)》编辑部
地址: 四川省成都市武侯区四川大学望江校区文科楼330至342室  邮编: 610064
电话: (028)85410393  传真: (028)85410393  E-mail: scdx@scu.edu.cn
本系统由北京勤云科技发展有限公司设计