- 无标题文档
















 工学 - 软件工程    























语音增强 ; 全尺度连接 ; 特征融合 ; 子带分析 ; 深度复数网络









[1] Loizou P C. Speech enhancement: Theory and practice [M]. CRC press, 2007.

[2] Boll S. Suppression of acoustic noise in speech using spectral subtraction [J]. IEEE Transactions on acoustics, speech, and signal processing, 1979, 27(2): 113-120.

[3] Paliwal K, Basu A. A speech enhancement method based on kalman filtering [C]. ICASSP'87 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1987: 177-180.

[4] Ephraim Y, Malah D. Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator [J]. IEEE Transactions on acoustics, speech, and signal processing, 1984, 32(6): 1109-1121.

[5] Ephraim Y, Van Trees H L. A signal subspace approach for speech enhancement [J]. IEEE Transactions on speech and audio processing, 1995, 3(4): 251-266.

[6] 路成, 田猛, 周健, et al. L_(1/2)稀疏约束卷积非负矩阵分解的单通道语音增强方法 [J]. 声学学报, 2017, 42(03): 377-384.

[7] Wang Y, Wang D. Towards scaling up classification-based speech separation [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(7): 1381-1390.

[8] Xu Y, Du J, Dai L-R, et al. A regression approach to speech enhancement based on deep neural networks [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 23(1): 7-19.

[9] Wang D. On ideal binary mask as the computational goal of auditory scene analysis [M]. Speech separation by humans and machines. Springer. 2005: 181-197.

[10] Lu X, Tsao Y, Matsuda S, et al. Speech enhancement based on deep denoising autoencoder [C]. Interspeech, 2013:

[11] 徐勇. 基于深层神经网络的语音增强方法研究 [D]. 中国科学技术大学, 2015.

[12] Erdogan H, Hershey J R, Watanabe S, et al. Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks [C]. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015: 708-712.

[13] Chen J, Wang Y, Yoho S E, et al. Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises [J]. The Journal of the Acoustical Society of America, 2016, 139(5): 2604-2612.

[14] Park S R, Lee J W. A fully convolutional neural network for speech enhancement [C]. Interspeech 2017, 2017: 1993-1997.

[15] Tan K, Wang D. A convolutional recurrent neural network for real-time speech enhancement [C]. Interspeech, 2018: 3229-3233.

[16] Tan K, Chen J, Wang D. Gated residual networks with dilated convolutions for monaural speech enhancement [J]. IEEE/ACM transactions on audio, speech, and language processing, 2018, 27(1): 189-198.

[17] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation [C].MICCAI 2015, 2015: 234-241.

[18] Badrinarayanan V, Kendall A, Cipolla R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation [J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(12): 2481-2495.

[19] Jansson A, Humphrey E, Montecchio N, et al. Singing voice separation with deep u-net convolutional networks [C]. 18th International Society for Music Information Retrieval Conference, 2017: 23-27.

[20] Stoller D, Ewert S, Dixon S. Wave-u-net: A multi-scale neural network for end-to-end audio source separation [C]. International Society for Music Information Retrieval (ISMIR) Conference 2018, 2018: 334-340.

[21] Soni M H, Shah N, Patil H A. Time-frequency masking-based speech enhancement using generative adversarial network [C]. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018: 5039-5043.

[22] Giri R, Isik U, Krishnaswamy A. Attention wave-u-net for speech enhancement [C]. 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2019: 249-253.

[23] Deng F, Jiang T, Wang X, et al. Naagn: Noise-aware attention-gated network for speech enhancement [C]. INTERSPEECH, 2020: 2457-2461.

[24] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets [J]. Advances in neural information processing systems, 2014, 27.

[25] Pascual S, Bonafonte A, Serrà J. Segan: Speech enhancement generative adversarial network [C]. Interspeech 2017, 2017: 3642-3646.

[26] Liu G, Gong K, Liang X, et al. Cp-gan: Context pyramid generative adversarial network for speech enhancement [C]. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020: 6624-6628.

[27] Lin J, Niu S, van Wijngaarden A J, et al. Improved speech enhancement using a time-domain gan with mask learning [C]. INTERSPEECH, 2020: 3286-3290.

[28] Phan H, McLoughlin I V, Pham L, et al. Improving gans for speech enhancement [J]. IEEE Signal Processing Letters, 2020, 27: 1700-1704.

[29] Zhang Z, Deng C, Shen Y, et al. On loss functions and recurrency training for gan-based speech enhancement systems [C]. Proc Interspeech 2020, 2020: 3266-3270.

[30] Nugraha A A, Sekiguchi K, Yoshii K. A flow-based deep latent variable model for speech spectrogram modeling and enhancement [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 1104-1117.

[31] Strauss M, Edler B. A flow-based neural network for time domain speech enhancement [C]. ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021: 5754-5758.

[32] Phan H, Le Nguyen H, Chén O Y, et al. Self-attention generative adversarial network for speech enhancement [C]. ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021: 7103-7107.

[33] Kim J, El-Khamy M, Lee J. T-gsa: Transformer with gaussian-weighted self-attention for speech enhancement [C]. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020: 6649-6653.

[34] Zhao Y, Wang D, Xu B, et al. Monaural speech dereverberation using temporal convolutional networks with self attention [J]. IEEE/ACM transactions on audio, speech, and language processing, 2020, 28: 1598-1607.

[35] Koizumi Y, Yatabe K, Delcroix M, et al. Speech enhancement using self-adaptation and multi-head self-attention [C]. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020: 181-185.

[36] Pandey A, Wang D. Dense cnn with self-attention for time-domain speech enhancement [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 1270-1279.

[37] Zhao Y, Wang D. Noisy-reverberant speech enhancement using denseunet with time-frequency attention [C]. INTERSPEECH, 2020: 3261-3265.

[38] Li X, Horaud R. Online monaural speech enhancement using delayed subband lstm [C]. Proceedings of INTERSPEECH, 2020:

[39] Liu H, Xie L, Wu J, et al. Channel-wise subband input for better voice and accompaniment separation on high resolution music [J]. Proc Interspeech 2020, 2020: 1241-1245.

[40] Narayanan A, Wang D. Ideal ratio mask estimation using deep neural networks for robust speech recognition [C]. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013: 7092-7096.

[41] Wang Y, Narayanan A, Wang D. On training targets for supervised speech separation [J]. IEEE/ACM transactions on audio, speech, and language processing, 2014, 22(12): 1849-1858.

[42] Williamson D S, Wang Y, Wang D. Complex ratio masking for monaural speech separation [J]. IEEE/ACM transactions on audio, speech, and language processing, 2015, 24(3): 483-492.

[43] Tan K, Wang D. Complex spectral mapping with a convolutional recurrent network for monaural speech enhancement [C]. ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019: 6865-6869.

[44] Trabelsi C, Bilaniuk O, Zhang Y, et al. Deep complex networks [C]. International Conference on Learning Representations, 2018:

[45] Choi H-S, Kim J-H, Huh J, et al. Phase-aware speech enhancement with deep complex u-net [C]. International Conference on Learning Representations, 2018:

[46] Hu Y, Liu Y, Lv S, et al. Dccrn: Deep complex convolution recurrent network for phase-aware speech enhancement [C]. Proc Interspeech 2020, 2020: 2472-2476.

[47] Afouras T, Chung J S, Zisserman A. The conversation: Deep audio-visual speech enhancement [C]. Proc Interspeech 2018, 2018: 3244-3248.

[48] Wang W, Xing C, Wang D, et al. A robust audio-visual speech enhancement model [C]. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020: 7529-7533.

[49] Li A, Zheng C, Fan C, et al. A recursive network with dynamic attention for monaural speech enhancement [C]. Interspeech 2020, 2020: 2422-2426.

[50] Huang H, Lin L, Tong R, et al. Unet 3+: A full-scale connected unet for medical image segmentation [C]. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020: 1055-1059.

[51] Garofolo J S, Lamel L F, Fisher W M, et al. Darpa timit acoustic-phonetic continous speech corpus cd-rom. Nist speech disc 1-1.1 [J]. 1993, 93: 27403.

[52] Hu G, Wang D. A tandem algorithm for pitch estimation and voiced speech segregation [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2010, 18(8): 2067-2079.

[53] Varga A, Steeneken H J. Assessment for automatic speech recognition: Ii. Noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems [J]. Speech communication, 1993, 12(3): 247-251.

[54] Rix A W, Beerends J G, Hollier M P, et al. Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs [C]. 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing Proceedings (Cat No 01CH37221), 2001: 749-752.

[55] Taal C H, Hendriks R C, Heusdens R, et al. An algorithm for intelligibility prediction of time–frequency weighted noisy speech [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(7): 2125-2136.

[56] Takahashi N, Mitsufuji Y. Multi-scale multi-band densenets for audio source separation [C]. 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2017: 21-25.

[57] Liu Y, Zhang H, Zhang X, et al. Supervised speech enhancement with real spectrum approximation [C]. ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019: 5746-5750.

[58] Luo Y, Mesgarani N. Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation [J]. IEEE/ACM transactions on audio, speech, and language processing, 2019, 27(8): 1256-1266.












   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式