1 |
杨明浩,高廷丽,陶建华,等.对话意图及语音识别错误对交互体验的影响[J]. 软件学报, 2016, 27(S2): 69 - 75.
|
|
Yang MH, Gao TL, Tao JH, et al. Error analysis of intention classification and speech recognition in human-computer dialog [J]. Journal of Software, 2016, 27(S2): 69 - 75. (in Chinese)
|
2 |
Rodriguez E, RuiZ B, Garcia-Crespo A, et al. Speech/speaker recognition using a HMM/GMM hybrid model [A]. International Conference on Audio-and Video-Based Biometric Person Authentication [C]. Berlin, Heidelberg: Springer, 1997. 227 - 234.
|
3 |
Mohamed AR, Sainath TN, Dahl G, et al. Deep belief networks using discriminative features for phone recognition [A]. 2011 IEEE International Conference on Acoustics, Speech and Signal Processing [C]. Prague, Czech Republic: IEEE, 2011. 5060 - 5063.
|
4 |
Yu D, Deng L. Deep learning and its applications to signal and information processing [J]. IEEE Signal Processing Magazine, 2011, 28(1): 145 - 154.
|
5 |
Graves A, Mohamed AR, Hinton G. Speech recognition with deep recurrent neural networks [A]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) [C]. Vancouver, Canada: IEEE, 2013. 6645 - 6649.
|
6 |
Sak H, Senior A, Beaufays F. Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition [A]. The 15th Annual Conference of the International Speech Communication Association [C]. Singapore: ISCA, 2014.338 - 342.
|
7 |
Abdel-Hamid O, Mohamed AR, Jiang H, et al. Convolutional neural networks for speech recognition [J]. IEEE/ACM Transactions on Audio Speech & Language Processing, 2014, 22(10): 1533 - 1545.
|
8 |
Graves A, Fernández S, Gomez F, et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks [A]. International Conference on Machine Learning, ICML 2006 [C]. Pittsburgh, PA: ACM, 2006. 369 - 376.
|
9 |
Zhang Y, Pezeshki M, Brakel P, et al. Towards end-to-end speech recognition with deep convolutional neural networks [A]. The 17th Annual Conference of the International Speech Communication Association [C]. San Francisco, CA: ISCA, 2016. 410 - 414.
|
10 |
Yang XD, Wang WZ, Yang HW, et al. Simple data augmented transformer end-to-end Tibetan speech recognition [A]. IEEE 3rd International Conference on Information Communication and Signal Processing [C]. NY: IEEE, 2020. 148 - 152.
|
11 |
Chang HJ, Liu AH, Lee HY, et al. End-to-end whispered speech recognition with frequency-weighted approaches and pseudo whisper pre-training [A]. IEEE Spoken Language Technology Workshop [C]. NY: IEEE, 2021. 186 - 193.
|
12 |
Fan CH, Yi JY, Tao JH, et al. Gated recurrent fusion with joint training framework for robust end-to-end speech recognition [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 198 - 209.
|
13 |
Graves A, Jürgen S. Framewise phoneme classification with bidirectional LSTM and other neural network architectures [J]. Neural Networks, 2005, 18(5-6): 602 - 610.
|
14 |
Sainath TN, Vinyals O, Senior A, et al. Convolutional, long short-term memory, fully connected deep neural networks [A]. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) [C]. NY: IEEE, 2015. 4580 - 4584.
|
15 |
Amodei D, Ananthanarayanan S, Anubhai R, et al. Deep speech 2: end-to-end speech recognition in English and Mandarin [A]. International Conference on Machine Learning 2016 [C]. NY: ACM, 2016. 173 - 182.
|
16 |
王海坤,潘嘉,刘聪.语音识别技术的研究进展与展望[J]. 电信科学报, 2018, 2: 1 - 11.
|
|
Wang HK, Pan J, Liu C. Research development and forecast of automatic speech recognition technologies [J]. Telecommunications Science, 2018, 2: 1 - 11. (in Chinese)
|
17 |
Kannan A, Wu YH, Nguyen P, et al. An analysis of incorporating an external language model into a sequence-to-sequence model [A]. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing [C]. Calgary, Canada: IEEE, 2017. 5824-5828.
|
18 |
Gulcehre C, Firat O, Xu K, et al. On using monolingual corpora in neural machine translation [OL]. http://arxiv.org/abs/1503.03535, 2015.
|
19 |
Anuroop S, Heewoo J, Sanjeev S, et al. Cold fusion: Training seq2seq models together with language models [A]. The 19th Annual Conference of the International Speech Communication Association [C]. Hyderabad, India: ISCA, 2018. 387 - 391.
|
20 |
Toshniwal S, Kannan A, Chiu CC, et al. A comparison of techniques for language model integration in encoder-decoder speech recognition [A]. IEEE Workshop on Spoken Language Technology [C]. Athens, Greece: IEEE, 2018. 369 - 375.
|
21 |
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need [A]. Advances in Neural Information Processing Systems[C]. Long Beach, CA: MIT Press, 2017. 5998-6008.
|
22 |
Bu H, Du J, Na X, et al. Aishell-1: An open-source mandarin speech corpus and a speech recognition baseline [A]. The 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA) [C]. Seoul, South Korea: IEEE, 2017. 58 - 62.
|
23 |
Wang CH, Zhang M, Ma SP, et al. Automatic online news issue construction in Web environment [A]. The 17th International World Wide Web Conference [C]. Beijing, China: ACM, 2008. 457 - 466.
|
24 |
Kingma D, Ba J. Adam: a method for stochastic optimization [A]. IEEE 17th International Conference on Computational Science and Engineering (CSE) [C]. Chengdu, China: IEEE, 2014. 563 - 568.
|
25 |
Sergey I, Christian S. Batch normalization: accelerating deep network training by reducing internal covariate shift [A]. International Conference on Machine Learning 2015 [C]. Lille France: ACM, 2015. 448 - 456.
|
26 |
Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: a simple way to prevent neural networks from overfitting [J]. Journal of Machine Learning Research, 2014, 15(1): 1929 - 1958.
|
27 |
Graves A, Jaitly N. Towards end-to-end speech recognition with recurrent neural networks [A]. International Conference on Machine Learning[C]. Beijing, China: JMLR, 2014, 32(2): 1764 - 1772.
|
28 |
胡章芳,徐轩,付亚芹,等.基于ResNet-BLSTM的端到端语音识别[J].计算机工程与应用, 2020, 56(18): 124-130.
|
|
Hu ZF, Xu X, Fu YQ, et al. End to end speech recognition based on ResNet-BLSTM[J]. Computer Engineering and Applications, 2020, 56(18): 124 - 130. (in Chinese)
|