[1] Ioffe S,Szegedy C.Batch normalization:accelerating deep network training by reducing internal covariate shift[A].International Conference on Machine Learning[C].Lille,France:International Machine Learning Society,2015.448-456.
[2] Wang L,Xiong Y,Wang Z,et al.Temporal segment networks:towards good practices for deep action recognition[A].European Conference on Computer Vision[C].Amsterdam,Netherlands:Springer International Publishing,2016.20-36.
[3] Tran D,Bourdev L,Fergus R,et al.Learning spatiotemporal features with 3D convolutional networks[A].International Conference on Computer Vision[C].Santiago,Chile:IEEE,2015.4489-4497.
[4] Hochreiter S,Schmidhuber J.Longshort-term memory[J].Neural Computation,1997,9(8):1735-1780.
[5] Donahue J,Hendricks L A,Guadarrama S,et al.Long-termrecurrent convolutional networks for visual recognition and description[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017,39(4):677-691.
[6] Brox T,Bruhn A,Papenberg N,et al.High accuracy optical flow estimation based on a theory for warping[J].Computer Vision,2004,3024(10):25-36.
[7] Xu K,Ba J,Kiros R,et al.Show,attend and tell:neural image caption generation with visual attention[A].International Conference on Machine Learning[C].Lille,France:International Machine Learning Society,2015.2048-2057.
[8] Sharma S,Kiros R,Salakhutdinov R.Action Recognition Using Visual Attention[DB/OL].https://arxiv.org/abs/1511.04119,2015-11-12.
[9] Yan S,Smith J S,Lu W,et al.Hierarchical multi-scale attention networks for action recognition[J].Signal Processing:Image Communication,2018,61:73-84.
[10] T Yu,C Guo,L Wang,et al.Joint spatial-temporal attention for action recognition[J].Computer Science,2018,112(2018):226-233.
[11] Bahdanau D,Cho K,Bengio Y.Neural Machine Translation by Jointly Learning to Align and Translate[DB/OL].https://arxiv.org/abs/1409.0473,2014-09-01.
[12] Schuster M,Paliwal KK.Bidirectional recurrent neural networks[J].IEEE Transactions on Signal Processing, 1997,45(17):2673-2681.
[13] Soomro K,Zamir A R,Shah M.UCF101:A dataset of 101 human actions classes from videos in the wild[[DB/OL]].https://arxiv.org/abs/1212.0402,2012-12-03.
[14] Kuehne H,Jhuang H,Garrote E,et al.HMDB:a large video database for human motion recognition[A].International Conference on Computer Vision[C].Barcelona,Spain:IEEE,2011.2556-2563.
[15] Deng J,Dong W,Socher R,et al.ImageNet:A large-scale hierarchical image database[A].Computer Vision and Pattern Recognition[C].Miami,FL,USA:IEEE,2009.248-255.
[16] Xiang L,Chuang G,et al.Multimodal keyless attention fusion for video classification[A].32nd AAAI Conference on Artificial Intelligence[C].New Orleans,Louisiana,USA:AAAI,2018.7202-7209.
[17] Yuan Y,Wang D,Wang Q.Memory-Augmented Temporal Dynamic Learning for Action Recognition[DB/OL].https://arxiv.org/abs/1904.13080,2019-4-30.
[18] Fan L,Huang W,Gan C,et al.End-to-End Learning of Motion Representation for Video Understanding[A].Computer Vision and Pattern Recognition[C].Salt Lake City,UT,USA:IEEE,2018.6016-6025.
[19] Sengupta B,Qian Y.Pillar Networks++:Distributed Non-parametric Deep and Wide Networks[DB/OL].https://arxiv.org/abs/1708.06250,2017-08-18.
[20] Peng X,Wang L,Wang X,et al.Bag of visual words and fusion methods for action recognition:Comprehensive study and good practice[J].Computer Vision and Image Understanding,2016,150(2016):109-125.
[21] Li Z,Gavrilyuk K,Gavves E,et al.VideoLSTM convolves,attends and flows for action recognition[J].Computer Vision and Image Understanding,2018,166(2018):41-50.
[22] Wang L,Yu Q,Tang X.Action recognition with trajectory-pooled deep-convolutional descriptors[A].Computer Vision and Pattern Recognition[C].Boston,America:IEEE,2015.4305-4314.
[23] Carreira J,Zisserman A.Quo Vadis,Action Recognition? A New Model and the Kinetics Dataset[A].Computer Vision and Pattern Recognition[C].Hawaii,America:IEEE,2017.6299-6308.
[24] Feichtenhofer C,Pinz A,Wildes R P.Spatiotemporal residual networks for video action recognition[A].Advances in Neural Information Processing Systems[C].Barcelona:NIPS,2016.3468-3476.
[25] Cai Z,Wang L,Peng X,et al.Multi-view super vector for action recognition[A].Computer Vision and Pattern Recognition[C].Columbus,America:IEEE,2014.596-603. |