[1] WANG H,SCHMID C.Action recognition with improved trajectories[A].International Conference on Computer Vision[C].Australia:IEEE,2013.3551-3558.
[2] WANG L,Qiao Y,TANG X.MoFAP:A multi-level representation for action recognition[J].International Journal of Computer Vision,2016,119(3):254-271.
[3] SUN S,KUANG Z,SHENG L,et al.Optical flow guided feature:a fast and robust motion representation for video action recognition[A].Computer Vision and Pattern Recognition[C].USA:IEEE,2018.1390-1399.
[4] LEE M,LEE S,SON S,et al.Motion feature network:Fixed motion filter for action recognition[A].European Conference on Computer Vision(ECCV)[C].Germany:Springer,2018.387-403.
[5] LIN J,GAN C,HAN S.Temporal Shift Module for Efficient Video Understanding[DB/OL].arXiv:1811.08383,2018.
[6] ZOLFAGHARI M,SINGH K,BROX T.ECO:Efficient convolutional network for online video understanding[A].European Conference on Computer Vision[C].German:Springer,2018.713-730.
[7] DIBA A,SHARMA V,VAN GOOL L.Deep temporal linear encoding networks[A].Computer Vision and Pattern Recognition(CVPR)[C].USA:IEEE,2017.1541-1550.
[8] KARPATHY A,TODERRICI G,SHETTY S,et al.Large-scale video classification with convolutional neural networks[A].Computer Vision and Pattern Recognition[C].USA:IEEE,2014.1725-1732.
[9] DONAHUE J,ANNE HENDRICKS L,GUADARRAMA S,et al.Long-term recurrent convolutional networks for visual recognition and description[A].Computer Vision and Pattern Recognition[C].USA:IEEE,2015. 2625-2634.
[10] LEV G,SADEH G,KLEIN B,et al.Rnn fisher vectors for action recognition and image annotation[A].European Conference on Computer Vision[C].German:Springer,2016.833-850.
[11] ZHU J,ZHU Z,ZOU W.End-to-end video-level representation learning for action recognition[A].International Conference on Pattern Recognition (ICPR)[C].China:IEEE,2018:645-650.
[12] WANG L,XIONG Y,WANG Z,et al.Temporal segment networks:Towards good practices for deep action recognition[A].European Conference on Computer Vision[C].German:Springer,2016.20-36.
[13] SIMONYAN K,ZISSERMAN A.Two-stream convolutional networks for action recognition in videos[A].Neural Information Processing Systems[C].Canada:NIPS,2014.568-576.
[14] FEICHTENHOFER C,PINZ A,ZISSERMAN A.Convolutional two-stream network fusion for video action recognition[A].Computer Vision and Pattern Recognition[C].USA:IEEE,2016.1933-1941.
[15] TRAN D,BOURDEV L D,FERGUS R,et al.C3D:generic features for video analysis[J].Computer Research Repository,2014,2(7):1-8.
[16] HARA K,KATAOKA H,SATOH Y.Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet[A].Computer Vision and Pattern Recognition[C].USA:IEEE,2018.18-22.
[17] CARREIRA J,ZISSERMAN A.Quo vadis,action recognition? a new model and the kinetics dataset[A].Computer Vision and Pattern Recognition(CVPR)[C].USA:IEEE,2017.4724-4733.
[18] TRAN D,WANG H,TORRESANI L,et al.A closer look at spatiotemporal convolutions for action recognition[A].Computer Vision and Pattern Recognition[C].USA:IEEE,2018.6450-6459.
[19] QIU Z,YAO T,MEI T.Learning spatio-temporal representation with pseudo-3d residual networks[A].International Conference on Computer Vision(ICCV)[C].Italy:IEEE,2017.5534-5542.
[20] XIE S,SUN C,HUANG J,et al.Rethinking spatiotemporal feature learning for video understanding[J].Computer Research Repository,2018,27(7):1-10.
[21] DIBA A,FAYYAZ M,SHARMA V,et al.Temporal 3D convNets:New Architecture and Transfer Learning for Video Classification[DB/OL].arXiv:1711.08200,2017.
[22] WANG X,GIRSHICK R,GUPTA A,et al.Non-local neural networks[A].Computer Vision and Pattern Recognition[C].USA:IEEE,2018.7794-7803.
[23] DIBA A,FAYYAZ M,SHARMA V,et al.Spatio-temporal channel correlation networks for action classification[A].European Conference on Computer Vision[C].German:Springer,2018.284-299.
[24] SOOMRO K,ZAMIR A R,SHAH M.UCF101:A dataset of 101 human actions classes from videos in the wild[J].Computer Science,2012,3(12):2-9.
[25] KUEHNE H,JHUANG H,GARROTE E,et al.HMDB:a large video database for human motion recognition[A].International Conference on Computer Vision(ICCV)[C].Spain:IEEE,2011.2556-2563.
[26] FEICHTENHOFER C,PINZ A,WILDES R.Spatiotemporal residual networks for video action recognition[A].Neural Information Processing Systems[C].Spain:NIPS,2016.3468-3476.
[27] LI Z,GAVRILYUK K,GAVVES E,et al.VideoLSTM convolves,attends and flows for action recognition[J].Computer Vision and Image Understanding,2018,166(3):41-50.
[28] HU J F,ZHENG W S,PAN J,et al.Deep bilinear learning for rgb-d action recognition[A].European Conference on Computer Vision(ECCV)[C].German:Springer,2018.335-351.
[29] RENDLE S.Factorization machines[A].International Conference on Data Mining[C].USA:IEEE,2010.995-1000. |