1 |
李志欣, 魏海洋, 张灿龙, 等. 图像描述生成研究进展[J]. 计算机研究与发展, 2021, 58(9): 1951‑1974.
|
|
LI Zhi-xin, WEI Hai-yang, ZHANG Can-long, et al. Research progress on image captioning[J]. Journal of Computer Research and Development, 2021, 58(9): 1951‑1974. (in Chinese)
|
2 |
DAI J, LI Y, HE K, et al. R-FCN: Object detection via region-based fully convolutional networks[C]//Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2016: 379‑387.
|
3 |
LI Zhi-xin, LIN Lan, ZHANG Can-long, et al. A semi-supervised learning approach based on adaptive weighted fusion for automatic image annotation [J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2021, 17(1): article37.
|
4 |
VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: A neural image caption generator[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, USA: IEEE Computer Society, 2015: 3156‑3164.
|
5 |
KARPATHY A, LI F F. Deep visual-semantic alignments for generating image descriptions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, USA: IEEE Computer Society, 2015: 3128‑3137.
|
6 |
MAO J, XU W, YANG Y, et al. Deep captioning with multimodal recurrent neural networks(m-RNN)[EB/OL]. [2021-09-22]. .
|
7 |
XU K, BA J, KIROS R, et al. Show, attend and tell: Neural image caption generation with visual attention[C]//Proceedings of International Conference on Machine Learning. Cambridge, USA: MIT Press, 2015: 2048‑2057.
|
8 |
CHEN L, ZHANG H, XIAO J, et al. SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, USA: IEEE Computer Society, 2017: 6298‑6306.
|
9 |
LU J, XIONG C, PARIKH D, et al. Knowing when to look: Adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, USA: IEEE Computer Society, 2017: 3242‑3250.
|
10 |
YOU Q, JIN H, WANG Z, et al. Image captioning with semantic attention[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, USA: IEEE Computer Society, 2016: 4651‑ 4659.
|
11 |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 5998‑6008.
|
12 |
YU A W, DOHAN D, LUONG M T, et al. QANet: Combining local convolution with global self-attention for reading comprehension[EB/OL]. [2021-09-22]. .
|
13 |
RANZATO M A, CHOPRA S, AULI M, et al. Sequence level training with recurrent neural networks[EB/OL]. [2021-09-22]. .
|
14 |
RENNIE S J, MARCHERET E, MROUEH Y, et al. Self-critical sequence training for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, USA: IEEE Computer Society, 2017: 1179‑1195.
|
15 |
PAPINENI K, ROUKOS S, WARD T, et al. BLEU: a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Stroudsburg, USA: ACL, 2002: 311‑318.
|
16 |
BANERJEE S, LAVIE A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments[C]//Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Stroudsburg, USA: ACL, 2005: 65‑72.
|
17 |
LIN C Y. ROUGE: A package for automatic evaluation of summaries[C]//Proceedings of the ACL Workshop on Text Summarization Branches Out. Stroudsburg, USA: ACL, 2004: 74‑81.
|
18 |
VEDANTAM R, ZITNICK C L, PARIKH D. CIDEr: Consensus-based image description evaluation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, USA: IEEE Computer Society, 2015: 4566‑4575.
|
19 |
WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine learning, 1992, 8(3-4): 229‑256.
|
20 |
HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, USA: IEEE Computer Society, 2016: 770‑778.
|
21 |
DIEDERIK K, JIMMY B. ADAM: A method for stochastic optimization[EB/OL]. [2021-09-22]. .
|
22 |
JIA X, GAVVES E, FERNANDO B, et al. Guiding the long-short term memory model for image caption generation[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2015: 2407‑2415.
|
23 |
WANG C, YANG H, BARTZ C, et al. Image captioning with deep bidirectional LSTMs[C]//Proceedings of the 24th ACM International Conference on Multimedia. New York, USA: ACM, 2016: 988‑997.
|
24 |
FU K, JIN J, CUI R, et al. Aligning where to see and what to tell: Image caption with region-based attention and scene-specific contexts[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2321‑2334.
|
25 |
CHEN X, MA L, JIANG W, et al. Regularizing RNNs for caption generation by reconstructing the past with the present[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, USA: IEEE Computer Society, 2018: 7995‑8003.
|
26 |
LIU S, ZHU Z, YE N, et al. Improved image captioning via policy gradient optimization of SPIDEr[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway, USA: IEEE, 2017: 873‑881.
|
27 |
ANDERSON P, HE X, BUEHLER C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, USA: IEEE Computer Society, 2018: 6077‑ 6086.
|
28 |
HUANG Fei-cheng, LI Zhi-xin, WEI Hai-yang, et al. Boost image captioning with knowledge reasoning [J]. Machine Learning, 2020, 109(12): 2313‑2332.
|