Cross-Media Image-Text Retrieval with Two Level Similarity

LI Zhi-xin; LING Feng; ZHANG Can-long; MA Hui-fang

doi:10.12263/DZXB.20191037

您当前的位置：

首页 >

文章列表页 >

Cross-Media Image-Text Retrieval with Two Level Similarity

更新时间：2025-12-08

- Cross-Media Image-Text Retrieval with Two Level Similarity
- Acta Electronica Sinica Vol. 49, Issue 2, Pages: 268-274(2021)
- 作者机构：
  
  1. 广西师范大学广西多源信息挖掘与安全重点实验室,广西,桂林,541004
  2. 西北师范大学计算机科学与工程学院,甘肃,兰州,730070
  3. 广西师范大学广西多源信息挖掘与安全重点实验室,广西,桂林,541004
  4. 西北师范大学计算机科学与工程学院,甘肃,兰州,730070
- 作者简介：
- 基金信息：
- DOI：10.12263/DZXB.20191037
  CLC： TP391
- Published Online：25 February 2021，
  
  Published：2021
- 稿件说明：
移动端阅览
LI Zhi-xin, LING Feng, ZHANG Can-long, et al. Cross-Media Image-Text Retrieval with Two Level Similarity[J]. Acta Electronica Sinica, 2021, 49(2): 268-274.
DOI：

LI Zhi-xin, LING Feng, ZHANG Can-long, et al. Cross-Media Image-Text Retrieval with Two Level Similarity[J]. Acta Electronica Sinica, 2021, 49(2): 268-274. DOI： 10.12263/DZXB.20191037.

摘要

为了更好地揭示图像和文本之间潜在的语义关联，提出了一种融合两级相似度的跨媒体检索方法，构建两个子网分别处理全局特征和局部特征，以获取图像和文本之间更好的语义匹配.图像分为整幅图像和一些图像区域两种表示，文本也分为整个语句和一些单词两种表示.设计一个两级对齐方法分别匹配图像和文本的全局和局部表示，并融合两种相似度学习跨媒体的完整表示.在MSCOCO和Flickr30K数据集上的实验结果表明，本文方法能够使图像和文本的语义匹配更准确，优于许多当前先进的跨媒体检索方法.

Abstract

To better reveal the latent semantic correlation between image and text

this paper proposes a cross media retrieval method by fusing two level similarity

which constructs two subnets to deal with global features and local features respectively so as to obtain better semantic matching between image and text. The image representation is divided into whole image and some image regions

and the text representation is also divided into whole sentence and some words. A two level alignment method is designed to match the global and local representation of image and text

and the two similarities are fused to learn the complete cross-media representation. The experimental results on MSCOCO and Flickr30K datasets show that the proposed method can make the semantic matching of image and text more accurate

and is superior to many state-of-the-art cross-media retrieval methods.

关键词

Keywords

references

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Operator Fusion Method and Hardware Architecture Design Based on Non-Standard Operators

Shared Super-Resolution Dual-Branch Network for Spatiotemporal Fusion of Remote Sensing Images

Lightweight Fully-Connected Tensorial Mapping Network for Hyperspectral Image Classification

Cross-CNN: An Animation Cross-Frame Sketch Colorization Algorithm Based on Hybrid Model with CNN and Transformer

No-Reference Screen Content Image Quality Assessment Based on Edge Assistance and Multi-Scale Transformer

Related Author

WANG Ying

GAO Lan

ZHANG Zhe

LIU Xin

WU Yi-xiong

ZHANG Wei-gong

FANG Shuai

ZHANG Xiao-xi

Related Institution

College of Information Engineering, Capital Normal University

School of Mathematical Science, Capital Normal University

Faculty of Software Technologics, Shanxi Agricultural University

School of Computer and Information, Hefei University of Technology

Anhui Province Key Laboratory of Industry Safety and Emergency Technology

⁰