动态遮挡场景下基于改进Transformer实例分割的VSLAM算法

陈孟元; 韩朋朋; 刘金辉; 张玉坤; 江浩玮; 丁陵梅

doi:10.12263/DZXB.20220310

您当前的位置：

首页 >

文章列表页 >

动态遮挡场景下基于改进Transformer实例分割的VSLAM算法

学术论文 | 更新时间：2025-12-08

- 动态遮挡场景下基于改进Transformer实例分割的VSLAM算法
- Improved Transformer Instance Segmentation Under Dynamic Occlusion Based VSLAM Algorithm
- 电子学报 2023年51卷第7期页码：1812-1825
- 作者机构：
  
  1.安徽工程大学电气工程学院, 安徽芜湖 241000
  2.高端装备先进感知与智能控制教育部重点实验室, 安徽芜湖 241000
  3.安徽工程大学产业创新技术有限公司, 安徽芜湖 241000
- 作者简介：
  
  [ "陈孟元　男，1984年1月生于安徽芜湖.现为安徽工程大学电气工程学院教授，硕士生导师.获安徽省科学技术奖一等奖.主要研究方向为移动机器人SLAM、目标跟踪及路径规划.E-mail: mychen@ahpu.edu.cn" ]
  [ "韩朋朋　男，1992年5月出生于安徽省阜阳市. 现为安徽工程大学电气工程学院硕士研究生. 研究方向为移动机器人视觉SLAM算法. E-mail: 1421384659@qq.com" ]
  刘金辉　男，1997年5月出生于安徽省亳州市. 现为安徽工程大学电气工程学院硕士研究生. 研究方向为移动机器人视觉SLAM算法.E-mail: m17855336477@163.com
  张玉坤　男，1995年5出生于安徽省亳州市. 现为安徽工程大学电气工程学院硕士研究生. 研究方向为移动机器人视觉SLAM算法.E-mail: 1728443478@qq.com
  江浩玮　男，1997年7月出生于安徽省安庆市. 现为安徽工程大学电气工程学院硕士研究生. 研究方向为移动机器人视觉SLAM算法.E-mail: 576992617@qq.com
  丁陵梅　女，1994年12月出生于江苏省泰州市. 现为安徽工程大学电气工程学院硕士研究生. 研究方向为移动机器人视觉SLAM算法.E-mail: 1425288603@qq.com
- 基金信息：
  
  国家自然科学基金(61903002);安徽省高校协同创新项目(GXXT-2021-050);安徽工程大学中青年拔尖人才项目;安徽工程大学引进人才科研启动基金
- DOI：10.12263/DZXB.20220310
  中图分类号： TP242.6
- 收稿：2021-03-25，
  
  修回：2022-10-11，
  
  纸质出版：2023-07-25
- 稿件说明：
移动端阅览
陈孟元,韩朋朋,刘金辉等.动态遮挡场景下基于改进Transformer实例分割的VSLAM算法[J].电子学报,2023,51(07):1812-1825.

CHEN Meng-yuan,HAN Peng-peng,LIU Jin-hui,et al.Improved Transformer Instance Segmentation Under Dynamic Occlusion Based VSLAM Algorithm[J].ACTA ELECTRONICA SINICA,2023,51(07):1812-1825.
陈孟元,韩朋朋,刘金辉等.动态遮挡场景下基于改进Transformer实例分割的VSLAM算法[J].电子学报,2023,51(07):1812-1825. DOI： 10.12263/DZXB.20220310.

CHEN Meng-yuan,HAN Peng-peng,LIU Jin-hui,et al.Improved Transformer Instance Segmentation Under Dynamic Occlusion Based VSLAM Algorithm[J].ACTA ELECTRONICA SINICA,2023,51(07):1812-1825. DOI： 10.12263/DZXB.20220310.

摘要

针对传统SLAM（Simultaneous Localization And Mapping）算法在动态遮挡场景下难以标记被遮挡物体，无法准确判断潜在物体运动状态以及剔除动态物体后特征点数量较少等问题，提出一种动态遮挡场景下基于改进Transformer实例分割的VSLAM算法（Improved Transformer instance segmentation under Dynamic occlusion VSLAM algorithm， ITD-SLAM）.本算法通过设计一种多注意力模块，引导模型关注被遮挡区域，同时改进相对位置编码优化被遮挡物体边界语义性，精确标记出潜在动态物体.为减少动态物体对SLAM系统定位精度的影响，通过相机位姿估计、物体运动估计与物体运动判断三个步骤估计潜在动态物体运动状态，并剔除其中的动态物体.根据网格流运动模型补全剔除区域的静态背景，并利用信息熵与交叉熵筛选修复区域特征点，补充高质量特征点用于相机位姿估计.在公开数据集TUM和真实场景中进行验证，结果表明本文算法均方根误差与DynaSLAM相比减少22.94%，表现出了较好的构图能力.

Abstract

For traditional SLAM (Simultaneous Localization And Mapping) algorithms

it is difficult to mark occluded objects in dynamic scenes with occlusion

and is impossible to accurately judge the motion state of potential objects as well as the number of feature points after culling dynamic objects is small. This paper proposes a VSLAM algorithm based on improved transformer instance segmentation under dynamic occlusion (ITD-SLAM) in dynamic occlusion scenarios. By designing a multi-attention module

this algorithm guides the model to pay attention to the occluded area

and at the same time improves the relative position encoding to optimize the boundary semantics of occluded objects

and accurately mark potential dynamic objects. In order to reduce the influence of dynamic objects on the positioning accuracy of the SLAM system

the motion state of potential dynamic objects is estimated through three steps of camera pose estimation

object motion estimation and object motion judgment

and dynamic objects are eliminated. According to the grid flow motion model

the static background of the culled area is completed

and the feature points of the repair area are screened and repaired by information entropy

and the high-quality feature points are supplemented for camera pose estimation. Experimental results on the public datasets show that this algorithm has better composition ability with its root mean square error reduced by 22.94% when compared with DynaSLAM.

关键词

Keywords

references

陈孟元 , 丁陵梅 , 张玉坤 . 基于改进关键帧选取策略的快速PL-SLAM算法 [J]. 电子学报 , 2022 , 50 ( 3 ): 608 - 618 .

CHEN M Y , DING L M , ZHANG Y K . Fast PL-SLAM algorithm based on improved keyframe extraction strategy [J]. Acta Electronica Sinica , 2022 , 50 ( 3 ): 608 - 618 . (in Chinese)

李博洋 , 刘思健 , 崔明月 , 等 . 基于最小回环检测的多车协同SLAM框架 [J]. 电子学报 , 2021 , 49 ( 11 ): 2241 - 2250 .

LI B Y , LIU S J , CUI M Y , et al . Multi-vehicle collaborative SLAM framework for minimum loop detection [J]. Acta Electronica Sinica , 2021 , 49 ( 11 ): 2241 - 2250 . (in Chinese)

高兴波 , 史旭华 , 葛群峰 , 等 . 面向动态物体场景的视觉SLAM综述 [J]. 机器人 , 2021 , 43 ( 6 ): 733 - 750 .

GAO X B , SHI X H , GE Q F , et al . A survey of visual SLAM for scenes with dynamic objects [J]. Robot , 2021 , 43 ( 6 ): 733 - 750 . (in Chinese)

曹剑飞 , 余金城 , 潘尚杰 , 等 . 采用双视觉里程计的SLAM位姿图优化方法 [J]. 计算机辅助设计与图形学学报 , 2021 , 33 ( 8 ): 1264 - 1272 .

CAO J F , YU J C , PAN S J , et al . A SLAM pose graph optimization method using dual visual odometry [J]. Journal of Computer-Aided Design & Computer Graphics , 2021 , 33 ( 8 ): 1264 - 1272 . (in Chinese)

陈宝华 , 邓磊 , 陈志祥 , 等 . 基于即时稠密三维重构的无人机视觉定位 [J]. 电子学报 , 2017 , 45 ( 6 ): 1294 - 1300 .

CHEN B H , DENG L , CHEN Z X , et al . Instant dense 3D reconstruction based UAV vision localization [J]. Acta Electronica Sinica , 2017 , 45 ( 6 ): 1294 - 1300 . (in Chinese)

罗会兰 , 陈鸿坤 . 基于深度学习的目标检测研究综述 [J]. 电子学报 , 2020 , 48 ( 6 ): 1230 - 1239 .

LUO H L , CHEN H K . Survey of object detection based on deep learning [J]. Acta Electronica Sinica , 2020 , 48 ( 6 ): 1230 - 1239 . (in Chinese)

MUR-ARTAL R , MONTIEL J M M , TARDÓS J D . ORB-SLAM: A versatile and accurate monocular SLAM system [J]. IEEE Transactions on Robotics , 2015 , 31 ( 5 ): 1147 - 1163 .

ENGEL J , KOLTUN V , CREMERS D . Direct sparse odometry [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 40 ( 3 ): 611 - 625 .

刘剑锋 , 孙力帆 , 普杰信 , 等 . 基于刚性约束的双移动机器人协同定位 [J]. 电子学报 , 2020 , 48 ( 9 ): 1777 - 1785 .

LIU J F , SUN L F , PU J X , et al . Cooperative localization in a team of two mobile robots based on rigid constraints [J]. Acta Electronica Sinica , 2020 , 48 ( 9 ): 1777 - 1785 . (in Chinese)

张慧娟 , 方灶军 , 杨桂林 . 动态环境下基于线特征的RGB-D视觉里程计 [J]. 机器人 , 2019 , 41 ( 1 ): 75 - 82 .

ZHANG H J , FANG Z J , YANG G L . RGB-D visual odometry in dynamic environments using line features [J]. Robot , 2019 , 41 ( 1 ): 75 - 82 . (in Chinese)

SUN Y X , LIU M , MENG M Q H . Motion removal for reliable RGB-D SLAM in dynamic environments [J]. Robotics and Autonomous Systems , 2018 , 108 : 115 - 128 .

ZHANG T W , ZHANG H Y , LI Y , et al . FlowFusion: dynamic dense RGB-D SLAM based on optical flow [C]// 2020 IEEE International Conference on Robotics and Automation (ICRA) . Piscataway : IEEE , 2020 : 7322 - 7328 .

KANEKO M , IWAMI K , OGAWA T , et al . Mask-SLAM: Robust feature-based monocular SLAM by masking using semantic segmentation [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) . Piscataway : IEEE , 2018 : 371 - 3718 .

XIAO L H , WANG J G , QIU X S , et al . Dynamic-SLAM: Semantic monocular visual localization and mapping based on deep learning in dynamic environment [J]. Robotics and Autonomous Systems , 2019 , 117 : 1 - 16 .

ZHONG F W , WANG S , ZHANG Z Q , et al . Detect-SLAM: Making object detection and SLAM mutually beneficial [C]// 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) . Piscataway : IEEE , 2018 : 1001 - 1010 .

姚二亮 , 张合新 , 宋海涛 , 等 . 基于语义信息和边缘一致性的鲁棒SLAM算法 [J]. 机器人 , 2019 , 41 ( 6 ): 751 - 760 .

YAO E L , ZHANG H X , SONG H T , et al . Robust SLAM algorithm based on semantic information and edge consistency [J]. Robot , 2019 , 41 ( 6 ): 751 - 760 . (in Chinese)

BESCOS B , FÁCIL J M , CIVERA J , et al . DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes [J]. IEEE Robotics and Automation Letters , 2018 , 3 ( 4 ): 4076 - 4083 .

YU C , LIU Z X , LIU X J , et al . DS-SLAM: A semantic visual SLAM towards dynamic environments [C]// 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . Piscataway : IEEE , 2019 : 1168 - 1174 .

Woo S , Park J , Lee J Y , et al . Cbam: Convolutional block attention module [C]// Proceedings of the European Conference on Computer Vision . Cham : Springer International Publishing , 2018 : 3 - 19 .

GEHRING J , AULI M , GRANGIER D , et al . Convolutional sequence to sequence learning [C]// Proceedings of the 34th International Conference on Machine Learning - Volume 70 . New York : ACM , 2017 : 1243 - 1252 .

CHEN L , ZHANG H W , XIAO J , et al . SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 6298 - 6306 .

CARION N , MASSA F , SYNNAEVE G , et al . End-to-end object detection with transformers [C]// Computer Vision─ECCV 2020 . Cham : Springer International Publishing , 2020 : 213 - 229 .

LIU Z , LIN Y T , CAO Y , et al . Swin transformer: Hierarchical vision transformer using shifted windows [C]// 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2022 : 9992 - 10002 .

CHEN Z S , XIE L X , NIU J W , et al . Visformer: the vision-friendly transformer [C]// 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2022 : 569 - 578 .

MUR-ARTAL R , TARDÓS J D . ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras [J]. IEEE Transactions on Robotics , 2017 , 33 ( 5 ): 1255 - 1262 .

LIN T Y , DOLLÁR P , GIRSHICK R , et al . Feature pyramid networks for object detection [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 936 - 944 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于光线碰撞检测的轻量化实时三维重建系统

基于改进关键帧选取策略的快速PL-SLAM算法

动态环境视觉导航的有限状态粗集方法研究

基于深度学习的图像实例分割技术研究进展