安徽工业大学机械工程学院,安徽马鞍山 243032
[ "张梦权 男,2001年4月出生于安徽省宿州市.现为安徽工业大学机械工程学院硕士研究生.主要研究方向为机器人与机器视觉.E-mail: 2992466836@qq.com" ]
[ "许四祥 男,1974年6月出生于湖北省汉川市.现为安徽工业大学机械工程学院教授、硕士生导师.主要研究方向为机器人与机器视觉.E-mail: xsxhust@ahut.edu.cn" ]
[ "杨玉 男,2001年11月出生于安徽省安庆市.现为安徽工业大学机械工程学院硕士研究生.主要研究方向为机器人与机器视觉.E-mail: 1308889562@qq.com" ]
[ "吴端正 男,2000年10月出生于安徽省蚌埠市.现为安徽工业大学机械工程学院硕士研究生.主要研究方向为机器人与机器视觉.E-mail: 3251387273@qq.com" ]
收稿:2025-05-26,
录用:2025-12-01,
纸质出版:2025-12-25
移动端阅览
张梦权, 许四祥, 杨玉, 等. 基于Star-RTMPose的双目视觉定位与测量[J]. 电子学报, 2025, 53(12): 4317-4329.
ZHANG Meng-quan, XU Si-xiang, YANG Yu, et al. Binocular Vision Localization and Measurement Based on Star-RTMPose[J]. Acta Electronica Sinica, 2025, 53(12): 4317-4329.
张梦权, 许四祥, 杨玉, 等. 基于Star-RTMPose的双目视觉定位与测量[J]. 电子学报, 2025, 53(12): 4317-4329. DOI:10.12263/DZXB.20250422
ZHANG Meng-quan, XU Si-xiang, YANG Yu, et al. Binocular Vision Localization and Measurement Based on Star-RTMPose[J]. Acta Electronica Sinica, 2025, 53(12): 4317-4329. DOI:10.12263/DZXB.20250422
针对传统双目视觉特征点检测算法效率低、匹配精度不足、对光照变化敏感以及参数调优复杂,导致双目视觉定位与测量精度受限的问题,本文提出一种基于Star-RTMPose(Star-enhanced Real-Time Multi-person Pose estimation)的双目视觉定位与测量方法.本文以钢铁冶金行业的连铸坯为研究对象,聚焦其火焰切割后毛刺切除所需的精准定位与尺寸测量需求,给出了对应的技术实现路径.首先,通过标定后的双目相机采集连铸坯图像,并采用LabelMe工具完成目标区域与关键点标注,将标注结果统一转换为MSCOCO(MicroSoft Common Objects in COntext)格式以适配模型训练.随后,采用“目标检测-关键点提取”的双阶段框架实现精准检测,即先基于RTMDet(Real-Time Models for object Detection)算法快速定位连铸坯的主体区域,进而采用基于RTMPose(Real-Time Multi-person Pose estimation)的改进模型Star-RTMPose提取关键点坐标.改进包括:在RTMPose主干引入StarTriBlock(Star Triple Block)模块,通过多支路动态融合机制增强网络对目标高层语义特征的表征能力,充分利用该阶段最大感受野与全局空间关联信息;使用基于深度可分离卷积的MaxDSC2(Maximum Depthwise Separable Convolution 2)模块替代网络头部的7×7大核卷积,并将该模块的中间通道数设定为输入通道数的0.45倍,在提升语义信息敏感度的同时降低参数量;用无参SimAM(Simple parameter-free Attention Module)注意力模块替代传统通道注意力模块,通过能量函数闭式解生成通道-空间三维联合权重,强化网络对空间特征的捕获性能,避免参数冗余.最终,结合双目相机标定参数与三角测量原理,完成关键点三维重建与连铸坯尺寸测量.实验结果表明:在关键点检测任务中,改进后的Star-RTMPose模型对单张图像的推理时间仅为9.86 ms,相较于基准模型RTMPose-T,其AP(Average Precision)提升1.09个百分点,PCK(Percentage of Correct Keypoints)提升0.40个百分点,NME(Normalized Mean Error)降低42.86%;改进后的模型在参数量更为精简的前提下,综合性能显著优于HRNet-W32、SwinTransformer-T等主流模型;在三维测量精度方面,本文方法对1型连铸坯长边尺寸的测量相对误差相较于传统ORB(Oriented FAST and Rotated BRIEF)算法以及改进后的FAST(Features from Accelerated Segment Test)算法分别降低了1.715个百分点和0.365个百分点.本文方法有效解决了传统算法鲁棒性欠佳的问题,实现了检测精度与测量精度的双重提升,切实满足工业场景对高精度检测的需求.
A binocular vision localization and measurement method based on star-enhanced real-time multi-person pose estimation (Star-RTMPose) is proposed to address the problems of low efficiency
insufficient matching accuracy
sensitivity to illumination changes
and complex parameter tuning of traditional binocular vision feature point detection algorithms
which limit the accuracy of binocular vision localization and measurement. Taking continuous casting billets in the iron and steel metallurgy industry as the research object
this method focuses on the precise positioning and dimension measurement requirements for burr removal after flame cutting
and proposes a corresponding technical implementation approach. Firstly
images of continuous casting billets are collected using calibrated binocular cameras. The LabelMe tool is then used to annotate target regions and keypoints
which are uniformly converted to the microsoft common objects in context (MSCOCO) format to adapt to model training. Subsequently
a two-stage framework of “target detection-keypoint extraction” is adopted to achieve precise detection: the real-time models for object detection (RTMDet) algorithm is first used to quickly locate the main area of the continuous casting billet
and then the improved real-time multi-person pose estimation (RTMPose) model
Star-RTMPose
is used to extract keypoint coordinates. The improvements include: introducing the star triple block (StarTriBlock) module into the RTMPose backbone to enhance the network’s ability to characterize high-level semantic features of the target through a multi-branch dynamic fusion mechanism
making full use of the maximum receptive field and global spatial correlation information of this stage; replacing the 7×7 large kernel convolution at the network head with the maximum depthwise separable convolution 2 (MaxDSC2) module based on depth-separable convolution
setting the intermediate channel number of this module to 0.45 times the input channel number to improve the sensitivity to semantic information while reducing the number of parameters; substituting the traditional channel attention module with the parameter-free simple parameter-free attention module (SimAM) attention module
which generates channel-spatial three-dimensional joint weights through the closed-form solution of the energy function
strengthens the network’s ability to capture spatial features
and avoids parameter redundancy. Finally
by combining the calibration parameters of the binocular camera with the triangulation principle
the three-dimensional reconstruction of keypoints and the dimensional measurement of continuous casting billets are completed. The experimental results show that: in the keypoint detection task
the inference time of the improved Star-RTMPose model for a single image is only 9.86 ms; compared with the baseline model RTMPose-T
its average precision (AP) is improved by 1.09 percentage points
percentage of correct keypoints (PCK) by 0.40 percentage points
and normalized mean error (NME) is reduced by 42.86%; on the premise of more streamlined parameters
the comprehensive performance of the improved model is significantly superior to that of mainstream models such as HRNet-W32 and SwinTransformer-T. In terms of three-dimensional measurement accuracy
the relative error of the proposed method for measuring the long side dimension of Type 1 continuous casting billet is reduced by 1.715 and 0.365 percentage points compared to the traditional oriented fast and rotated brief (ORB) algorithm and the improved features from accelerated segment test (FAST) algorithm
respectively. This method effectively addresses the issue of poor robustness in traditional algorithms
achieving dual improvements in detection accuracy and measurement accuracy
and thereby meeting the demand for high-precision detection in industrial scenarios.
安徽工业大学 . 一种去除板坯毛刺的系统 : CN102935547B [P ] . 2014-10-15 .
LOWE D G . Distinctive image features from scale-invariant keypoints [J ] . International Journal of Computer Vision , 2004 , 60 ( 2 ): 91 - 110 .
BAY H , ESS A , TUYTELAARS T , et al . Speeded-up robust features (SURF) [J ] . Computer Vision and Image Understanding , 2008 , 110 ( 3 ): 346 - 359 .
RUBLEE E , RABAUD V , KONOLIGE K , et al . ORB: An efficient alternative to SIFT or SURF [C ] // 2011 International Conference on Computer Vision . Piscataway : IEEE , 2012 : 2564 - 2571 .
宋超群 , 许四祥 , 杨宇 , 等 . 基于改进FAST和BRIEF的双目视觉测量方法 [J ] . 激光与光电子学进展 , 2022 , 59 ( 8 ): 173 - 180 .
SONG C Q , XU S X , YANG Y , et al . Binocular vision measurement method using improved FAST and BRIEF [J ] . Laser & Optoelectronics Progress , 2022 , 59 ( 8 ): 173 - 180 . (in Chinese)
宋祥 , 许四祥 , 杨利法 , 等 . 基于非线性扩散与高维M-SURF描述符的双目视觉测量方法 [J ] . 光电子·激光 , 2024 , 35 ( 4 ): 405 - 413 .
SONG X , XU S X , YANG L F , et al . Binocular vision measurement method based on nonlinear diffusion and high-dimensional M-SURF descriptor [J ] . Journal of Optoelectronics·Laser , 2024 , 35 ( 4 ): 405 - 413 . (in Chinese)
XU S X , DONG C C , ZHOU S H , et al . Binocular measurement method for the continuous casting slab model based on the improved BRISK algorithm [J ] . Applied Optics , 2022 , 61 ( 11 ): 3019 - 3025 .
CAO Z , HIDALGO G , SIMON T , et al . OpenPose: Realtime multi-person 2D pose estimation using part affinity fields [EB/OL ] . ( 2019-05-30 )[ 2025-10-10 ] . https://arXiv.org/abs/1812.08008 https://arXiv.org/abs/1812.08008 .
SUN K , XIAO B , LIU D , et al . Deep high-resolution representation learning for human pose estimation [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 5686 - 5696 .
江佳鸿 , 夏楠 , 李长吾 , 等 . 基于多尺度增量学习的单人体操动作中关键点检测方法 [J ] . 电子学报 , 2024 , 52 ( 5 ): 1730 - 1742 .
JIANG J H , XIA N , LI C W , et al . Keypoint detection method for single person gymnastics actions based on multi-scale incremental learning [J ] . Acta Electronica Sinica , 2024 , 52 ( 5 ): 1730 - 1742 . (in Chinese)
YUAN Y H , FU R , HUANG L , et al . HRFormer: High-resolution transformer for dense prediction [EB/OL ] . ( 2021-11-07 )[ 2025-10-10 ] . https://arXiv.org/abs/2110.09408 https://arXiv.org/abs/2110.09408 .
JIANG T , LU P , ZHANG L , et al . RTMPose: Real-time multi-person pose estimation based on MMPose [EB/OL ] . ( 2023-07-03 )[ 2025-11-11 ] . https://arXiv.org/abs/2303.07399 https://arXiv.org/abs/2303.07399 .
LI Y J , YANG S , LIU P D , et al . SimCC: A simple coordinate classification perspective forHuman pose estimation [C ] // Computer Vision - ECCV 2022 . Cham : Springer , 2022 : 89 - 106 .
LYU C Q , ZHANG W W , HUANG H A , et al . RTMDet: An empirical study of designing real-time object detectors [EB/OL ] . ( 2022-12-16 )[ 2025-10-10 ] . https://arXiv.org/abs/2212.07784 https://arXiv.org/abs/2212.07784 .
MA X , DAI X Y , BAI Y , et al . Rewrite the stars [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2024 : 5694 - 5703 .
YANG L X , ZHANG R Y , LI L D , et al . SimAM: A simple, parameter-free attention module for convolutional neural networks [C ] // International Conference on Machine Learning . Cambridge : PMLR , 2021 ( 139 ): 11863 - 11874 .
HU J , SHEN L , SUN G . Squeeze-and-excitation networks [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 7132 - 7141 .
WOO S , PARK J , LEE J Y , et al . CBAM: Convolutional block attention module [C ] // Computer Vision - ECCV 2018 . Cham : Springer , 2018 : 3 - 19 .
WANG Q L , WU B G , ZHU P F , et al . ECA-net: Efficient channel attention for deep convolutional neural networks [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 11531 - 11539 .
CHOLLET F . Xception: Deep learning with depthwise separable convolutions [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2017 : 1800 - 1807 .
LI X W , SUN K , FAN H B , et al . Real-time cattle pose estimation based on improved RTMPose [J ] . Agriculture , 2023 , 13 ( 10 ): 1938 .
LIU Z , LIN Y T , CAO Y , et al . Swin transformer: Hierarchical vision transformer using shifted windows [C ] // 2021 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2022 : 9992 - 10002 .
SANDLER M , HOWARD A , ZHU M L , et al . MobileNetV2: Inverted residuals and linear bottlenecks [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 4510 - 4520 .
李同谱 , 许四祥 , 施宇翔 , 等 . 基于双目视觉与Transformer的连铸坯模型定位与测量 [J ] . 中南大学学报(自然科学版) , 2024 , 55 ( 4 ): 1312 - 1322 .
LI T P , XU S X , SHI Y X , et al . Continuous casting slab model positioning and measurement based on binocular vision and Transformer [J ] . Journal of Central South University (Science and Technology) , 2024 , 55 ( 4 ): 1312 - 1322 . (in Chinese)
任加琪 , 许四祥 , 董宾卉 , 等 . 基于轻量化HRNet的双目视觉定位与测量 [J/OL ] . 中国机械工程 , 2024 : 1 - 9 [ 2025-10-10 ] . https://kns.cnki.net/kcms/detail/42.1294.TH.20241211.1933.008.html https://kns.cnki.net/kcms/detail/42.1294.TH.20241211.1933.008.html .
REN J Q , XU S X , DONG B H , et al . Binocular vision localization and measurement based on lightweight HRNet [J/OL ] . China Mechanical Engineering , 2024 : 1 - 9 . https://kns.cnki.net/kcms/detail/42.1294.TH.20241211.1933.008.html https://kns.cnki.net/kcms/detail/42.1294.TH.20241211.1933.008.html . (in Chinese)
0
浏览量
8
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621