Self-Supervised Hand Pose Estimation with Regional Depth Correspondence

WANG Jing-yu; HUANG Wei-ting; LIU Cong; QI Qi; SUN Hai-feng; LIAO Jian-xin

doi:10.12263/DZXB.20210648

您当前的位置：

首页 >

文章列表页 >

Self-Supervised Hand Pose Estimation with Regional Depth Correspondence

PAPERS | 更新时间：2025-12-08

- Self-Supervised Hand Pose Estimation with Regional Depth Correspondence
- ACTA ELECTRONICA SINICA Vol. 51, Issue 6, Pages: 1644-1653(2023)
- 作者机构：
  
  1.北京邮电大学网络与交换国家重点实验室，北京 100876
  2.中国移动通信有限公司研究院，北京 100053
- 作者简介：
- 基金信息：
  
  National Key Research and Development Program of China(2020YFB1807800);National Natural Science Foundation of China(62071067;62001054;61771068);Ministry of Education-China Mobile Research Foundation(MCM20200202;MCM20180101);Postdoctoral Innovative Talent Support Program(BX20200067);China Postdoctoral Science Foundation(2021M690469)
- DOI：10.12263/DZXB.20210648
  CLC： TP391.4;
- Received：20 May 2021，
  
  Revised：2021-08-15，
  
  Published：25 June 2023
- 稿件说明：
移动端阅览
王敬宇,黄伟亭,刘聪等.基于局部深度一致性的自监督手部姿态估计[J].电子学报,2023,51(06):1644-1653.

WANG Jing-yu,HUANG Wei-ting,LIU Cong,et al.Self-Supervised Hand Pose Estimation with Regional Depth Correspondence[J].ACTA ELECTRONICA SINICA,2023,51(06):1644-1653.
王敬宇,黄伟亭,刘聪等.基于局部深度一致性的自监督手部姿态估计[J].电子学报,2023,51(06):1644-1653. DOI： 10.12263/DZXB.20210648.

WANG Jing-yu,HUANG Wei-ting,LIU Cong,et al.Self-Supervised Hand Pose Estimation with Regional Depth Correspondence[J].ACTA ELECTRONICA SINICA,2023,51(06):1644-1653. DOI： 10.12263/DZXB.20210648.

摘要

基于深度图的3D手部姿态估计通常需要大量人工标注数据以达到高精确度和鲁棒性，然而关节点标注过程冗杂且存在一定误差.现有研究工作使用自监督方法解决对标注数据的依赖，通过在虚拟数据集上预训练网络，并在无标注的真实数据集上进行模型拟合，实现3D姿态估计.自监督方法的关键在于设计模型拟合的能量函数以减小模型在真实数据集上的精度下降程度.为了减小模型拟合难度，本文提出局部深度一致性损失，依据初始姿态估计结果，提取输入与输出深度图的局部表征，将深度图显式地解耦为以关节点为中心的不同区域.通过有针对性地对不同关节点进行局部优化，减少虚拟与真实深度图之间的固有领域误差对网络学习的影响，增加训练的稳定性.本文方法在NYU数据集上相比基础方法平均关节点误差提升了21.9%.

Abstract

Depth-based 3D hand pose estimation requires manually labelled data to achieve high accuracy and robustness. However

the labeling process is laborsome and bares inevitable biases. Researchers solve this problem by using self-supervised methods. They pretrain model on synthetic dataset then finetune on unlabelled real dataset through model fitting. The biggest challenge is the design of model fitting term in fintuning stage to prevent severe accuracy drop. We proposed the regional depth correspondence loss which utilized initial pose estimation results to extract regional representation of input and output depth maps and transparently divided them into different regions. This allows network to finetune regions around joints without being affected by overall domain gaps between synthetic and real depth images. The proposed method outperforms baseline method by 21.9% on NYU hand pose dataset.

关键词

Keywords

references

任海兵 , 祝远新 , 徐光祐 , 等 . 基于视觉手势识别的研究-综述 [J]. 电子学报 , 2000 , 28 ( 2 ): 118 - 121 .

REN H B , ZHU Y X , XU G Y , et al . Vision-based recognition of hand gestures: A survey [J]. Acta Electronica Sinica , 2000 , 28 ( 2 ): 118 - 121 . (in Chinese)

管业鹏 . 复杂人机交互场景下的指势用户对象识别 [J]. 电子学报 , 2014 , 42 ( 11 ): 2135 - 2141 .

GUAN Y P . Pointing user recognition in human-computer interaction with cluttered scene [J]. Acta Electronica Sinica , 2014 , 42 ( 11 ): 2135 - 2141 . (in Chinese)

徐一华 , 李善青 , 贾云得 . 一种基于视觉的手指屏幕交互方法 [J]. 电子学报 , 2007 , 35 ( 11 ): 2236 - 2240 .

XU Yi-hua , LI Shan-qing , JIA Yun-de . A vision-based method for finger-screen interaction [J]. Acta Electronica Sinica , 2007 , 35 ( 11 ): 2236 - 2240 . (in Chinese)

武汇岳 , 王建民 , 戴国忠 . 基于小样本学习的3D动态视觉手势个性化交互方法 [J]. 电子学报 , 2013 , 41 ( 11 ): 2230 - 2236 .

WU HUI-YUE , WANG JIAN-MIN , DAI GUO-ZHONG . Personalized interaction techniques of vision-based 3D dynamic gestures based on small sample learning [J]. Acta Electronica Sinica , 2013 , 41 ( 11 ): 2230 - 2236 . (in Chinese)

CUI J , KUIJPER A , SOURIN A . Exploration of natural free-hand interaction for shape modeling using leap motion controller [C]// Proceedings of the International Conference on Cyberworlds(CW) . Chongqing : IEEE Computer Society , 2016 : 41 - 48 .

齐静 , 徐坤 , 丁希仑 . 机器人视觉手势交互技术研究进展 [J]. 机器人 , 2017 , 39 ( 4 ): 565 - 584 .

QI J , XU K , DING X L . Vision-based hand gesture recognition for human-robot interaction: A review [J]. Robot , 2017 , 39 ( 4 ): 565 - 584 . (in Chinese)

WAN C D , PROBST T , GOOL L V , et al . Dense 3d regression for hand pose estimation [C]// Computer Vision and Pattern Recognition (CVPR) . Utah : Computer Vision Foundation / IEEE Computer Society , 2018 : 5147 - 5156 .

HUANG W T , REN P F , WANG J Y , et al . Awr: adaptive weighting regression for 3d hand pose estimation [C]// Association for the Advancement of Artificial Intelligence (AAAI) . New York : Journal of Artificial Intelligence Research , 2020 : 11061 - 11068 .

CHEN Y J , TU Z G , GE L H , et al . SO-handnet: self-organizing network for 3d hand pose estimation with semi-supervised learning [C]// International Conference on Computer Vision (ICCV) . Seoul : IEEE . 2019 : 6960 - 6969 .

GE L H , LIANG H , YUAN J S , et al . Robust 3d hand pose estimation in single depth images: from single-view cnn to multi-view cnns [C]// Computer Vision and Pattern Recognition (CVPR) . Las Vegas : IEEE Computer Society , 2016 : 3593 - 3601 .

MOON G , CHANG J Y , LEE K M . V2v-posenet: voxel-to-voxel prediction network for accurate 3d hand and human pose estimation from a single depth map [C]// Computer Vision and Pattern Recognition (CVPR) . Utah : IEEE Computer Society , 2018 : 5079 - 5088 .

YUAN S X , YE Q , STENGER B , et al . Bighand2.2m benchmark: hand pose dataset and state of the art analysis [C]// Computer Vision and Pattern Recognition (CVPR) . Hawaii : IEEE Computer Society , 2017 : 2605 - 2613 .

TOMPSON J , STEIN M , YANN L C , et al . Real-time continuous pose recovery of human hands using convolutional networks [J]. ACM Transactions on Graphics (TOG) , 2014 , 169 ( 33 ): 1 - 10 .

WAN C D , PROBST T , GOOL L V , et al . Dual grid net: Hand mesh vertex regression from single depth maps [C]// European Conference on Computer Vision (ECCV) . Glasgow : Springer , 2020 : 442 - 459 .

DIBRA E , WOLF T , ÖZTIRELI C , et al . How to refine 3d hand pose estimation from unlabelled depth data [C]// International Conference on 3D Vision . Qing Dao : Institute of Electrical and Electronics Engineers , 2017 : 135 - 144 .

WAN CD , PROBST T , GOOL LV , et al . Self-supervised 3d hand pose estimation through training by fitting [C]// Computer Vision and Pattern Recognition (CVPR) . Long Beach : IEEE Computer Society , 2019 : 10853 - 10862 .

MELAX S , KESELMAN L , ORSTEN S . Dynamics based 3d skeletal hand tracking [C]// Proceedings of Graphics Interface 2013 . Toronto : Canadian Information Processing Society , 2013 : 63 - 70 .

SINHA A , CHOI C , RAMANI K . Deephand: Robust hand pose estimation by completing a matrix imputed with deep features [C]// Computer Vision and Pattern Recognition (CVPR) . Las Vegas : IEEE Computer Society , 2016 : 4150 - 4158 .

ZHANG H , BO Z H , YONG J H , et al . InteractionFusion: real-time reconstruction of hand poses and deformable objects in hand-object interactions [J]. ACM Transactions on Graphics , 2019 , 38 ( 4 ): 1 - 11 .

SUPANCIC III JS , ROGEZ G , YANG Y , et al . Depth-based hand pose estimation: Methods, data, and challenges [J]. International Journal of Computer Vision , 2018 , 126 ( 11 ): 1180 - 1198 .

HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C]// Computer Vision and Pattern Recognition (CVPR) . Las Vegas : IEEE Computer Society , 2016 : 770 - 778 .

ZHANG X , LI Q , MO H , et al . End-to-end hand mesh recovery from a monocular RGB image [C]// International Conference on Computer Vision (ICCV) . Seoul : IEEE , 2019 : 2354 - 2364 .

OBERWEGER M , WOHLHART P , LEPETIT V . Training a feedback loop for hand pose estimation [C]// International Conference on Computer Vision (ICCV) . Santiago : IEEE Computer Society , 2015 : 3316 - 3324 .

ROMERO J , TZIONAS D , BLACK MJ , hands Embodied : Modeling and capturing hands and bodies together [J]. ACM Transactions on Graphics (TOG) . 2017 , 36 ( 6 ): 245 : 1 - 245 : 17 .

REN P F , SUN H F , HUANG W T , et al . Spatial-aware stacked regression network for real-time 3D hand pose estimation [J]. Neurocomputing , 2021 , 437 : 42 - 57 .

OBERWEGER M , WOHLHART P , LEPETIT V . Hands deep in deep learning for hand pose estimation [C]// Computer Vision Winter Workshop . Styria : Slovenian Pattern Recognition Society , 2015 : 1 - 10 .

ZHOU X Y , WAN Q F , ZHANG W , et al . Model-based deep hand pose estimation [C]// International Joint Conference on Artificial Intelligence (IJCAI) . New York : Margan Kaufmann , 2016 : 2421 - 2427 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Variable Horizon Multi-Directional Scanning Method for Time Series Anomaly Detection

Differentially Private with Sparse and Smooth Self-Distillation

Operator Fusion Method and Hardware Architecture Design Based on Non-Standard Operators

A Method for Enhancing the Quality of Decompressed Point Clouds Based on Attention-Fused Multi-Scale Features

Related Author

LIU Cong

HUANG Yu-zhe

GUAN Yong-yuan

WEI Song-jie

ZHAO Deng-feng

XUE Da-xuan

ZHAO Su-yun

CHEN Hong

Related Institution

China Mobile Group Design Institute Co.， Ltd

School of Computer Science and Engineering, School of Cyber Science and Engineering, Nanjing University of Science and Technology

School of Information, Renmin University of China

College of Information Engineering, Capital Normal University

School of Mathematical Science, Capital Normal University

⁰