基于Transformer动态场景信息生成对抗网络的行人轨迹预测方法

裴炤; 邱文涛; 王淼; 马苗; 张艳宁

doi:10.12263/DZXB.20210762

您当前的位置：

首页 >

文章列表页 >

基于Transformer动态场景信息生成对抗网络的行人轨迹预测方法

学术论文 | 更新时间：2025-12-08

- 基于Transformer动态场景信息生成对抗网络的行人轨迹预测方法
- Pedestrian Trajectory Prediction Method Using Dynamic Scene Information Based Transformer Generative Adversarial Network
- 电子学报 2022年50卷第7期页码：1537-1547
- 作者机构：
  
  1.陕西师范大学现代教学技术教育部重点实验室,陕西西安 710119
  2.陕西师范大学计算机科学学院,陕西西安 710119
  3.上海交通大学航空航天学院, 上海 200240
  4.空天地海一体化大数据应用技术国家工程实验室,陕西西安 710129
  5.西北工业大学计算机学院, 陕西西安 710129
- 作者简介：
  
  [ "裴　炤　男，1983年2月生，陕西西安人，博士、教授、博士生导师.主要从事计算机视觉与人工智能、图像处理与模式识别、机器学习的相关研究.E-mail：zpei@snnu.edu.cn" ]
  [ "邱文涛（1984-）男，1996年8月生，山东枣庄人.陕西师范大学计算机科学学院研究生，主要研究方向为计算机视觉和行人轨迹预测.E-mail:qiuwentao@snnu.edu.cn" ]
  [ "王淼男，1981年7月生，河南义马人，博士，上海交通大学航空航天学院助理研究员，主要研究方向为智能信息处理、数据挖掘、计算机视觉.E-mail:miaowang@sjtu.edu.cn" ]
- 基金信息：
  
  国家自然科学基金(61971273;61877038);陕西省重点研发计划(2021GY-032);中央高校基本科研业务(GK202003077);上海市自然科学基金(20ZR1427800)
- DOI：10.12263/DZXB.20210762
  中图分类号： TP391
- 收稿：2021-06-16，
  
  修回：2021-11-07，
  
  纸质出版：2022-07-25
- 稿件说明：
移动端阅览
裴炤,邱文涛,王淼等.基于Transformer动态场景信息生成对抗网络的行人轨迹预测方法[J].电子学报,2022,50(07):1537-1547.

PEI Zhao,QIU Wen-tao,WANG Miao,et al.Pedestrian Trajectory Prediction Method Using Dynamic Scene Information Based Transformer Generative Adversarial Network[J].ACTA ELECTRONICA SINICA,2022,50(07):1537-1547.
裴炤,邱文涛,王淼等.基于Transformer动态场景信息生成对抗网络的行人轨迹预测方法[J].电子学报,2022,50(07):1537-1547. DOI： 10.12263/DZXB.20210762.

PEI Zhao,QIU Wen-tao,WANG Miao,et al.Pedestrian Trajectory Prediction Method Using Dynamic Scene Information Based Transformer Generative Adversarial Network[J].ACTA ELECTRONICA SINICA,2022,50(07):1537-1547. DOI： 10.12263/DZXB.20210762.

摘要

行人轨迹预测是视频监控的重要组成部分，因现有方法未充分利用场景特征信息造成其预测轨迹不符合生活常识，导致行人轨迹预测精度较低出现明显偏离真实轨迹的情况.针对上述不足本文提出一种基于Transformer动态场景信息生成对抗网络（Generative Adversarial Network，GAN）的行人轨迹预测方法.该方法利用动态场景特征提取模块的卷积神经网络（Convolutional Neural Networks，CNN）模型对目标行人的动态场景信息进行特征提取，同时生成器网络中的编码器利用Transformer对行人的社会交互信息特征以及轨迹信息特征进行建模.在ETH和UCY数据集上的实验结果表明，与Social GAN模型相比，本文方法在多个场景下的平均位移误差准确率提高了25.61%，最终位移误差准确率提高了38.44%.

Abstract

Pedestrian trajectory prediction is an important part of video surveillance. The current methods are not accurate and sometimes violate common senses because scene information is not fully used. To eliminate the above shortcomings

this paper proposes a transformer generated adversarial network(GAN) algorithm which combines dynamic scene information with pedestrian social interaction information. The convolution neural network model of the dynamic scene extraction module is utilized to extract the dynamic scene information features of the target pedestrian

and the encoder in the generator network uses transformer to model the features of social interaction information and trajectory information of pedestrians. Experimental results on ETH and UCY datasets show that

compared with social GAN model

our method improves the accuracy of average displacement error by 25.61% and the accuracy of average final displacement error by 38.44% in multiple scenarios.

关键词

Keywords

references

PEI Z , QI X , ZHANG Y , et al . Human trajectory prediction in crowded scene using social-affinity long short-term memory [J]. Pattern Recognition , 2019 , 93 : 273 - 282 .

YAMAGUCHI K , BERG A C , ORTIZ L E , et al . Who are you with and where are you going? [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Colorado Springs : IEEE , 2011 : 1345 - 1352 .

DESOUZA G N , KAK A C . Vision for mobile robot navigation: A survey [J]. IEEE Trans on Pattern Analysis and Machine Intelligence , 2002 , 24 ( 2 ): 237 - 267 .

RUDENKO A , PALMIERI L , HERMAN M , et al . Human motion trajectory prediction: A survey [J]. The International Journal of Robotics Research , 2020 . 39 ( 8 ): 895 - 935 .

李康 , 李亚敏 , 胡学敏 , 等 . 基于卷积神经网络的鲁棒高精度目标跟踪算法 [J]. 电子学报 , 2018 , 46 ( 9 ): 2087 - 2093 .

LI K , LI Y M , HU X M , et al . A robust and accurate object tracking algorithm based on convolutional neural network [J]. Acta Electronica Sinica , 2018 , 46 ( 9 ): 2087 - 2093 . (in Chinese)

马少雄 , 邱实 , 唐颖 , 等 . 基于工地场景的深度学习目标跟踪算法 [J]. 电子学报 , 2020 , 48 ( 9 ): 1665 - 1671 .

MA S X , QIU S , TANG Y , et al . Deep learning target tracking algorithm based on construction site scene [J]. Acta Electronica Sinica , 2020 , 48 ( 9 ): 1665 - 1671 . (in Chinese)

S-C B LO , H-P CHAN , LIN J-S , et al . Artificial convolution neural network for medical image pattern recognition [J]. Neural Networks , 1995 , 8 ( 7-8 ): 1201 - 1214 .

PELLEGRINI S , ESS A , VAN G L . Improving data association by joint modeling of pedestrian trajectories and groupings [C]// Proceedings of the European Conference on Computer Vision . Crete : Springer , 2010 : 452 - 465 .

LERNER A , CHRYSANTHOU Y , LISCHINSKI D . Crowds by example [C]// Proceedings of the Computer Graphics Forum . Oxford : Blackwell Publishing Ltd , 2007 : 26 ( 3 ): 655 - 664 .

HELBING D , MOLNAR P . Social force model for pedestrian dynamics [J]. Physical Review E , 1995 , 51 ( 5 ): 4282 - 4286 .

KITANI K M , ZIEBART B D , BAGNELL J A , et al . Activity forecasting [C]// Proceedings of the European Conference on Computer Vision . Florence, Italy : Springer , 2012 : 201 - 214 .

LEE N , CHOI W , VERNAZA P , et al . Desire: Distant future prediction in dynamic scenes with interacting agents [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Hawaii, USA : IEEE , 2017 : 336 - 345 .

PELLEGRINI S , ESS A , SCHINDLER K , et al . You'll never walk alone: Modeling social behavior for multi-target tracking [C]// Proceedings of the 2009 IEEE 12th International Conference on Computer Vision . Kyoto, Japan : IEEE , 2009 : 261 - 268 .

MOUSSAID M , PEROZO N , GARNIER S , et al . The walking behaviour of pedestrian social groups and its impact on crowd dynamics [J]. Plos One , 2010 , 5 ( 3 ): e10047 .

XU Y , PIAO Z , GAO S . Encoding crowd interaction with deep neural network for pedestrian trajectory prediction [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City, USA : IEEE , 2018 : 5275 - 5284 .

ZHAO T , XU Y , MONFORT M , et al . Multi-agent tensor fusion for contextual trajectory prediction [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Long Beach, USA : IEEE , 2019 : 12126 - 12134 .

ALAHI A , RAMANATHAN V , FEI F L . Socially-aware large-scale crowd forecasting [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Columbus, USA : IEEE , 2014 : 2203 - 2210 .

ALAHI A , GOEL K , RAMANATHAN V , et al . Social LSTM: Human trajectory prediction in crowded spaces [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas, USA : IEEE , 2016 : 961 - 971 .

BALLAN L , CASTALDO F , ALAHI A , et al . Knowledge transfer for scene-specific motion prediction [C]// Proceedings of the European Conference on Computer Vision . Amsterdam, Netherlands : Springer , 2016 : 697 - 713 .

LIU J , SHAHROUDY A , XU D , et al . Spatio-temporal LSTM with trust gates for 3d human action recognition [C]// Proceedings of the European Conference on Computer Vision . Amsterdam, Netherlands : Springer , 2016 : 816 - 833 .

ROBICQUET A , SADEGHIAN A , ALAHI A , et al . Learning social etiquette: Human trajectory understanding in crowded scenes [C]// Proceedings of the European Conference on Computer Vision . Amsterdam, Netherlands : Springer , 2016 : 549 - 565 .

ALTCHÉ F , DEL A . An LSTM network for highway trajectory prediction [C]// Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems . Yokohama, Japan : IEEE , 2017 : 353 - 359 .

CHENG B , XU X , ZENG Y J , et al . Pedestrian trajectory prediction via the social-grid LSTM model [J]. Journal of Engineering-Joe , 2018 , 2018( 16 ): 1468 - 1474 .

FERNANDO T , DENMAN S , SRIDHARAN S , et al . Soft+hardwired attention: An LSTM framework for human trajectory prediction and abnormal event detection [J]. Neural Network , 2018 , 108 ( 1 ): 466 - 478 .

ZHANG P , OUYANG W , ZHANG P , et al . SR-LSTM: State refinement for LSTM towards pedestrian trajectory prediction [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Long Beach, USA : IEEE , 2019 : 12085 - 12094 .

金苍宏 , 董腾然 , 陈天翼 , 等 . 融合序列分解与时空卷积的时序预测算法 [J]. 电子学报 , 2021 , 49 ( 2 ): 233 - 238 .

JIN C H , DONG T R , CHEN T Y , et al . Spatio-temporal convolutional forecasting based on time-series decomposition strategy [J]. Acta Electronica Sinica , 2021 , 49 ( 2 ): 233 - 238 . (in Chinese)

GUPTA A , JOHNSON J , FEI F L , et al . Social GAN: Socially acceptable trajectories with generative adversarial networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City, USA : IEEE , 2018 : 2255 - 2264 .

SADEGHIAN A , KOSARAJUV . et al . Sophie: An attentive GAN for predicting paths compliant to social and physical constraints [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Long Beach, USA : IEEE , 2019 : 1349 - 1358 .

李志欣 , 孙亚茹 , 唐素勤 , 等 . 双路注意力引导图卷积网络的关系抽取 [J]. 电子学报 , 2021 , 49 ( 2 ): 315 - 323 .

Li Z X , Sun Y R , Tang S Q , et al . Dual attention guided graph convolutional networks for relation extraction [J]. Acta Electronica Sinica , 2021 , 49 ( 2 ): 315 - 323 . (in Chinese)

VEMULA A , MUELLING K , OH J . Social attention: Modeling attention in human crowds [C]// Proceedings of the 2018 IEEE International Conference on Robotics and Automation . China : IEEE , 2018 : 4601 - 4607 .

KOSARAJU V , SADEGHIAN A , MARTÍN M R , et al . Social-BIGAT: Multimodal trajectory forecasting using bicycle-GAN and graph attention networks [C]// Proceedings of the Advances in Neural Information Processing Systems . Vancouver, Canada : NIPS , 2019 : 137 - 146 .

DAI Z , YANG Z , YANG Y , et al . Transformer-XL: Attentive language models beyond a fixed-length context [C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics . Florence, Italy : ACL , 2019 : 2978 - 2988 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

隐空间采样与隐蔽特征提取的CR-GAN复杂无线信道建模

面向时序异常检测的可变视距多向扫描方法

基于稀疏平滑自蒸馏的差分隐私深度学习方法

基于非一般类算子融合方法及硬件架构设计