青岛科技大学信息科学技术学院,山东青岛 266061
[ "王芳芳 女,1997年11月出生于山东省潍坊市.现为青岛科技大学信息科学技术学院硕士研究生.主要研究方向为计算机视觉、轨迹预测.E-mail: 2024111009@mails.qust.edu.cn" ]
[ "刘明华 男,1980年2月出生于山东省聊城市.现为青岛科技大学信息科学技术学院教授.主要研究方向为计算机视觉、目标识别与跟踪、视频图像理解与分割等.E-mail: qustlmh@qust.edu.cn" ]
[ "渠连恩 男,1980年12月出生于山东省聊城市.现为青岛科技大学信息科学技术学院教授.主要研究方向为计算机视觉、智能交通、天气预测等.E-mail: lianen.qu@qust.edu.cn" ]
[ "王贺 男,1999年10月出生于山东省菏泽市.现为青岛科技大学信息科学技术学院硕士研究生.主要研究方向为计算机视觉、语音识别.E-mail: 19854299311@163.com" ]
[ "李丹宁 女,2000年9月出生于山东省泰安市.现为青岛科技大学信息科学技术学院硕士研究生.主要研究方向为计算机视觉、时间序列.E-mail: 15666254825@163.com" ]
收稿:2025-09-27,
录用:2025-12-15,
纸质出版:2025-12-25
移动端阅览
王芳芳, 刘明华, 渠连恩, 等. 基于图卷积与自适应Transformer的行人轨迹预测[J]. 电子学报, 2025, 53(12): 4507-4517.
WANG Fang-fang, LIU Ming-hua, QU Lian-en, et al. Pedestrian Trajectory Prediction Based on Graph Convolution and Adaptive Transformer[J]. Acta Electronica Sinica, 2025, 53(12): 4507-4517.
王芳芳, 刘明华, 渠连恩, 等. 基于图卷积与自适应Transformer的行人轨迹预测[J]. 电子学报, 2025, 53(12): 4507-4517. DOI:10.12263/DZXB.20250855
WANG Fang-fang, LIU Ming-hua, QU Lian-en, et al. Pedestrian Trajectory Prediction Based on Graph Convolution and Adaptive Transformer[J]. Acta Electronica Sinica, 2025, 53(12): 4507-4517. DOI:10.12263/DZXB.20250855
行人轨迹预测是自动驾驶和机器人导航等领域的核心挑战之一,其关键在于如何有效建模行人间的复杂交互关系并提取多尺度时空特征.本文提出一种基于图卷积与自适应Transformer的行人轨迹预测方法(pedestrian trajectory prediction method based on Graph Convolution and Adaptive Transformer,GCAT),通过层次化的特征提取与自适应交互建模实现高精度的轨迹预测.模型以历史观测时间窗口内所有行人的位置与速度信息作为输入,首先通过线性投影与正弦-余弦位置编码将原始观测映射至高维特征空间,以显式保留时序顺序信息.随后,引入关系图卷积网络捕获行人之间的局部拓扑结构及空间交互强度,通过基于特征余弦相似度的自适应邻接矩阵实时构建交互图,使图结构能够根据场景特征自适应调整.同时,引入增强型多层卷积结构,通过可学习的残差权重自适应平衡不同层级特征的贡献,有效缓解深层网络的梯度消失问题,增强局部交互特征的表达能力.此外,模型进一步引入空间自适应Transformer建模全局时空依赖关系,该模块通过可学习的空间偏移量实现特征图上的连续采样.具体实现中,模型通过线性层从输入特征中生成空间偏移量和注意力权重,偏移量与参考点坐标相加后经归一化得到实际采样位置,利用双线性插值从特征图中提取对应位置的特征值,再通过注意力权重进行加权聚合,获得对局部几何变化与全局时序依赖的增强表达.这种连续采样策略使模型能够聚焦于对轨迹预测最相关的空间区域,自适应地应对不同场景的几何布局变化.同时,模型融合多粒度时序特征,逐步提取从局部交互到全局依赖的多层次时空表达,有效解决了现有方法在长程依赖建模、环境适应性以及多尺度特征融合等关键方面存在的问题.在实验验证方面,本文在两个广泛使用的公共行人轨迹预测数据集ETH和UCY上对所提出的方法进行了系统评估.相比现有基线模型,所提出模型在平均位移误差(Average Displacement Error,ADE)和最终位移误差(Final Displacement Error
FDE)指标上分别取得了5.1%和13.2%的性能提升,验证了模型在复杂交互关系建模和多尺度时空特征提取方面的有效性与先进性.
Pedestrian trajectory prediction is one of the core challenges in fields such as autonomous driving and robotic navigation. Its key difficulty lies in effectively modeling complex interactions among pedestrians and extracting multi-scale spatiotemporal features. This paper proposes a pedestrian trajectory prediction method based on graph convolution and adaptive transformer (GCAT)
which achieves high-precision trajectory prediction through hierarchical feature extraction and adaptive interaction modeling.The model takes the position and velocity information of all pedestrians within a historical observation window as input. First
linear projection and sinusoidal positional encoding are applied to map the raw observations into a high-dimensional feature space
explicitly preserving temporal order information. Subsequently
a relational graph convolutional network is introduced to capture local topological structures and spatial interaction strengths among pedestrians. An adaptive adjacency matrix based on feature cosine similarity is constructed in real time to model pedestrian interactions
enabling the graph structure to dynamically adjust according to scene characteristics. In addition
an enhanced multi-layer convolutional structure is employed
where learnable residual weights are used to adaptively balance the contributions of features at different layers. This design effectively alleviates the gradient vanishing problem in deep networks and strengthens the representation capability of local interaction features.Furthermore
the model incorporates a spatially adaptive Transformer to model global spatiotemporal dependencies. This module achieves continuous sampling over feature maps through learnable spatial offsets. Specifically
spatial offsets and attention weights are generated from the input features via linear layers. The offsets are added to reference point coordinates and normalized to obtain actual sampling locations. Bilinear interpolation is then used to extract feature values at these locations from the feature maps
which are subsequently aggregated using the attention weights. This process yields enhanced representations that capture both local geometric variations and global temporal dependencies. The continuous sampling strategy enables the model to focus on spatial regions most relevant to trajectory prediction and to adaptively handle geometric layout variations across different scenes.Meanwhile
the model further integrates multi-granularity temporal features
progressively extracting multi-level spatiotemporal representations ranging from local interactions to global dependencies. This design effectively addresses key limitations of existing methods in modeling long-range dependencies
environmental adaptability
and multi-scale feature fusion.For experimental validation
the proposed method is systematically evaluated on two widely used public pedestrian trajectory prediction datasets
ETH and UCY. Compared with existing baseline models
the proposed approach achieves improvements of 5.1% and 13.2% in terms of average displacement error (ADE) and final displacement error (FDE)
respectively
demonstrating its effectiveness and superiority in complex interaction modeling and multi-scale spatiotemporal feature extraction.
DEO N , TRIVEDI M M . Convolutional social pooling for vehicle trajectory prediction [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops . Piscataway : IEEE , 2018 : 1549 - 15498 .
IVANOVIC B , PAVONE M . The trajectron: Probabilistic multi-agent trajectory modeling with dynamic spatiotemporal graphs [C ] // 2019 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2019 : 2375 - 2384 .
ROBICQUET A , SADEGHIAN A , ALAHI A , et al . Learning social etiquette: Human trajectory understanding in crowded scenes [C ] // Computer Vision - ECCV 2016 . Cham : Springer , 2016 : 549 - 565 .
LEFÈVRE S , VASQUEZ D , LAUGIER C . A survey on motion prediction and risk assessment for intelligent vehicles [J ] . ROBOMECH Journal , 2014 , 1 ( 1 ): 1 .
SHI L S , WANG L , LONG C J , et al . SGCN: Sparse graph convolution network for pedestrian trajectory prediction [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2021 : 8990 - 8999 .
胡春华 , 曾萼岚 , 荣辉桂 . 基于双图卷积机制的数字孪生交通流预测 [J ] . 电子学报 , 2025 , 53 ( 1 ): 141 - 150 .
HU C H , ZENG E L , RONG H G . Traffic flow prediction of digital twin based on two-graph convolution mechanism [J ] . Acta Electronica Sinica , 2025 , 53 ( 1 ): 141 - 150 . (in Chinese)
袁丁 , 李源 , 孟羽倩 , 等 . 基于时空注意力Transformer的自动驾驶运动规划方法 [J ] . 电子学报 , 2025 , 53 ( 7 ): 2418 - 2427 .
YUAN D , LI Y , MENG Y Q , et al . A motion planning method for autonomous driving based on spatiotemporal attention transformer [J ] . Acta Electronica Sinica , 2025 , 53 ( 7 ): 2418 - 2427 . (in Chinese)
GIULIARI F , HASAN I , CRISTANI M , et al . Transformer networks for trajectory forecasting [C ] // 2020 25th International Conference on Pattern Recognition . Piscataway : IEEE , 2021 : 10335 - 10342 .
GUPTA A , JOHNSON J , LI F F , et al . Social GAN: Socially acceptable trajectories with generative adversarial networks [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 2255 - 2264 .
HUANG Y F , BI H K , LI Z X , et al . STGAT: Modeling spatial-temporal interactions for human trajectory prediction [C ] // 2019 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2020 : 6271 - 6280 .
MOHAMED A , QIAN K , ELHOSEINY M , et al . Social-STGCNN: A social spatio-temporal graph convolutional neural network for human trajectory prediction [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 14412 - 14420 .
LI J C , YANG F , TOMIZUKA M , et al . EvolveGraph: Multi-agent trajectory prediction with dynamic relational reasoning [EB/OL ] . ( 2020-10-22 )[ 2025-10-10 ] . https://arXiv.org/abs/2003.13924 https://arXiv.org/abs/2003.13924 .
DEVLIN J , CHANG M W , LEE K , et al . BERT: Pre-training of deep bidirectional transformers for language understanding [C ] // Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) . Kerrville : Association for Computational Linguistics , 2019 : 4171 - 4186 .
SU Z X , HUANG G , ZHANG S Y , et al . Crossmodal transformer based generative framework for pedestrian trajectory prediction [C ] // 2022 International Conference on Robotics and Automation . Piscataway : IEEE , 2022 : 2337 - 2343 .
RADFORD A , NARASIMHAN K . Improving language understanding by generative pre-training [EB/OL ] . ( 2018 )[ 2025-10-10 ] . https://www.mikecaptain.com/resources/pdf/GPT-1.pdf https://www.mikecaptain.com/resources/pdf/GPT-1.pdf .
WANG A , SINGH A , MICHAEL J , et al . GLUE: A multi-task benchmark and analysis platform for natural language understanding [EB/OL ] . ( 2019-02-22 )[ 2025-10-10 ] . https://arXiv.org/abs/1804.07461 https://arXiv.org/abs/1804.07461 .
YAO H Y , WAN W G , LI X . End-to-end pedestrian trajectory forecasting with transformer network [J ] . ISPRS International Journal of Geo-Information , 2022 , 11 ( 1 ): 44 .
YU C J , MA X , REN J W , et al . Spatio-temporal graph transformer networks for pedestrian trajectory prediction [C ] // Computer Vision - ECCV 2020 . Cham : Springer , 2020 : 507 - 523 .
MANGALAM K , GIRASE H , AGARWAL S , et al . It is not the journey but the destination: Endpoint conditioned trajectory prediction [C ] // Computer Vision - ECCV 2020 . Cham : Springer , 2020 : 759 - 776 .
YUAN Y , WENG X S , OU Y L , et al . AgentFormer: Agent-aware transformers for socio-temporal multi-agent forecasting [C ] // 2021 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2022 : 9793 - 9803 .
KINGA D , ADAM J B . A method for stochastic optimization [C ] // International Conference on Learning Representations(ICLR) . Appleton : ICLR , 2015 : 50478691 .
PASZKE A , GROSS S , MASSA F , et al . PyTorch: An imperative style, high-performance deep learning library [EB/OL ] . ( 2019-12-03 )[ 2025-10-10 ] . https://arXiv.org/abs/1912.01703 https://arXiv.org/abs/1912.01703 .
ZHOU H , REN D C , XIA H X , et al . AST-GNN: An attention-based spatio-temporal graph neural network for Interaction-aware pedestrian trajectory prediction [J ] . Neurocomputing , 2021 , 445 : 298 - 308 .
ZHOU L , ZHAO Y L , YANG D Y , et al . GCHGAT: Pedestrian trajectory prediction using group constrained hierarchical graph attention networks [J ] . Applied Intelligence , 2022 , 52 ( 10 ): 11434 - 11447 .
LIAN J , REN W W , LI L H , et al . PTP-STGCN: Pedestrian trajectory prediction based on a spatio-temporal graph convolutional neural network [J ] . Applied Intelligence , 2023 , 53 ( 3 ): 2862 - 2878 .
TANG H W , WEI P , LI J P , et al . EvoSTGAT: Evolving spatiotemporal graph attention networks for pedestrian trajectory prediction [J ] . Neurocomputing , 2022 , 491 : 333 - 342 .
PENG Y S , ZHANG G F , SHI J , et al . SRAI-LSTM: A social relation attention-based interaction-aware LSTM for human trajectory prediction [J ] . Neurocomputing , 2022 , 490 : 258 - 268 .
LV K , YUAN L . SKGACN: Social knowledge-guided graph attention convolutional network for human trajectory prediction [J ] . IEEE Transactions on Instrumentation and Measurement , 2023 , 72 : 2517111 .
ZHANG X C , ANGELOUDIS P , DEMIRIS Y . Dual-branch spatio-temporal graph neural networks for pedestrian trajectory prediction [J ] . Pattern Recognition , 2023 , 142 : 109633 .
PENG Y S , ZHANG G F , SHI J , et al . MRGTraj: A novel non-autoregressive approach for human trajectory prediction [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2024 , 34 ( 4 ): 2318 - 2331 .
MARCHETTI F , BECATTINI F , SEIDENARI L , et al . SMEMO: Social memory for trajectory forecasting [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2024 , 46 ( 6 ): 4410 - 4425 .
CHEN W X , SANG H F , WANG J Y , et al . DSTIGCN: Deformable spatial-temporal interaction graph convolution network for pedestrian trajectory prediction [J ] . IEEE Transactions on Intelligent Transportation Systems , 2025 , 26 ( 5 ): 6923 - 6935 .
CHEN W X , SANG H F , WANG J Y , et al . WTGCN: Wavelet transform graph convolution network for pedestrian trajectory prediction [J ] . International Journal of Machine Learning and Cybernetics , 2024 , 15 ( 12 ): 5531 - 5548 .
XU C X , MAO W B , ZHANG W J , et al . Remember intentions: Retrospective-memory-based trajectory prediction [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 6478 - 6487 .
XU C X , LI M S , NI Z Y , et al . GroupNet: Multiscale hypergraph neural networks for trajectory prediction with relational reasoning [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 6488 - 6497 .
GU T P , CHEN G Y , LI J L , et al . Stochastic trajectory prediction via motion indeterminacy diffusion [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 17092 - 17101 .
XU C X , TAN R T , TAN Y H , et al . EqMotion: Equivariant multi-agent motion prediction with invariant interaction reasoning [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2023 : 1410 - 1420 .
HU Y , CHEN S H , ZHANG Y , et al . Collaborative motion prediction via neural motion message passing [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 6318 - 6327 .
SHAFIEE N , PADIR T , ELHAMIFAR E . Introvert: Human trajectory prediction via conditional 3D attention [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2021 : 16810 - 16820 .
孔玮 , 刘云 , 李辉 , 等 . 基于全局自适应有向图的行人轨迹预测 [J ] . 电子学报 , 2022 , 50 ( 8 ): 1905 - 1916 .
KONG W , LIU Y , LI H , et al . Pedestrian trajectory prediction based on global adaptive directed graph [J ] . Acta Electronica Sinica , 2022 , 50 ( 8 ): 1905 - 1916 . (in Chinese)
0
浏览量
12
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621