A Reliability-Aware Mechanism for Hierarchical Federated Learning

LIU Xiaoyan; YU Zhen; LIANG Jingyu; LIN Botao; HUANG Jiwei

doi:10.12263/DZXB.20250852

您当前的位置：

首页 >

文章列表页 >

A Reliability-Aware Mechanism for Hierarchical Federated Learning

PAPERS | 更新时间：2026-06-04

- A Reliability-Aware Mechanism for Hierarchical Federated Learning
- ACTA ELECTRONICA SINICA Vol. 54, Issue 1, Pages: 262-275(2026)
- 作者机构：
  
  1.中国石油大学（北京）石油数据挖掘北京市重点实验室，北京 102249
  2.中石油（北京）数智研究院有限公司，北京 102206
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China(62572484);Frontier Interdisciplinary Exploration Research Program of China University of Petroleum, Beijing(2462024XKQY003)
- DOI：10.12263/DZXB.20250852
  CLC： TP301;TN92
- Received：27 September 2025，
  
  Accepted：21 January 2026，
  
  Published：25 January 2026
- 稿件说明：
移动端阅览
刘晓燕, 余振, 梁晶语, 等. 可靠性感知的分层联邦学习机制[J]. 电子学报, 2026, 54(01): 262-275.

LIU Xiaoyan, YU Zhen, LIANG Jingyu, et al. A Reliability-Aware Mechanism for Hierarchical Federated Learning[J]. Acta Electronica Sinica, 2026, 54(01): 262-275.
刘晓燕, 余振, 梁晶语, 等. 可靠性感知的分层联邦学习机制[J]. 电子学报, 2026, 54(01): 262-275. DOI：10.12263/DZXB.20250852

LIU Xiaoyan, YU Zhen, LIANG Jingyu, et al. A Reliability-Aware Mechanism for Hierarchical Federated Learning[J]. Acta Electronica Sinica, 2026, 54(01): 262-275. DOI：10.12263/DZXB.20250852

摘要

分层联邦学习（Hierarchical Federated Learning，HFL）通过“终端-边缘-云”的层次化组织，在边缘侧执行组内聚合、云侧进行全局聚合，以实现跨区域的高效协同训练。然而，客户端数据普遍呈非独立同分布（Non-Independent and Identically Distributed，Non-IID）特性，易导致组内更新方向不一致、梯度偏移乃至收敛震荡，进而削弱全局模型性能。同时，边缘服务器受资源约束、负载波动与链路不稳定影响，存在性能退化甚至失效风险，可能引发组内聚合中断，降低系统稳定性与任务完成效率。对此，本文提出一种可靠性感知的分层联邦学习框架（Reliability-aware Hierarchical Federated Learning，R-HFL），将训练过程划分为可靠性感知分组阶段和全局聚合阶段。在分组阶段，综合客户端模型语义特征与地理邻近性进行联合聚类，以提升组内统计一致性并缓解Non-IID诱发的梯度偏移，同时引入边缘节点可靠性指标作为约束进行协同选择，优先选取高可靠性边缘服务器作为组内中间聚合器，从而降低聚合服务中断风险。进一步地，考虑边缘服务器可靠性的时变性与联邦训练的长期性，本文设计了失效触发的可靠性感知服务迁移机制。当组内聚合器发生故障时，将聚合任务动态迁移至可用边缘服务器，以保障训练连续性。为实现迁移过程的自适应决策，本文将多客户端迁移建模为马尔可夫决策过程（Markov Decision Process，MDP），采用多智能体近端策略优化（Multi-Agent Proximal Policy Optimization，MAPPO）于集中式训练、分布式执行（Centralized Training with Decentralized Execution，CTDE）框架中学习迁移策略；通过统一的奖励与约束机制动态权衡迁移成本、迁移后通信开销与语义分布相似度，从而实现迁移目标的自适应选择、迁移后快速适配与收敛稳定性维持。最后，在两个真实数据集及多种Non-IID划分场景下进行实验验证。结果表明，所提R-HFL在全局模型精度与收敛速度上优于基线方法，并能在边缘服务器失效情况下显著降低训练中断风险与迁移开销，提升系统整体鲁棒性和故障容忍能力。

Abstract

Hierarchical federated learning (HFL) operates in a client-edge-cloud architecture

where intra-group aggregation is carried out at the edge and global aggregation is performed in the cloud

enabling efficient distributed collaborative training. However

client data is typically non-independent and identically distributed (Non-IID)

which may yield inconsistent local updates

leading to gradient drift and convergence instability

and degrading global model performance. Meanwhile

edge servers are subject to resource limitations

workload fluctuations

and unstable links

which can cause performance degradation or even failures. Such events may interrupt intra-group aggregation

undermining system stability and task completion efficiency. To address these challenges

this paper proposes a reliability-aware hierarchical federated learning framework (R-HFL) that decomposes the training procedure into a reliability-aware grouping stage and a global aggregation stage. In the grouping stage

we jointly cluster clients by integrating model semantic similarity and geographic proximity

improving intra-group statistical consistency and mitigating gradient drift induced by Non-IID data. In addition

an edge reliability metric is incorporated as a reliability-aware selection criterion

prioritizing highly reliable edge servers as group-level aggregators to reduce the risk of aggregation interruption. Furthermore

to account for the time-varying reliability of edge servers and the long-term horizon of federated training

we design a failure-triggered task migration mechanism: when a group-level aggregator fails

the aggregation task is dynamically migrated to an available edge server to maintain training continuity. To enable adaptive migration decisions

we formulate the migration process as a markov decision process (MDP) and adopt multi-agent proximal policy optimization (MAPPO) under centralized training and decentralized execution (CTDE) to learn migration policies. A unified reward function with constraints is further designed to dynamically balance migration cost

post-migration communication overhead

and semantic distribution similarity

facilitating an adaptive trade-off among objectives

fast migration adaptation

and sustained convergence stability. Finally

extensive experiments are conducted on two real-world datasets under different Non-IID scenarios. The results demonstrate that R-HFL consistently outperforms baseline methods in terms of global accuracy and convergence rate

while substantially reducing the risk of training disruption and migration overhead under edge server failures

thereby improving overall system robustness and fault tolerance.

关键词

Keywords

references

张晶 , 关建峰 , 刘科显 , 等 . 基于动态势博弈的边缘算力网络任务调度算法 [J ] . 电子学报 , 2025 , 53 ( 1 ): 221 - 237 .

Zhang Jing , Guan Jianfeng , Liu Kexian , et al . Task scheduling algorithm based on dynamic potential game for edge compute first networking [J ] . Acta Electronica Sinica , 2025 , 53 ( 1 ): 221 - 237 . (in Chinese)

Guo T , Guo S , Wang J X , et al . PromptFL: Let federated participants cooperatively learn prompts instead of models-federated learning in age of foundation model [J ] . IEEE Transactions on Mobile Computing , 2024 , 23 ( 5 ): 5179 - 5194 . DOI: 10.1109/tmc.2023.3302410 http://dx.doi.org/10.1109/tmc.2023.3302410

Quan H Y , Zhang Q M , Zhao J H . Federated learning assisted intelligent IoV mobile edge computing [J ] . IEEE Transactions on Green Communications and Networking , 2025 , 9 ( 1 ): 228 - 241 . DOI: 10.1109/tgcn.2024.3421357 http://dx.doi.org/10.1109/tgcn.2024.3421357

Huang J W , Liu F Z , Zhang J B . Multi-dimensional QoS evaluation and optimization of mobile edge computing for IoT: A survey [J ] . Chinese Journal of Electronics , 2024 , 33 ( 4 ): 859 - 874 . DOI: 10.23919/cje.2023.00.264 http://dx.doi.org/10.23919/cje.2023.00.264

蒋伟进 , 杜熙晨 , 蒋意容 , 等 . 基于自适应联邦学习的环境监测群智感知算法 [J ] . 电子学报 , 2025 , 53 ( 3 ): 821 - 835 .

Jiang Weijin , Du Xichen , Jiang Yirong , et al . Adaptive federated learning based crowd sensing algorithm for environmental monitoring [J ] . Acta Electronica Sinica , 2025 , 53 ( 3 ): 821 - 835 . (in Chinese)

Tu J K , Yang L , Cao J N . Distributed machine learning in edge computing: Challenges, solutions and future directions [J ] . ACM Computing Surveys , 2025 , 57 ( 5 ): 1 - 37 . DOI: 10.1145/3708495 http://dx.doi.org/10.1145/3708495

Tang J , Li X H , Li H , et al . Joint class-balanced client selection and bandwidth allocation for cost-efficient federated learning in mobile edge computing networks [J ] . IEEE Transactions on Mobile Computing , 2025 , 24 ( 7 ): 5681 - 5698 . DOI: 10.1109/tmc.2025.3539284 http://dx.doi.org/10.1109/tmc.2025.3539284

Letaief K B , Shi Y M , Lu J M , et al . Edge artificial intelligence for 6G: Vision, enabling technologies, and applications [J ] . IEEE Journal on Selected Areas in Communications , 2022 , 40 ( 1 ): 5 - 36 . DOI: 10.1109/jsac.2021.3126076 http://dx.doi.org/10.1109/jsac.2021.3126076

刘松 , 罗杨宇 , 许佳培 , 等 . 基于轻量自蒸馏的低成本联邦学习 [J ] . 电子学报 , 2025 , 53 ( 1 ): 259 - 269 .

Liu Song , Luo Yangyu , Xu Jiapei , et al . Low-cost federated learning based on lightweight self-distillation [J ] . Acta Electronica Sinica , 2025 , 53 ( 1 ): 259 - 269 . (in Chinese)

Zhou X K , Ye X Z , Wang K I , et al . Hierarchical federated learning with social context clustering-based participant selection for Internet of medical things applications [J ] . IEEE Transactions on Computational Social Systems , 2023 , 10 ( 4 ): 1742 - 1751 . DOI: 10.1109/tcss.2023.3259431 http://dx.doi.org/10.1109/tcss.2023.3259431

康海燕 , 冀珊珊 . 面向无线边缘网络的分层Stackelberg博弈群体激励方法 [J ] . 电子学报 , 2024 , 52 ( 7 ): 2382 - 2392 .

Kang Haiyan , Ji Shanshan . Hierarchical Stackelberg game swarm learning incentive method forWireless edge network [J ] . Acta Electronica Sinica , 2024 , 52 ( 7 ): 2382 - 2392 . (in Chinese)

Ma C M , Li X Q , Huang B G , et al . Personalized client-edge-cloud hierarchical federated learning in mobile edge computing [J ] . Journal of Cloud Computing , 2024 , 13 ( 1 ): 161 . DOI: 10.1186/s13677-024-00721-w http://dx.doi.org/10.1186/s13677-024-00721-w

Zhu G X , Wang Y , Huang K B . Broadband analog aggregation for low-latency federated edge learning [J ] . IEEE Transactions on Wireless Communications , 2020 , 19 ( 1 ): 491 - 506 . DOI: 10.1109/twc.2019.2946245 http://dx.doi.org/10.1109/twc.2019.2946245

Xu Z C , Zhao D P , Liang W F , et al . HierFedML: Aggregator placement and UE assignment for hierarchical federated learning in mobile edge computing [J ] . IEEE Transactions on Parallel and Distributed Systems , 2023 , 34 ( 1 ): 328 - 345 . DOI: 10.1109/tpds.2022.3218807 http://dx.doi.org/10.1109/tpds.2022.3218807

Singh N , Rupchandani J , Adhikari M . Personalized federated learning for heterogeneous edge device: Self-knowledge distillation approach [J ] . IEEE Transactions on Consumer Electronics , 2024 , 70 ( 1 ): 4625 - 4632 . DOI: 10.1109/tce.2023.3327757 http://dx.doi.org/10.1109/tce.2023.3327757

Li T , Sahu A , Zaheer M , et al . Federated optimization in heterogeneous networks [J ] . Proceedings of Machine Learning and Systems , 2020 , 2 ( 3 ): 429 - 450 .

Liang J Y , Ma B W , Feng Z H , et al . Reliability-aware task processing and offloading for data-intensive applications in edge computing [J ] . IEEE Transactions on Network and Service Management , 2023 , 20 ( 4 ): 4668 - 4680 . DOI: 10.1109/tnsm.2023.3258191 http://dx.doi.org/10.1109/tnsm.2023.3258191

Lyu X T , Han Y F , Wang W , et al . Poisoning with Cerberus: Stealthy and colluded backdoor attack against federated learning [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2023 , 37 ( 7 ): 9020 - 9028 . DOI: 10.1609/aaai.v37i7.26083 http://dx.doi.org/10.1609/aaai.v37i7.26083

Abdelmoniem A M , Ho C Y , Papageorgiou P , et al . Empirical analysis of federated learning in heterogeneous environments [C ] // Proceedings of the 2nd European Workshop on Machine Learning and Systems . New York : ACM , 2022 : 3526969 . DOI: 10.1145/3517207.3526969 http://dx.doi.org/10.1145/3517207.3526969

Chen L M , Zhao D H , Tao L P , et al . A credible and fair federated learning framework based on blockchain [J ] . IEEE Transactions on Artificial Intelligence , 2025 , 6 ( 2 ): 301 - 316 . DOI: 10.1109/tai.2024.3355362 http://dx.doi.org/10.1109/tai.2024.3355362

Ameen M , Khan R U , Wang P F , et al . Addressing unreliable local models in federated learning through unlearning [J ] . Neural Networks , 2024 , 180 : 106688 . DOI: 10.1016/j.neunet.2024.106688 http://dx.doi.org/10.1016/j.neunet.2024.106688

Li J C , Li G B , Cheng H , et al . FedDiv: Collaborative noise filtering for federated learning with noisy labels [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2024 , 38 ( 4 ): 3118 - 3126 . DOI: 10.1609/aaai.v38i4.28095 http://dx.doi.org/10.1609/aaai.v38i4.28095

Zhang Z X , Cao X Y , Jia J Y , et al . FLDetector: Defending federated learning against model poisoning attacks via detecting malicious clients [C ] // Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining . New York : ACM , 2022 : 2545 - 2555 . DOI: 10.1145/3534678.3539231 http://dx.doi.org/10.1145/3534678.3539231

Chen X H , Xu G Y , Xu X S , et al . Multicenter hierarchical federated learning with fault-tolerance mechanisms for resilient edge computing networks [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2025 , 36 ( 1 ): 47 - 61 . DOI: 10.1109/tnnls.2024.3362974 http://dx.doi.org/10.1109/tnnls.2024.3362974

Li Z J , Wu H , Lu Y L , et al . Mitigating straggler effect in federated learning based on reconfigurable intelligent surface over Internet of Vehicles [J ] . China Communications , 2024 , 21 ( 8 ): 62 - 78 . DOI: 10.23919/jcc.fa.2023-0180.202408 http://dx.doi.org/10.23919/jcc.fa.2023-0180.202408

Ye H , Liang L , Li G Y . Decentralized federated learning with unreliable communications [J ] . IEEE Journal of Selected Topics in Signal Processing , 2022 , 16 ( 3 ): 487 - 500 . DOI: 10.1109/jstsp.2022.3152445 http://dx.doi.org/10.1109/jstsp.2022.3152445

Mao Y Z , Zhao Z H , Yang M L , et al . SAFARI: Sparsity-enabled federated learning with limited and unreliable communications [J ] . IEEE Transactions on Mobile Computing , 2024 , 23 ( 5 ): 4819 - 4831 . DOI: 10.1109/tmc.2023.3296624 http://dx.doi.org/10.1109/tmc.2023.3296624

王鑫 , 周泽宝 , 余芸 , 等 . 一种面向电能量数据的联邦学习可靠性激励机制 [J ] . 计算机科学 , 2022 , 49 ( 3 ): 31 - 38 .

Wang Xin , Zhou Zebao , Yu Yun , et al . Reliable incentive mechanism for federated learning of electric metering data [J ] . Computer Science , 2022 , 49 ( 3 ): 31 - 38 . (in Chinese)

Cai H Y , Gao L J , Wang Jiahao , et al . Reliable incentive mechanism in hierarchical federated learning based on two-way reputation and contract theory [J ] . Future Generation Computer Systems , 2024 , 159 : 533 - 544 . DOI: 10.1016/j.future.2024.05.045 http://dx.doi.org/10.1016/j.future.2024.05.045

Qin Z X , Yang L , Wang Q L , et al . Reliable and interpretable personalized federated learning [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2023 : 20422 - 20431 . DOI: 10.1109/cvpr52729.2023.01956 http://dx.doi.org/10.1109/cvpr52729.2023.01956

Zhou Z , Zhuang Y R , Li H J , et al . MR-FFL: A stratified community-based mutual reliability framework for fairness-aware federated learning in heterogeneous UAV networks [J ] . IEEE Internet of Things Journal , 2024 , 11 ( 12 ): 20995 - 21009 . DOI: 10.1109/jiot.2024.3357779 http://dx.doi.org/10.1109/jiot.2024.3357779

Sharma M , Kaur P . Reliable federated learning in a cloud-fog-IoT environment [J ] . The Journal of Supercomputing , 2023 , 79 ( 14 ): 15435 - 15458 . DOI: 10.1007/s11227-023-05252-w http://dx.doi.org/10.1007/s11227-023-05252-w

Han X X , Zhang G Y , Yang L B , et al . Client dependability evaluation in federated learning framework [C ] // Third International Conference on Communications, Information System, and Data Science . Washington : SPIE , 2025 : 135190 A. 1 - 135190 A. 9 .

Alosaime S , Jhumka A . FLARE: Availability awareness for resource-efficient federated learning [C ] // 2024 IEEE 29th Pacific Rim International Symposium on Dependable Computing . Piscataway : IEEE , 2024 : 66 - 75 . DOI: 10.1109/prdc63035.2024.00020 http://dx.doi.org/10.1109/prdc63035.2024.00020

Acar B , Sterling M . Ensuring federated learning reliability for infrastructure-enhanced autonomous driving [J ] . Journal of Intelligent and Connected Vehicles , 2023 , 6 ( 3 ): 125 - 135 . DOI: 10.26599/jicv.2023.9210009 http://dx.doi.org/10.26599/jicv.2023.9210009

Mcmahan B , Moore E , Ramage D , et al . Communication-efficient learning of deep networks from decentralized data [C/OL ] // Proceedings of Artificial Intelligence and Statistics . PMLR , 2017 : 1273 - 1282 . https://proceedings.mlr.press/v54/mcmahan17a https://proceedings.mlr.press/v54/mcmahan17a .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Hierarchical Federated Learning-Based RAN Slicing for Drone-Small-Cells

Spatial-FineDef: An Approach for Detecting Small Defects in Wind Turbine Blades that Integrate Multi-Scale Perception and Adaptive Enhancement

Scene Graph Generation of Livestreaming Video via VLM Convex Optimization

A TCAD-DNN-Based Total-Ionizing-Dose Effect Model for FinFET Devices

Robust Node-Specific Distributed Generalized Sidelobe Canceler for Outdoor Multi-Source Enhancement

Related Author

YIN Min

SHEN Hang

WANG Tian-jing

BAI Guang-wei

Related Institution

College of Computer and Information Engineering， Nanjing Tech University

⁰