基于混合专家的流量分类基础模型

周嘉俊; 孙长辉; 何美静; 俞山青

doi:10.12263/DZXB.20251026

您当前的位置：

首页 >

文章列表页 >

基于混合专家的流量分类基础模型

密态网络安全前沿技术 | 更新时间：2026-06-17

- 基于混合专家的流量分类基础模型
- A Foundation Model for Traffic Classification Based on Mixture-of-Experts
- 电子学报 2026年54卷第4期页码：1460-1480
- 作者机构：
  
  1.浙江工业大学网络空间安全研究院，浙江杭州 310023
  2.杭州市滨江区浙工大人工智能创新研究院，浙江杭州 310056
  3.杭州数宇智汇科技发展有限责任公司，浙江杭州 310056
- 作者简介：
  
  周嘉俊男，1995年12月出生于浙江省杭州市。现为浙江工业大学网络空间安全研究院特聘副研究员。主要研究方向为互联网安全、大模型安全、图机器学习。E-mail: jjzhou@zjut.edu.cn
  孙长辉男，2002年4月出生于江西省赣州市。现为浙江工业大学网络空间安全研究院硕士研究生。主要研究方向为网络流量分析、恶意流量检测。E-mail: sunchanghui@zjut.edu.cn
  何美静女，2004年11月出生于陕西省西安市。现为浙江工业大学网络空间安全研究院硕士研究生。主要研究方向为网络流量分析、恶意流量检测。E-mail: mjhe@zjut.edu.cn
  俞山青女，1984年2月出生于浙江省杭州市。现为浙江工业大学网络空间安全研究院副教授。主要研究方向为互联网安全、推荐系统安全。E-mail: yushanqing@zjut.edu.cn
- 基金信息：
  
  国家自然科学基金(62503423);国家重点研发计划(2025YFA1510900);浙江省自然科学基金联合重点(LBMHZ25F020002)
- DOI：10.12263/DZXB.20251026
  中图分类号： TP393.08;
- 收稿：2026-01-13，
  
  录用：2026-02-05，
  
  纸质出版：2026-04-25
- 稿件说明：
移动端阅览
周嘉俊, 孙长辉, 何美静, 等. 基于混合专家的流量分类基础模型[J]. 电子学报, 2026, 54(04): 1460-1480.

ZHOU Jiajun, SUN Changhui, HE Meijing, et al. A Foundation Model for Traffic Classification Based on Mixture-of-Experts[J]. Acta Electronica Sinica, 2026, 54(04): 1460-1480.
周嘉俊, 孙长辉, 何美静, 等. 基于混合专家的流量分类基础模型[J]. 电子学报, 2026, 54(04): 1460-1480. DOI：10.12263/DZXB.20251026

ZHOU Jiajun, SUN Changhui, HE Meijing, et al. A Foundation Model for Traffic Classification Based on Mixture-of-Experts[J]. Acta Electronica Sinica, 2026, 54(04): 1460-1480. DOI：10.12263/DZXB.20251026

摘要

随着网络通信技术的快速演进，攻击者广泛利用流量加密技术来隐匿恶意行为，导致基于端口匹配与深度包检测（Deep Packet Inspection，DPI）的传统流量分析技术的性能显著下降，网络安全防御边界日益模糊。尽管近年来基于深度学习与预训练技术的流量分类方法在提取网络流量深层特征方面取得了显著进展，但现有研究多采用密集型Transformer架构，模型推理时需激活全部参数，导致计算成本与模型规模紧密耦合，引发高昂的推理延迟与显存开销。在面对现代网络环境高吞吐量与实时检测的需求时，这种计算效率瓶颈极易形成防御漏洞，严重制约了大规模深度学习模型在实际网络防御场景中的部署与应用。为有效解决模型容量扩展与推理效率之间的矛盾，本文提出了一种专为异构流量分类设计的稀疏基础模型Traffic-MoE。该模型不仅沿用了“预训练-微调”范式以应对安全领域标注数据稀缺的挑战，更创新性地引入稀疏混合专家（Mixture-of-Experts，MoE）架构，实现对通用协议特征与特定领域行为的解耦建模。具体而言，本文首先设计了Traffic2Token异构流量表征方法，针对原始流量跨协议、跨设备的复杂特性，通过融合数据包关键特征与有效载荷，利用二元语法（bigram）分词技术构建细粒度Token序列，在保留字节级时序依赖关系的同时，有效抑制了底层噪声干扰。在此基础上，本文在Transformer架构中嵌入稀疏MoE模块以取代传统的密集前馈网络（Feed-Forward Network，FFN），利用可学习的门控网络实施动态路由策略，对于每个输入流量Token仅激活前

个最相关的特化专家，并保留共享专家以捕获通用的协议语法，从而在大幅扩展模型总容量的同时显著降低推理开销。依托自主构建的包含200万条会话流的无标签预训练语料库，模型通过自回归的“下一Token预测”任务习得网络协议的状态转换逻辑，随后仅需轻量级的监督微调便能快速适配下游任务。为了全面评估模型性能，本文在四个权威公开数据集上构建了六个典型的下游分类任务，涵盖物联网攻击检测、加密服务识别、匿名流量分析等多类场景。实验结果表明，相较于ET-BERT、NetGPT等现有先进基线方法，Traffic-MoE展现出更卓越的泛化能力与鲁棒性，整体检测性能平均提高了8.44%。更关键的是，得益于稀疏激活机制带来的计算优势，在同等参数规模下，Traffic-MoE相较于传统密集型架构实

现了37.45%的吞吐量提升、27.25%的推理延迟缩减以及27.04%的GPU峰值显存消耗降低，为高效网络流量分析建立了一种新的范式。

Abstract

With the rapid evolution of network communication technologies

adversaries extensively leverage traffic encryption techniques to conceal malicious behaviors. Consequently

the performance of traditional traffic analysis methods based on port matching and deep packet inspection (DPI) has declined significantly

rendering network security defense boundaries increasingly blurred. Although traffic classification methods based on deep learning and pre-training techniques have made significant progress in extracting deep features of network traffic

existing studies predominantly adopt dense Transformer architectures. These models necessitate the activation of all parameters during inference

resulting in a tight coupling between computational cost and model scale

thereby incurring high inference latency and memory overhead. In the face of demands for high throughput and real-time detection in modern network environments

this computational efficiency bottleneck tends to create critical defense vulnerabilities

severely constraining the deployment and application of large-scale deep learning models in practical network defense scenarios. To effectively resolve the contradiction between model capacity expansion and inference efficiency

this paper proposes Traffic-MoE

a sparse foundation model designed specifically for heterogeneous traffic classification. This model not only follows the “pre-training and fine-tuning” paradigm to address the challenge of labeled data scarcity in the security domain but also innovatively introduces a sparse mixture-of-experts (MoE) architecture to achieve decoupled modeling of general protocol features and domain-specific behaviors. Specifically

we first design the Traffic2Token heterogeneous traffic representation method. Addressing the complex cross-protocol and cross-device characteristics of raw traffic

this method integrates critical packet features with payloads and utilizes bigram tokenization to construct fine-grained token sequences

effectively suppressing underlying noise interference while preserving byte-level temporal dependencies. On this basis

we embed sparse MoE modules into the Transformer architecture to replace traditional dense feed-forward networks (FFN). By leveraging a learnable gating network to implement a dynamic routing strategy

the model activates only the top-k most relevant specialized experts for each input traffic token while retaining a shared expert to capture general protocol syntax

thereby significantly reducing inference overhead while substantially expanding total model capacity. Leveraging a self-constructed unlabeled pre-training corpus containing 2 million session flows

the model learns the state transition logic of network protocols through an autoregressive “next-token prediction” task

subsequently requiring only lightweight supervised fine-tuning to rapidly adapt to downstream tasks. To comprehensively evaluate model performance

we construct six typical downstream classification tasks across four authoritative public datasets

covering diverse scenarios such as IoT (Internet of Things) attack detection

encrypted service identification

and anonymous traffic analysis. Experimental results demonstrate that compared to existing state-of-the-art baselines such as ET-BERT and NetGPT

Traffic-MoE exhibits superior generalization ability and robustness

with an average improvement of 8.44% in overall detection performance. Crucially

benefiting from the computational advantages brought by the sparse activation mechanism

Traffic-MoE achieves a 37.45% increase in throughput

a 27.25% reduction in inference latency

and a 27.04% decrease in peak GPU memory consumption compared to traditional dense architectures with equivalent parameter scales

establishing a new paradigm for efficient network traffic analysis.

关键词

Keywords

references

Wazid M , Das A K , Shetty S , et al . Security in 5G-enabled internet of things communication: Issues, challenges, and future research roadmap [J ] . IEEE Access , 2021 , 9 : 4466 - 4489 . DOI: 10.1109/access.2020.3047895 http://dx.doi.org/10.1109/access.2020.3047895

Buczak A L , Guven E . A survey of data mining and machine learning methods for cyber security intrusion detection [J ] . IEEE Communications Surveys & Tutorials , 2016 , 18 ( 2 ): 1153 - 1176 . DOI: 10.1109/comst.2015.2494502 http://dx.doi.org/10.1109/comst.2015.2494502

Zeng Yi , Gu Huaxi , Wei Wenting , et al . Deep-Full-Range: A deep learning based network encrypted traffic classification and intrusion detection framework [J ] . IEEE Access , 2019 , 7 : 45182 - 45190 . DOI: 10.1109/access.2019.2908225 http://dx.doi.org/10.1109/access.2019.2908225

Bekerman D , Shapira B , Rokach L , et al . Unknown malware detection using network traffic classification [C ] // 2015 IEEE Conference on Communications and Network Security (CNS) . Piscataway : IEEE , 2015 : 134 - 142 . DOI: 10.1109/cns.2015.7346821 http://dx.doi.org/10.1109/cns.2015.7346821

Xuan C D . Detecting APT attacks based on network traffic using machine learning [J ] . Journal of Web Engineering , 2021 , 20 ( 1 ): 171 - 190 .

Azab A , Khasawneh M , Alrabaee S , et al . Network traffic classification: Techniques, datasets, and challenges [J ] . Digital Communications and Networks , 2024 , 10 ( 3 ): 676 - 692 . DOI: 10.1016/j.dcan.2022.09.009 http://dx.doi.org/10.1016/j.dcan.2022.09.009

Taylor V F , Spolaor R , Conti M , et al . Robust smartphone app identification via encrypted network traffic analysis [J ] . IEEE Transactions on Information Forensics and Security , 2018 , 13 ( 1 ): 63 - 78 . DOI: 10.1109/TIFS.2017.2737970 http://dx.doi.org/10.1109/TIFS.2017.2737970

Van Ede T , Bortolameotti R , Continella A , et al . FlowPrint: Semi-supervised mobile-app fingerprinting on encrypted network traffic [C ] // 27th Annual Network and Distributed System Security Symposium (NDSS) . Reston : The Internet Society , 2020 : 24412 . DOI: 10.14722/ndss.2020.24412 http://dx.doi.org/10.14722/ndss.2020.24412

Liu Chang , He Longtao , Xiong Gang , et al . FS-net: A flow sequence network for encrypted traffic classification [C ] // IEEE INFOCOM 2019-IEEE Conference on Computer Communications . Piscataway : IEEE , 2019 : 1171 - 1179 . DOI: 10.1109/INFOCOM.2019.8737507 http://dx.doi.org/10.1109/INFOCOM.2019.8737507

Shen Meng , Zhang Jinpeng , Zhu Liehuang , et al . Accurate decentralized application identification via encrypted traffic analysis using graph neural networks [J ] . IEEE Transactions on Information Forensics and Security , 2021 , 16 : 2367 - 2380 . DOI: 10.1109/TIFS.2021.3050608 http://dx.doi.org/10.1109/TIFS.2021.3050608

Zhang Haozhen , Yu Le , Xiao Xi , et al . TFE-GNN: A temporal fusion encoder using graph neural networks for fine-grained encrypted traffic classification [C ] // Proceedings of the ACM Web Conference 2023 . New York : ACM , 2023 : 2066 - 2075 . DOI: 10.1145/3543507.3583227 http://dx.doi.org/10.1145/3543507.3583227

Guo Chaoqun , Wang Nan , Sun Yuanlin , et al . DTC: Addressing the long-tailed problem in intrusion detection through the divide-then-conquer paradigm [C ] // 2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS) . Piscataway : IEEE , 2023 : 1319 - 1326 . DOI: 10.1109/icpads60453.2023.00189 http://dx.doi.org/10.1109/icpads60453.2023.00189

Gui Jie , Chen Tuo , Zhang Jing , et al . A survey on self-supervised learning: Algorithms, applications, and future trends [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2024 , 46 ( 12 ): 9052 - 9071 . DOI: 10.1109/tpami.2024.3415112 http://dx.doi.org/10.1109/tpami.2024.3415112

He Hongye , Yang Zhiguo , Chen Xiangning . Payload encoding representation from transformer for encrypted traffic classification [J ] . ZTE Communications , 2021 , 19 ( 4 ): 90 - 97 .

Lin Xinjie , Xiong Gang , Gou Gaopeng , et al . ET-BERT: A contextualized datagram representation with pre-training transformers for encrypted traffic classification [C ] // Proceedings of the ACM Web Conference 2022 . New York : ACM , 2022 : 633 - 642 . DOI: 10.48550/arXiv.2202.06335 http://dx.doi.org/10.48550/arXiv.2202.06335

Meng Xuying , Lin Chungang , Wang Yequan , et al . NetGPT: Generative pretrained transformer for network traffic [PP/OL ] . V2. arXiv ( 2023-05-17 )[ 2025-09-01 ] . https://arxiv.org/abs/2304.09513v2 https://arxiv.org/abs/2304.09513v2 . DOI: 10.1109/isaeece66033.2025.11159936 http://dx.doi.org/10.1109/isaeece66033.2025.11159936

Zhao Ruijie , Zhan Mingwei , Deng Xianwen , et al . Yet another traffic classifier: A masked autoencoder based traffic transformer with multi-level flow representation [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2023 , 37 ( 4 ): 5420 - 5427 . DOI: 10.1609/aaai.v37i4.25674 http://dx.doi.org/10.1609/aaai.v37i4.25674

Zhou Guangmeng , Guo Xiongwen , Liu Zhuotao , et al . TrafficFormer: An efficient pre-trained model for traffic data [C ] // 2025 IEEE Symposium on Security and Privacy (SP) . Piscataway : IEEE , 2025 : 1844 - 1860 . DOI: 10.1109/sp61157.2025.00102 http://dx.doi.org/10.1109/sp61157.2025.00102

Shazeer N , Mirhoseini A , Maziarz K , et al . Outrageously large neural networks: The sparsely-gated mixture-of-experts layer [C/OL ] // Proceedings of the 5th International Conference on Learning Representations , 2017 : 1 - 19 . https://openreview.net/forum?id=B1ckMDqlg https://openreview.net/forum?id=B1ckMDqlg . DOI: 10.48550/arXiv.1701.06538 http://dx.doi.org/10.48550/arXiv.1701.06538

Fedus W , Zoph B , Shazeer N . Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity [J ] . The Journal of Machine Learning Research , 2022 , 23 ( 1 ): 5232 - 5270 .

Jiang A Q , Sablayrolles A , Roux A , et al . Mixtral of experts [PP/OL ] . V1. arXiv ( 2024-01-08 )[ 2025-09-01 ] . https://arxiv.org/abs/2401.04088 https://arxiv.org/abs/2401.04088 .

Riquelme C , Puigcerver J , Mustafa B , et al . Scaling vision with sparse mixture of experts [C ] // Proceedings of the 35th International Conference on Neural Information Processing Systems . New York : Curran Associates Inc. , 2021 : 657 . DOI: 10.48550/arXiv.2106.05974 http://dx.doi.org/10.48550/arXiv.2106.05974

Wei Guanglu , Wang Zhonghua . Adoption and realization of deep learning in network traffic anomaly detection device design [J ] . Soft Computing , 2021 , 25 ( 2 ): 1147 - 1158 . DOI: 10.1007/s00500-020-05210-1 http://dx.doi.org/10.1007/s00500-020-05210-1

Moore A W , Papagiannaki K . Toward the accurate identification of network applications [C ] // 6th International Workshop on Passive and Active Network Measurement . Berlin : Springer , 2005 : 41 - 54 . DOI: 10.1007/978-3-540-31966-5_4 http://dx.doi.org/10.1007/978-3-540-31966-5_4

Saber A , Fergani B , Abbas M . Encrypted traffic classification: Combining over-and under-sampling through a PCA-SVM [C ] // 2018 3rd International Conference on Pattern Analysis and Intelligent Systems (PAIS) . Piscataway : IEEE , 2018 : 8598480 . DOI: 10.1109/pais.2018.8598480 http://dx.doi.org/10.1109/pais.2018.8598480

Liu Ya , Wang Xiao , Qu Bo , et al . ATVITSC: A novel encrypted traffic classification method based on deep learning [J ] . IEEE Transactions on Information Forensics and Security , 2024 , 19 : 9374 - 9389 . DOI: 10.1109/TIFS.2024.3433446 http://dx.doi.org/10.1109/TIFS.2024.3433446

Yang Zhe , Ma Zitong , Zhao Wenbo , et al . HRNN: Hypergraph recurrent neural network for network intrusion detection [J ] . Journal of Grid Computing , 2024 , 22 ( 2 ): 52 . DOI: 10.1007/s10723-024-09767-1 http://dx.doi.org/10.1007/s10723-024-09767-1

赵文博 , 马紫彤 , 杨哲 . 基于超图神经网络的恶意流量分类模型 [J ] . 网络与信息安全学报 , 2023 , 9 ( 5 ): 166 - 177 .

Zhao Wenbo , Ma Zitong , Yang Zhe . Model of the malicious traffic classification based on hypergraph neural network [J ] . Chinese Journal of Network and Information Security , 2023 , 9 ( 5 ): 166 - 177 . (in Chinese)

Wu Yonghui , Schuster M , Chen Zhifeng , et al . Google’s neural machine translation system: Bridging the gap between human and machine translation [EB/OL ] . V2.arXiv ( 2016-10-08 )[ 2025-09-01 ] . https://arxiv.org/abs/1609.08144v2 https://arxiv.org/abs/1609.08144v2 .

Vaswani A , Shazeer N , Parmar N , et al . Attention is all you need [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . New York : Curran Associates Inc. , 2017 : 6000 - 6010 .

Zhang Biao , Sennrich R . Root mean square layer normalization [C ] // Proceedings of the 33rd International Conference on Neural Information Processing Systems . New York : Curran Associates Inc. , 2019 : 1110 . DOI: 10.48550/arXiv.1910.07467 http://dx.doi.org/10.48550/arXiv.1910.07467

Su Jianlin , Ahmed M , Lu Yu , et al . RoFormer: Enhanced transformer with rotary position embedding [J ] . Neurocomputing , 2024 , 568 : 127063 . DOI: 10.1016/j.neucom.2023.127063 http://dx.doi.org/10.1016/j.neucom.2023.127063

Neto E C P , Dadkhah S , Ferreira R , et al . CICIoT2023: A real-time dataset and benchmark for large-scale attacks in IoT environment [J ] . Sensors , 2023 , 23 ( 13 ): 5941 . DOI: 10.3390/s23135941 http://dx.doi.org/10.3390/s23135941

Dadkhah S , Neto E C P , Ferreira R , et al . CICIoMT2024: A benchmark dataset for multi-protocol security assessment in IoMT [J ] . Internet of Things , 2024 , 28 : 101351 . DOI: 10.1016/j.iot.2024.101351 http://dx.doi.org/10.1016/j.iot.2024.101351

Wang Wei , Zhu Ming , Zeng Xuewen , et al . Malware traffic classification using convolutional neural network for representation learning [C ] // 2017 International Conference on Information Networking (ICOIN) . Piscataway : IEEE , 2017 : 712 - 717 . DOI: 10.1109/ICOIN.2017.7899588 http://dx.doi.org/10.1109/ICOIN.2017.7899588

Draper-Gil G , Lashkari A H , Mamun M S I , et al . Characterization of encrypted and VPN traffic using time-related features [C ] // Proceedings of the 2nd International Conference on Information Systems Security and Privacy (ICISSP 2016) . Setúbal : SciTePress , 2016 : 407 - 414 . DOI: 10.5220/0005740704070414 http://dx.doi.org/10.5220/0005740704070414

Moustafa N , Slay J . UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set) [C ] // 2015 Military Communications and Information Systems Conference (MilCIS) . Piscataway : IEEE , 2015 : 7348942 . DOI: 10.1109/milcis.2015.7348942 http://dx.doi.org/10.1109/milcis.2015.7348942

Lashkari A H , Draper-Gil G , Mamun M S I , et al . Characterization of tor traffic using time based features [C ] // Proceedings of the 3rd International Conference on Information Systems Security and Privacy . Setúbal : SciTePress , 2017 : 253 - 262 . DOI: 10.5220/0006105602530262 http://dx.doi.org/10.5220/0006105602530262

Loshchilov I , Hutter F . Decoupled weight decay regularization [C/OL ] // Proceedings of the 7th International Conference on Learning Representations , 2019 : 1 - 18 . https://openreview.net/forum?id=Bkg6RiCqY7 https://openreview.net/forum?id=Bkg6RiCqY7 .

Dao T , Fu D Y , Ermon S , et al . FlashAttention: Fast and memory-efficient exact attention with IO-awareness [C ] // Proceedings of the 36th International Conference on Neural Information Processing Systems . New York : Curran Associates Inc. , 2022 : 068431 - 1189 . DOI: 10.52202/068431-1189 http://dx.doi.org/10.52202/068431-1189

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于突发特征词元自学习的未知加密恶意流量检测方法

基于大语言模型的加密流量分析方法研究综述

基于MP-FSCIL的恶意代码分类方法

匿迹效应：高级网络威胁隐蔽性机理建模与形式化量化研究

“面向人机物融合场景的泛在操作系统与环境研究进展”专栏