1.北京理工大学网络空间安全学院,北京100081
2.天翼安全科技有限公司基础能力部,北京100020
3.国家超级计算济南中心,山东济南 250014
[ "沈蒙 男,1988年1月出生于山东省德州市.现为北京理工大学网络空间安全学院教授,博士生导师.主要研究方向为网络加密流量分析、数据隐私安全、区块链应用.中国电子学会会员编号:E190019899M.E-mail: shenmeng@bit.edu.cn" ]
[ "贾冀哲 男,1997年2月出生于辽宁省沈阳市.现为北京理工大学网络空间安全学院博士研究生.主要研究方向为网络加密流量分析、恶意流量检测.E-mail: jiajizhe@bit.edu.cn" ]
[ "赵卜凡 男,2002年11月出生于山东省德州市.现为北京理工大学网络空间安全学院博士研究生.主要研究方向为网络加密流量分析、恶意流量检测.E-mail: zhaobufan@bit.edu.cn" ]
[ "常力元 男,1984年2月出生于吉林省吉林市.现为中国电信集团首席专家、天翼安全科技有限公司副总工程师.主要研究方向为网络安全技术.中国电子学会会员编号:E190087084M.E-mail: changly@chinatelecom.cn" ]
[ "杨明 男,1981年3月出生于山东省东营市.现为齐鲁工业大学(山东省科学院)、山东省计算中心(国家超级计算济南中心)研究员,博士生导师.主要研究方向为数据安全、人工智能安全.E-mail: yangm@sdas.org" ]
[ "任琛琛 女,2003年4月出生于河北省保定市.现为北京理工大学网络空间安全学院硕士研究生.主要研究方向为网络加密流量分析、恶意流量检测、网站指纹攻击.E-mail: chenchenren@bit.edu.cn" ]
[ "宋悦 男,1987年7月出生于北京市.现为天翼安全科技有限公司基础能力部基础研究组组长.主要研究方向为网络安全技术.E-mail: songy7@chinatelecom.cn" ]
[ "祝烈煌 男,1976年9月出生于浙江省衢州市.现为北京理工大学网络空间安全学院教授,博士生导师.主要研究方向为密码学、网络和信息安全.中国电子学会会员编号:E190010255M.E-mail: liehuangz@bit.edu.cn" ]
收稿:2025-08-21,
录用:2025-12-17,
纸质出版:2025-12-25
移动端阅览
沈蒙, 贾冀哲, 赵卜凡, 等. 基于突发特征词元自学习的未知加密恶意流量检测方法[J]. 电子学报, 2025, 53(12): 4231-4249.
SHEN Meng, JIA Ji-zhe, ZHAO Bu-fan, et al. Unknown Encrypted Malicious Traffic Detection via Burst Feature Token Self-Learning[J]. Acta Electronica Sinica, 2025, 53(12): 4231-4249.
沈蒙, 贾冀哲, 赵卜凡, 等. 基于突发特征词元自学习的未知加密恶意流量检测方法[J]. 电子学报, 2025, 53(12): 4231-4249. DOI:10.12263/DZXB.20250731
SHEN Meng, JIA Ji-zhe, ZHAO Bu-fan, et al. Unknown Encrypted Malicious Traffic Detection via Burst Feature Token Self-Learning[J]. Acta Electronica Sinica, 2025, 53(12): 4231-4249. DOI:10.12263/DZXB.20250731
当今,互联网流量已普遍加密,以保障其机密性与隐私性.然而,攻击者常常滥用流量加密技术来隐藏其恶意网络行为.由于加密恶意流量与加密良性流量具有相似特征,其能够轻易规避传统基于特征签名与深度包检测(Deep Packet Inspection,DPI)的检测方法.现有加密恶意流量的检测研究主要集中于基于有监督学习的范式,尽管其在已知攻击类型上表现良好,但其有效性严重依赖于大量且持续更新的标记恶意流量样本.面对恶意软件快速迭代、变种频繁以及加密隧道技术的广泛应用,有监督学习模型难以应对训练数据中未曾出现的未知攻击类型,存在显著的泛化能力不足问题.此外,现有方法的特征表示多依赖于手工设计的统计特征,难以捕捉恶意行为在加密流量底层数据报文中的深层语义信息与复杂时序动态,导致特征区分度有限,无法有效适配新型攻击模式.为此,本文提出了一种可靠的基于突发特征词元自学习的未知加密恶意流量检测方法MalGuard.通过分析网络传输底层机理及观察良性流量与恶意流量的关键特征差异性分布,创新地提出了一种基于流量突发特征的新型流量词元化表示方法,实现了对数据报文语义信息与时序动态的关联表征,为后续预训练模型提供了高信息密度的输入基础.基于新型流量词元化表示方法,本文提出两项流量领域专用的自监督预训练任务——跨度掩码语言模型与跨度边界目标任务,通过掩码并重构流量数据报文的跨度内容,强化模型对跨度内数据报文上下文关联的整体感知,实现具备泛化能力的流量通用特征提取.基于该特征,进一步构建适配流量特征分布的轻量级无监督学习算法,通过定位高维表征空间中的离群点,无需恶意标签数据即可实现对加密恶意流量的可靠检测.为验证MalGuard的有效性,我们在三个公开数据集上进行了实验评估.实验结果表明,MalG
uard在未知加密恶意流量上的检测表现超过了现有的最佳方法.具体而言,将良性流量与恶意流量的样本数量定义为不平衡比例
<math id="M1"><mi>β</mi></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=106758997&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=106758984&type=
1.86266661
2.87866688
,
在
<math id="M2"><mi>β</mi></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=106759020&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=106758987&type=
1.86266661
2.87866688
=4∶1和
<math id="M3"><mi>β</mi></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=106759020&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=106758987&type=
1.86266661
2.87866688
=16∶1时,MalGuard的检测平均F1值分别为91.76%和84.56%,相比现有最佳方法提高了6.01个百分点和28.23个百分点.
The widespread adoption of encrypted internet traffic ensures confidentiality and privacy
yet attackers increasingly leverage encryption techniques to conceal malicious network activities. As encrypted malicious traffic exhibits characteristics similar to benign encrypted traffic
it can easily evade tra
ditional detection methods based on feature signatures and deep packet inspection (DPI). Current research on encrypted malicious traffic detection primarily focuses on supervised learning paradigms. While effective against known attack types
their efficacy heavily relies on large
continuously updated labeled malicious traffic samples. Confronted with rapidly evolving malware variants and the widespread use of encryption tunneling techniques
supervised learning models struggle to generalize to unseen attack types
exhibiting significant limitations in adaptability. Furthermore
the feature representations in existing methods often depend on manually engineered statistical features
which fail to capture the deep semantic information and complex temporal dynamics of malicious behaviors within the underlying data packets of encrypted flows
resulting in limited feature discriminability and ineffectiveness against novel attack patterns. To address these challenges
we propose MalGuard
a reliable method for detecting unknown encrypted malicious traffic via self-supervised learning of burst-feature tokens. By analyzing the underlying mechanisms of network transmission and observing the key characteristic distributions between benign and malicious traffic
we innovatively propose a novel burst-aware traffic tokenization method
achieving a correlated representation of the semantic information and temporal dynamics of data packets and providing a high-information-density input foundation for subsequent model pre-training. Building on this token representation
we design two traffic-specific self-supervised pre-training tasks—Span-Masked Language Modeling and a Span Boundary Objective. These tasks mask and reconstruct spans of packet content to enhance the model’s holistic perception of contextual dependencies within the data
enabling the extraction of generalized traffic features. Leveraging these features
we further construct a lightweight unsupervised learning algorithm adapted to the intrinsic distribution of tr
affic characteristics. By identifying outliers in the high-dimensional representation space
reliable detection of encrypted malicious traffic is achieved without requiring labeled malicious data. To validate the effectiveness of MalGuard
we conducted experimental evaluations on three public datasets. Experimental results demonstrate that MalGuard outperforms the SOTA methods in detecting unknown encrypted malicious traffic. Specifically
we define the imbalance ratio
<math id="M4"><mi>β</mi></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=106759022&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=106759001&type=
1.86266661
2.87866688
as the ratio of benign to malicious samples
at
<math id="M5"><mi>β</mi></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=106759022&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=106759001&type=
1.86266661
2.87866688
=4∶1 and
<math id="M6"><mi>β</mi></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=106759025&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=106759023&type=
1.86266661
2.87866688
=
16∶1
MalGuard achieves average F1 scores of 91.76% and 84.56%
surpassing the best existing baseline by 6.01 percentage points and 28.33 percentage points
respectively.
WATCHGUARD . WatchGuard’s Threat Lab analyzes the latest malware and internet attacks [EB/OL ] . [ 2025-10-10 ] . https://www.watchguard.com/wgrd-resource-center/security-report-q1-2025 https://www.watchguard.com/wgrd-resource-center/security-report-q1-2025 .
HOLLAND J , SCHMITT P , FEAMSTER N , et al . New directions in automated traffic analysis [C ] // Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security . New York : ACM , 2021 : 3366 - 3383 .
WANG N , SHI S H , CHEN Y M , et al . FeCo: Boosting intrusion detection capability in IoT networks via contrastive learning [J ] . IEEE Transactions on Dependable and Secure Computing , 2025 , 22 ( 4 ): 4215 - 4230 .
ANDERSON B , MCGREW D . Identifying encrypted malware traffic with contextual flow data [C ] // Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security . New York : ACM , 2016 : 35 - 46 .
SHEN M , ZHANG J P , ZHU L H , et al . Accurate decentralized application identification via encrypted traffic analysis using graph neural networks [J ] . IEEE Transactions on Information Forensics and Security , 2021 , 16 : 2367 - 2380 .
SHEN M , WU J H , AI J Y , et al . Swallow: A transfer-robust website fingerprinting attack via consistent feature learning [C ] // Proceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security . New York : ACM , 2025 : 1574 - 1588 .
SHEN M , JI K X , GAO Z B , et al . Subverting website fingerprinting defenses with robust traffic representation [C ] // USENIX Security Symposium . California : USENIX Association , 2023
SHEN M , JI K X , WU J H , et al . Real-time website fingerprinting defense via traffic cluster anonymization [C ] // 2024 IEEE Symposium on Security and Privacy . Piscataway : IEEE , 2024 : 3238 - 3256 .
HE H Y , YANG Z G , CHEN X N . PERT: Payload encoding representation from transformer for encrypted traffic classification [C ] // 2020 ITU Kaleidoscope: Industry-Driven Digital Transformation . Piscataway : IEEE , 2020 : 9303204 .
LIN X J , XIONG G , GOU G P , et al . ET-BERT: A contextualized datagram representation with pre-training transformers for encrypted traffic classification [C ] // Proceedings of the ACM Web Conference 2022 . New York : ACM , 2022 : 633 - 642 .
ZHAO R J , ZHAN M W , DENG X W , et al . A novel self-supervised framework based on masked autoencoder for traffic classification [J ] . IEEE/ACM Transactions on Networking , 2024 , 32 ( 3 ): 2012 - 2025 .
WANG T Z , XIE X H , WANG W D , et al . Netmamba: Efficient network traffic classification via pre-training unidirectional mamba [C ] // 2024 IEEE 32nd International Conference on Network Protocols . Piscataway : IEEE , 2025 : 10858569 .
ZHOU G M , GUO X W , LIU Z T , et al . TrafficFormer: An efficient pre-trained model for traffic data [C ] // 2025 IEEE Symposium on Security and Privacy . Piscataway : IEEE , 2025 : 1844 - 1860 .
QU J , MA X B , LI J F . TrafficGPT: Breaking the token barrier for efficient long traffic analysis and generation [EB/OL ] . ( 2024-03-18 )[ 2025-10-10 ] . https://arXiv.org/abs/2403.05822 https://arXiv.org/abs/2403.05822 .
MIRSKY Y , DOITSHMAN T , ELOVICI Y , et al . Kitsune: An ensemble of autoencoders for online network intrusion detection [EB/OL ] . ( 2018-05-27 )[ 2025-10-10 ] . https://arXiv.org/abs/1802.09089 https://arXiv.org/abs/1802.09089 .
CATILLO M , PECCHIA A , VILLANO U . CPS-GUARD: Intrusion detection for cyber-physical systems and IoT devices using outlier-aware deep autoencoders [J ] . Computers & Security , 2023 , 129 : 103210 .
ZHANG P , HE F Z , ZHANG H , et al . Real-time malicious traffic detection with online isolation forest over SD-WAN [J ] . IEEE Transactions on Information Forensics and Security , 2023 , 18 : 2076 - 2090 .
WANG K , STOLFO S J . Anomalous payload-based network intrusion detection [C ] // Recent Advances in Intrusion Detection . Berlin : Springer , 2004 : 203 - 222 .
FU C P , LI Q , SHEN M , et al . Realtime robust malicious traffic detection via frequency domain analysis [C ] // Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security . New York : ACM , 2021 : 3431 - 3446 .
FU C P , LI Q , XU K . Detecting unknown encrypted malicious traffic in real time via flow interaction graph analysis [EB/OL ] . ( 2023-01-31 )[ 2025-10-10 ] . https://arXiv.org/abs/2301.13686 https://arXiv.org/abs/2301.13686 .
RAMESH R , EVDOKIMOV L , XUE D W , et al . VPNalyzer: Systematic investigation of the VPN ecosystem [C ] // Proceedings 2022 Network and Distributed System Security Symposium . Internet Society , 2022 : 24285 .
STRATOSPHERE . Stratosphere laboratory datasets [EB/OL ] . ( 2020-03-13 )[ 2025-10-10 ] . https://www.stratosphereips.org/datasets-overview https://www.stratosphereips.org/datasets-overview .
NGUYEN T T T , ARMITAGE G . A survey of techniques for Internet traffic classification using machine learning [J ] . IEEE Communications Surveys & Tutorials , 2008 , 10 ( 4 ): 56 - 76 .
GUPTA A , SHARMA L S . A categorical survey of state-of-the-art intrusion detection system-Snort [J ] . International Journal of Information and Computer Security , 2020 , 13 ( 3/4 ): 337 - 356 .
CHIBA Z , ABGHOUR N , MOUSSAID K , et al . Newest collaborative and hybrid network intrusion detection framework based on suricata and isolation forest algorithm [C ] // Proceedings of the 4th International Conference on Smart City Applications . New York : ACM , 2019 : 1 - 11 .
DONG C , LU Z G , CUI Z L , et al . MBTree: Detecting encryption RATs communication using malicious behavior tree [J ] . IEEE Transactions on Information Forensics and Security , 2021 , 16 : 3589 - 3603 .
LI H D , HU H X , GU G F , et al . vNIDS: Towards elastic security with safe and efficient virtualization of network intrusion detection systems [C ] // Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security . New York : ACM , 2018 : 17 - 34 .
TAYLOR V F , SPOLAOR R , CONTI M , et al . AppScanner: Automatic fingerprinting of smartphone apps from encrypted network traffic [C ] // 2016 IEEE European Symposium on Security and Privacy . Piscataway : IEEE , 2016 : 439 - 454 .
PAPADOGIANNAKI E , IOANNIDIS S . A survey on encrypted network traffic analysis applications, techniques, and countermeasures [J ] . ACM Computing Surveys , 2022 , 54 ( 6 ): 1 - 35 .
FU Z Q , LIU M X , QIN Y , et al . Encrypted malware traffic detection via graph-based network analysis [C ] // Proceedings of the 25th International Symposium on Research in Attacks, Intrusions and Defenses . New York : ACM , 2022 : 495 - 509 .
CUI S S , DONG C , SHEN M , et al . CBSeq: A channel-level behavior sequence for encrypted malware traffic detection [J ] . IEEE Transactions on Information Forensics and Security , 2023 , 18 : 5011 - 5025 .
CAVILLE E , LO W W , LAYEGHY S , et al . Anomal-E: A self-supervised network intrusion detection system based on graph neural networks [J ] . Knowledge-Based Systems , 2022 , 258 : 110030 .
ZHANG Y X , WANG J D , CHEN Y Q , et al . Adaptive memory networks with self-supervised learning for unsupervised anomaly detection [J ] . IEEE Transactions on Knowledge and Data Engineering , 2023 , 35 ( 12 ): 12068 - 12080 .
HAN X Y , CUI S S , QIN J , et al . ContraMTD: An unsupervised malicious network traffic detection method based on contrastive learning [C ] // Proceedings of the ACM Web Conference 2024 . New York : ACM , 2024 : 1680 - 1689 .
轩勃娜 , 李进 . 基于改进CNN的恶意软件分类方法 [J ] . 电子学报 , 2023 , 51 ( 5 ): 1187 - 1197 .
XUAN B N , LI J . Malware classification method based on improved CNN [J ] . Acta Electronica Sinica , 2023 , 51 ( 5 ): 1187 - 1197 . (in Chinese)
谢丽霞 , 魏晨阳 , 杨宏宇 , 等 . 基于多维度动态加权alpha图像融合与特征增强的恶意软件检测方法 [J ] . 电子学报 , 2025 , 53 ( 3 ): 849 - 863 .
XIE L X , WEI C Y , YANG H Y , et al . Malware detection method based on multi-dimensional dynamic weighted alpha image fusion and feature enhancement [J ] . Acta Electronica Sinica , 2025 , 53 ( 3 ): 849 - 863 . (in Chinese)
刘新 . 基于机器学习的恶意软件分析方法与智能检测技术研究 [D ] . 湘潭 : 湘潭大学 , 2014 .
LIU X . Research on Analysis of Malware Based on Machine Learning and Intelligent Detection Technology [D ] . Xiangtan : Xiangtan University , 2014 . (in Chinese)
景鸿理 , 黄娜 , 李建国 . 基于机器学习的恶意软件检测研究进展及挑战 [J ] . 信息技术与网络安全 , 2020 , 39 ( 11 ): 38 - 44, 68 .
JING H L , HUANG N , LI J G . Research progress and challenges of malware detection method based on machine learning [J ] . Information Techology and Network Security , 2020 , 39 ( 11 ): 38 - 44, 68 . (in Chinese)
李敏 . 基于深度学习的恶意软件检测方法研究 [D ] . 北京 : 华北电力大学 , 2023 .
LI M . Research on Malware Detection Method Based on Deep Learning [D ] . Beijing : North China Electric Power University , 2023 . (in Chinese)
郑锐 , 汪秋云 , 傅建明 , 等 . 一种基于深度学习的恶意软件家族分类模型 [J ] . 信息安全学报 , 2020 , 5 ( 1 ): 1 - 9 .
ZHENG R , WANG Q Y , FU J M , et al . A novel malware classification model based on deep learning [J ] . Journal of Cyber Security , 2020 , 5 ( 1 ): 1 - 9 . (in Chinese)
LOTFOLLAHI M , JAFARI SIAVOSHANI M , SHIRALI HOSSEIN ZADE R , et al . Deep packet: A novel approach for encrypted traffic classification using deep learning [J ] . Soft Computing , 2020 , 24 ( 3 ): 1999 - 2012 .
DEVLIN J , CHANG M W , LEE K , et al . BERT: Pre-training of deep bidirectional transformers for language understanding [C ] // Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) . Kerrville : Association for Computational Linguistics , 2019 : 4171 - 4186 .
GU A , DAO T . Mamba: Linear-time sequence modeling with selective state spaces [EB/OL ] . ( 2024-05-31 )[ 2025-10-10 ] . https://arXiv.org/abs/2312.00752 https://arXiv.org/abs/2312.00752 .
SHEN M , WU J H , YE K , et al . Robust detection of malicious encrypted traffic via contrastive learning [J ] . IEEE Transactions on Information Forensics and Security , 2025 , 20 : 4228 - 4242 .
MARINO D L , WICKRAMASINGHE C S , RIEGER C , et al . Self-supervised and interpretable anomaly detection using network transformers [J ] . IEEE Transactions on Industrial Informatics , 2025 , 21 ( 5 ): 4252 - 4261 .
KOUKOULIS I , SYRIGOS I , KORAKIS T . Self-supervised transformer-based contrastive learning for intrusion detection systems [EB/OL ] . ( 2025-05-12 )[ 2025-10-10 ] . https://arXiv.org/abs/2505.08816 https://arXiv.org/abs/2505.08816 .
CHANDOLA V , BANERJEE A , KUMAR V . Anomaly detection: A survey [J ] . ACM Computing Surveys , 2009 , 41 ( 3 ): 1 - 58 .
TEGELER F , FU X M , VIGNA G , et al . BotFinder: Finding bots in network traffic without deep packet inspection [C ] // Proceedings of the 8th International Conference on Emerging Networking Experiments and Technologies . New York : ACM , 2012 : 349 - 360 .
SHARAFALDIN I , HABIBI LASHKARI A , GHORBANI A A . Toward generating a new intrusion detection dataset and intrusion traffic characterization [C ] // Proceedings of the 4th International Conference on Information Systems Security and Privacy . SCITEPRESS - Science and Technology Publications , 2018 : 108 - 116 .
LOPEZ A D , MOHAN A P , NAIR S . Network traffic behavioral analytics for detection of DDoS attacks [J ] . SMU Data Science Review , 2019 , 2 ( 1 ): 14 .
KHRAISAT A , GONDAL I , VAMPLEW P , et al . Survey of intrusion detection systems: Techniques, datasets and challenges [J ] . Cybersecurity , 2019 , 2 ( 1 ): 20 .
JOSHI M , CHEN D Q , LIU Y H , et al . SpanBERT: Improving pre-training by representing and predicting spans [J ] . Transactions of the Association for Computational Linguistics , 2020 , 8 : 64 - 77 .
HE L H , LEE K , LEVY O , et al . Jointly predicting predicates and arguments in neural semantic role labeling [EB/OL ] . ( 2018-08-13 )[ 2025-10-10 ] . https://arXiv.org/abs/1805.04787 https://arXiv.org/abs/1805.04787 .
GAN J H , TAO Y F . DBSCAN revisited: Mis-claim, un-fixability, and approximation [C ] // Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data . New York : ACM , 2015 : 519 - 530 .
WANG W , ZHU M , ZENG X W , et al . Malware traffic classification using convolutional neural network for representation learning [C ] // 2017 International Conference on Information Networking . Piscataway : IEEE , 2017 : 712 - 717 .
KOUKIS D , ANTONATOS S , ANTONIADES D , et al . A generic anonymization framework for network traffic [C ] // 2006 IEEE International Conference on Communications . Piscataway : IEEE , 2006 : 2302 - 2309 .
SCHÖLKOPF B , PLATT J C , SHAWE-TAYLOR J , et al . Estimating the support of a high-dimensional distribution [J ] . Neural Computation , 2001 , 13 ( 7 ): 1443 - 1471 .
VINCENT P , LAROCHELLE H , LAJOIE I , et al . Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion [J ] . Journal of Machine Learning Research , 2010 , 11 : 3371 - 3408 .
BREUNIG M M , KRIEGEL H P , NG R T , et al . LOF: Identifying density-based local outliers [C ] // Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data . New York : ACM , 2000 : 93 - 104 .
0
浏览量
33
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621