一种基于混合学习的恶意代码检测方法

doi:10.12263/DZXB.20180711

PDF(2098 KB)

电子学报 ›› 2021, Vol. 49 ›› Issue (2) : 286-291. DOI: 10.12263/DZXB.20180711

学术论文

一种基于混合学习的恶意代码检测方法

梁光辉^1,3, 摆亮², 庞建民^1,3, 单征^1,3, 岳峰^1,3, 张磊⁴

作者信息 +

A Malware Detection Method Based on Hybrid Learning

LIANG Guang-hui^1,3, BAI Liang², PANG Jian-min^1,3, SHAN Zheng^1,3, YUE Feng^1,3, ZHANG Lei⁴

Author information +

文章历史 +

摘要

近年来，自动化沙箱被广泛部署并应用于恶意代码分析与检测，然而随着恶意代码数量的激增和抗分析能力的增强，如何有效应对海量恶意代码分析任务，提高沙箱系统分析效率，是增强网络安全防御能力的一个重要研究方向.本文利用不同学习方式以及恶意代码动、静态特征的特点，提出了一种基于混合学习模型的恶意代码检测方法，分别提取恶意代码的静态模糊哈希特征和动态行为特征，然后将无监督聚类学习与有监督的分类学习相结合用于恶意代码检测.实验表明，在不影响检测精度的情况下，只利用了原有系统0.02%分析时间提高了整个系统25.6%的检测速度.

Abstract

In recent years, automated sandboxes have been widely deployed for malware analysis and detection. However, with the rapid increase column of malware and the enhancement of anti-analysis capabilities, how to effectively handle these massive malware analysis tasks and improve the efficiency of sandbox system is an important research topic. Based on the characteristics of different learning methods and malware dynamic and static features, this paper proposes a malware detection method based on a hybrid learning model. We extract static fuzzy-hash features and dynamic behavior features of malware, then unsupervised clustering learning is combined with supervised classification learning. Experiments show that using only 0.02% of the analysis time improves the detection speed of the entire system by 25.6% without affecting the detection accuracy.

导出引用

梁光辉, 摆亮, 庞建民, 单征, 岳峰, 张磊. 一种基于混合学习的恶意代码检测方法[J]. 电子学报, 2021, 49(2): 286-291. https://doi.org/10.12263/DZXB.20180711

LIANG Guang-hui, BAI Liang, PANG Jian-min, SHAN Zheng, YUE Feng, ZHANG Lei. A Malware Detection Method Based on Hybrid Learning[J]. Acta Electronica Sinica, 2021, 49(2): 286-291. https://doi.org/10.12263/DZXB.20180711

中图分类号： TP305

参考文献

[1] Symantec Corporation.Executive Summary-2018 Internet Security Threat Report[EB/OL].https://www.symantec.com/content/dam/symantec/docs/reports/istr-23-executive-summary-en.pdf,2018-03-01/2018-04-12.
[2] Bayer U,Comparetti P M,Hlauschek C,et al.Scalable,behavior-based malware clustering[A].ISOC.NDSS[C].San Diego,USA:ISOC,2009.8-11.
[3] Willems C,Holz T,Freiling F.Toward automated dynamic malware analysis using cwsandbox[A].IEEE Security & Privacy[C].Oakland,USA:IEEE.2007.32-39.
[4] Malware Benchmark.[EB/OL].http://www.malwareben-chmark.org,2016-10-25/2018-05-07.
[5] Sebastián M,Rivera R,Kotzias P,et al.Avclass:A tool for massive malware labeling[A].International Symposium on Research in Attacks,Intrusions,and Defenses[C].Cham,Swit:Springer,2016.230-253.
[6] Mao WX,Cai ZM,Tong L.Malware detection method based on active learning[J].Journal of Software,2017,28(2):384-397.
[7] Dinaburg A,Royal P,Sharif M,et al.Ether:malware analysis via hardware virtualization extensions[A].Proceedings of the 15th ACM conference on computer and communications security[C].Alexandria,USA:ACM,2008.51-62.
[8] Willems C,Holz T,Freiling F.Toward automated dynamic malware analysis using cwsandbox[A].IEEE Security & Privacy[C] Oakland,USA:IEEE,2007.26-32.
[9] Neugschwandtner M,Comparetti P M,Jacob G,et al.Forecast:skimming off the malware cream[A].Proceedings of the 27th Annual Computer Security Applications Conference[C].Orlando,USA:ACM,2011.11-20.
[10] Bayer U,Kirda E,Kruegel C.Improving the efficiency of dynamic malware analysis[A].Proceedings of the 2010 ACM Symposium on Applied Computing[C].Switzerland:ACM,2010.1871-1878.
[11] Vadrevu P,Perdisci R.MAXS:Scaling malware execution with sequential multi-hypothesis testing[A].Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security[C].Xi'an,China:ACM,2016.771-782.
[12] Kornblum J.Identifying almost identical files using context triggered piecewise hashing[J].Digital Investigation,2006,3:91-97.
[13] Roussev V.Data fingerprinting with similarity digests[A].IFIP International Conference on Digital Forensics[C].Berlin,GER:Springer,2010.207-226.
[14] Breitinger F,Astebøl K P,Baier H,et al.Mvhash-b-a new approach for similarity preserving hashing[A].IT security incident management and it forensics (IMF)[C].Nuremberg,GER:IEEE,2013.33-44.
[15] ATridgell Spamsum Readme,[EB/OL] http://www.samba.org/ftp/unpacked/junkcode/spamsum/,2011-02-05/2017-10-14.
[16] Sarantinos N,Benzad C,Arabiat O,et al.Forensic malware analysis:The value of fuzzy hashing algorithms in identifying similarities[A].Trustcom/BigDataSE/I-SPA[C].Tianjin,China:IEEE,2016.1782-1787.
[17] Sebastián M,Rivera R,Kotzias P,et al.Avclass:A tool for massive malware labeling[A].International Symposium on Research in Attacks,Intrusions,and Defenses[C].Evry,France:Springer,2016.230-253.