• •
收稿日期:
2022-09-20
修回日期:
2022-11-28
出版日期:
2023-02-27
通讯作者:
作者简介:
基金资助:
ZHANG Dan-dan, SONG Ya-fei, LIU Shu()
Received:
2022-09-20
Revised:
2022-11-28
Online:
2023-02-27
Corresponding author:
Supported by:
摘要:
对未知恶意代码及其变种进行快速准确地识别,是对恶意攻击行为进行有效防范的前提和基础.但随着恶意代码变种的急剧增加,人工更新样本数据库的效率越来越差,仅仅依据延时的数据库信息,传统的识别方法难以有效捕获经过混淆方法操作的样本特征信息.针对上述问题,本文设计了一种基于灰度图像处理的深度学习模型MalMKNet(Multi-scale Kernel Network for Malware),建立了一种多尺度卷积核混合的卷积神经网络(Convolutional Neural Network,CNN)架构,以提高恶意代码识别能力.该模型运用具有捷径(shortcut)结构的深度大内核卷积和标准小内核卷积相结合的混合卷积核(Mixed Kernels,MK)模块,以提高模型准确率;在此基础上,通过多尺度内核融合(Multi-scale Kernel Fusion,MKF),以降低模型参数量;再结合特征重组(feature shuffle)操作,实现优化特征通信,在不增加模型参数量的前提下提升了分类精度.实验结果表明,MalMKNet在恶意代码家族分类准确率方面优于其他基于深度学习的分类方法,准确率达到了99.35%.
中图分类号:
张丹丹, 宋亚飞, 刘曙. MalMKNet:一种用于恶意代码分类的多尺度卷积神经网络[J]. 电子学报, DOI: 10.12263/DZXB.20221069.
Dan-dan ZHANG, Ya-fei SONG, Shu LIU . MalMKNet: A Multi-scale Convolutional Neural Network Used for Malware Classification[J]. Acta Electronica Sinica, DOI: 10.12263/DZXB.20221069.
文件大小 | 图像宽度 |
---|---|
<10 kB | 32 |
10 kB~30 kB | 64 |
30 kB~60 kB | 128 |
60 kB~100 kB | 256 |
100 kB~200 kB | 384 |
200 kB~500 kB | 512 |
500 kB~1000 kB | 768 |
>1000 kB | 1024 |
表1 不同文件大小的图像宽度
文件大小 | 图像宽度 |
---|---|
<10 kB | 32 |
10 kB~30 kB | 64 |
30 kB~60 kB | 128 |
60 kB~100 kB | 256 |
100 kB~200 kB | 384 |
200 kB~500 kB | 512 |
500 kB~1000 kB | 768 |
>1000 kB | 1024 |
输入尺寸 | 参数量/M | 准确率/% | 预测时间/ms |
---|---|---|---|
32×32 | 0.208357 | 98.96 | 17.92 |
64×64 | 0.294425 | 99.35 | 18.36 |
128×128 | 0.359145 | 99.03 | 21.07 |
256×256 | 0.533643 | 98.28 | 25.11 |
表2 不同图像输入尺寸对比实验结果
输入尺寸 | 参数量/M | 准确率/% | 预测时间/ms |
---|---|---|---|
32×32 | 0.208357 | 98.96 | 17.92 |
64×64 | 0.294425 | 99.35 | 18.36 |
128×128 | 0.359145 | 99.03 | 21.07 |
256×256 | 0.533643 | 98.28 | 25.11 |
优化器 | 准确率/% | 精确率/% | 召回率/% | F1-score/% | 预测时间/ms |
---|---|---|---|---|---|
Adagrade | 97.20 | 96.26 | 97.20 | 96.68 | 18.36 |
Adamax | 97.67 | 96.71 | 97.67 | 97.13 | 18.36 |
Adam | 98.32 | 98.38 | 98.32 | 98.30 | 18.36 |
NAdam | 99.00 | 99.08 | 99.00 | 99.01 | 18.36 |
diffGrad | 99.35 | 99.37 | 99.35 | 99.35 | 18.36 |
表3 不同优化器对比实验结果
优化器 | 准确率/% | 精确率/% | 召回率/% | F1-score/% | 预测时间/ms |
---|---|---|---|---|---|
Adagrade | 97.20 | 96.26 | 97.20 | 96.68 | 18.36 |
Adamax | 97.67 | 96.71 | 97.67 | 97.13 | 18.36 |
Adam | 98.32 | 98.38 | 98.32 | 98.30 | 18.36 |
NAdam | 99.00 | 99.08 | 99.00 | 99.01 | 18.36 |
diffGrad | 99.35 | 99.37 | 99.35 | 99.35 | 18.36 |
多尺度内核融合 | 捷径 | 特征重组 | 准确率/% | 精确率/% | 召回率/% | F1-score/% | 参数量/M |
---|---|---|---|---|---|---|---|
√ | √ | 99.03 | 99.06 | 99.03 | 99.03 | 5.413992 | |
√ | √ | 97.20 | 96.27 | 97.20 | 96.67 | 0.226484 | |
√ | √ | 98.24 | 97.27 | 98.24 | 97.72 | 0.294425 | |
√ | √ | √ | 99.35 | 99.37 | 99.35 | 99.35 | 0.294425 |
表4 消融实验结果
多尺度内核融合 | 捷径 | 特征重组 | 准确率/% | 精确率/% | 召回率/% | F1-score/% | 参数量/M |
---|---|---|---|---|---|---|---|
√ | √ | 99.03 | 99.06 | 99.03 | 99.03 | 5.413992 | |
√ | √ | 97.20 | 96.27 | 97.20 | 96.67 | 0.226484 | |
√ | √ | 98.24 | 97.27 | 98.24 | 97.72 | 0.294425 | |
√ | √ | √ | 99.35 | 99.37 | 99.35 | 99.35 | 0.294425 |
模型 | 准确率/% | 精确率/% | 召回率/% | F1-score/% | 参数量/M | 预测时间/ms |
---|---|---|---|---|---|---|
DenseNet | 99.07 | 99.08 | 99.07 | 99.06 | 6.973209 | 26.32 |
MobileNetV2 | 97.63 | 96.72 | 97.63 | 97.10 | 2.255321 | 19.10 |
ResNet | 97.92 | 96.95 | 97.92 | 97.40 | 21.291225 | 21.39 |
ShuffleNet | 98.35 | 98.41 | 98.35 | 98.33 | 0.366985 | 18.96 |
MalMKNet | 99.35 | 99.37 | 99.35 | 99.35 | 0.294425 | 18.36 |
表5 MalMKNet模型与其他模型的实验结果对比
模型 | 准确率/% | 精确率/% | 召回率/% | F1-score/% | 参数量/M | 预测时间/ms |
---|---|---|---|---|---|---|
DenseNet | 99.07 | 99.08 | 99.07 | 99.06 | 6.973209 | 26.32 |
MobileNetV2 | 97.63 | 96.72 | 97.63 | 97.10 | 2.255321 | 19.10 |
ResNet | 97.92 | 96.95 | 97.92 | 97.40 | 21.291225 | 21.39 |
ShuffleNet | 98.35 | 98.41 | 98.35 | 98.33 | 0.366985 | 18.96 |
MalMKNet | 99.35 | 99.37 | 99.35 | 99.35 | 0.294425 | 18.36 |
作者 | 年份 | 数据集 | 模型 | 准确率/% | 精确率/% | 召回率/% | F1-score/% | 预测时间/ms |
---|---|---|---|---|---|---|---|---|
Nataraj[ | 2011 | Malimg | KNN | 97.18 | - | - | - | - |
Yue[ | 2017 | Malimg | Vgg-verydeep-19 | 97.32 | - | - | - | - |
Zhihua[ | 2018 | Malimg | GIST+KNN | 91.9 | 92.1 | 91.7 | - | 60 |
Zhihua[ | 2018 | Malimg | GIST+SVM | 92.2 | 92.5 | 91.4 | - | 64 |
Zhihua[ | 2018 | Malimg | GLCM+KNN | 92.5 | 92.7 | 92.3 | - | 45 |
Zhihua[ | 2018 | Malimg | GLCM+SVM | 93.2 | 93.44 | 93 | - | 48 |
Zhihua[ | 2018 | Malimg | IDA+DRBA | 94.5 | 94.6 | 94.5 | - | 20 |
Dai[ | 2018 | Malimg | GIST-Descriptor, SVM & KNN | 97 | - | - | - | - |
Kumar[ | 2018 | Malimg | CNN | 98 | - | - | - | - |
Kalash[ | 2018 | Malimg | M-CNN | 98.52 | - | - | - | - |
Chen[ | 2018 | Malimg | Inception-V1 | 99.25 | - | - | - | - |
Cui[ | 2019 | Malimg | NSGA-II | 97.6 | 97.6 | 88.4 | - | - |
Singh[ | 2019 | Malimg | Deep CNN | 96.08 | - | - | - | - |
Gibert[ | 2019 | Malimg | CNN | 98.48 | - | - | - | - |
Venkatraman[ | 2019 | Malimg | CNN UniGRU | 96 | 91.8 | 91.2 | 91.4 | - |
Venkatraman[ | 2019 | Malimg | CNN BiGRU | 96.3 | 91.8 | 91.5 | 91.6 | - |
Lo[ | 2019 | Malimg | Xception | 99.03 | - | - | - | - |
Cayir[ | 2020 | Malimg | CapsNet | 98.63 | - | - | 96.58 | - |
Cayir[ | 2020 | Malimg | RCNF | 98.72 | - | - | 96.61 | - |
Naeem[ | 2020 | Malimg | DCNN | 98.47 | 98.47 | 98.47 | - | - |
Naeem[ | 2020 | Malimg | DCNN | 98.79 | 98.79 | 98.79 | - | - |
Vasan[ | 2020 | Malimg | IMCFN | 98.82 | 98.85 | 98.81 | 98.75 | 810 |
Proposed work. | - | Malimg | MalMKNet | 99.35 | 99.37 | 99.35 | 99.35 | 18.36 |
表6 MalMKNet模型与其他研究方法的实验结果对比
作者 | 年份 | 数据集 | 模型 | 准确率/% | 精确率/% | 召回率/% | F1-score/% | 预测时间/ms |
---|---|---|---|---|---|---|---|---|
Nataraj[ | 2011 | Malimg | KNN | 97.18 | - | - | - | - |
Yue[ | 2017 | Malimg | Vgg-verydeep-19 | 97.32 | - | - | - | - |
Zhihua[ | 2018 | Malimg | GIST+KNN | 91.9 | 92.1 | 91.7 | - | 60 |
Zhihua[ | 2018 | Malimg | GIST+SVM | 92.2 | 92.5 | 91.4 | - | 64 |
Zhihua[ | 2018 | Malimg | GLCM+KNN | 92.5 | 92.7 | 92.3 | - | 45 |
Zhihua[ | 2018 | Malimg | GLCM+SVM | 93.2 | 93.44 | 93 | - | 48 |
Zhihua[ | 2018 | Malimg | IDA+DRBA | 94.5 | 94.6 | 94.5 | - | 20 |
Dai[ | 2018 | Malimg | GIST-Descriptor, SVM & KNN | 97 | - | - | - | - |
Kumar[ | 2018 | Malimg | CNN | 98 | - | - | - | - |
Kalash[ | 2018 | Malimg | M-CNN | 98.52 | - | - | - | - |
Chen[ | 2018 | Malimg | Inception-V1 | 99.25 | - | - | - | - |
Cui[ | 2019 | Malimg | NSGA-II | 97.6 | 97.6 | 88.4 | - | - |
Singh[ | 2019 | Malimg | Deep CNN | 96.08 | - | - | - | - |
Gibert[ | 2019 | Malimg | CNN | 98.48 | - | - | - | - |
Venkatraman[ | 2019 | Malimg | CNN UniGRU | 96 | 91.8 | 91.2 | 91.4 | - |
Venkatraman[ | 2019 | Malimg | CNN BiGRU | 96.3 | 91.8 | 91.5 | 91.6 | - |
Lo[ | 2019 | Malimg | Xception | 99.03 | - | - | - | - |
Cayir[ | 2020 | Malimg | CapsNet | 98.63 | - | - | 96.58 | - |
Cayir[ | 2020 | Malimg | RCNF | 98.72 | - | - | 96.61 | - |
Naeem[ | 2020 | Malimg | DCNN | 98.47 | 98.47 | 98.47 | - | - |
Naeem[ | 2020 | Malimg | DCNN | 98.79 | 98.79 | 98.79 | - | - |
Vasan[ | 2020 | Malimg | IMCFN | 98.82 | 98.85 | 98.81 | 98.75 | 810 |
Proposed work. | - | Malimg | MalMKNet | 99.35 | 99.37 | 99.35 | 99.35 | 18.36 |
1 | SU J W, VASCONCELLOS D V, PRASAD S, et al. Lightweight classification of IoT malware based on image recognition[C]//HIRONORI K. 2018 IEEE 42nd Annual Computer Software and Applications Conference(COMPSAC). Piscataway: IEEE, 2018: 664-669. |
2 | 国家互联网应急中心. 2020年中国互联网网络安全报告[R/OL]. (2021-07-21)[2022-12-29]. . |
3 | YADAV B, TOKEKAR S. Recent innovations and comparison of deep learning techniques in malware classification: A review[J]. International Journal of Information Security Science, 2021, 9(4): 230-247. |
4 | GREENGARD S. Cybersecurity gets smart[J]. Communications of the ACM, 2016, 59(5): 29-31. |
5 | VENKATRAMAN S, ALAZAB M. Use of data visualisation for zero-day malware detection[J]. Security and Communication Networks, 2018, 2018: 1-13. |
6 | NATARAJ L, KARTHIKEYAN S, JACOB G, et al. Malware images: Visualization and automatic classification[C]//GREGORY J. Proceedings of the 8th International Symposium on Visualization for Cyber Security. New York: ACM, 2011: 1-7. |
7 | MAKANDAR A, PATROT A. Malware class recognition using image processing techniques[C]//GÜNTER F. 2017 International Conference on Data Management, Analytics and Innovation(ICDMAI). Piscataway: IEEE, 2017: 76-80. |
8 | XIANG Q, WANG X D, SONG Y F, et al. One-dimensional convolutional neural networks for high-resolution range profile recognition via adaptively feature recalibrating and automatically channel pruning[J]. International Journal of Intelligent Systems, 2021, 36(1): 332-361. |
9 | XIANG Q, WANG X D, LAI J, et al. Multi-scale group-fusion convolutional neural network for high-resolution range profile target recognition[J]. IET Radar, Sonar & Navigation, 2022, 16(12): 1997-2016. |
10 | CUI Z H, XUE F, CAI X J, et al. Detection of malicious code variants based on deep learning[J]. IEEE Transactions on Industrial Informatics, 2018, 14(7): 3187-3196. |
11 | HAMAD N, CHENG X C, FARHAN U, et al. A deep convolutional neural network stacked ensemble for malware threat classification in Internet of Things[J]. Journal of Circuits, Systems and Computers, 2022, 31(17): 1-13. |
12 | KALASH M, ROCHAN M, MOHAMMED N, et al. Malware classification with deep convolutional neural networks[C]//GUY P. 2018 9th IFIP International Conference on New Technologies, Mobility and Security(NTMS). Piscataway: IEEE, 2018: 1-5. |
13 | VENKATRAMAN S, ALAZAB M, VINAYAKUMAR R. A hybrid deep learning image-based analysis for effective malware detection[J]. Journal of Information Security and Applications, 2019, 47: 377-389. |
14 | GO J H, JAN T, MOHANTY M, et al. Visualization approach for malware classification with ResNeXt[C]//FERRANTE N. 2020 IEEE Congress on Evolutionary Computation(CEC). Piscataway: IEEE, 2020: 1-7. |
15 | LIU S, CHEN T, CHEN X, et al. More convnets in the 2020s: Scaling up kernels beyond 51x51 using sparsity[EB/OL]. (2022-07-07)[2022-12-29]. . |
16 | IOFFE S, SZEGEDY C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]//FRANCIS B. Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37. New York: ACM, 2015: 448-456. |
17 | MISRA D. Mish: A self regularized non-monotonic neural activation function[EB/OL]. (2019-08-23)[2022-12-29]. . |
18 | CHOLLET F. Xception: deep learning with depthwise separable convolutions[C]//CARMEN S. 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE, 2017: 1800-1807. |
19 | XIE S N, GIRSHICK R, DOLLÁR P, et al. Aggregated residual transformations for deep neural networks[C]//CARMEN S. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2017: 5987-5995. |
20 | NAMANYA A P, AWAN I U, DISSO J P, et al. Similarity hash based scoring of portable executable files for efficient malware detection in IoT[J]. Future Generation Computer Systems, 2020, 110: 824-832. |
21 | YUE S. Imbalanced malware images classification: a CNN based approach[EB/OL]. (2017-08-27)[2022-12-29]. . |
22 | DAI Y, LI H, QIAN Y, et al. A malware classification method based on memory dump grayscale image[J]. Digital Investigation, 2018, 27: 30-37. |
23 | KUMAR R, ZHANG X S, KHAN R U, et al. Malicious code detection based on image processing using deep learning[C]//EDWIN W. Proceedings of the 2018 International Conference on Computing and Artificial Intelligence. New York: ACM, 2018: 81-85. |
24 | CHEN L. Deep transfer learning for static malware classification[EB/OL]. (2018-12-18)[2022-12-29]. . |
25 | CUI Z, DU L, WANG P, et al. Malicious code detection based on CNNs and multi-objective algorithm[J]. Journal of Parallel and Distributed Computing, 2019, 129: 50-58. |
26 | SINGH A, HANDA A, KUMAR N, et al. Malware classification using image representation[M]//SHLOMI D. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2019: 75-92. |
27 | GIBERT D, MATEU C, PLANES J, et al. Using convolutional neural networks for classification of malware represented as images[J]. Journal of Computer Virology and Hacking Techniques, 2019, 15(1): 15-28. |
28 | LO W W, YANG X, WANG Y P. An xception convolutional neural network for malware classification with transfer learning[C]//JUAN M C. 2019 10th IFIP International Conference on New Technologies, Mobility and Security(NTMS). Piscataway: IEEE, 2019: 1-5. |
29 | ÇAYIR A, ÜNAL U, DAĞ H. Random CapsNet forest model for imbalanced malware type classification task[J]. Computers & Security, 2021, 102: 102133. |
30 | NAEEM H, ULLAH F, NAEEM M R, et al. Malware detection in industrial Internet of Things based on hybrid image visualization and deep learning model[J]. Ad Hoc Networks, 2020, 105: 102154. |
31 | VASAN D, ALAZAB M, WASSAN S, et al. IMCFN: Image-based malware classification using fine-tuned convolutional neural network architecture[J]. Computer Networks, 2020, 171: 107138. |
32 | DING X H, GUO Y C, DING G G, et al. ACNet: strengthening the kernel skeletons for powerful CNN via asymmetric convolution blocks[C]//DAVID F. 2019 IEEE/CVF International Conference on Computer Vision(ICCV). Piscataway: IEEE, 2020: 1911-1920. |
[1] | 李钦, 刘伟, 牛朝阳, 宝音图, 惠周勃. 低信噪比下基于分裂EfficientNet网络的雷达信号调制方式识别[J]. 电子学报, 2023, 51(3): 675-686. |
[2] | 黄赟, 张帆, 郭威, 陈立, 羊光. 一种基于数据标准差的卷积神经网络量化方法[J]. 电子学报, 2023, 51(3): 639-647. |
[3] | 范兵兵, 何庭建, 张聪炫, 陈震, 黎明. 联合遮挡约束与残差补偿的特征金字塔光流计算方法[J]. 电子学报, 2023, 51(3): 648-657. |
[4] | 吕杭, 蒋明峰, 李杨, 张鞠成, 王志康. 基于混合时频域特征的卷积神经网络心律失常分类方法的研究[J]. 电子学报, 2023, 51(3): 701-711. |
[5] | 张聿远, 闫文君, 张立民. 基于多模态特征融合网络的空时分组码识别算法[J]. 电子学报, 2023, 51(2): 489-498. |
[6] | 许新征, 李杉. 基于特征膨胀卷积模块的轻量化技术研究[J]. 电子学报, 2023, 51(2): 355-364. |
[7] | 丁琪, 田萱, 孙国栋. 基于注意力增强的热点感知新闻推荐模型[J]. 电子学报, 2023, 51(1): 93-104. |
[8] | 张永梅, 孙捷. 基于动静态特征双输入神经网络的咳嗽声诊断COVID-19算法[J]. 电子学报, 2023, 51(1): 202-212. |
[9] | 袁海英, 成君鹏, 曾智勇, 武延瑞. Mobile_BLNet:基于Big-Little Net的轻量级卷积神经网络优化设计[J]. 电子学报, 2023, 51(1): 180-191. |
[10] | 王神龙, 雍宇, 吴晨睿. 基于伪孪生神经网络的低纹理工业零件6D位姿估计[J]. 电子学报, 2023, 51(1): 192-201. |
[11] | 李滔, 董秀成, 林宏伟. 基于深监督跨尺度注意力网络的深度图像超分辨率重建[J]. 电子学报, 2023, 51(1): 128-138. |
[12] | 郭晓轩, 冯其波, 冀振燕, 郑发家, 杨燕燕. 多线激光光条图像缺陷分割模型研究[J]. 电子学报, 2023, 51(1): 172-179. |
[13] | 贾童瑶, 卓力, 李嘉锋, 张菁. 基于深度学习的单幅图像去雾研究进展[J]. 电子学报, 2023, 51(1): 231-245. |
[14] | 何滢婕, 刘月峰, 边浩东, 郭威, 张小燕. 基于Informer的电池荷电状态估算及其稀疏优化方法[J]. 电子学报, 2023, 51(1): 50-56. |
[15] | 吴靖, 叶晓晶, 黄峰, 陈丽琼, 王志锋, 刘文犀. 基于深度学习的单帧图像超分辨率重建综述[J]. 电子学报, 2022, 50(9): 2265-2294. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||