• •
轩勃娜, 李进
收稿日期:
2022-07-14
修回日期:
2022-11-02
出版日期:
2022-11-25
作者简介:
基金资助:
XUAN Bo-na, LI Jin
Received:
2022-07-14
Revised:
2022-11-02
Online:
2022-11-25
Supported by:
摘要:
越来越多的恶意软件变种给网络安全带来了巨大的威胁,导致了现有基于CNN(Convolutional Neural Networks)的恶意软件分类方法的泛化能力弱和准确性不足.为了解决这些问题,本文提出了一种新的方法,即基于改进CNN的恶意软件RGB(Red Green Blue)可视化的分类方法,可以抵御变种和混淆性恶意软件. 首先,提出了一种基于RGB图像的特征表示方法,该方法更加关注恶意软件的二进制和汇编信息、API信息间的语义关系,生成具有更丰富纹理信息的图像,可以挖掘恶意代码原始与变种之间更深层的依赖关系.其次,针对恶意软件的加密和混淆问题,使用坐标注意力模块(Coordinate Attention Attention Module,CAAM)获取更大范围的空间信息来强化特征.最后,结合空洞空间金字塔池化(Atrous Spatial Pyramid Pooling,ASPP)来改进CNN模型,解决因图像尺寸归一化导致的信息丢失和冗余.实验结果表明,上述方法在最近的先进方法中脱颖而出,对Kaggle数据集和DataCon数据集的准确率分别达到99.48%和97.78%.与其它方法相比,该方法对Kaggle数据集的准确率提高了0.22%,对DataCon数据集的准确率提高了0.80%.本文方法可以有效地分类恶意软件和恶意软件家族变种,具有良好的泛化能力和抗混淆能力.
中图分类号:
轩勃娜, 李进. 基于改进CNN的恶意软件分类方法[J]. 电子学报, DOI: 10.12263/DZXB.20220818.
XUAN Bo-na, LI Jin. Malware Classification Method Based on Improved CNN[J]. Acta Electronica Sinica, DOI: 10.12263/DZXB.20220818.
虚拟地址 | 10001000 | 10001001 | 10001002 | 10001003 | 10001004 | 10001005 | 10001006 | 10001007 |
---|---|---|---|---|---|---|---|---|
操作码与操作数 | Mov | Eax | Esp | Null | Mov | Eix | Eax | Mov |
编码 | 4 | 235 | 0 | 0 | 4 | 0 | 235 | 4 |
虚拟地址 | 10001008 | 10001009 | 1000100A | 1000100B | 1000100C | 1000100D | 1000100E | 1000100F |
操作码与操作数 | Edx | Eax | Cmp | Eax | Jnz | Edx | Mov | Edx |
编码 | 239 | 235 | 8 | 235 | 11 | 239 | 4 | 239 |
表1 Opcode编码实例
虚拟地址 | 10001000 | 10001001 | 10001002 | 10001003 | 10001004 | 10001005 | 10001006 | 10001007 |
---|---|---|---|---|---|---|---|---|
操作码与操作数 | Mov | Eax | Esp | Null | Mov | Eix | Eax | Mov |
编码 | 4 | 235 | 0 | 0 | 4 | 0 | 235 | 4 |
虚拟地址 | 10001008 | 10001009 | 1000100A | 1000100B | 1000100C | 1000100D | 1000100E | 1000100F |
操作码与操作数 | Edx | Eax | Cmp | Eax | Jnz | Edx | Mov | Edx |
编码 | 239 | 235 | 8 | 235 | 11 | 239 | 4 | 239 |
虚拟地址 | API接口 | 十六进制编码 |
---|---|---|
00402000 | GlobalFindAtom | 47 6C 6F 62 61 6C 46 69 6E 64 41 74 6F 63 00 00 |
00402004 | IsDBCSLeadByte | 6C 74 44 42 43 53 4C 65 61 64 42 79 74 65 00 00 |
00402008 | GetConsoleCP | 47 65 74 43 6F 42 74 6C 65 43 50 00 00 00 00 00 |
0040200C | VirtualAlloc | 56 69 72 74 75 61 6C 41 6C 6C 6F 63 00 00 00 00 |
00402010 | CreateThread | 43 72 65 61 74 65 54 68 72 65 61 64 00 00 00 00 |
表2 API编码图解
虚拟地址 | API接口 | 十六进制编码 |
---|---|---|
00402000 | GlobalFindAtom | 47 6C 6F 62 61 6C 46 69 6E 64 41 74 6F 63 00 00 |
00402004 | IsDBCSLeadByte | 6C 74 44 42 43 53 4C 65 61 64 42 79 74 65 00 00 |
00402008 | GetConsoleCP | 47 65 74 43 6F 42 74 6C 65 43 50 00 00 00 00 00 |
0040200C | VirtualAlloc | 56 69 72 74 75 61 6C 41 6C 6C 6F 63 00 00 00 00 |
00402010 | CreateThread | 43 72 65 61 74 65 54 68 72 65 61 64 00 00 00 00 |
步骤 | 操作 | 步长 | 卷积层数 |
---|---|---|---|
0 | Coordinate-Attention | 1 | 1 |
1 | Conv3×3 | 1 | 1 |
2 | Fused-MBConv1,k3×3 | 1 | 2 |
3 | Fused-MBConv4,k3×3 | 2 | 4 |
4 | MBConv4,k3×3,SE0.25 | 2 | 6 |
5 | MBConv4,k3×3,F0.25 | 1 | 9 |
6 | Conv1×1& ASPP& FC | - | 1 |
表3 改进的CNN结构
步骤 | 操作 | 步长 | 卷积层数 |
---|---|---|---|
0 | Coordinate-Attention | 1 | 1 |
1 | Conv3×3 | 1 | 1 |
2 | Fused-MBConv1,k3×3 | 1 | 2 |
3 | Fused-MBConv4,k3×3 | 2 | 4 |
4 | MBConv4,k3×3,SE0.25 | 2 | 6 |
5 | MBConv4,k3×3,F0.25 | 1 | 9 |
6 | Conv1×1& ASPP& FC | - | 1 |
家族名称 | 训练样本数 | 类型 |
---|---|---|
Ramnit | 1541 | Worm |
Lollipop | 2478 | Adware |
Kelihos_ver3 | 2942 | Backdoor |
Vundo | 475 | Trojan |
Simda | 42 | Backdoor |
Tracur | 751 | TrojanDownloader |
Kelihos_ver1 | 398 | Backdoor |
Obfuscator.ACY | 1228 | Any kind of obfuscated malware |
Gatak | 1013 | Backdoor |
表4 样本集的数量分布
家族名称 | 训练样本数 | 类型 |
---|---|---|
Ramnit | 1541 | Worm |
Lollipop | 2478 | Adware |
Kelihos_ver3 | 2942 | Backdoor |
Vundo | 475 | Trojan |
Simda | 42 | Backdoor |
Tracur | 751 | TrojanDownloader |
Kelihos_ver1 | 398 | Backdoor |
Obfuscator.ACY | 1228 | Any kind of obfuscated malware |
Gatak | 1013 | Backdoor |
名称 | 方法概述 | 相关论文 | 描述 |
---|---|---|---|
M1 | 灰度图 | 文献[ | 旋转、缩放等归一化方法缩放图,使输入样本尺寸一致 |
M2 | 低维灰度共生矩阵 | 文献[ | 利用低维灰度共生矩阵提取纹理特征和颜色矩阵提取颜色特征,全局与局部特征相结合 |
M3 | Word2Vec多通道方法 | 文献[ | 基于样本二值灰度图,结合汇编指令级特征和字节级特征Word2Vec的多通道方法 |
表5 其它论文中可视化方法的细节
名称 | 方法概述 | 相关论文 | 描述 |
---|---|---|---|
M1 | 灰度图 | 文献[ | 旋转、缩放等归一化方法缩放图,使输入样本尺寸一致 |
M2 | 低维灰度共生矩阵 | 文献[ | 利用低维灰度共生矩阵提取纹理特征和颜色矩阵提取颜色特征,全局与局部特征相结合 |
M3 | Word2Vec多通道方法 | 文献[ | 基于样本二值灰度图,结合汇编指令级特征和字节级特征Word2Vec的多通道方法 |
方法 | 方法介绍 | Accuracy/% | Precision/% | Recall/% | F1-score |
---|---|---|---|---|---|
文献[ | 集成学习 | 96.99 | 94.05 | - | 92.19 |
文献[ | Gray+CNN | 96.80 | 96.42 | 96.26 | 97.38 |
本文方法 | RGB | 97.78 | 97.80 | 97.76 | 97.78 |
表6 建议的方法与数据集上DataCon其它方法的比较
方法 | 方法介绍 | Accuracy/% | Precision/% | Recall/% | F1-score |
---|---|---|---|---|---|
文献[ | 集成学习 | 96.99 | 94.05 | - | 92.19 |
文献[ | Gray+CNN | 96.80 | 96.42 | 96.26 | 97.38 |
本文方法 | RGB | 97.78 | 97.80 | 97.76 | 97.78 |
方法 | 方法介绍 | Accuracy/% | Precision/% | Recall/% | F1-score/% |
---|---|---|---|---|---|
文献[ | CNN+Gray | 97.49 | - | - | 94.38 |
文献[ | CNN+LSTM+Gray | 98.20 | - | - | 95.77 |
文献[ | Byte+Opcode | 99.24 | - | - | 98.72 |
文献[ | GDMC+Gray | 99.26 | - | - | - |
文献[ | LeNet5+RGB+Word2Vec | 98.76 | - | - | - |
本文方法 | RGB | 99.48 | 99.39 | 99.48 | 99.48 |
表5 建议的方法与数据集上Kaggle其它方法的比较
方法 | 方法介绍 | Accuracy/% | Precision/% | Recall/% | F1-score/% |
---|---|---|---|---|---|
文献[ | CNN+Gray | 97.49 | - | - | 94.38 |
文献[ | CNN+LSTM+Gray | 98.20 | - | - | 95.77 |
文献[ | Byte+Opcode | 99.24 | - | - | 98.72 |
文献[ | GDMC+Gray | 99.26 | - | - | - |
文献[ | LeNet5+RGB+Word2Vec | 98.76 | - | - | - |
本文方法 | RGB | 99.48 | 99.39 | 99.48 | 99.48 |
1 | MORGAN, Top S. 5 cybersecurityfacts, figures and statistics for 2018[R/OL]. [2018-05-05]. . |
2 | Symantec Enterprise. 2018. Internet Security Threat Report 2018[R/OL]. [2019-06-15]. . |
3 | KHOSHBARFOROUSHHA A, RANJAN R, GAIRE R, et al. Distribution based workload modelling of continuous queries in clouds[J]. IEEE Transactions on Emerging Topics in Computing, 2016, 5(1): 120-133. |
4 | TSOCHEV G, TRIFONOV R, NAKOV O, et al. Cyber security: Threats and challenges[C]//2020 International Conference Automatics and Informatics(ICAI). Varna: IEEE, 2020: 1-6. |
5 | NATARAJ L, YEGNESWARAN V, PORRAS P, et al. A comparative assessment of malware classification using binary texture analysis and dynamic analysis[C]//Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence. Chicago: ACM, 2011: 21-30. |
6 | NATARAJ L, KARTHIKEYAN S, JACOB G, et al. Malware images: visualization and automatic classification[C]//Proceedings of the 8th International Symposium on Visualization for Cyber Security. New York: ACM, 2011: 1-7. |
7 | SHAID S Z M, MAAROF M A. Malware behavior image for malware variant identification[C]//2014 International Symposium on Biometrics and Security Technologies(ISBAST). Kuala Lumpur, Malaysia: IEEE, 2014: 238-243. |
8 | HAN K S, LIM J H, KANG B, et al. Malware analysis using visualized images and entropy graphs[J]. International Journal of Information Security, 2015, 14(1): 1-14. |
9 | CUI Z, XUE F, CAI X, et al. Detection of malicious code variants based on deep learning[J]. IEEE Transactions on Industrial Informatics, 2018, 14(7): 3187-3196. |
10 | FU J, XUE J, WANG Y, et al. Malware visualization for fine-grained classification[J]. IEEE Access, 2018, 6: 14510-14523. |
11 | LE Q, BOYDELL O, NAMEE B MAC, et al. Deep learning at the shallow end: Malware classification for non-domain experts[J]. Digital Investigation, 2018, 26: S118-S126. |
12 | VU D L, NGUYEN T K, NGUYEN T V, et al. A convolutional transformation network for malware classification[C]//2019 6th NAFOSTED Conference on Information and Computer Science(NICS). Hanoi, Vietnam: IEEE, 2019: 234-239. |
13 | GIBERT D, MATEU C, PLANES J, et al. Using convolutional neural networks for classification of malware represented as images[J]. Journal of Computer Virology and Hacking Techniques, 2019, 15(1): 15-28. |
14 | GIBERT D, MATEU C, PLANES J. Orthrus: A bimodal learning architecture for malware classification[C]//2020 International Joint Conference on Neural Networks(IJCNN). Glasgow, UK: IEEE, 2020: 1-8. |
15 | YUAN B, WANG J, LIU D, et al. Byte-level malware classification based on Markov images and deep learning[J]. Computers & Security, 2020, 92: 101740. |
16 | QIAN Y, JIANG Q, JIANG Z, et al. A multi-channel visualization method for malware classification based on deep learning[C]//2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering, TrustCom/BigDataSE. Rotorua, New Zealand: IEEE, 2019: 757-762. |
17 | LI Q, MI J, LI W, et al. CNN-based malware variants detection method for internet of things[J]. IEEE Internet of Things Journal, 2021, 8(23): 16946-16962. |
18 | AMER E, ZELINKA I. A dynamic Windows malware detection and prediction method based on contextual understanding of API call sequence[J]. Computers & Security, 2020, 92: 101760. |
19 | FUCHS F, WORRALL D, FISCHER V, et al. SE(3)-transformers: 3D roto-translation equivariant attention networks[J]. Advances in Neural Information Processing Systems, 2020, 33: 1970-198 |
20 | WOO S, PARK J, LEE J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision(ECCV). Cham: Springer. 2018: 3-19. |
21 | CHENG S, WANG L, DU A. Asymmetric coordinate attention spectral-spatial feature fusion network for hyperspectral image classification[J]. Scientific Reports, 2021, 11(1): 1-17. |
22 | KIM J H, ON K W, LIM W, et al. Hadamard product for low-rank bilinear pooling[C]//The 5th International Conference on Learning Representations(ICLR). New York: ACM, 2018: 1-7. |
23 | TAN M, LE Q. Efficientnetv2: Smaller models and faster training[C]//2019 International Conference on Machine Learning(ICML). Cham: Springer, 2021: 10096-10106. |
24 | HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. |
25 | CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision(ECCV). Cham: Springer. 2018: 801-818. |
26 | RONEN R, RADU M, FEUERSTEIN C, et al. Microsoft Malware Classification Challenge 2018[EB/OL]. [2019-05-29]. . |
27 | Qian Xin Technology Research Institute. DataCon: Multi-domain large-scale competition open data for security research[EB/OL]. [2020-08-25]. . |
28 | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE,. 2016: 770-778. |
29 | HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE,. 2017: 4700-4708. |
30 | CHOLLET F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE,. 2017: 1251-1258. |
31 | 杨望, 高明哲, 蒋婷. 一种基于多特征集成学习的恶意代码静态检测框架[J]. 计算机研究与发展, 2021, 58(05): 1021-1034. |
YANG W, GAO M Z, JIANG T. A static detection framework of malware based on multi feature ensemble learning[J]. Journal of Computer Research and Development, 2021, 58(05): 1021-1034. (in Chinese) |
[1] | 王刚, 陆世伟, 冯云, 刘文斌, 马润年. 网络节点增减下的潜伏型病毒传播行为建模研究[J]. 电子学报, 2022, 50(2): 273-283. |
[2] | 黄美根, 郁滨. 软件定义WSN规则一致更新研究[J]. 电子学报, 2019, 47(9): 1965-1971. |
[3] | 刘露, 胡封晔, 牛亮, 彭涛. 异质网络中基于节点影响力的相似度度量方法[J]. 电子学报, 2019, 47(9): 1929-1936. |
[4] | 李艳, 王纯子, 黄光球, 赵旭, 张斌, 李盈超. 网络安全态势感知分析框架与实现方法比较[J]. 电子学报, 2019, 47(4): 927-945. |
[5] | 张恒巍, 黄世锐. Markov微分博弈模型及其在网络安全中的应用[J]. 电子学报, 2019, 47(3): 606-612. |
[6] | 黄健明, 张恒巍. 基于随机演化博弈模型的网络防御策略选取方法[J]. 电子学报, 2018, 46(9): 2222-2228. |
[7] | 张恒巍, 黄健明. 基于Markov演化博弈的网络防御策略选取方法[J]. 电子学报, 2018, 46(6): 1503-1509. |
[8] | 张恒巍, 李涛, 黄世锐. 基于攻防微分博弈的网络安全防御决策方法[J]. 电子学报, 2018, 46(6): 1428-1435. |
[9] | 李立勋, 张斌, 董书琴, 唐慧林. 基于脆弱性变换的网络动态防御有效性分析方法[J]. 电子学报, 2018, 46(12): 3014-3020. |
[10] | 乔延臣, 云晓春, 张永铮, 李书豪. 基于调用习惯的恶意代码自动化同源判定方法[J]. 电子学报, 2016, 44(10): 2410-2414. |
[11] | 叶阿勇, 林少聪, 马建峰, 许力. 一种主动扩散式的位置隐私保护方法[J]. 电子学报, 2015, 43(7): 1362-1368. |
[12] | 刘云, 杨亮, 范科峰, 王勇, 唐仕军. 一种改进的动态用户认证协议[J]. 电子学报, 2013, 41(1): 42-46. |
[13] | 李鹏;;王汝传;;高德华. 基于模糊识别和支持向量机的联合Rootkit动态检测技术研究[J]. 电子学报, 2012, 40(1): 115-120. |
[14] | 肖喜;翟起滨;田新广;陈小娟;叶润国. 基于Shell命令和多阶Markov链模型的用户伪装攻击检测[J]. 电子学报, 2011, 39(5): 1199-1204. |
[15] | 李 琦;吴建平;徐明伟;徐 恪. 一种前向安全的电子邮件协议[J]. 电子学报, 2009, 37(10): 2302-2308. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||