电子学报

• •    

基于改进CNN的恶意软件分类方法

轩勃娜, 李进   

  1. 空军工程大学防空反导学院,陕西 西安 710051
  • 收稿日期:2022-07-14 修回日期:2022-11-02 出版日期:2022-11-25
    • 作者简介:
    • 轩勃娜 女,1991年2月出生于陕西省兴平市.现为空军工程大学防空反导学院硕士.主要研究方向为恶意代码分类. E-mail: afeunbx219318@163.com
      李进 男,1971年9月出生于陕西省西安市.1988年毕业于空军工程大学电电子学系.现为空军工程大学副教授,从事地对空导弹指挥控制系统的网络安全. E-mail: ljlxls@163.com
    • 基金资助:
    • 国家自然科学基金 (61806219)

Malware Classification Method Based on Improved CNN

XUAN Bo-na, LI Jin   

  1. School of Air and Missile Defense,College of Air Force Engineering University of China,Xi'an,Shaanxi 710051,China
  • Received:2022-07-14 Revised:2022-11-02 Online:2022-11-25
    • Supported by:
    • National Natural Science Foundation of China (61806219)

摘要:

越来越多的恶意软件变种给网络安全带来了巨大的威胁,导致了现有基于CNN(Convolutional Neural Networks)的恶意软件分类方法的泛化能力弱和准确性不足.为了解决这些问题,本文提出了一种新的方法,即基于改进CNN的恶意软件RGB(Red Green Blue)可视化的分类方法,可以抵御变种和混淆性恶意软件. 首先,提出了一种基于RGB图像的特征表示方法,该方法更加关注恶意软件的二进制和汇编信息、API信息间的语义关系,生成具有更丰富纹理信息的图像,可以挖掘恶意代码原始与变种之间更深层的依赖关系.其次,针对恶意软件的加密和混淆问题,使用坐标注意力模块(Coordinate Attention Attention Module,CAAM)获取更大范围的空间信息来强化特征.最后,结合空洞空间金字塔池化(Atrous Spatial Pyramid Pooling,ASPP)来改进CNN模型,解决因图像尺寸归一化导致的信息丢失和冗余.实验结果表明,上述方法在最近的先进方法中脱颖而出,对Kaggle数据集和DataCon数据集的准确率分别达到99.48%和97.78%.与其它方法相比,该方法对Kaggle数据集的准确率提高了0.22%,对DataCon数据集的准确率提高了0.80%.本文方法可以有效地分类恶意软件和恶意软件家族变种,具有良好的泛化能力和抗混淆能力.

关键词: 网络安全, 恶意代码分类, RGB图像, 汇编信息, 语义关系, 坐标注意力模块, 空洞空间金字塔

Abstract:

The increasing variants malware bring a great threat to network security, leading to weak generalization and insufficient accuracy of existing base on the convolutional neural networks(CNN) malware classification methods. To solve these problems, an approach, namely, a classification method based on improved the CNN for malware RGB(Red Green Blue) visualization that can resist variants and obfuscation malware. Firstly, our method proposed a feature representation method based on RGB image, which pays more attention to the semantic relationship between binary, assembly information and API information of malware. The generated image, with richer vein information, that can uncover deeper dependencies between the original and variants of the malware. Secondly, to address the problems of malware encryption and obfuscation, this paper uses the coordinate attention attention module(CAAM) to obtain a larger range of the spatial information to strengthen malware features. Finally, the Atrous spatial pyramid pooling(ASPP) is combined to improve the CNN model to address the information loss and redundancy due to image size normalization. The experimental results show that the above methods stands out among the recent advanced methods with an accuracy of 99.48% and 97.78% for dataset Kaggle and dataset DataCon. Compared with the other methods, our method had the accuracy increased by 0.22% for dataset Kaggle, and had the accuracy increased by 0.80% for dataset DataCon. Our method can effectively classify malware and variants of malware families, which has excellent generalization ability and anti-obfuscation ability.

Key words: network security, malware classification, RGB image, compile information, semantic relationship, coordinate attention attention module, atrous spatial pyramid pooling

中图分类号: