电子学报

• •    

MalMKNet:一种用于恶意代码分类的多尺度卷积神经网络

张丹丹, 宋亚飞, 刘曙()   

  1. 空军工程大学防空反导学院,陕西西安 710051
  • 收稿日期:2022-09-20 修回日期:2022-11-28 出版日期:2023-02-27
    • 通讯作者:
    • 宋亚飞
    • 作者简介:
    • 张丹丹 女,1998年12月出生于上海市.现为空军工程大学硕士研究生.现为空军工程大学硕士研究生.主要研究方向为恶意代码检测. E-mail: afeu_ddz@163.com
      宋亚飞(通讯作者) 男,1988年出生于河南汝州.现为空军工程大学防空反导学院副教授.主要研究方向为机器学习及其在目标识别和入侵检测等领域中的应用. E-mail: yafei_song@163.com
      刘曙 男,1971年出生于湖南益阳.现为空军工程大学防空反导学院副教授.主要研究方向为网络空间信息防御和计算机与软件工程. E-mail: liushu@163.com
    • 基金资助:
    • 国家自然科学基金(61806219);陕西省科学基金(2021JM-226);陕西省高校科协青年人才托举计划(20190108);陕西省创新能力支撑计划(2020KJXX-065)

MalMKNet: A Multi-scale Convolutional Neural Network Used for Malware Classification

ZHANG Dan-dan, SONG Ya-fei, LIU Shu()   

  1. Institute of Air Defense and Anti-missile,Air Force Engineering University,Xi'an,Shaanxi 710051,China
  • Received:2022-09-20 Revised:2022-11-28 Online:2023-02-27
    • Corresponding author:
    • SONG Ya-fei
    • Supported by:
    • National Natural Science Foundation of China(61806219);National Science Foundation of Shaanxi Provence(2021JM-226);Young Talent fund of University and Association for Science and Technology in Shaanxi, China(20190108);Young Talent fund of University and Association for Science and Technology in Shaanxi, China(20190108)

摘要:

对未知恶意代码及其变种进行快速准确地识别,是对恶意攻击行为进行有效防范的前提和基础.但随着恶意代码变种的急剧增加,人工更新样本数据库的效率越来越差,仅仅依据延时的数据库信息,传统的识别方法难以有效捕获经过混淆方法操作的样本特征信息.针对上述问题,本文设计了一种基于灰度图像处理的深度学习模型MalMKNet(Multi-scale Kernel Network for Malware),建立了一种多尺度卷积核混合的卷积神经网络(Convolutional Neural Network,CNN)架构,以提高恶意代码识别能力.该模型运用具有捷径(shortcut)结构的深度大内核卷积和标准小内核卷积相结合的混合卷积核(Mixed Kernels,MK)模块,以提高模型准确率;在此基础上,通过多尺度内核融合(Multi-scale Kernel Fusion,MKF),以降低模型参数量;再结合特征重组(feature shuffle)操作,实现优化特征通信,在不增加模型参数量的前提下提升了分类精度.实验结果表明,MalMKNet在恶意代码家族分类准确率方面优于其他基于深度学习的分类方法,准确率达到了99.35%.

关键词: 恶意代码识别, 卷积神经网络, 深度学习, 图像处理, 大卷积核, 轻量化模型

Abstract:

Rapid and accurate identification of unknown malware and its variants is the premise and basis for the effective prevention of malicious attacks. However, with the rapid increase of malware variants, the efficiency of manual updating of the sample database is getting worse and worse. It is difficult for the traditional identification method to effectively capture the sample feature information operated by the confusion method only based on the delayed database information. To address the above problems, this paper proposes a deep learning model based on grayscale image processing, MalMKNet(Multi-scale Kernel Network for Malware), a convolutional neural network(CNN) architecture using multi-scale convolution kernel mixing action to improve malware detection capabilities. The mixed kernels(MK) module combining deep large kernel convolution and standard small kernel convolution with shortcut structure is proposed to improve the model accuracy, and then we proposed multi-scale kernel fusion(MKF) to reduce the number of parameters. The feature shuffle(FS) is proposed to improve the classification accuracy without increasing the number of parameters. Experimental results show that MalMKNet outperforms the state-of-the-art methods in terms of malware family classification accuracy which achieves 99.35%.

Key words: malware detection, convolutional neural network, deep learning, image processing, large kernels, lightweight model

中图分类号: