CN111552964A - Malicious software classification method based on static analysis - Google Patents

Malicious software classification method based on static analysis Download PDF

Info

Publication number
CN111552964A
CN111552964A CN202010264024.3A CN202010264024A CN111552964A CN 111552964 A CN111552964 A CN 111552964A CN 202010264024 A CN202010264024 A CN 202010264024A CN 111552964 A CN111552964 A CN 111552964A
Authority
CN
China
Prior art keywords
malicious software
gray
layer
neural network
software
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010264024.3A
Other languages
Chinese (zh)
Inventor
李静梅
白丹
彭弘
薛迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN202010264024.3A priority Critical patent/CN111552964A/en
Publication of CN111552964A publication Critical patent/CN111552964A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Virology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of computer security, and particularly relates to a malicious software classification method based on static analysis. The invention relates to a method for classifying malicious software samples, which comprises the steps of converting malicious software into binary files, generating gray level images, training the gray level images by adopting a convolutional neural network model with a spatial pyramid pooling layer to obtain a static classifier, and classifying the malicious software samples by the static classifier to belong families. The invention can classify the malicious software by taking the gray level image as the characteristic, thereby effectively reducing the information loss caused by the image preprocessing stage. According to the method, the malicious software is classified by analyzing the outline characteristics of the malicious software, so that the professional can be helped to reduce the cost for identifying the malicious software.

Description

Malicious software classification method based on static analysis
Technical Field
The invention belongs to the technical field of computer security, and particularly relates to a malicious software classification method based on static analysis.
Background
Along with the rapid development of the internet industry, the dependence of people on various kinds of software is enhanced, which brings great convenience to attack and spread of malicious software. Because of the endless proliferation of various automation tools, malware is discovered by people much less rapidly than is derived on the internet. For example, 15,714,700 malicious objects were detected by the 2017 Kaspersky laboratory. In quarter 1 of 2018 the McAfee laboratory detected 790 ten thousand malicious files per day, an increase of 450 million over quarter 4 of 2017. Although malware is being derived more and more rapidly, the vast majority of malware has evolved through the polymorphism and deformation of known malware. Therefore, the homologous relationship in the sample is found to have very important functions on attack tissue tracing, operation environment restoration and attack prevention.
Due to the continuous improvement of the technology of the malicious software and the wider and wider use population of the application software, the spread range of the malicious software is continuously increased in the process of executing certain operations, but most of the malicious software is evolved from the known malicious software. While considerable research has been done by the relevant personnel today, malware continues to flood. The dynamic analysis method has high accuracy but poor efficiency, and generates excessive classification cost in the analysis process. Compared with a dynamic analysis method, the static analysis method has higher classification accuracy and better efficiency than the dynamic analysis method. Therefore, it is a very important topic to research a malware classification method based on static analysis. The method has very important scientific theoretical value and practical application significance in researching the malicious software classification technology with wide application range and strong practicability to improve the safety of the computer system.
Disclosure of Invention
The invention aims to provide a malware classification method based on static analysis, which classifies malware by analyzing outline characteristics of the malware and helps professionals to reduce the cost of identifying the malware.
The purpose of the invention is realized by the following technical scheme: the method comprises the following steps:
step 1: inputting a software data set to be classified, and dividing the software data set to be classified into a training set and a testing set;
step 2: converting the software sample in the training set into a binary file, wherein the conversion method specifically comprises the following steps: the Exe of the Windows executable file to be analyzed is converted into a binary stream file in a bytes format;
and step 3: partitioning the binary file with every 8-bit byte and converting every 8-bit byte into a gray value, the conversion scheme mapping byte values from 0 to 255, where 0 represents black and 255 represents white; secondly, converting the gray values into a two-dimensional gray matrix in a sequential arrangement mode, and determining the width and the height of the two-dimensional gray matrix according to the size of a file so as to visualize the two-dimensional gray matrix into a gray image;
and 4, step 4: training a convolutional neural network model by using the generated gray level image to generate a static classifier; the convolutional neural network model comprises an input layer, a convolutional layer, a maximum pooling layer, a spatial pyramid pooling layer and an output layer; processing the gray scale image using a small window convolution filter; convolution layers all use a convolution kernel of 3 × 3, and the step size is set to 1; performing 1-pixel edge filling on the input feature map in the convolutional layer; using a 2 multiplied by 2 sliding window when the pooling is maximum, and setting the step length to be 2; the last pooling layer adopts 3 layers of space pyramid pooling, and features of any dimension are input and then uniformly output; the convolutional neural network uses dropout with probability of 0.5 after each pooling layer to prevent the overfitting phenomenon; initializing and batch normalizing by using a Leaky ReLU activation function and uniformly distributed weights;
and 5: and inputting the test set of the software data set to be classified into a static classifier, judging the family to which the malicious software belongs according to the classification result of the classifier, and finishing classification of the malicious software.
The invention has the beneficial effects that:
the invention takes the gray level image as the characteristic and uses the convolution neural network with the spatial pyramid pooling layer for classification, thereby effectively reducing the information loss caused in the image preprocessing stage. The method has the advantages of lower time cost, high detection speed and high detection efficiency in the field of malicious software detection.
Drawings
FIG. 1 is a flow chart of the present invention.
FIG. 2 is a flowchart of the method for generating a malware grayscale image according to the present invention.
FIG. 3 is a diagram of a convolutional neural network structure in the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
The invention provides a malicious software classification system based on static analysis, and belongs to the field of computer security. The invention relates to a method for classifying malicious software samples, which comprises the steps of converting malicious software into binary files, generating gray level images, training the gray level images by adopting a convolutional neural network model with a spatial pyramid pooling layer to obtain a static classifier, and classifying the malicious software samples by the static classifier to belong families. The invention can classify the malicious software by taking the gray level image as the characteristic, thereby effectively reducing the information loss caused by the image preprocessing stage. The invention aims to classify the malicious software by analyzing the outline characteristics of the malicious software and help professionals to reduce the cost of identifying the malicious software.
(1) The classification system converts the malicious software sample into a binary file for processing in a static analysis mode.
(2) The classification system divides binary files by taking each 8 bytes as a block, converts a string of gray value streams into gray values of 0-255 and stores the gray values in a one-dimensional gray value array, converts one-dimensional vectors into a two-dimensional gray value matrix, and then generates gray level images.
(3) The classification system trains a convolutional neural network model to generate a static classifier using the generated grayscale image. The convolutional neural network model comprises an input layer, a convolutional layer, a maximum pooling layer, a spatial pyramid pooling layer and an output layer.
(4) And the classification system inputs the malware samples in the test set into a static classifier, and judges the family to which the malware belongs according to the classification result of the classifier.
Converting the malware into a binary file in the classification system further comprises:
preprocessing the sample data of the malicious software, and converting the Windows executable file to be analyzed to the binary stream file in the bytes format.
The generating of the gray scale image in the classification system further comprises:
the classification system treats malware as a binary file, partitions the binary file every 8-bit byte, and converts every 8-bit byte to a gray value, the conversion scheme mapping byte values from 0 (black) to 255 (white). And then converting the gray values into a two-dimensional gray matrix in a sequential arrangement mode, and determining the width and the height of the two-dimensional gray matrix according to the size of the malicious code file so as to visualize the two-dimensional gray matrix into a gray image.
The convolutional neural network involved in the classification system further comprises:
the classification system processes the gray scale image through a convolutional neural network using a small window convolution filter. The convolutional layers all use a 3 x 3 convolutional kernel with the step size set to 1. The input feature map is then edge-filled for 1 pixel in the convolutional layer. A 2 x 2 sliding window is used for maximum pooling, with the step size set to 2. And the last pooling layer adopts 3 layers of space pyramid pooling, and the features of any dimension are input and then uniformly output.
The convolutional neural network optimization further comprises:
the convolutional neural network uses dropout with a probability of 0.5 after each pooling layer to prevent the overfitting phenomenon. Then initialized with the Leaky ReLU activation function, uniformly distributed weights and batch normalization.
The static classifier classification involved in the classification system further comprises:
and classifying each malicious software sample by the convolutional neural network model, namely obtaining a classification result of the static classifier.
The invention provides a malware classification system based on static analysis, which can classify malware by taking a gray image as a feature. The invention aims to classify the malicious software by analyzing the outline characteristics of the malicious software and help professionals to reduce the cost of identifying the malicious software.
Compared with the prior art, the invention has the advantages that:
1. the invention provides a malicious software classification system based on static analysis, which takes a gray level image as a feature and uses a convolutional neural network with a spatial pyramid pooling layer for classification, thereby effectively reducing information loss caused in an image preprocessing stage.
2. The invention provides a malicious software classification system based on static analysis, which has lower time cost in the field of malicious software detection.
3. The invention provides a malicious software classification system based on static analysis, which has the advantages of high detection speed and high detection efficiency.
Fig. 1 is a flowchart of a malware classification system based on static analysis according to the present invention. The present invention includes the following four aspects.
(1) The classification system converts the malicious software sample into a binary file for processing in a static analysis mode.
Preprocessing the sample data of the malicious software, and converting the Windows executable file to be analyzed to the binary stream file in the bytes format.
(2) The classification system divides binary files by taking each 8 bytes as a block, converts a string of gray value streams into gray values of 0-255 and stores the gray values in a one-dimensional gray value array, converts the one-dimensional vectors into a two-dimensional gray value matrix, and then generates a gray image.
In conjunction with the grayscale chart generation flow diagram of FIG. 2, the classification system treats malware as a binary file, partitions the binary file by every 8-bit byte, and converts every 8-bit byte to a grayscale value, with the transformation scheme mapping byte values from 0 (black) to 255 (white). And then converting the gray values into a two-dimensional gray matrix in a sequential arrangement mode, and determining the width and the height of the two-dimensional gray matrix according to the size of the malicious code file so as to visualize the two-dimensional gray matrix into a gray image.
(3) The classification system trains a convolutional neural network model to generate a static classifier using the generated grayscale image. The convolutional neural network model comprises an input layer, a convolutional layer, a maximum pooling layer, a spatial pyramid pooling layer and an output layer.
In connection with fig. 3, the classification system processes the gray scale image through a convolutional neural network using a small window convolution filter. The convolutional layers all use a 3 x 3 convolutional kernel with the step size set to 1. The input feature map is then edge-filled for 1 pixel in the convolutional layer. A 2 x 2 sliding window is used for maximum pooling, with the step size set to 2. And the last pooling layer adopts 3 layers of space pyramid pooling, and the features of any dimension are input and then uniformly output.
(4) And the classification system inputs the malware samples in the test set into a static classifier, and judges the family to which the malware belongs according to the classification result of the classifier.
And classifying each malicious software sample by the convolutional neural network model, namely obtaining a classification result of the static classifier.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (1)

1. A malware classification method based on static analysis is characterized by comprising the following steps:
step 1: inputting a software data set to be classified, and dividing the software data set to be classified into a training set and a testing set;
step 2: converting the software sample in the training set into a binary file, wherein the conversion method specifically comprises the following steps: the Exe of the Windows executable file to be analyzed is converted into a binary stream file in a bytes format;
and step 3: partitioning the binary file with every 8-bit byte and converting every 8-bit byte into a gray value, the conversion scheme mapping byte values from 0 to 255, where 0 represents black and 255 represents white; secondly, converting the gray values into a two-dimensional gray matrix in a sequential arrangement mode, and determining the width and the height of the two-dimensional gray matrix according to the size of a file so as to visualize the two-dimensional gray matrix into a gray image;
and 4, step 4: training a convolutional neural network model by using the generated gray level image to generate a static classifier; the convolutional neural network model comprises an input layer, a convolutional layer, a maximum pooling layer, a spatial pyramid pooling layer and an output layer; processing the gray scale image using a small window convolution filter; convolution layers all use a convolution kernel of 3 × 3, and the step size is set to 1; performing 1-pixel edge filling on the input feature map in the convolutional layer; using a 2 multiplied by 2 sliding window when the pooling is maximum, and setting the step length to be 2; the last pooling layer adopts 3 layers of space pyramid pooling, and features of any dimension are input and then uniformly output; the convolutional neural network uses dropout with probability of 0.5 after each pooling layer to prevent the overfitting phenomenon; initializing and batch normalizing by using a Leaky ReLU activation function and uniformly distributed weights;
and 5: and inputting the test set of the software data set to be classified into a static classifier, judging the family to which the malicious software belongs according to the classification result of the classifier, and finishing classification of the malicious software.
CN202010264024.3A 2020-04-07 2020-04-07 Malicious software classification method based on static analysis Pending CN111552964A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010264024.3A CN111552964A (en) 2020-04-07 2020-04-07 Malicious software classification method based on static analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010264024.3A CN111552964A (en) 2020-04-07 2020-04-07 Malicious software classification method based on static analysis

Publications (1)

Publication Number Publication Date
CN111552964A true CN111552964A (en) 2020-08-18

Family

ID=72005656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010264024.3A Pending CN111552964A (en) 2020-04-07 2020-04-07 Malicious software classification method based on static analysis

Country Status (1)

Country Link
CN (1) CN111552964A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112347478A (en) * 2020-10-13 2021-02-09 北京天融信网络安全技术有限公司 Malicious software detection method and device
CN112884061A (en) * 2021-03-10 2021-06-01 河北师范大学 Malicious software family classification method based on parameter optimization meta-learning
CN113282925A (en) * 2021-03-30 2021-08-20 深圳融安网络科技有限公司 Malicious file detection method and device, terminal equipment and storage medium
CN113538288A (en) * 2021-07-29 2021-10-22 中移(杭州)信息技术有限公司 Network anomaly detection method and device and computer readable storage medium
CN114611102A (en) * 2022-02-23 2022-06-10 西安电子科技大学 Visual malicious software detection and classification method and system, storage medium and terminal
CN114741697A (en) * 2022-04-22 2022-07-12 中国电信股份有限公司 Malicious code classification method and device, electronic equipment and medium
CN114913384A (en) * 2022-06-24 2022-08-16 河北科技大学 Target application classification method and device and electronic equipment
CN114926680A (en) * 2022-05-13 2022-08-19 山东省计算中心(国家超级计算济南中心) Malicious software classification method and system based on AlexNet network model
CN116861431A (en) * 2023-09-05 2023-10-10 国网山东省电力公司信息通信公司 Malicious software classification method and system based on multichannel image and neural network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10089467B1 (en) * 2017-05-23 2018-10-02 Malwarebytes Inc. Static anomaly-based detection of malware files
CN109829306A (en) * 2019-02-20 2019-05-31 哈尔滨工程大学 A kind of Malware classification method optimizing feature extraction
CN110096878A (en) * 2019-04-26 2019-08-06 武汉智美互联科技有限公司 A kind of detection method of Malware
CN110572393A (en) * 2019-09-09 2019-12-13 河南戎磐网络科技有限公司 Malicious software traffic classification method based on convolutional neural network
CN110765458A (en) * 2019-09-19 2020-02-07 浙江工业大学 Malicious software detection method and device based on deep learning
CN110826060A (en) * 2019-09-19 2020-02-21 中国科学院信息工程研究所 Visual classification method and device for malicious software of Internet of things and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10089467B1 (en) * 2017-05-23 2018-10-02 Malwarebytes Inc. Static anomaly-based detection of malware files
CN109829306A (en) * 2019-02-20 2019-05-31 哈尔滨工程大学 A kind of Malware classification method optimizing feature extraction
CN110096878A (en) * 2019-04-26 2019-08-06 武汉智美互联科技有限公司 A kind of detection method of Malware
CN110572393A (en) * 2019-09-09 2019-12-13 河南戎磐网络科技有限公司 Malicious software traffic classification method based on convolutional neural network
CN110765458A (en) * 2019-09-19 2020-02-07 浙江工业大学 Malicious software detection method and device based on deep learning
CN110826060A (en) * 2019-09-19 2020-02-21 中国科学院信息工程研究所 Visual classification method and device for malicious software of Internet of things and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DI XUE等: "Malware Classification Using Probability Scoring and Machine Learning", 《网页在线公开:HTTPS://IEEEXPLORE.IEEE.ORG/STAMP/STAMP.JSP?TP=&ARNUMBER=8758215》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112347478A (en) * 2020-10-13 2021-02-09 北京天融信网络安全技术有限公司 Malicious software detection method and device
CN112884061A (en) * 2021-03-10 2021-06-01 河北师范大学 Malicious software family classification method based on parameter optimization meta-learning
CN113282925B (en) * 2021-03-30 2023-09-05 深圳融安网络科技有限公司 Malicious file detection method, malicious file detection device, terminal equipment and storage medium
CN113282925A (en) * 2021-03-30 2021-08-20 深圳融安网络科技有限公司 Malicious file detection method and device, terminal equipment and storage medium
CN113538288A (en) * 2021-07-29 2021-10-22 中移(杭州)信息技术有限公司 Network anomaly detection method and device and computer readable storage medium
CN114611102A (en) * 2022-02-23 2022-06-10 西安电子科技大学 Visual malicious software detection and classification method and system, storage medium and terminal
CN114741697A (en) * 2022-04-22 2022-07-12 中国电信股份有限公司 Malicious code classification method and device, electronic equipment and medium
CN114741697B (en) * 2022-04-22 2023-10-13 中国电信股份有限公司 Malicious code classification method and device, electronic equipment and medium
CN114926680A (en) * 2022-05-13 2022-08-19 山东省计算中心(国家超级计算济南中心) Malicious software classification method and system based on AlexNet network model
CN114926680B (en) * 2022-05-13 2022-11-11 山东省计算中心(国家超级计算济南中心) Malicious software classification method and system based on AlexNet network model
CN114913384A (en) * 2022-06-24 2022-08-16 河北科技大学 Target application classification method and device and electronic equipment
CN116861431A (en) * 2023-09-05 2023-10-10 国网山东省电力公司信息通信公司 Malicious software classification method and system based on multichannel image and neural network
CN116861431B (en) * 2023-09-05 2023-11-21 国网山东省电力公司信息通信公司 Malicious software classification method and system based on multichannel image and neural network

Similar Documents

Publication Publication Date Title
CN111552964A (en) Malicious software classification method based on static analysis
CN110765458B (en) Malicious software image format detection method and device based on deep learning
CN113806746B (en) Malicious code detection method based on improved CNN (CNN) network
CN111914612B (en) Construction graphic primitive self-adaptive identification method based on improved convolutional neural network
CN110633570A (en) Black box attack defense method for malicious software assembly format detection model
CN111259397B (en) Malware classification method based on Markov graph and deep learning
CN110826060A (en) Visual classification method and device for malicious software of Internet of things and electronic equipment
CN110647745A (en) Detection method of malicious software assembly format based on deep learning
CN116910752B (en) Malicious code detection method based on big data
CN113221115B (en) Visual malicious software detection method based on collaborative learning
CN112465057B (en) Target detection and identification method based on deep convolutional neural network
CN112884061A (en) Malicious software family classification method based on parameter optimization meta-learning
CN111382438A (en) Malicious software detection method based on multi-scale convolutional neural network
CN114510721B (en) Static malicious code classification method based on feature fusion
CN111552965A (en) Malicious software classification method based on PE (provider edge) header visualization
CN115964710A (en) Malicious code detection method and system based on internal memory forensics and deep learning
Yoo et al. The image game: exploit kit detection based on recursive convolutional neural networks
CN114972886A (en) Image steganography analysis method
CN116010950A (en) Malicious software detection method and system based on ViT twin neural network
CN114896594A (en) Malicious code detection device and method based on image feature multi-attention learning
CN110458239A (en) Malware classification method and system based on binary channels convolutional neural networks
CN113420295A (en) Malicious software detection method and device
CN116595525A (en) Threshold mechanism malicious software detection method and system based on software map
CN115294392B (en) Visible light remote sensing image cloud removal method and system based on network model generation
CN108446558B (en) Space filling curve-based malicious code visual analysis method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200818