CN111382438B

CN111382438B - Malware detection method based on multi-scale convolutional neural network

Info

Publication number: CN111382438B
Application number: CN202010231067.1A
Authority: CN
Inventors: 白金荣; 熊倩; 秦汝霞
Original assignee: Yuxi Normal University
Current assignee: Yuxi Normal University
Priority date: 2020-03-27
Filing date: 2020-03-27
Publication date: 2024-04-23
Anticipated expiration: 2040-03-27
Also published as: CN111382438A

Abstract

The invention discloses a malicious software detection method based on a multi-scale convolutional neural network, which comprises the steps of firstly converting a binary executable file of a training sample into a hexadecimal character sequence with fixed length, then converting the hexadecimal character sequence with fixed length into a low-dimensional vector through word embedding, inputting the low-dimensional vector generated by conversion into the multi-scale convolutional neural network, training a detection model based on the multi-scale convolutional neural network, finally converting the binary executable file of software to be detected into a low-dimensional vector according to the steps, inputting the low-dimensional vector into the detection model based on the multi-scale convolutional neural network, classifying the software to be detected and outputting a detection result; the multi-scale convolutional neural network adopts a plurality of parallel feature extraction channels, and each feature extraction channel consists of a one-dimensional convolutional layer, a pooling layer and a first Dropout layer which are sequentially connected. The method solves the problem of low accuracy of the existing malicious software detection method.

Description

Malware detection method based on multi-scale convolutional neural network

Technical Field

The invention belongs to the technical field of malware protection, and relates to a malware detection method based on a multi-scale convolutional neural network.

Background

The Internet is deeply changing the production and living modes of human beings, the world is deeply influenced by network attacks while the world is deeply benefited by network development, and the network space security problem has become a serious challenge puzzling the world. According to statistics of 2017 Internet security threat report issued by Symantec, the company captures 4.01 hundred million new malicious software in 2016, and 109 ten thousand new malicious software are released to the Internet every day. Such a huge number of malicious software has become the biggest security threat of the internet, and seriously affects the information security of countries around the world.

Malware refers to any software that detracts from the benefit of a user, and is a generic term for various hostile or intrusive software, including viruses, worms, trojan horses, rootkits, backdoors, botnets, spyware, and the like. Malware may affect not only an infected computer or device, but also other devices that communicate with the infected device. Thus, accurate identification and detection of malware is critical to network information security.

Signature-based methods are widely used in current malware detection systems, which can effectively detect known malware by extracting a particular byte sequence of a binary program to obtain a signature. However, the traditional detection method cannot identify and detect unknown malicious software types and new malicious software generated by simply jacketing or confusing known malicious software, meanwhile, the malicious software using the polymorphic deformation technology continuously and randomly changes the content of the binary file in the process of propagation, has no fixed characteristics, and cannot detect the malicious software by using a signature-based method. In addition, the speed of manually extracting virus signatures by analysts has not been able to match the malware growth speed, which all present serious challenges to the protection effort of malware. Researchers have therefore proposed a variety of data mining and machine learning based malware detection methods that represent executable files as features at different levels of abstraction, which are used to train classifiers to achieve intelligent detection of unknown malware. Based on the difference in the execution manner of the detection process, it is generally classified into a static method and a dynamic method. Static methods directly analyze byte code sequences, system call functions, control flow graphs, operation code sequences, etc. of samples without running executable file samples. Static methods can provide safer detection environments and faster detection speeds, but are susceptible to shelling and confusion techniques, and generally require shelling, decryption, normalization prior to analysis, resulting in lower detection accuracy and efficiency. The dynamic method runs a malicious software sample in a controlled environment (a virtual machine, a simulator, a sandbox and the like), analyzes interaction between the malicious software sample and a system, records a system call sequence, a system call parameter, a running instruction sequence, an information flow and the like of the malicious software sample, and further identifies malicious behaviors of the malicious software sample. The dynamic method can accurately identify the nature of the malicious behavior, and is still effective for the samples with the shell, deformation, polymorphism and confusion. However, the dynamic detection method generally needs to consume more time and system resources, is greatly affected by the running environment, cannot completely traverse all executable paths of the software, and has low reliability of detection results.

Disclosure of Invention

The embodiment of the invention aims to provide a malware detection method based on a multi-scale convolutional neural network, which aims to solve the problems that the existing static malware detection method based on data mining and machine learning is easily influenced by a shelling and confusion technology and needs shelling, decryption and normalization processing before analysis to cause low detection accuracy and efficiency, and the existing dynamic malware detection method based on data mining and machine learning consumes more time and system resources and cannot completely traverse all executable paths of malware to cause low reliability of detection results.

The technical scheme adopted by the embodiment of the invention is that the malicious software detection method based on the multi-scale convolutional neural network is carried out according to the following steps:

S1, converting a binary executable file of a training sample into a hexadecimal character sequence with a fixed length;

S2, converting a hexadecimal character sequence with a fixed length into a low-dimensional vector through word embedding;

s3, inputting the low-dimensional vector generated by conversion into a multi-scale convolutional neural network, and training a detection model based on the multi-scale convolutional neural network;

Step S4, converting the binary executable file of the software to be detected into a low-dimensional vector according to the steps S1-S2, inputting the low-dimensional vector into the detection model based on the multi-scale convolutional neural network obtained by training in the step S3, classifying the software to be detected, and outputting a detection result, wherein the detection result is malicious software or benign software;

The multi-scale convolutional neural network adopts a plurality of parallel feature extraction channels, and each feature extraction channel consists of a one-dimensional convolutional layer, a pooling layer and a first Dropout layer which are sequentially connected.

Further, the specific implementation process of converting the binary executable file of the training sample into the hexadecimal character sequence with the fixed length in the step S1 is as follows:

Firstly, setting a byte threshold value, processing a binary executable file of a training sample into a binary executable file with a fixed length according to the set byte threshold value, discarding bytes behind the binary executable file of the training sample with the byte length being larger than the byte threshold value, and filling spaces behind the binary executable file of the training sample with the byte length being smaller than the byte threshold value so that the byte length of the binary executable file of each training sample reaches the byte threshold value, so that the byte length of the binary executable file of each training sample is equal to the byte threshold value;

Then, the characters of each byte of the binary executable file of the training sample with fixed length are encoded, and the characters of each byte are converted into integer indexes from 1 to 257, so that the hexadecimal character sequence with fixed length is obtained.

Further, in the step S2, the word embedding is used to convert the hexadecimal character sequence with the fixed length into a low-dimensional vector, and the word2vec model is used to convert the hexadecimal character sequence with the fixed length into the low-dimensional vector;

the byte threshold is set to 3000.

Furthermore, the output ends of the plurality of parallel feature extraction channels are provided with splicing layers, and effective features extracted by the plurality of parallel feature extraction channels are spliced;

The step S4 adopts the detection model based on the multi-scale convolutional neural network obtained in the step S3, and the specific implementation process of classifying the software to be detected and outputting the detection result is as follows:

Firstly, inputting a low-dimensional vector generated by conversion into a detection model based on a multi-scale convolutional neural network, simultaneously sliding a plurality of parallel one-dimensional convolutional layers on the low-dimensional vector to carry out convolutional operation, and finally, sequentially carrying out feature splicing on features extracted by the plurality of parallel convolutional layers after passing through a pooling layer, a first Dropout layer and a splicing layer, and extracting effective features of a binary executable file of software to be detected;

And then, using a full connection layer to carry out nonlinear combination on the effective characteristics of the extracted binary executable file of the software to be detected, and obtaining a detection result.

Further, a second Dropout layer is arranged between the full-connection layer and the splicing layer.

Furthermore, the multi-scale convolutional neural network adopts 3 parallel feature extraction channels to perform effective feature extraction.

Further, 56 convolution kernels are used for the one-dimensional convolution layers of the 3 parallel feature extraction channels, and the window sizes of the convolution kernels are 9, 11 and 13 respectively, and the step sizes are 1.

Further, the step sizes of the pooling layers of the 3 parallel feature extraction channels are 9, 11 and 13 respectively.

Further, the full connection layer is provided with 16 neurons.

Furthermore, the pooling layers all adopt maximum pooling.

The embodiment of the invention has the beneficial effects that the malicious software detection method based on the multi-scale CNN is provided, and the convolution neural network can not process the original binary file and can only input the numerical characteristic to the original binary file, so the embodiment of the invention firstly converts the byte sequence of the binary file into the hexadecimal character sequence, then uses word embedding to convert each byte character in the hexadecimal character sequence into a vector with a low dimension and fixed length, and converts the binary file into the numerical original characteristic, so that the multi-scale convolution neural network can process the binary file. The method has the advantages that the method is capable of solving the problems that the static malicious software detection method based on data mining and machine learning is easily affected by the shelling and confusion technology, and the shelling, decryption and standardization processing are needed before analysis, so that the detection accuracy and efficiency are low, and the adaptability and accuracy of the detection method are effectively improved. The multi-scale convolutional neural network is used for directly learning effective feature representation from binary files, has strong mode expression capability, gets rid of dependence on feature engineering, can automatically and intelligently learn the feature representation of malicious software, is favorable for finding potential security threats, does not need professional knowledge in the field of malicious software detection, avoids complicated feature engineering work of a traditional machine learning method, and solves the problems that the existing dynamic detection method based on data mining and machine learning consumes more time and system resources and cannot completely traverse all executable paths of the malicious software to cause low reliability of detection results, thereby improving the detection rate of the malicious software and reducing the false alarm rate. The convolution kernels with different scales can extract features with different precision, simultaneously, a plurality of parallel convolution layers are utilized to carry out convolution operation with different window sizes, then the generated features are combined, more abundant and complete feature information in different scales in data is learned, the accuracy of malicious software detection is improved, the detection accuracy reaches 98.18%, the logarithmic loss value is 0.1503, and the AUC value is 0.997.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a convolutional neural network structure.

Fig. 2 is a schematic illustration of convolution operation of a one-dimensional convolutional neural network model.

Fig. 3 is a schematic diagram of a multi-scale convolutional neural network architecture.

FIG. 4 is a graph of the accuracy variation of a multi-scale convolutional neural network model.

FIG. 5 is a graph of log-loss variation for a multi-scale convolutional neural network model.

FIG. 6 is a schematic diagram of a confusion matrix for a multi-scale convolutional neural network model.

FIG. 7 is a schematic diagram of a normalized confusion matrix for a multi-scale convolutional neural network model.

Fig. 8 is a ROC graph of a multi-scale convolutional neural network model.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The convolutional neural network (Convolutional Neural Network, CNN) is a representative algorithm of deep learning, is a feed-forward neural network, the connection mode among neurons of the feed-forward neural network is inspired by animal visual cortex tissues, the convolutional operation is used for characterization learning, and the input information is translated according to a hierarchical structure to extract characteristics unchanged. The structure of a common convolutional neural network is shown in fig. 1, and mainly comprises a convolutional layer, a pooling layer, a full-connection layer, a Dropout layer and the like. The function of the convolution layer is to extract the characteristics of the input data, wherein the convolution layer internally comprises a plurality of convolution kernels, and each convolution kernel corresponds to a weight coefficient and an offset. Each neuron in the convolution layer is connected to a plurality of neurons in a region of the preceding layer that is located close to the region, the size of the region being dependent on the size of the convolution kernel. When the convolution kernel works, the convolution kernel regularly sweeps the input features, dot product operation is carried out on the input features in the receptive field, and the offset is added to calculate and generate an output matrix. Pooling is a form of nonlinear downsampling in which the pooling layer contains a predefined pooling function (e.g., minimum, maximum, and average) that functions to replace feature map statistics of adjacent regions with results of a single point in the feature map. The pooling layer will continuously reduce the spatial dimensions of the features and thus the number of parameters and calculations will also decrease, which to some extent also controls the overfitting. The fully connected layer is located at the final part of the convolutional neural network and functions to nonlinearly combine the extracted features to obtain an output, i.e., the fully connected layer itself is not expected to have feature extraction capability, but rather attempts to complete the learning objective with existing high-order features. The Dropout layer implements average sampling of predictions for different networks, and randomly discards neurons (and their connections) from the neural network during training to prevent overcompensation of neurons. The Dropout layer results as if it were training different neural networks and then averaging the effects of a large number of networks. Since these networks may be overfitted in different ways, the network effects through the Dropout layer may reduce the overfitting.

The one-dimensional convolutional neural network refers to a one-dimensional neural network, and the one-dimensional convolutional neural network scans an input sequence from beginning to end by using the one-dimensional convolutional kernel to perform convolutional operation so as to find effective features of the input sequence. As shown in fig. 2, the input sequence is represented by a one-dimensional 7 x1 input vector (1, 2, -1, -2, 1) convolved with a one-dimensional convolution kernel 3 x1 weight vector (1, 0, -1). The convolution kernel has a window size of 3 and a step size of 1, moves over the input 7 x1 sequence to perform a convolution operation, and generates a 5 x1 output vector (-1,2,1,1,0) as it passes through the input sequence.

The multi-scale CNN-based malicious software detection method is based on single-scale CNN, and the framework of the detection method is shown in fig. 3, and mainly comprises the following steps: data preprocessing, word embedding and multi-scale one-dimensional convolutional neural network, wherein the data preprocessing converts an input binary executable file of malicious software into a numeric original characteristic so that the convolutional neural network can process; word embedding converts the original numeric features into feature representations with stronger semantic information, and can effectively reduce the dimension of the features; the multi-scale one-dimensional convolutional neural network extracts high-level abstract features with different scales, realizes the supplement and enhancement of the features with different scales, trains a detection model with strong mode expression capability, and finally realizes the detection of unknown malicious software; the method specifically comprises the following steps of:

s1, converting an input binary executable file into a hexadecimal character sequence through data preprocessing, wherein the method specifically comprises the following steps of:

The binary executable file is of different sizes and the CNN requires a fixed size input. Therefore, first, a byte threshold is set, and the input binary executable file is processed into a binary executable file with a fixed length according to the set byte threshold, for binary executable files with byte lengths greater than the byte threshold, the following bytes are discarded, for files with byte lengths less than the byte threshold, spaces are filled behind the files to make the byte lengths of each binary executable file equal to the byte threshold, the first 3000 bytes of the binary executable file are selected in this embodiment, for files with byte lengths greater than 3000 bytes, the following bytes are discarded, for files with byte lengths less than 3000 bytes, spaces are filled behind the files, so that each binary executable file is 3000 bytes long.

Then, each byte of the fixed length binary executable is encoded with 257 possibilities (including filled space characters), and each byte of the character is converted into an integer index of 1 to 257, so that each binary executable sample is encoded into a hexadecimal character sequence with a fixed length (length is 3000); the length of the hexadecimal character sequence is set to 3000 as a result of repeated test selection, so that the detection accuracy can be effectively ensured, the length of the hexadecimal character sequence is set to be too long, the detection accuracy can be improved to a certain degree, but the processing efficiency is low, the length of the hexadecimal character sequence is set to be too short, the processing efficiency is higher, and the detection accuracy can be reduced.

S2, the integer index sequence, namely the hexadecimal character sequence, has no special meaning, the convolutional neural network does not benefit discrete data, so characters of each byte of the hexadecimal character sequence are mapped to a low-dimensional vector through word embedding (word embedding) to form a vector space which is easy to process by the convolutional neural network, the characters of each byte have certain semantics, and the relation of original samples in the semantic space is reserved in the vector space. Word embedding is a feature learning technique in natural language processing, and a word is converted into a vector representation with a fixed length by using word embedding, so that mathematical processing is facilitated.

S3, inputting the low-dimensional vector generated by conversion into a multi-scale convolutional neural network, training a detection model based on the multi-scale convolutional neural network by using effective features of a binary executable file of a multi-scale CNN learning training sample, wherein the multi-scale CNN adopts a plurality of parallel feature extraction channels, each feature extraction channel consists of a one-dimensional convolutional layer, a pooling layer and a first Dropout layer which are sequentially connected, and a splicing layer is arranged between the plurality of parallel feature extraction channels and a full-connection layer.

The multi-scale CNN detection model architecture of the present embodiment is different from the single-scale CNN detection model, and is not layer by layer, but uses the output of the next layer or more as input, and instead uses a plurality of parallel feature extraction channels, here, 3 parallel feature extraction channels. The 3-scale one-dimensional convolution layers are performed simultaneously, 56 convolution kernels are used for each of the 3 one-dimensional convolution layers, and only the window sizes of the convolution kernels of the one-dimensional convolution layers are 9, 11 and 13 respectively, so that the convolution kernels of the 3 one-dimensional convolution layers slide on the low-dimensional vector generated by word embedding according to the window sizes of 9, 11 and 13 respectively and the step size of 1 to perform convolution operation. In order to ensure the detection accuracy, the pooling layer and the first Dropout layer are respectively connected behind three parallel one-dimensional convolution layers to form a feature extraction channel. Pooling sampling is performed to reduce the dimension of the active features, as shown in fig. 3, it can be seen that the desired pooling layer uses maximum pooling. The first Dropout layer is added to prevent overfitting, in each iteration, by randomly temporarily disconnecting some neurons in the network; and finally, arranging a splicing layer to splice the characteristics of the three channels.

S4, converting a binary executable file of the software to be detected into a low-dimensional vector according to the method, inputting the low-dimensional vector into a detection model based on a multi-scale convolutional neural network obtained by training, classifying the software to be detected, and outputting a detection result, wherein the method specifically comprises the following steps of:

Firstly inputting a low-dimensional vector generated by conversion into a detection model based on a multi-scale convolutional neural network, simultaneously sliding a plurality of parallel one-dimensional convolutional layers on the low-dimensional vector to carry out convolutional operation, and finally carrying out feature splicing after sequentially passing through a pooling layer, a first Dropout layer and a splicing layer on features extracted by the plurality of parallel convolutional layers, so as to extract effective features of a binary executable file of software to be detected;

And then, carrying out nonlinear combination on the extracted effective characteristics of the binary executable file of the software to be detected by using a full-connection layer to obtain a detection result, wherein the detection result is malicious software or benign software, and a second Dropout layer is arranged between the full-connection layer and the splicing layer so as to prevent overfitting.

Experimental results and analysis

(1) And (3) selecting a software sample:

Experimental evaluation used different periods of malware and benign software samples, including 7871 benign software samples and 8269 malware samples, of which 4103 malware samples were found 2011 ago and 4166 malware samples were newly found in recent years; 3918 benign software samples were collected from the completely new installed Windows XP SP3 system, and 3953 benign software samples were collected from the completely new installed 32-bit Windows 7 specialty system. All malware samples were collected from VXHeavens websites, and all sample formats were in Windows PE format. The dataset composition is shown in table 1.

Table 1 software sample statistics

Category(s)	Malware sample	Benign software samples
			Early sample	4103	3918
Recent samples	4166	3953
			Totalizing	8269	7871

(2) The evaluation index and the method are as follows:

Classification performance is evaluated mainly with two indicators: accuracy and log loss. Accuracy measures the proportion of correctly predicted samples to total samples in all predictions, which is often insufficient to evaluate the robustness of the predictions, and therefore also requires the use of logarithmic losses. The logarithmic Loss (Logarithmic Loss), also known as Cross-entropy Loss, is defined on a probabilistic estimate for measuring the magnitude of the gap between the predicted and real categories. Minimizing the log loss is substantially equivalent to maximizing the accuracy of the classifier, with a log loss value of 0 for a perfect classifier. The logarithmic loss function is calculated as follows:

Wherein Y is an output variable, that is, a detection result of the output software to be detected, X is an input variable, that is, a binary executable file of the software to be detected, L is a loss function, N is a number of test samples (binary executable files of the software to be detected), Y _ij is a binary index, which represents a class j corresponding to an input i-th test sample, the class j refers to benign software or malicious software, p _ij is a probability that the i-th test sample input by an input instance belongs to the class j, M is a total class number, and m=2 in this embodiment.

The performance of the classifier can also be evaluated using a ROC curve (Receiver Operating Characteristic), the vertical axis of which is the detection rate (True Positive Rate), the horizontal axis is the false positive rate (False Positive Rate), which reflects the relationship between the detection rate and the false positive rate as the detection threshold changes. The value of the area under the ROC curve (Area Under ROC Curve, AUC) is an index for evaluating the comparative synthesis of the classifier, the AUC value is typically between 0.5 and 1.0, and a larger AUC value generally indicates better performance of the classifier.

(3) And (3) super parameter debugging:

In the machine learning model, the parameters that need to be manually selected are called superparameters. The performance of CNN is greatly affected by the super parameter, and the problem of under fitting or over fitting can occur due to improper super parameter selection. GRIDSEARCHCV is a common method for searching optimal parameters of a model in sklearn library, GRIDSEARCH and CV are that is, grid searching and cross verification, GRIDSEARCHCV uses a cross verification method to sequentially adjust parameters within a specified parameter range, and uses the adjusted parameters to train a learner to find the parameter with the highest precision on a verification set from all the parameters. In this embodiment, GRIDSEARCHCV is used to search and debug the hyper-parameters of the convolutional neural network, and the debugging results are shown in table 2.

Table 2 multi-scale CNN hyper-parameter debugging results

Super parameter	Parameter options or ranges	Better value
			Embedding output dim	{16,24,32,40,48,56,64}	40
Kernel_size of convolutional layer 1	{7,9,11,13,15,17}	9
			Kernel_size of convolutional layer 2	{7,9,11,13,15,17}	11
Kernel_size of convolutional layer 3	{7,9,11,13,15,17}	13
			Filters number for 3 convolutional layers	{8,16,24,36,48,56}	56
Number of neurons in full connected layer	{16,32,64,96,128,160,192,224,256,288}	16
			Dropout	{0.1,0.2,0.3,0.4,0.5}	0.1
optimizer	{SGD，RMSprop，Adagrad，Adam}	RMSprop
			batch_size	{10,20,40,60,80,100}	40
epochs	{10,15,20,25,30}	20

(4) Experimental results and analysis:

The multi-scale CNN model training is basically based on gradient descent, and the process of searching the direction with the fastest descending speed of the function value and iterating along the descending direction to quickly reach the local optimal solution is the gradient descent process. One epoch is trained once using all samples in the training set, and the total number of times the entire training set is used is the value of the epoch. The change in epoch value affects the number of updates of the weight value of the convolutional neural network.

The experiment used 80% sample training, 20% sample validation, and trained 40 iterations to find the better epoch value. As the number of iteration increases, the accuracy change curve of the multi-scale CNN model is shown in fig. 4, and the logarithmic loss change curve of the model is shown in fig. 5. As can be seen from fig. 4 and 5, when the epoch value increases from 0 to 5, the training accuracy and the verification accuracy of the multi-scale CNN model increase rapidly, and the training log loss and the verification log loss of the multi-scale CNN model decrease rapidly; when the epoch value is from 5 to 40, the training accuracy and the verification accuracy of the multi-scale CNN model are basically unchanged, the training log loss of the multi-scale CNN model is basically unchanged, and the verification log loss still changes and has a growing trend; the accuracy and log loss curves of figures 4 and 5 were analyzed together and the optimal value for epoch was chosen to be 20.

After confirming the training iteration number of the model 20, a ten-fold cross-validation experiment was performed. In this experiment, the accuracy of the 10-fold cross validation of the multi-scale CNN method proposed in this embodiment is 98.18%, the log loss is 0.1503, the confusion matrix is shown in fig. 6, and the normalized confusion matrix is shown in fig. 7. As can be seen from fig. 6 and fig. 7, the malware detection method provided by the embodiment of the invention obtains a more ideal result, and has higher classification accuracy.

The ROC curve of the multi-scale CNN-based malware detection model is shown in fig. 8, and reflects the relationship between the detection rate and the false alarm rate with the change of the detection threshold. The abscissa (0, 1) represents a perfect classifier that correctly classifies all samples. The closer the ROC curve is to the upper left corner, the better the performance of the classifier. As can be seen from fig. 8, the ROC curve of the model is very close to the upper left corner, and the performance is better. The AUC value of the multi-scale CNN-based malware detection model is 0.997, which has been very close to the optimal value 1 of AUC values.

(5) And (3) comparing experimental results:

In order to comprehensively evaluate the performance of the method proposed by the embodiment of the invention, the method of the embodiment of the invention is compared with a classical detection method, and the results are shown in table 3. As can be seen from table 3, most of indexes of the multi-scale CNN-based malware detection method provided by the embodiment of the present invention are better than those of the classical detection method, and slightly weaker than the byte sequence 3-grams. The byte sequence 3-gram method needs to traverse the whole executable file to extract the features, is greatly influenced by the window value, takes a great amount of time to select the feature value with high accuracy, and needs to perform feature selection or reduction to reduce the size of the feature vector. Because table 3 is a comparison of the 10-fold cross-validation results, it is difficult for the byte sequence 3-grams to extract features from the training data and perform feature selection only during feature engineering, both extracting features from the entire data set and performing feature selection, and the experimental results are slightly better than the real experimental results. The malicious software detection method based on the multi-scale CNN provided by the embodiment of the invention belongs to an end-to-end detection method, and compared with three detection methods based on a PE format structure, a DLL, an API and a byte sequence 3-gram, the malicious software detection method provided by the embodiment of the invention avoids a complicated characteristic engineering process. Compared with the detection method based on single-scale CNN, the detection method based on multi-scale CNN provided by the embodiment of the invention has the advantage that various performance indexes are improved to a certain extent.

Table 3 comparison of experimental results for different monitoring methods

Detection method	Accuracy (%)	Log loss	AUC
				PE (polyethylene) format structure	96.84	0.1049	0.994
DLL and API	96.08	0.1638	0.991
				Byte sequence 3-grams	98.8	0.0701	0.997
Single-scale CNN	97.19	0.1165	0.996
				Multiscale CNN	98.18	0.1503	0.997

The convolution kernel with single scale can only use the same scale to extract the features, ignoring the features with other precision, and leading to incomplete information of the extracted feature expression. The embodiment provides a multi-scale CNN-based malicious software detection method, effective feature representation is directly learned from binary executable files through a multi-scale convolutional neural network, features with different precision are extracted by convolution kernels with different scales, the same data are subjected to convolution operation with different window sizes at the same time, and then the generated features are combined and the richer and complete feature information in different scales in the data is learned, so that the accuracy of malicious software detection is improved. The proposed method is reasonable in a conceptual sense and is also ideal in terms of results. The accuracy of the proposed multi-scale CNN malicious software detection method is 98.18%, the logarithmic loss is 0.1503, the AUC value is 0.997, and each performance index is superior to most classical detection methods, so that the method is a malicious software detection method with good performance such as robustness.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. The method for detecting the malicious software based on the multi-scale convolutional neural network is characterized by comprising the following steps of:

Then, coding each byte character of the binary executable file of the training sample with fixed length, and converting each byte character of the binary executable file into an integer index of 1 to 257 to obtain a hexadecimal character sequence with fixed length;

word2vec model is adopted for word embedding;

the byte threshold is set to 3000;

Step S4, converting the binary executable file of the software to be detected into a low-dimensional vector according to the steps S1-S2, inputting the low-dimensional vector into the detection model based on the multi-scale convolutional neural network obtained by training in the step S3, classifying the software to be detected, and outputting a detection result, wherein the method specifically comprises the following steps:

Firstly, inputting a low-dimensional vector generated by conversion into a detection model based on a multi-scale convolutional neural network, simultaneously sliding 3 parallel one-dimensional convolutional layers on the low-dimensional vector to carry out convolutional operation, and finally, sequentially carrying out feature splicing on features extracted by the 3 parallel convolutional layers after passing through a pooling layer, a first Dropout layer and a splicing layer, and extracting effective features of a binary executable file of software to be detected; 56 convolution kernels are used for the one-dimensional convolution layers, the convolution kernel windows of the one-dimensional convolution layers are respectively 9, 11 and 13, and the step sizes are 1; the step sizes of the pooling layers are 9, 11 and 13 respectively; the pooling layers adopt maximum pooling;

then, nonlinear combination is carried out on the effective characteristics of the extracted binary executable file of the software to be detected by using a full connection layer, and a detection result is obtained; a second Dropout layer is arranged between the full-connection layer and the splicing layer; the full connection layer is provided with 16 neurons.