CN111382438A

CN111382438A - Malicious software detection method based on multi-scale convolutional neural network

Info

Publication number: CN111382438A
Application number: CN202010231067.1A
Authority: CN
Inventors: 白金荣; 熊倩; 秦汝霞
Original assignee: Yuxi Normal University
Current assignee: Yuxi Normal University
Priority date: 2020-03-27
Filing date: 2020-03-27
Publication date: 2020-07-07
Anticipated expiration: 2040-03-27
Also published as: CN111382438B

Abstract

The invention discloses a malicious software detection method based on a multi-scale convolutional neural network, which comprises the steps of converting a binary executable file of a training sample into a hexadecimal character sequence with fixed length, converting the hexadecimal character sequence with the fixed length into a low-dimensional vector through word embedding, inputting the low-dimensional vector generated by conversion into the multi-scale convolutional neural network, training a detection model based on the multi-scale convolutional neural network, converting the binary executable file of software to be detected into the low-dimensional vector according to the steps, inputting the low-dimensional vector into the detection model based on the multi-scale convolutional neural network obtained by training, classifying the software to be detected and outputting a detection result; the multi-scale convolutional neural network adopts a plurality of parallel feature extraction channels, and each feature extraction channel consists of a one-dimensional convolutional layer, a pooling layer and a first Dropout layer which are connected in sequence. The problem of low accuracy of the existing malicious software detection method is solved.

Description

Malicious software detection method based on multi-scale convolutional neural network

Technical Field

The invention belongs to the technical field of malicious software protection, and relates to a malicious software detection method based on a multi-scale convolutional neural network.

Background

The internet is changing the production and living style of human beings, and the world is being attacked by the network while being benefited by the network development, and the network space security problem becomes a serious challenge. According to the 2017 internet security threat report issued by Symantec, a total of 4.01 million new malware were captured by Symantec in 2016, and an average of 109 million new malware were released to the internet each day. Such a huge amount of malware has become the greatest security threat to the internet, and the information security of countries in the world is seriously affected.

Malware refers to any software that can damage the interests of users, and is a generic term for various hostile or intrusive software, including viruses, worms, trojans, rootkits, backdoors, botnets, spyware, and the like. Malware may affect not only infected computers or devices, but other devices that communicate with the infected devices. Therefore, accurate identification and detection of malware is critical to network information security.

The signature-based method is widely applied to the current malware detection system, and the signature is obtained by extracting a special byte sequence of a binary program, so that the known malware can be effectively detected. However, the conventional detection method cannot identify and detect unknown malware types and new malware generated by simply adding shells or confusing known malware, and meanwhile, the malware using the polymorphic deformation technology continuously and randomly changes the content of a binary file in the spreading process without fixed characteristics, and the malware cannot be detected by using a signature-based method. In addition, the speed of manually extracting virus feature codes by analysts cannot match the growth speed of the malware, which brings serious challenges to the protection work of the malware. Therefore, researchers have proposed many malware detection methods based on data mining and machine learning, which represent executable files as features at different levels of abstraction, and use these features to train classifiers to achieve intelligent detection of unknown malware. Based on the difference of the execution modes of the detection processes, the detection processes are generally classified into static methods and dynamic methods. The static method does not need to run executable file samples, and directly analyzes byte code sequences, system call functions, control flow diagrams, operation code sequences and the like of the samples. The static method can provide a safer detection environment and a faster detection speed, but is easily affected by a shelling and obfuscation technology, and generally needs to be subjected to shelling, decryption and normalization processing before analysis, so that the detection accuracy and efficiency are low. The dynamic method runs a malicious software sample in a controlled environment (a virtual machine, a simulator, a sandbox and the like), analyzes the interaction between the malicious software sample and a system, and records a system calling sequence, a system calling parameter, a running instruction sequence, an information flow and the like of the malicious software sample, so that the malicious behavior of the malicious software sample is identified. The dynamic method can accurately identify the nature of the malicious behavior, and is still effective for samples with shells, deformation, polymorphism and confusion. However, the dynamic detection method usually needs to consume more time and system resources, is greatly influenced by the operating environment, cannot completely traverse all executable paths of the software, and has low reliability of the detection result.

Disclosure of Invention

The embodiment of the invention aims to provide a multi-scale convolutional neural network-based malicious software detection method, and aims to solve the problems that the existing static malicious software detection method based on data mining and machine learning is easily influenced by a shelling technology and a confusion technology, and the detection accuracy and efficiency are low due to the fact that shelling, decryption and normalization processing are required before analysis, and the existing dynamic malicious software detection method based on data mining and machine learning consumes more time and system resources and cannot completely traverse all executable paths of malicious software, so that the detection result is low in reliability.

The technical scheme adopted by the embodiment of the invention is that the malicious software detection method based on the multi-scale convolutional neural network is carried out according to the following steps:

step S1, converting the binary executable file of the training sample into a hexadecimal character sequence with fixed length;

step S2, converting the fixed length hexadecimal character sequence into a low-dimensional vector through word embedding;

step S3, inputting the low-dimensional vector generated by conversion into a multi-scale convolution neural network, and training a detection model based on the multi-scale convolution neural network;

step S4, converting the binary executable file of the software to be detected into a low-dimensional vector according to the steps S1-S2, inputting the low-dimensional vector into the multi-scale convolutional neural network-based detection model obtained by training in the step S3, classifying the software to be detected and outputting a detection result, wherein the detection result is malicious software or benign software;

the multi-scale convolutional neural network adopts a plurality of parallel feature extraction channels, and each feature extraction channel consists of a one-dimensional convolutional layer, a pooling layer and a first Dropout layer which are connected in sequence.

Further, the specific implementation process of converting the binary executable file of the training sample into the hexadecimal character sequence with fixed length in step S1 is as follows:

firstly, setting a byte threshold, processing a binary executable file of a training sample into a binary executable file with a fixed length according to the set byte threshold, discarding the following bytes of the binary executable file of the training sample with the byte length larger than the byte threshold, and filling blanks to make the byte length of the binary executable file of the training sample smaller than the byte threshold, so that the byte length of the binary executable file of each training sample is equal to the byte threshold;

then, encoding the character of each byte of the binary executable file of the training sample with fixed length, and converting the character of each byte into an integer index from 1 to 257 to obtain a hexadecimal character sequence with fixed length.

Further, in the step S2, the hexadecimal character sequence with fixed length is converted into a low-dimensional vector by word embedding, and a word2vec model is adopted to convert the hexadecimal character sequence with fixed length into a low-dimensional vector;

the byte threshold is set to 3000.

Furthermore, the output ends of the parallel feature extraction channels are provided with a splicing layer for splicing the effective features extracted by the parallel feature extraction channels;

the step S4 adopts the multi-scale convolutional neural network-based detection model obtained in the step S3, and the specific implementation process of classifying the software to be detected and outputting the detection result is as follows:

firstly, inputting a low-dimensional vector generated by conversion into a detection model based on a multi-scale convolutional neural network, simultaneously sliding a plurality of parallel one-dimensional convolutional layers on the low-dimensional vector for convolution operation, finally performing feature splicing on the features extracted by the plurality of parallel convolutional layers after sequentially passing through a pooling layer, a first Dropout layer and a splicing layer, and extracting to obtain effective features of a binary executable file of software to be detected;

and then, carrying out nonlinear combination on the extracted effective characteristics of the binary executable file of the software to be detected by using the full connection layer, and obtaining a detection result.

Further, a second Dropout layer is arranged between the full connection layer and the splicing layer.

Furthermore, the multi-scale convolution neural network adopts 3 parallel feature extraction channels to extract effective features.

Furthermore, 56 convolution kernels are used for the one-dimensional convolution layers of the 3 parallel feature extraction channels, the window sizes of the convolution kernels are 9, 11 and 13 respectively, and the step length is 1.

Further, the step sizes of the pooling layers of the 3 parallel feature extraction channels are 9, 11 and 13 respectively.

Further, the full-junction layer is provided with 16 neurons.

Furthermore, the pooling layers are all maximally pooled.

The embodiment of the invention has the beneficial effects that the convolutional neural network can not process the original binary file and only inputs the digitized characteristics to the original binary file, so that the embodiment of the invention firstly converts the byte sequence of the binary file into the hexadecimal character sequence, then converts the characters of each byte in the hexadecimal character sequence into vectors with low-dimensional fixed length by using word embedding, and converts the binary file into the digitized original characteristics so that the multi-scale convolutional neural network can process the characters. And the vectors with low-dimensional fixed length are input into the multi-scale convolutional neural network, effective feature representation is directly learned from the binary executable file through the multi-scale convolutional neural network without shelling, decryption and normalization processing, so that the problems that a static malicious software detection method based on data mining and machine learning is easily influenced by a shelling and confusion technology and detection accuracy and efficiency are low due to the fact that shelling, decryption and normalization processing are required before analysis are solved, and the adaptability and accuracy of the detection method are effectively improved. The multi-scale convolutional neural network is used for directly learning effective feature representation from a binary file, has strong pattern expression capability, gets rid of dependence on feature engineering, can automatically and intelligently learn the feature representation of malicious software, is beneficial to finding out potential security threats, does not need professional knowledge in the field of malicious software detection, avoids tedious feature engineering work of the traditional machine learning method, solves the problems that the existing dynamic detection method based on data mining and machine learning consumes more time and system resources and cannot completely traverse all executable paths of the malicious software, so that the reliability of detection results is low, improves the detection rate of the malicious software and reduces the false alarm rate. The convolution kernels with different scales can extract features with different accuracies, meanwhile, convolution operation with different window sizes is carried out by utilizing the plurality of parallel convolution layers, then the generated features are combined, richer and complete feature information in different scales in data is learned, the accuracy of malicious software detection is improved, the detection accuracy reaches 98.18%, the logarithmic loss value is 0.1503, and the AUC value is 0.997.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic diagram of a convolutional neural network structure.

Fig. 2 is a schematic diagram of the convolution operation of the one-dimensional convolution neural network model.

FIG. 3 is a schematic diagram of a multi-scale convolutional neural network architecture.

FIG. 4 is a graph of accuracy change for a multi-scale convolutional neural network model.

FIG. 5 is a graph of log loss variation for a multi-scale convolutional neural network model.

FIG. 6 is a schematic diagram of a confusion matrix for a multi-scale convolutional neural network model.

FIG. 7 is a diagram of a normalized confusion matrix for a multi-scale convolutional neural network model.

FIG. 8 is a ROC plot for a multi-scale convolutional neural network model.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The Convolutional Neural Network (CNN) is a representative algorithm for deep learning, and is a feed-forward Neural Network, the connection mode between neurons of the Convolutional Neural Network is inspired by animal visual cortex tissues, the Convolutional operation is used for characterizing learning, and the input information is translated according to a hierarchical structure to extract features invariably. The structure of a conventional convolutional neural network is shown in fig. 1, and mainly comprises a convolutional layer, a pooling layer, a full-link layer, a Dropout layer, and the like. The function of the convolution layer is to perform feature extraction on input data, the convolution layer internally comprises a plurality of convolution kernels, and each convolution kernel corresponds to a weight coefficient and an offset. Each neuron in the convolution layer is connected to a plurality of neurons in a closely located region in the previous layer, the size of the region being dependent on the size of the convolution kernel. When the convolution kernel works, the convolution kernel regularly sweeps the input characteristics, the dot product operation is carried out on the input characteristics in the receptive field, and the offset is added to calculate and generate an output matrix. Pooling is a form of non-linear down-sampling, and the pooling layer contains pre-set pooling functions (e.g., minimum, maximum, and mean) that function to replace feature map statistics of adjacent regions with the result of a single point in the feature map. The pooling layer will constantly reduce the spatial dimension of the feature and thus the number and computational effort of the parameters will also decrease, which to some extent also controls the overfitting. The fully-connected layer is located at the last part of the convolutional neural network, and the function of the fully-connected layer is to perform nonlinear combination on the extracted features to obtain output, namely the fully-connected layer is not expected to have feature extraction capability per se, but is used for trying to complete a learning target by utilizing the existing high-order features. The Dropout layer realizes the average sampling of the predictions of different networks, and randomly discards neurons (and their connections) from the neural network in the training process so as to prevent the neurons from being over-cooperatively adapted. The Dropout layer results as if different neural networks were trained and then the effect of a large number of networks is averaged. Since these nets may over-fit in different ways, the net effect through the Dropout layer may reduce the over-fit.

As shown in FIG. 2, the input sequence is represented by a one-dimensional 7 × 1 input vector (1,1,2, -1,1, -2,1) represented by a one-dimensional convolution kernel 3 × 1 weight vector (1,0, -1) convolution samples, the window size of the convolution kernel is 3, the step size is 1, the convolution kernel is moved over the input 7 × 1 sequence to perform convolution operation, and when it passes through the input sequence, it produces a 5 × 1 output vector (-1,2,1,1, 0).

The malware detection method based on the multi-scale CNN is based on the single-scale CNN, and the architecture of the detection method is shown in fig. 3, and mainly includes: the method comprises the following steps of data preprocessing, word embedding and a multi-scale one-dimensional convolutional neural network, wherein the data preprocessing converts an input binary executable file of malicious software into numerical original features so that the convolutional neural network can process the binary executable file; the word embedding converts the digitized original features into feature representation with strong semantic information, and can effectively reduce the dimension of the features; extracting high-level abstract features of different scales by using a multi-scale one-dimensional convolutional neural network, realizing the supplement and enhancement of the features of different scales, training a detection model with strong pattern expression capability, and finally realizing the detection of unknown malicious software; the method specifically comprises the following steps:

step S1, converting an input binary executable file into a hexadecimal character sequence through data preprocessing, and specifically performing the following steps:

the binary executables are of varying sizes, and the CNN requires a fixed size input. Therefore, firstly, a byte threshold is set, an input binary executable file is processed into a binary executable file with a fixed length according to the set byte threshold, for the binary executable file with the byte length larger than the byte threshold, the following bytes are discarded, for the file with the byte length smaller than the byte threshold, a space is filled after the file to enable the byte length of the file to reach the byte threshold, so that the byte length of each binary executable file is equal to the byte threshold, the first 3000 bytes of the binary executable file are selected in the embodiment, for the file with the byte length larger than 3000 bytes, the following bytes are discarded, and for the file with the byte length smaller than 3000 bytes, a space is filled after the file to enable each binary executable file to be 3000 bytes long.

Then, encoding characters of each byte of the fixed-length binary executable file, wherein the characters of each byte are 257 possibilities (including filled space characters), converting the characters of each byte into integer indexes from 1 to 257, and thus encoding each binary executable file sample into a fixed-length (length is 3000) hexadecimal character sequence; the length of the hexadecimal character sequence is set to 3000, which is the result of repeated test selection, and can effectively ensure the detection accuracy, the length of the hexadecimal character sequence is set too long, which may improve certain detection accuracy, but the processing efficiency is low, and the length of the hexadecimal character sequence is set too short, which has higher processing efficiency, but can reduce the detection accuracy.

Step S2, the integer index sequence, namely the hexadecimal character sequence has no special meaning, and discrete data are not well processed by the convolutional neural network, so that characters of each byte of the hexadecimal character sequence are mapped to a low-dimensional vector through word embedding (word embedding), a vector space which is easy to process by the convolutional neural network is formed, the characters of each byte have certain semantics, and the relation of an original sample in the semantic space is reserved in the vector space. Word embedding is a feature learning technology in natural language processing, and a word is converted into vector representation with fixed length by using word embedding, so that mathematical processing is facilitated, and the word embedding is performed by using a word2vec model in the embodiment.

And S3, inputting the low-dimensional vector generated by conversion into a multi-scale convolutional neural network, learning effective characteristics of a binary executable file of a training sample by using a multi-scale CNN, training a detection model based on the multi-scale convolutional neural network, wherein the multi-scale CNN adopts a plurality of parallel characteristic extraction channels, each characteristic extraction channel consists of a one-dimensional convolutional layer, a pooling layer and a first Dropout layer which are sequentially connected, and a splicing layer is arranged between the plurality of parallel characteristic extraction channels and a full connection layer.

The multi-scale CNN detection model architecture of this embodiment is different from the single-scale CNN detection model architecture, and instead of using the outputs of one layer after another and the outputs of more than one layer after another as inputs, a plurality of parallel feature extraction channels, here 3 parallel feature extraction channels, are used. The 3 scales of one-dimensional convolutional layers are performed simultaneously, and each of the 3 one-dimensional convolutional layers uses 56 convolution kernels, except that the window sizes of the convolution kernels of the respective one-dimensional convolutional layers are 9, 11, and 13, respectively, so that the convolution kernels of the 3 one-dimensional convolutional layers slide on the low-dimensional vector generated by word embedding for convolution operation with the window sizes of 9, 11, and 13, and the step size of 1, respectively. In order to ensure the detection accuracy, the pooling layer and the first Dropout layer are respectively connected after the three parallel one-dimensional convolution layers to form a feature extraction channel. Pooling sampling is performed to reduce the dimensionality of the active features, as shown in FIG. 3, where it can be seen that the layers to be pooled are maximally pooled. A first Dropout layer is added to prevent overfitting, and in each iteration, part of the neurons in the network are temporarily disconnected randomly; and finally, arranging a splicing layer to splice the characteristics of the three channels.

S4, according to the method, a binary executable file of the software to be detected is converted into a low-dimensional vector, then the low-dimensional vector is input into a detection model which is obtained through training and is based on the multi-scale convolutional neural network, the software to be detected is classified, and a detection result is output, wherein the method specifically comprises the following steps:

firstly, inputting a low-dimensional vector generated by conversion into a detection model based on a multi-scale convolutional neural network, simultaneously sliding a plurality of parallel one-dimensional convolutional layers on the low-dimensional vector for convolution operation, finally performing feature splicing on the features extracted by the plurality of parallel convolutional layers after sequentially passing through a pooling layer, a first Dropout layer and a splicing layer, and extracting effective features of a binary executable file of software to be detected;

and then, carrying out nonlinear combination on the extracted effective characteristics of the binary executable file of the software to be detected by using the full connection layer to obtain a detection result, wherein the detection result is malicious software or benign software, and a second Dropout layer is arranged between the full connection layer and the splicing layer to prevent overfitting.

Results and analysis of the experiments

(1) Selecting a software sample:

experimental evaluation used different periods of malware and benign software samples, including 7871 benign software samples and 8269 malware samples, of which 4103 malware samples were discovered before 2011 and 4166 malware samples were newly discovered in recent years; 3918 benign software samples were collected from the newly installed Windows XP SP3 system, and 3953 benign software samples were collected from the newly installed 32-bit Windows 7 professional system. All malware samples were collected from the VXHeavens website, and all sample formats were in Windows PE format. The data set composition is shown in table 1.

Table 1 software sample statistics

Categories	Malware sample	Benign software samples
			Early stage samples	4103	3918
Recent sample	4166	3953
			Total up to	8269	7871

(2) Evaluation index and method:

classification performance is mainly evaluated by two indicators: accuracy and log loss. The accuracy measures the proportion of correctly predicted samples to the total samples in all predictions, and the accuracy alone is usually not sufficient to evaluate the robustness of the prediction, so that the logarithmic loss is also used. The Logarithmic Loss (Logarithmic Loss), also known as Cross-entropy Loss (Cross-entropy Loss), is defined on a probabilistic estimate that measures the size of the gap between the predicted class and the true class. Minimizing the log loss is essentially equivalent to maximizing the accuracy of the classifier, with a log loss value of 0 for a perfect classifier. The formula for the calculation of the log-loss function is as follows:

wherein Y is an output variable, namely an output detection result of the software to be detected, X is an input variable, namely a binary executable file of the software to be detected, L is a loss function, N is the number of test samples (binary executable file of the software to be detected), Y is_ijIs a binary index representing the category j corresponding to the ith test sample of the input, wherein the category j refers to benign software or malicious software, and p_ijThe probability that the ith test sample input for the input example belongs to the category j, M is the total number of categories, and M is 2 in the embodiment.

The performance of the classifier can also be evaluated by using a ROC curve (Receiver Operating Characteristic) whose vertical axis is the detection Rate (True Positive Rate) and horizontal axis is the False Positive Rate (False Positive Rate), which reflects the relationship between the detection Rate and the False Positive Rate with the change of the detection threshold. The value of area under the ROC Curve (AUC) is an index for evaluating the comparative synthesis of the classifier, the value of AUC is usually between 0.5 and 1.0, and a larger value of AUC generally indicates that the performance of the classifier is better.

(3) Debugging the hyper-parameters:

in machine learning models, the parameters that need to be manually selected are called hyper-parameters. The performance of the CNN is greatly affected by the hyper-parameters, and improper selection of the hyper-parameters may cause under-fitting or over-fitting problems. GridSearchCV is a common method used for searching optimal parameters of a model in a skearn library, GridSearch and CV are grid search and cross validation, the GridSearchCV uses a cross validation method, parameters are sequentially adjusted in a specified parameter range, a learner is trained by using the adjusted parameters, and the parameters with the highest precision on a validation set are found from all the parameters. In this embodiment, GridSearchCV is used to search and debug hyper-parameters of the convolutional neural network, and the debugging result is shown in table 2.

TABLE 2 Multi-Scale CNN SuperParametric debugging results

Hyper-parameter	Options or ranges of parameters	Preferred value
			Output _ dim of Embedding	{16,24,32,40,48,56,64}	40
Kernel _ size of convolutional layer 1	{7,9,11,13,15,17}	9
			Kernel _ size of convolutional layer 2	{7,9,11,13,15,17}	11
Kernel _ size of convolutional layer 3	{7,9,11,13,15,17}	13
			Filters number of 3 convolutional layers	{8,16,24,36,48,56}	56
Number of neurons in full connectivity layer	{16,32,64,96,128,160,192,224,256,288}	16
			Dropout	{0.1,0.2,0.3,0.4,0.5}	0.1
optimizer	{SGD，RMSprop，Adagrad，Adam}	RMSprop
			batch_size	{10,20,40,60,80,100}	40
epochs	{10,15,20,25,30}	20

(4) Experimental results and analysis:

the multi-scale CNN model training is basically based on gradient descent, the direction with the highest descending speed of a function value is searched, iteration is carried out along the descending direction, and the process of rapidly reaching the local optimal solution is the gradient descent process. One training time with all samples in the training set is an epoch, and the total number of times the entire training set is used is the value of the epoch. The change in the epoch value affects the number of updates of the weight value of the convolutional neural network.

The experiment uses 80% of sample training, 20% of sample validation, and 40 iterations of training to find a better epoch value. As the number of iterations increases, the accuracy of the multi-scale CNN model is plotted in FIG. 4, and the log-loss of the model is plotted in FIG. 5. As can be seen from fig. 4 and 5, when the epoch value increases from 0 to 5, the training accuracy and the verification accuracy of the multi-scale CNN model increase rapidly, and the training log loss and the verification log loss of the multi-scale CNN model decrease rapidly; when the epoch value is from 5 to 40, the training accuracy and the verification accuracy of the multi-scale CNN model are basically unchanged, the training logarithmic loss of the multi-scale CNN model is basically unchanged, and the verification logarithmic loss still changes and has a growing trend; by comprehensively analyzing the accuracy and logarithmic loss variation curves of FIGS. 4 and 5, a preferred value of 20 is selected for epoch.

After confirming the number of training iterations of the model 20, ten times of cross validation experiments were performed. In this experiment, the accuracy of the 10-fold cross validation of the multi-scale CNN method proposed in this embodiment is 98.18%, the log loss is 0.1503, the confusion matrix is shown in fig. 6, and the normalized confusion matrix is shown in fig. 7. As can be seen from fig. 6 and 7, the malware detection method provided by the embodiment of the present invention obtains a relatively ideal result, and has a relatively high classification accuracy.

An ROC curve of a multi-scale CNN-based malware detection model, which reflects the relationship between detection rate and false alarm rate as the detection threshold changes, is shown in fig. 8. The abscissa point (0,1) represents a perfect classifier that correctly classifies all samples. The closer the ROC curve is to the upper left corner, the better the performance of the classifier. As can be seen from FIG. 8, the ROC curve of the model is very close to the upper left corner, and the performance is better. The AUC value of the multi-scale CNN-based malware detection model is 0.997, which is already very close to the optimal value of 1 in the AUC value.

(5) And (3) comparing experimental results:

in order to comprehensively evaluate the performance of the proposed method of the present invention, the method of the present invention is compared with the classical detection method, and the results are shown in table 3. As can be seen from table 3, most of the indexes of the multi-scale CNN-based malware detection method provided in the embodiment of the present invention are better than those of the classical detection method, and are slightly weaker than those of the byte sequence 3-grams. The byte sequence 3-grams method needs to traverse the whole executable file to extract features, is greatly influenced by window values, takes a large amount of time to select feature values with high accuracy, and needs to perform feature selection or reduction to reduce the size of feature vectors. Because table 3 is a comparison of 10-fold cross validation results, it is difficult for byte sequence 3-grams to extract features and perform feature selection only from training data in the feature engineering process, both features are extracted and feature selection is performed on the whole data set, and the given experimental result is slightly better than the real experimental result. The malicious software detection method based on the multi-scale CNN provided by the embodiment of the invention belongs to an end-to-end detection method, and avoids a complicated characteristic engineering process compared with three detection methods based on a PE format structure, a DLL, an API and a byte sequence 3-grams. Compared with the detection method based on the single-scale CNN, the malicious software detection method based on the multi-scale CNN of the embodiment of the invention has the advantages that various performance indexes are improved to a certain extent.

TABLE 3 comparison of the results of the different monitoring methods

Detection method	Accuracy (%)	Logarithmic loss	AUC
				PE format structure	96.84	0.1049	0.994
DLL and API	96.08	0.1638	0.991
				Byte sequence 3-grams	98.8	0.0701	0.997
Single-scale CNN	97.19	0.1165	0.996
				Multi-scale CNN	98.18	0.1503	0.997

The single-scale convolution kernel can only use the same scale to extract the features, neglecting the features of other precisions, and leading the information expressed by the extracted features to be incomplete. The embodiment provides a multi-scale CNN-based malware detection method, which learns effective feature representation from a binary executable file directly through a multi-scale convolutional neural network, extracts features with different precisions from convolutional kernels with different scales, performs convolutional operation on the same data with different window sizes at the same time, then combines the generated features, learns richer and complete feature information in different scales in the data, and improves the accuracy of malware detection. The proposed method is reasonable in a conceptual sense and is also ideal in terms of results. The accuracy of the provided multi-scale CNN malicious software detection method is 98.18%, the logarithmic loss is 0.1503, the AUC value is 0.997, and each performance index is superior to most of the classical detection methods, so that the method is a malicious software detection method with good robustness and other performances.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. The malicious software detection method based on the multi-scale convolutional neural network is characterized by comprising the following steps of:

step S4, converting the binary executable file of the software to be detected into a low-dimensional vector according to the steps S1-S2, inputting the low-dimensional vector into the multi-scale convolutional neural network-based detection model obtained by training in the step S3, classifying the software to be detected and outputting a detection result;

2. The method for detecting malware based on multi-scale convolutional neural network of claim 1, wherein the step S1 is implemented by converting the binary executable file of the training sample into a hexadecimal character sequence with fixed length as follows:

3. The multi-scale convolutional neural network-based malware detection method as claimed in claim 2, wherein in step S2, a word2vec model is adopted to convert a fixed length hexadecimal character sequence into a low-dimensional vector;

the byte threshold is set to 3000.

4. The multi-scale convolutional neural network-based malware detection method as claimed in any one of claims 1 to 3, wherein a splicing layer is arranged at the output end of the plurality of parallel feature extraction channels to splice the effective features extracted by the plurality of parallel feature extraction channels;

5. The multi-scale convolutional neural network-based malware detection method of claim 4, wherein a second Dropout layer is arranged between the fully-connected layer and the spliced layer.

6. The multi-scale convolutional neural network-based malware detection method of claim 4, wherein the multi-scale convolutional neural network adopts 3 parallel feature extraction channels for effective feature extraction.

7. The multi-scale convolutional neural network-based malware detection method of claim 6, wherein the one-dimensional convolutional layers of the 3 parallel feature extraction channels use 56 convolutional kernels, the window sizes of the convolutional kernels are 9, 11 and 13, and the step size is 1.

8. The multi-scale convolutional neural network-based malware detection method of claim 6, wherein the step sizes of the pooling layers of the 3 parallel feature extraction channels are 9, 11 and 13 respectively.

9. The multi-scale convolutional neural network-based malware detection method as claimed in any one of claims 6 to 8, wherein the fully connected layer is provided with 16 neurons.

10. The multi-scale convolutional neural network-based malware detection method as claimed in any one of claims 1-3, 5 and 7-8, wherein the pooling layers are maximum pooling.