CN117496246A - Malicious software classification method based on convolutional neural network - Google Patents
Malicious software classification method based on convolutional neural network Download PDFInfo
- Publication number
- CN117496246A CN117496246A CN202311489175.9A CN202311489175A CN117496246A CN 117496246 A CN117496246 A CN 117496246A CN 202311489175 A CN202311489175 A CN 202311489175A CN 117496246 A CN117496246 A CN 117496246A
- Authority
- CN
- China
- Prior art keywords
- image
- malware
- malicious software
- classification method
- gray
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 74
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 16
- 239000013598 vector Substances 0.000 claims abstract description 27
- 230000003321 amplification Effects 0.000 claims abstract description 5
- 238000003199 nucleic acid amplification method Methods 0.000 claims abstract description 5
- 238000011176 pooling Methods 0.000 claims description 13
- 238000012360 testing method Methods 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 7
- 230000006835 compression Effects 0.000 claims description 5
- 238000007906 compression Methods 0.000 claims description 5
- 230000001186 cumulative effect Effects 0.000 claims description 5
- 238000005315 distribution function Methods 0.000 claims description 5
- 230000005284 excitation Effects 0.000 claims description 5
- 238000003711 image thresholding Methods 0.000 claims description 5
- 238000005457 optimization Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 238000002790 cross-validation Methods 0.000 claims description 4
- 230000002155 anti-virotic effect Effects 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 230000002708 enhancing effect Effects 0.000 claims description 2
- 230000002401 inhibitory effect Effects 0.000 abstract 1
- 230000006870 function Effects 0.000 description 23
- 238000004458 analytical method Methods 0.000 description 9
- 230000004913 activation Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000003068 static effect Effects 0.000 description 4
- 244000035744 Hura crepitans Species 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
- G06V10/765—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/096—Transfer learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/40—Image enhancement or restoration by the use of histogram techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Abstract
The invention discloses a malicious software classification method based on a convolutional neural network. The method comprises the following steps: collecting a malicious software sample data set; converting the malware sample into a gray scale image; increasing the local contrast of the gray images and the contrast between the gray images, and simultaneously inhibiting the amplification of noise; the gray level image is input into an Efficientnet-B0 model to obtain a more refined feature vector, regularized finally and input into a Softmax function to be classified so as to determine the malicious software family to which the gray level image belongs. According to the method, a malicious software sample is converted into a gray image, and an Efficientnet-B0 model with high accuracy and fewer parameters is used, so that families corresponding to malicious software can be identified efficiently, a certain unknown malicious software attack discovery capability is provided, and the method can be expanded to other platforms for identifying scenes of the malicious software.
Description
Technical Field
The invention belongs to the fields of computer system security, network security and artificial intelligence security application, and particularly relates to a malicious software classification method based on a convolutional neural network.
Background
Malware is software that is installed and run on a user device without permission from the user, infringing on the legal rights and interests of the user. Because the malicious software creates the malicious software variant by reusing the core code, the malicious software is easier to write, and an automatic malicious software and variant generation platform thereof are usually arranged, so that the number of the malicious software is large and widely spread, and the malicious software causes great threat to enterprises, governments, financial institutions and the like, and even damages a user software and hardware system to cause great economic loss. Since most of the malware is automatically generated or generated by conventional malware, the most important method for searching and protecting the malware is to efficiently classify and assign the malware to a conventional malware family, and then the existing searching and protecting method is adopted to prevent the attack of the malware. Aiming at the problems, the invention provides a malicious software classification technology and method based on a convolutional neural network.
The current malware detection methods are mainly divided into static analysis and dynamic analysis. Static analysis is to analyze its code files without executing the application, and although static analysis provides the most comprehensive code coverage at a faster rate and with less overhead, obfuscation and encryption techniques can affect its analysis performance and effectiveness. Dynamic analysis has better performance and effect in dealing with obfuscation techniques and encryption techniques by monitoring application running in a sandbox environment and collecting its behavior information, but dynamic analysis requires longer analysis time and higher memory overhead. Furthermore, by running an application in a sandbox does not cover all possible code execution paths and running scenarios, and some malware may detect the sandbox environment, malicious behavior may not appear during dynamic analysis.
The image-based malicious software detection does not need to extract features of an original sample, the image generation speed is high, and the image-based malicious software detection has better performance and effect on the malicious software detection by using the confusion technology and the encryption technology. The code of the malware generally has a plurality of bytes, and the image-based malware detection corresponds each byte to one pixel, so code execution instructions can be converted into a plurality of pixel values. Locating similar instruction sequences from different malware samples is equivalent to identifying regions with similar pixel values in their corresponding images. However, similar instruction sequences of different malware samples belonging to the same malware family may exist at different locations of their files, resulting in reduced classification model accuracy. Too many parameters of the traditional convolutional neural network can lead to low classification efficiency and poor generalization of malicious software based on the convolutional neural network.
In summary, the method for classifying the malicious software by using the convolutional neural network still needs to be further improved in terms of accuracy, generalization and efficiency by converting the malicious software sample into the gray level image.
Disclosure of Invention
The invention aims to solve the defects and the shortcomings of the existing classification schemes and provides a malicious software classification method based on a convolutional neural network. According to the method, the malicious software sample is converted into the gray image, so that a great amount of time expenditure in the feature extraction process is avoided; then, a self-adaptive histogram equalization method for limiting the contrast is used for enhancing the local contrast of the gray level image, and compared with the histogram method, the self-adaptive histogram equalization method for limiting the contrast can enhance the local contrast of the image, avoid amplifying the noise of the image and enhance the contrast effect between the images; finally, the obtained image is input into a classifier, the classifier removes the last full-connection layer of the Efficient net-B0 model, all layers before the full-connection layer are reserved, a global average pooling layer and a Softmax layer are added after the full-connection layer, the global average pooling layer has fewer parameters than the full-connection layer, the regularization function is achieved, and the Efficient net-B0 model can obtain higher accuracy with a small number of parameters and calculated amount. The method can obtain higher accuracy with less detection time.
The technical scheme adopted by the invention is as follows:
a malicious software classification method based on a convolutional neural network comprises the following steps:
s1) marking software samples in a malicious software data set, converting each byte into decimal numbers between [0,255] according to byte sequence of the software samples, converting the decimal numbers into a first gray scale image, dividing the first gray scale image into a training set and a testing set through a cross verification function, and ensuring that the proportion of each malicious software category corresponding to the training set and the testing set is consistent with that of an original data set;
s2) image enhancement: the first gray image is processed by adopting an adaptive histogram equalization method for limiting contrast,
obtaining a second gray scale image with enhanced local contrast;
s3) feature extraction: inputting the second gray level image into an EfficientNet-B0 model, extracting features, outputting more refined feature vectors with stronger expression capacity, and obtaining a third gray level image;
s4) image classification: the third gray level image is input to the global average pooling layer, one-dimensional vector is output, then the one-dimensional vector is input to the Softmax layer, the input one-dimensional vector is converted into probability distribution, each element of the output vector is between 0 and 1, the probability value that a sample belongs to a certain malicious software family is represented, the sum of all elements is 1, and the category with the highest probability is selected as a prediction result.
In some examples of malware classification methods, the image enhancement specifically includes:
a) Dividing an original image into a plurality of regions;
b) Calculating a cumulative distribution function CDF of pixel values in the image area;
c) Judging whether the frequency value of a certain pixel in the image area is higher than a preset frequency threshold value, if so, performing clipping operation by using an image threshold processing function, and randomly assigning the pixels higher than the preset frequency threshold value to [0,255]
Values within the range to ensure that no pixel has a frequency value above the threshold;
d) The interpolation method is used for converting each region, so that pixel values are related to each other, noise amplification is limited, and contrast of an image is enhanced.
In some examples of malware classification methods, the Efficient Net-B0 model is formed by optimization through compression and excitation methods using a series of MBConv modules.
In some examples of malware classification methods, the image thresholding function is that of a Python CV2 library.
In some examples of malware classification methods, a third party open source library CV2 of Python is used to convert decimal numbers into a first grayscale image.
In some examples of malware classification methods, during a clipping operation, portions of pixel values that occur more frequently than a frequency threshold are divided equally into 0-255, out of 256 packets, and if there are portions that are not allocated equally, equally spaced are inserted into the packets in sequence until all the excess portions are allocated to the corresponding packets.
In some examples of malware classification methods, the region length is determined by the instruction length of its samples, and the width of the region is determined by the average height of all malware samples of the same series.
In some examples of malware classification methods, the malware data set is a data set Malimg having multiple malware types.
In some examples of malware classification methods, malware is tagged with open source antivirus software, clamAV.
In some examples of malware classification methods, the cross-validation function is stratifiedfold.
In some examples of malware classification methods, a pre-set frequency threshold is set with reference to known publications.
The beneficial effects of the invention are as follows:
compared with the problems of low accuracy, poor generalization, low efficiency and the like of the prior classification technology, the invention has the following advantages:
(1) The classification efficiency is high: according to the method, the malicious software sample is converted into the image, the conversion speed is high, the time required by static feature extraction and dynamic feature extraction is saved, and the parameters and the calculated amount of the model are less.
(2) The accuracy is high: the local contrast of the original image is enhanced, and the accuracy of classification can be improved by using a model with high classification accuracy.
(3) High generalization: the full-connection layer of the EfficientNet-B0 model is removed, the generalization capability of the model is reduced due to excessive parameters of the full-connection layer, and the feature vector is simplified by using the global average pooling layer, so that the regularization effect is achieved, and the generalization capability of the model is enhanced.
(4) And (3) visualization: and the malicious software sample is converted into an image, so that the difference between different malicious software families can be observed conveniently and intuitively.
Drawings
FIG. 1 is a flow chart of a convolutional neural network-based malware classification method of the present invention.
FIG. 2 is a flow chart of an image enhancement process of the convolutional neural network-based malware classification method of the present invention.
FIG. 3 is a flow chart of an image classification process of the convolutional neural network-based malware classification method of the present invention.
Detailed Description
A malicious software classification method based on a convolutional neural network comprises the following steps:
s1) marking software samples in a malicious software data set, converting each byte into decimal numbers between [0,255] according to byte sequence of the software samples, converting the decimal numbers into a first gray scale image, dividing the first gray scale image into a training set and a testing set through a cross verification function, and ensuring that the proportion of each malicious software category corresponding to the training set and the testing set is consistent with that of an original data set;
s2) image enhancement: the first gray image is processed by adopting an adaptive histogram equalization method for limiting contrast,
obtaining a second gray scale image with enhanced local contrast;
s3) feature extraction: inputting the second gray level image into an EfficientNet-B0 model, extracting features, outputting more refined feature vectors with stronger expression capacity, and obtaining a third gray level image;
s4) image classification: the third gray level image is input to the global average pooling layer, one-dimensional vector is output, then the one-dimensional vector is input to the Softmax layer, the input one-dimensional vector is converted into probability distribution, each element of the output vector is between 0 and 1, the probability value that a sample belongs to a certain malicious software family is represented, the sum of all elements is 1, and the category with the highest probability is selected as a prediction result.
The source of the malicious software data set has no special requirement, and the sample is complete in variety and easy to obtain. In some examples of malware classification methods, the malware data set is a data set Malimg having multiple malware types.
Various marking software may be used to mark malware. In some examples of malware classification methods, malware is marked with open-source antivirus software, clamAV, taking into account accessibility of programs.
The decimal numbers may be converted into the first gray scale image using various well known algorithms. In some examples of malware classification methods, decimal numbers are converted to a first grayscale image using a third party open source library CV2 of Python, taking into account the accessibility of the program.
There is no special requirement for the cross-validation function, which in some examples of malware classification methods is stratifiedfold.
In some examples of malware classification methods, the image enhancement specifically includes:
a) Dividing an original image into a plurality of regions;
b) Calculating a cumulative distribution function CDF of pixel values in the image area;
c) Judging whether the frequency value of a certain pixel in the image area is higher than a preset frequency threshold value, if so, performing clipping operation by using an image threshold processing function, and randomly assigning the pixels higher than the preset frequency threshold value to [0,255]
Values within the range to ensure that no pixel has a frequency value above the threshold;
d) The interpolation method is used for converting each region, so that pixel values are related to each other, noise amplification is limited, and contrast of an image is enhanced.
The cumulative distribution function CDF can be calculated as follows:
where L is the total number of gray pixels, 256, n j Is the probability value that the pixel value j occurs in the image area.
In some examples of malware classification methods, the Efficient Net-B0 model is formed by optimization through compression and excitation methods using a series of MBConv modules. Specifically, when constructing the afflicientnet-B0 model, a mobile rollover bottleneck convolution module in the MobileNet V2 is used as a main building block of the model, and on the basis, a multi-objective neural architecture is used for searching, so that a base network afflicientnet-B0 model is finally determined. The MBConv module in the Efficientnet-B0 model is formed by optimization using the compression and excitation method in SENet on the basis of a depth separable convolution. The Efficient net-B0 model can be regarded as an efficient feature extractor, and the image with enhanced local contrast outputs feature vectors which are more refined and have stronger expressive power after a series of operations such as convolution, pooling and activation.
The image thresholding function may be a variety of known functions. In some examples of malware classification methods, the image thresholding function is that of the Python CV2 library, considering the accessibility of the algorithm.
In some examples of malware classification methods, during a clipping operation, portions of pixel values that occur more frequently than a frequency threshold are divided equally into 0-255, out of 256 packets, and if there are portions that are not allocated equally, equally spaced are inserted into the packets in sequence until all the excess portions are allocated to the corresponding packets.
In some examples of malware classification methods, the region length is determined by the instruction length of its samples, and the width of the region is determined by the average height of all malware samples of the same series.
In some examples of malware classification methods, a pre-set frequency threshold is set with reference to known publications.
In some examples of the malware classification method, during the image classification operation, after the image is input into the EfficientNet-B0 model, a series of operations such as convolution, pooling and activation are performed, and then a feature vector with higher definition and higher expression capability is output. Then the regularization function is achieved through the global average pooling layer, it can simplify the three-dimensional input of the image width and length w×h×d to a one-dimensional output of which only length 1×1×d remains, and then input the output to the Softmax function. The Softmax function receives a vector z containing K real numbers and converts it to a K probability-forming probability distribution proportional to the exponent of the input number, the corresponding function being:
the Softmax function first indexes each element in the input vector z, i.e(z i Representing the i-th element in z), and adding all elements to obtain a value representing the index sum +.>Normalizing each element by dividing the index value of each element by the sum of the indices to obtain an output Softmax (z i ) Each element representing the vector represents a probability of a malware family.
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other, and the present invention will be further described in detail with reference to the drawings and the specific embodiments.
The invention provides a malicious software classification method based on a convolutional neural network. According to the method, the malicious software sample is converted into the gray image, so that the large time cost of feature extraction is avoided; then using a self-adaptive histogram equalization method for limiting contrast to enhance the local contrast of the gray image; and finally inputting the obtained image into an Efficientnet-B0 model with the last full-connection layer removed, and adding a global average pooling layer and a Softmax layer after the Efficientnet-B0 model to judge which malware family a malware sample belongs to. As shown in fig. 1 to 3, the method specifically comprises the following steps:
step one, obtaining a public malicious software data set Malimg, marking the data set by using ClamAV, and converting each byte into decimal numbers between [0,255] according to byte sequence. A third party open source library of Python was used to convert it into a two-dimensional gray scale image. The gray level image is divided into a training set and a testing set through a cross validation function StratitifiedKFOld, so that the proportion of each malicious software category corresponding to the training set and the testing set is ensured to be consistent with that of the original data set.
Step two,
The first step is to divide the gray image;
the second step is to calculate the Cumulative Distribution Function (CDF) frequency value of the image region, which is calculated as follows:
where L is the total number of gray pixels, 256.n is n j Is the probability value that the pixel value j appears in the image area;
judging whether the frequency value cdf (i) of the pixel is higher than a preset frequency threshold value, if yes, performing clipping operation by using an image threshold processing function of a CV2 library, and randomly endowing a part of the pixels with values in the range of [0,255], so that the frequency value of the pixel can be ensured to be higher than the frequency threshold value of clipping limit;
the fourth step is to transform each region by using interpolation in order to correlate pixel values within each region, limit amplification of noise and enhance contrast of the image.
And thirdly, when constructing the Efficientnet-B0 model, using a mobile overturn bottleneck convolution module in the MobileNet V2 as a main building block of the model, and searching by using a multi-target neural architecture on the basis to finally determine a base line network Efficientnet-B0 model. The MBConv module in the Efficientnet-B0 model is formed by optimization using the compression and excitation method in SENet on the basis of a depth separable convolution. The Efficient net-B0 model can be regarded as an efficient feature extractor, and the image with enhanced local contrast outputs feature vectors which are more refined and have stronger expressive power after a series of operations such as convolution, pooling and activation.
And step four, inputting an image into an EfficientNet-B0 model, and outputting a more refined feature vector with stronger expression capability after a series of operations such as convolution, pooling, activation and the like. Then the regularization function is achieved through the global average pooling layer, it can simplify the three-dimensional input of the image width and length w×h×d to a one-dimensional output of which only length 1×1×d remains, and then input the output to the Softmax function. The Softmax function receives a vector z containing K real numbers and converts it to a K probability-forming probability distribution proportional to the exponent of the input number, the corresponding function being:
the Softmax function first indexes each element in the input vector z, i.e(z i Representing the i-th element in z), and adding all elements to obtain a value representing the index sum +.>Normalizing each element by dividing the index value of each element by the sum of the indices to obtain an output Softmax (z i ) Each element representing the vector represents a probability of a malware family.
Comparison of classification efficiency for different algorithms:
in the ImageNet dataset, the accuracy of the efficentet-B0 model was higher than that of the ResNet50 and the densanenet 169, the amount of parameters was minimal, the amount of calculation was minimal FLPOS (floating point operations), and the efficentets were evaluated on 8 common migration learning datasets, the results indicated that the efficentets reached the currently optimal accuracy on 5 datasets therein, and the amount of parameters was greatly reduced, indicating that the efficentets had good accuracy, performance, and migration ability.
The above description of the present invention is further illustrated in detail and should not be taken as limiting the practice of the present invention. It is within the scope of the present invention for those skilled in the art to make simple deductions or substitutions without departing from the concept of the present invention.
Claims (10)
1. A malicious software classification method based on a convolutional neural network comprises the following steps:
s1) marking software samples in a malicious software data set, converting each byte into decimal numbers between [0,255] according to byte sequence of the software samples, converting the decimal numbers into a first gray scale image, dividing the first gray scale image into a training set and a testing set through a cross verification function, and ensuring that the proportion of each malicious software category corresponding to the training set and the testing set is consistent with that of an original data set;
s2) image enhancement: processing the first gray level image by adopting a self-adaptive histogram equalization method for limiting contrast ratio to obtain a second gray level image for enhancing local contrast ratio;
s3) feature extraction: inputting the second gray level image into an EfficientNet-B0 model, extracting features, outputting more refined feature vectors with stronger expression capacity, and obtaining a third gray level image;
s4) image classification: the third gray level image is input to the global average pooling layer, one-dimensional vector is output, then the one-dimensional vector is input to the Softmax layer, the input one-dimensional vector is converted into probability distribution, each element of the output vector is between 0 and 1, the probability value that a sample belongs to a certain malicious software family is represented, the sum of all elements is 1, and the category with the highest probability is selected as a prediction result.
2. The malware classification method according to claim 1, wherein the image enhancement specifically comprises:
a) Dividing an original image into a plurality of regions;
b) Calculating a cumulative distribution function CDF of pixel values in the image area;
c) Judging whether the frequency value of a certain pixel in the image area is higher than a preset frequency threshold value, if so, performing clipping operation by using an image threshold processing function, and randomly assigning the value in the range of [0,255] to the pixel higher than the preset frequency threshold value so as to ensure that the frequency value of the pixel is higher than the threshold value;
d) The interpolation method is used for converting each region, so that pixel values are related to each other, noise amplification is limited, and contrast of an image is enhanced.
3. The malware categorization method of claim 1, wherein the afflicientnet-B0 model is formed by optimization through compression and excitation methods using a series of MBConv modules.
4. The malware classification method of claim 2, wherein the image thresholding function is an image thresholding function of a Python CV2 library; and/or
The decimal numbers are converted into a first gray scale image using a third party open source library CV2 of Python.
5. The malware classification method according to claim 2 or 4, wherein, in the clipping operation, the portions where the frequency of occurrence of the pixel values exceeds the frequency threshold are divided equally into 0-255, and 256 packets in total, and if there are portions that are not distributed equally, the portions are inserted into the packets in equal intervals in order until all the excess portions are distributed to the corresponding packets.
6. The method of claim 1, wherein the length of the region is determined by the instruction length of the sample, and the width of the region is determined by the average height of all malware samples in the same series.
7. The malware classification method according to claim 1, wherein the malware data set is a data set Malimg having a plurality of malware types.
8. The malware classification method according to claim 1, wherein malware is marked using an open source antivirus software, clamAV.
9. The malware classification method of claim 1, wherein the cross-validation function is stratifiedfold.
10. The malware classification method according to claim 1, wherein the preset frequency threshold is set with reference to known publications.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311489175.9A CN117496246A (en) | 2023-11-09 | 2023-11-09 | Malicious software classification method based on convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311489175.9A CN117496246A (en) | 2023-11-09 | 2023-11-09 | Malicious software classification method based on convolutional neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117496246A true CN117496246A (en) | 2024-02-02 |
Family
ID=89674045
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311489175.9A Pending CN117496246A (en) | 2023-11-09 | 2023-11-09 | Malicious software classification method based on convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117496246A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113989583A (en) * | 2021-09-03 | 2022-01-28 | 中电积至(海南)信息技术有限公司 | Method and system for detecting malicious traffic of internet |
CN114926680A (en) * | 2022-05-13 | 2022-08-19 | 山东省计算中心(国家超级计算济南中心) | Malicious software classification method and system based on AlexNet network model |
WO2023193629A1 (en) * | 2022-04-08 | 2023-10-12 | 华为技术有限公司 | Coding method and apparatus for region enhancement layer, and decoding method and apparatus for area enhancement layer |
-
2023
- 2023-11-09 CN CN202311489175.9A patent/CN117496246A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113989583A (en) * | 2021-09-03 | 2022-01-28 | 中电积至(海南)信息技术有限公司 | Method and system for detecting malicious traffic of internet |
WO2023193629A1 (en) * | 2022-04-08 | 2023-10-12 | 华为技术有限公司 | Coding method and apparatus for region enhancement layer, and decoding method and apparatus for area enhancement layer |
CN114926680A (en) * | 2022-05-13 | 2022-08-19 | 山东省计算中心(国家超级计算济南中心) | Malicious software classification method and system based on AlexNet network model |
Non-Patent Citations (1)
Title |
---|
杨春雨: "基于纹理特征融合与深度学习的恶意软件分类", 中国优秀硕士学位论文全文数据库, no. 2021, 15 September 2021 (2021-09-15), pages 17 - 40 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108985361B (en) | Malicious traffic detection implementation method and device based on deep learning | |
CN111832019B (en) | Malicious code detection method based on generation countermeasure network | |
CN109302410B (en) | Method and system for detecting abnormal behavior of internal user and computer storage medium | |
CN109492395B (en) | Method, device and storage medium for detecting malicious program | |
CN111259397B (en) | Malware classification method based on Markov graph and deep learning | |
CN112491796A (en) | Intrusion detection and semantic decision tree quantitative interpretation method based on convolutional neural network | |
Seneviratne et al. | Self-supervised vision transformers for malware detection | |
CN112088378A (en) | Image hidden information detector | |
CN113904861B (en) | Encryption traffic safety detection method and device | |
Tran et al. | Image-based unknown malware classification with few-shot learning models | |
AlGarni et al. | An efficient convolutional neural network with transfer learning for malware classification | |
CN110705622A (en) | Decision-making method and system and electronic equipment | |
CN111241550B (en) | Vulnerability detection method based on binary mapping and deep learning | |
CN111291712B (en) | Forest fire recognition method and device based on interpolation CN and capsule network | |
CN116644422A (en) | Malicious code detection method based on malicious block labeling and image processing | |
CN112560034A (en) | Malicious code sample synthesis method and device based on feedback type deep countermeasure network | |
Xin et al. | Malicious code detection method based on image segmentation and deep residual network RESNET | |
CN116595525A (en) | Threshold mechanism malicious software detection method and system based on software map | |
CN117496246A (en) | Malicious software classification method based on convolutional neural network | |
CN116188439A (en) | False face-changing image detection method and device based on identity recognition probability distribution | |
CN115567224A (en) | Method for detecting abnormal transaction of block chain and related product | |
CN115828239A (en) | Malicious code detection method based on multi-dimensional data decision fusion | |
CN115564970A (en) | Network attack tracing method, system and storage medium | |
CN114638356A (en) | Static weight guided deep neural network back door detection method and system | |
CN113553586A (en) | Virus detection method, model training method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |