CN116644422A - Malicious code detection method based on malicious block labeling and image processing - Google Patents
Malicious code detection method based on malicious block labeling and image processing Download PDFInfo
- Publication number
- CN116644422A CN116644422A CN202310606050.3A CN202310606050A CN116644422A CN 116644422 A CN116644422 A CN 116644422A CN 202310606050 A CN202310606050 A CN 202310606050A CN 116644422 A CN116644422 A CN 116644422A
- Authority
- CN
- China
- Prior art keywords
- malicious
- block
- basic
- code
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 62
- 238000012545 processing Methods 0.000 title claims abstract description 25
- 238000002372 labelling Methods 0.000 title claims abstract description 15
- 230000006870 function Effects 0.000 claims abstract description 35
- 238000000034 method Methods 0.000 claims abstract description 24
- 238000013145 classification model Methods 0.000 claims abstract description 12
- 239000013598 vector Substances 0.000 claims description 19
- 238000012549 training Methods 0.000 claims description 15
- 239000008186 active pharmaceutical agent Substances 0.000 claims description 12
- 230000003044 adaptive effect Effects 0.000 claims description 7
- 239000011159 matrix material Substances 0.000 claims description 7
- 238000003062 neural network model Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 238000006073 displacement reaction Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 4
- 238000012546 transfer Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 abstract description 8
- 230000009466 transformation Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000012800 visualization Methods 0.000 description 3
- 230000003321 amplification Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000001186 cumulative effect Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/561—Virus type analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computer Security & Cryptography (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Virology (AREA)
- Biomedical Technology (AREA)
- Pure & Applied Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Databases & Information Systems (AREA)
- Algebra (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a malicious code detection method based on malicious block labeling and image processing, which belongs to the field of malicious code detection and comprises the following steps: the method comprises the steps of (S1) dividing a binary file of malicious codes to be detected into a plurality of basic blocks, detecting whether each basic block is a malicious block, and marking the position of the malicious block in the binary file; malicious blocks are basic blocks related to malicious functions; (S2) converting the binary file into a gray level image, and improving the local contrast of a part of images corresponding to malicious blocks in the gray level image to obtain a target gray level image; and (S3) inputting the target gray level diagram into a trained malicious code classification model so as to predict the probability that the malicious code belongs to each family class, and determining the family class with the highest probability as the family class to which the malicious code belongs. The method and the device can enhance the influence degree of the content related to the malicious function in the malicious code on the classification result, thereby improving the accuracy of the classification of the malicious code.
Description
Technical Field
The invention belongs to the field of malicious code detection, and particularly relates to a malicious code detection method based on malicious block labeling and image processing.
Background
The network security industry is constantly striving to prevent and treat the attack behavior of malicious codes, and an attacker can utilize the malicious codes to infect victim equipment to achieve the purpose of destroying confidentiality and integrity of data resources of users and enterprises, so that the malicious codes are accurately detected and corresponding measures are taken, and the network security method has very important significance for guaranteeing network security.
Traditionally, malware detection or classification is performed by signature-based or heuristic methods. Signature-based methods deploy signatures for different malware families and variants, serve as prototypes, allow corresponding classification of newly discovered malware files, determine corresponding family categories, and can take corresponding countermeasures according to the characteristics of the family malicious codes. Over the past few years, nataraj et al introduced a static malware analysis technique called malware visualization, wherein malware visualization was a technique that represented the contents of a malware binary file in some form as an image, specifically, the original bytes of the malware binary file were read as 8-bit unsigned integers and stored into vectors that were reshaped into a matrix and then could be visualized as a grayscale image.
The malicious software analysis method for the malicious software visualization effectively solves the problem of malicious software classification. However, because the malicious functions of the whole malicious software are nested in other non-malicious functions, that is, a considerable part of content of the malicious software is irrelevant to the malicious functions, when the gray level image obtained by converting the whole malicious software binary file is directly classified, the content irrelevant to the malicious functions can influence the result of the whole classification, and the final classification accuracy cannot be ensured.
Disclosure of Invention
Aiming at the defects and improvement demands of the prior art, the invention provides a malicious code detection method based on malicious block labeling and image processing, which aims to enhance the influence degree of content related to malicious functions in malicious codes on classification results, so as to improve the accuracy of classification of the malicious codes and facilitate correct cognition and analysis of unknown malicious codes.
To achieve the above object, according to one aspect of the present invention, there is provided a malicious code detection method based on malicious block annotation and image processing, comprising the steps of:
the method comprises the steps of (S1) dividing a binary file of malicious codes to be detected into a plurality of basic blocks, detecting whether each basic block is a malicious block, and marking the position of the malicious block in the binary file; malicious blocks are basic blocks related to malicious functions;
(S2) converting the binary file into a gray level image, and improving the local contrast of a part of images corresponding to malicious blocks in the gray level image to obtain a target gray level image;
(S3) inputting the target gray level diagram into a trained malicious code classification model to predict the probability that the malicious code belongs to each family class, and determining the family class with the highest probability as the family class to which the malicious code belongs;
the malicious code classification model is a neural network model and is used for predicting the probability that the malicious code corresponding to the input gray level image belongs to each family class.
Further, in step (S1), for any basic block, it is detected whether it is a malicious block, in a manner including:
extracting code features of the basic blocks and converting the code features into feature vectors; code features include structural features, arithmetic instruction features, branch instruction features, and API call features;
inputting the feature vector into a trained malicious block detection model, and carrying out feature extraction and reconstruction on the feature vector by using the malicious block detection module to obtain a reconstructed feature;
if the difference between the reconstructed feature and the feature vector output by the malicious block detection module is greater than a preset threshold value, judging the basic block as a malicious block; otherwise, judging that the basic block is not a malicious block;
the malicious block detection model is a neural network model and is used for extracting and reconstructing characteristics of an input basic block, and the training mode comprises the following steps:
collecting a binary file irrelevant to malicious functions, dividing the binary file into basic blocks, and extracting code features of the basic blocks as benign samples to obtain a benign sample set;
initializing a malicious block detection model, training the malicious block detection model by using a benign sample set with the aim of minimizing reconstruction loss, and obtaining a trained malicious block detection model after training is finished.
Further, the structural features include: the number of children and intermediate values of the basic block; the arithmetic instruction features include the number of basic math, displacement instructions, and logical operations contained by the basic block; the transfer instruction features include the number of stack operations, register operations, and port operations within the basic block; the API call feature includes the number of calls to the APIs associated with dll, process, service, system information within the basic block.
Further, the malicious block detection model is a self-encoder model.
Further, in step (S2), the local contrast of the partial image corresponding to the malicious block in the gray scale image is improved by limiting the contrast adaptive histogram equalization algorithm.
Further, the malicious code classification model is a Vision Transformer model.
Further, in step (S2), converting the binary file into a gray scale map includes:
converting each 8 bits as a unit into an unsigned integer according to a code sequence, and storing the values into an unsigned integer vector;
converting the unsigned integer vector into a matrix, taking each element in the matrix as a pixel, and taking the numerical value of the element as the gray value of the corresponding pixel to obtain a gray image.
According to still another aspect of the present invention, there is provided a computer-readable storage medium comprising: a stored computer program; when the computer program is executed by the processor, the device where the computer readable storage medium is located is controlled to execute the malicious code detection method based on malicious block labeling and image processing.
In general, through the above technical solutions conceived by the present invention, the following beneficial effects can be obtained:
(1) According to the method, the binary file of the malicious code to be detected is divided into the basic blocks, the malicious blocks in the basic blocks, namely the basic blocks related to the malicious function, are detected, the positions of the basic blocks are marked in the whole binary file, the positioning of the malicious blocks is realized, when the subsequent classification is carried out based on the visualization technology, the image processing is carried out on partial images corresponding to the malicious blocks in the gray level map obtained by converting the binary file based on the position marking results of the malicious blocks, so that the local contrast of the partial images is improved, the weight of related content of the malicious blocks on the classification results can be effectively improved based on the processing, the influence of irrelevant content of the malicious function on the classification results is weakened, and the detection accuracy of the malicious code is effectively improved.
(2) In the preferred scheme of the invention, a neural network model is used as a malicious block detection model, the model is used for extracting and reconstructing characteristics of an input basic block, the function of anomaly detection can be realized, the model is trained by benign samples irrelevant to malicious functions, the input and output differences of the model are smaller for benign basic blocks, and the input and output differences of the model are larger for malicious blocks, so that the judgment and positioning of the malicious blocks in binary codes can be accurately completed based on the method.
(3) In a preferred embodiment of the present invention, when detecting a malicious block, the code features of the extracted basic block include a structural feature, an arithmetic instruction feature, a branch instruction feature, and an API call feature, where the structural feature includes: the number of children and intermediate values of the basic block; the arithmetic instruction features include the number of basic math, displacement instructions, and logical operations contained by the basic block; the transfer instruction features include the number of stack operations, register operations, and port operations within the basic block; the API calling features comprise the calling quantity of the APIs related to dll, process, service and system information in the basic block, and can comprehensively and accurately reflect the functions realized by the basic block.
(4) In the preferred scheme of the invention, the local contrast of the partial image corresponding to the malicious block in the gray level image is improved by using the adaptive histogram equalization algorithm (Contrast Limited Adaptive Histogram Equalization, CLAHE) with limited contrast, so that the noise amplification problem can be reduced while the local contrast is improved.
(5) In the preferred scheme of the invention, a Vision Transformer model is specifically used for realizing a malicious code classification model, the model can divide an input image into a plurality of sub-blocks and form the sub-blocks into linear embedded sequences, then the linear embedded sequences are used as inputs of a transducer to simulate word group sequence input in the NLP field, and the method has a good classification effect on gray level images obtained by conversion of malicious code binary files based on the model.
Drawings
FIG. 1 is a flowchart of a malicious code detection method based on malicious block labeling and image processing provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of image generation, image processing, model training, and model verification according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a malicious block detection model according to an embodiment of the present invention;
FIG. 4 is a diagram of an implementation of limiting contrast adaptive histogram equalization on gray scale images according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a bilinear interpolation method according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
In the present invention, the terms "first," "second," and the like in the description and in the drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
In order to solve the technical problems that the classification result of the existing malicious code detection method is interfered by content irrelevant to malicious functions and the classification accuracy is low, the invention provides a malicious code detection method based on malicious block labeling and image processing, and the whole thought of the malicious code detection method is as follows: positioning and marking malicious blocks in a binary file of a malicious code to be detected, and performing image processing on partial images corresponding to the malicious blocks in a gray level diagram obtained by converting the binary file to improve local contrast, so that the importance of related content of a malicious function on a classification result is improved, the influence of unrelated content of the malicious function is weakened, and the classification accuracy is improved.
The following are examples.
Example 1:
a malicious code detection method based on malicious block annotation and image processing is shown in fig. 1 and 2, and comprises the following steps:
the method comprises the steps of (S1) dividing a binary file of malicious codes to be detected into a plurality of basic blocks, detecting whether each basic block is a malicious block, and marking the position of the malicious block in the binary file; malicious blocks are basic blocks related to malicious functions.
The basic block, i.e. the sequence of instructions that are executed sequentially, comprises only one input and one output. According to the embodiment, the binary file is divided into the basic blocks, and whether the basic blocks are related to malicious functions or not is judged, so that the positioning and labeling of the malicious blocks can be effectively realized.
Optionally, in step (S1) of the present embodiment, for any basic block, whether it is a malicious block is detected by a method including:
extracting code features of the basic blocks and converting the code features into feature vectors; code features include structural features, arithmetic instruction features, branch instruction features, and API call features;
inputting the feature vector into a trained malicious block detection model, and carrying out feature extraction and reconstruction on the feature vector by using the malicious block detection module to obtain a reconstructed feature;
if the difference between the reconstructed feature and the feature vector output by the malicious block detection module is greater than a preset threshold value, judging the basic block as a malicious block; otherwise, judging that the basic block is not a malicious block;
the malicious block detection model is a neural network model and is used for extracting and reconstructing characteristics of an input basic block, and the training mode comprises the following steps:
collecting a binary file irrelevant to malicious functions, dividing the binary file into basic blocks, and extracting code features of the basic blocks as benign samples to obtain a benign sample set;
initializing a malicious block detection model, training the malicious block detection model by using a benign sample set with the aim of minimizing reconstruction loss, and obtaining a trained malicious block detection model after training is finished.
Because the malicious block detection model can extract and reconstruct the characteristics of the input basic blocks, the function of anomaly detection can be realized, and the benign samples irrelevant to the malicious function are specifically utilized to train the malicious block detection model, the input and output differences of the model are smaller for benign basic blocks, and the input and output differences of the model are larger for malicious blocks, so that the judgment and positioning of the malicious blocks in the binary code can be accurately completed based on the characteristics; optionally, in this embodiment, a U-Net model is selected to implement a malicious block detection model, where the U-Net model is a self-encoder (AutoEncoder) model, and the self-encoder is an artificial neural network used in semi-supervised learning and non-supervised learning, and has a function of performing feature learning on input information by taking the input information as a learning target;
the structure of the U-Net model is shown in FIG. 3, the model contains an encoder g and a decoder f, and when we input an x, we can get an output x' after going through the entire neural network, namely:
f(g(x))=x′
the automatic encoder uses the reconstruction loss x '-x as the loss, and continuously learns to gradually reduce the difference between x and x', so that after learning by using a large number of benign samples, the difference between x and x 'is smaller for benign basic blocks, and the difference between x and x' is larger for malicious basic blocks, so that possible malicious basic blocks can be effectively detected according to the difference.
It is easy to understand that, in order to ensure the training effect of the model, this embodiment collects a large number of binary files irrelevant to malicious functions to make benign samples for training the malicious block detection model; after the malicious block detection model is trained, malicious samples are also manufactured by using malicious code basic blocks for realizing malicious functions, and the training effect of the model obtained by training is tested to ensure that the detection accuracy of the detection model meets the requirement, as shown in fig. 2; in addition, in other embodiments of the present invention, the malicious block detection model may be implemented based on other models that can perform feature extraction and reconstruction.
In order to accurately identify whether the function implemented by the basic block is a malicious function, in this embodiment, the extracted code features of the basic block specifically include: the number of children and intermediate values of the basic block; the arithmetic instruction features specifically include the number of basic mathematics, displacement instructions and logical operations contained in the basic block; the transfer instruction features include, in particular, the number of stack operations, register operations, and port operations within the basic block; the API call feature specifically includes the number of calls to the API associated with dll, process, service, system information within the basic block. The four types of features considered in the embodiment can fully and accurately reflect the functions realized by the basic blocks, and the embodiment takes the features as the input of a malicious block detection model to accurately identify whether the functions of the basic blocks are malicious functions or not, so that the detection of the malicious blocks is accurately completed. In practical application, the code features of the basic blocks can be extracted directly by using the BinaryNinja tool.
Through step (S1), the present embodiment can accurately complete positioning and labeling of malicious blocks in the binary file, and on this basis, the present embodiment further includes the steps of:
and (S2) converting the binary file into a gray level image, and improving the local contrast of a part of images corresponding to the malicious blocks in the gray level image to obtain a target gray level image.
In this embodiment, the specific way to convert the binary file into the gray scale map is:
converting each 8 bits as a unit into an unsigned integer according to a code sequence, and storing the values into an unsigned integer vector;
converting the unsigned integer vector into a matrix, taking each element in the matrix as a pixel, and taking the numerical value of the element as the gray value of the corresponding pixel to obtain a gray image.
It is easy to understand that in the gray map conversion process, the numerical range of 8-bit unsigned integer number corresponding to each pixel is 0-255, corresponding to the gray value of the pixel, 0-255, wherein 0 corresponds to black and 255 corresponds to white.
According to the labeling result in the step (S1), a partial image corresponding to a malicious block in the converted gray level image can be positioned, and the local contrast of the partial image can be improved by means of image processing, so that as a preferred implementation manner, in this embodiment, the local contrast of the partial image corresponding to the malicious block in the gray level image can be improved by limiting the contrast adaptive histogram equalization algorithm (Contrast Limited Adaptive Histogram Equalization, CLAHE), and the noise amplification problem can be reduced while the local contrast is improved; the process of improving local contrast based on the CLAHE is shown in fig. 4, and includes the following steps:
(S21) according to the position of the malicious block in the image, taking a local regular image, and determining to divide the local image into non-overlapping sub-blocks with equal size;
(S22) calculating a sub-block histogram from the image;
(S23) calculating clipLimit from the sub-block histogram provided in the sub-operation S22;
(S24) intercepting pixels exceeding the clipLimit value in the gray level histogram of each sub-block image, and uniformly distributing the intercepted pixels to each gray level;
(S25) reconstructing gray values of the pixel points by using a bilinear interpolation method, and finally realizing histogram equalization.
As shown in fig. 5, the abscissa of each pixel point in the gray scale map represents the current pixel value, and the ordinate represents the transformed pixel value. After the image is segmented according to the method, the gray scale transformation function adopted by each sub-block in equalization is different. The method comprises the following steps:
1) First, according to pixel values, the entire graphic region may be divided into three types of regions A, B, C, which represent a quadrangular region, an edge region, and a central region, respectively.
2) And judging the pixel points in the image one by one, and determining which type of region the pixel points belong to. The pixels in different areas are processed in different ways.
3) If the pixel belongs to the A-class region, the pixel is not subjected to any interpolation operation, and a gray level transformation function is directly applied to perform gray level transformation:
where cdf (x) represents the cumulative distribution value for pixel value x in the subgraph. cdf min With cdf max The minimum and maximum values in the sub-pixel cumulative distribution are represented, respectively. L represents the total number of gray levels, typically 256;
4) If the pixel point belongs to the B-type region, respectively marking the transformation functions corresponding to the A-type regions adjacent to the pixel point asAnd->And takes two points M, N belonging to two class a regions so that MN and the pixel point are on the same horizontal line. The pixel points of M and N points are respectively marked as x 1 And x 2 The pixel applies a linear interpolation transform:
5) If the pixel point belongs to the class C region, referring to point P in fig. 4, bilinear interpolation transform is applied to point P:
after image processing by the CLAHE, the images are unified in size by means of equidistant scaling sampling, for example.
Through the above step (S2), the malicious code binary file is converted into a gray scale image, and the local contrast of the partial image corresponding to the malicious block is effectively improved, based on which the embodiment further includes:
(S3) inputting the target gray level diagram into a trained malicious code classification model to predict the probability that the malicious code belongs to each family class, and determining the family class with the highest probability as the family class to which the malicious code belongs; the malicious code classification model is a neural network model and is used for predicting the probability that the malicious code corresponding to the input gray level image belongs to each family class;
as a preferred implementation manner, in this embodiment, the malicious code classification model is a Vision Transformer model.
The Transformer is an end-to-end NLP model proposed by Google team in 2017 that foregoes using traditional RNN sequential structure and employs self-intent mechanism to enable model parallelization training and knowledge of global information. Vision Transformer can be regarded as a graphic version of a transducer, the standard transducer model is directly migrated to the image field to become a Vision Transformer model under the condition of least transformation, the Vision Transformer model can divide an input image into a plurality of sub-blocks and form the sub-blocks into linear embedded sequences, then the linear embedded sequences are used as input of the transducer to simulate word group sequence input in the NLP field, and under the application scene of the embodiment, the gray level graph obtained by binary file conversion of malicious codes has a better classifying effect based on the model.
It should be noted that other image classification models may be used where classification accuracy meets requirements.
In general, the embodiment locates the malicious blocks in the malicious code binary file, and then improves the local contrast of the malicious block part in the image, so that the classification accuracy can be effectively improved.
Example 2:
a computer-readable storage medium, comprising: a stored computer program; when the computer program is executed by the processor, the device where the computer readable storage medium is located is controlled to execute the malicious code detection method based on malicious block labeling and image processing provided in the above embodiment 1.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (8)
1. A malicious code detection method based on malicious block labeling and image processing is characterized by comprising the following steps:
the method comprises the steps of (S1) dividing a binary file of malicious codes to be detected into a plurality of basic blocks, detecting whether each basic block is a malicious block, and marking the position of the malicious block in the binary file; the malicious blocks are basic blocks related to malicious functions;
(S2) converting the binary file into a gray level image, and improving the local contrast of a part of images corresponding to malicious blocks in the gray level image to obtain a target gray level image;
(S3) inputting the target gray level graph into a trained malicious code classification model to predict the probability that the malicious code belongs to each family class, and determining the family class with the highest probability as the family class to which the malicious code belongs;
the malicious code classification model is a neural network model and is used for predicting the probability that malicious codes corresponding to an input gray level graph belong to each family class.
2. The malicious code detection method based on malicious block labeling and image processing according to claim 1, wherein in the step (S1), for any one basic block, whether it is a malicious block is detected by:
extracting code features of the basic blocks and converting the code features into feature vectors; the code features include a structural feature, an arithmetic instruction feature, a branch instruction feature, and an API call feature;
inputting the feature vector into a trained malicious block detection model, and carrying out feature extraction and reconstruction on the feature vector by the malicious block detection module to obtain a reconstructed feature;
if the difference between the reconstructed feature output by the malicious block detection module and the feature vector is greater than a preset threshold, judging that the basic block is a malicious block; otherwise, judging that the basic block is not a malicious block;
the malicious block detection model is a neural network model and is used for extracting and reconstructing characteristics of an input basic block, and the training mode comprises the following steps:
collecting a binary file irrelevant to malicious functions, dividing the binary file into basic blocks, and extracting code features of the basic blocks as benign samples to obtain a benign sample set;
initializing a malicious block detection model, training the malicious block detection model by using the benign sample set with the aim of minimizing reconstruction loss, and obtaining a trained malicious block detection model after training is finished.
3. The malicious code detection method based on malicious block annotation and image processing according to claim 2, wherein the structural features include: the number of children and intermediate values of the basic block; the arithmetic instruction features comprise the number of basic mathematics, displacement instructions and logic operations contained in basic blocks; the transfer instruction features include a number of stack operations, register operations, and port operations within a basic block; the API call feature includes the number of calls to dll, process, service, systeminformation related APIs within the basic block.
4. A malicious code detection method based on malicious block annotation and image processing according to claim 3, wherein said malicious block detection model is a self-encoder model.
5. The malicious code detection method based on malicious block labeling and image processing according to any one of claims 1 to 4, wherein in the step (S2), local contrast of a portion of the image corresponding to the malicious block in the gray scale map is improved by limiting a contrast adaptive histogram equalization algorithm.
6. The malicious code detection method based on malicious block annotation and image processing according to any one of claims 1-4, wherein the malicious code classification model is a Vision Transformer model.
7. The malicious code detection method based on malicious block annotation and image processing according to any one of claims 1 to 4, wherein in the step (S2), converting the binary file into a grayscale image comprises:
converting each 8 bits as a unit into an unsigned integer according to a code sequence, and storing the values into an unsigned integer vector;
and converting the unsigned integer vector into a matrix, taking each element in the matrix as a pixel, and taking the numerical value of the element as the gray value of the corresponding pixel to obtain the gray image.
8. A computer-readable storage medium, comprising: a stored computer program; when the computer program is executed by a processor, the device where the computer readable storage medium is located is controlled to execute the malicious code detection method based on malicious block marking and image processing according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310606050.3A CN116644422A (en) | 2023-05-23 | 2023-05-23 | Malicious code detection method based on malicious block labeling and image processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310606050.3A CN116644422A (en) | 2023-05-23 | 2023-05-23 | Malicious code detection method based on malicious block labeling and image processing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116644422A true CN116644422A (en) | 2023-08-25 |
Family
ID=87614821
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310606050.3A Pending CN116644422A (en) | 2023-05-23 | 2023-05-23 | Malicious code detection method based on malicious block labeling and image processing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116644422A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117235728A (en) * | 2023-11-16 | 2023-12-15 | 中国电子科技集团公司第十五研究所 | Malicious code gene detection method and device based on fine granularity labeling model |
-
2023
- 2023-05-23 CN CN202310606050.3A patent/CN116644422A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117235728A (en) * | 2023-11-16 | 2023-12-15 | 中国电子科技集团公司第十五研究所 | Malicious code gene detection method and device based on fine granularity labeling model |
CN117235728B (en) * | 2023-11-16 | 2024-02-06 | 中国电子科技集团公司第十五研究所 | Malicious code gene detection method and device based on fine granularity labeling model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112597495B (en) | Malicious code detection method, system, equipment and storage medium | |
CN109829306A (en) | A kind of Malware classification method optimizing feature extraction | |
CN108875727B (en) | The detection method and device of graph-text identification, storage medium, processor | |
CN113344910B (en) | Defect labeling image generation method and device, computer equipment and storage medium | |
CN113344826B (en) | Image processing method, device, electronic equipment and storage medium | |
CN112088378A (en) | Image hidden information detector | |
CN116644422A (en) | Malicious code detection method based on malicious block labeling and image processing | |
CN111835769A (en) | Malicious traffic detection method, device, equipment and medium based on VGG neural network | |
CN116910752B (en) | Malicious code detection method based on big data | |
CN114596290A (en) | Defect detection method, defect detection device, storage medium, and program product | |
CN114581646A (en) | Text recognition method and device, electronic equipment and storage medium | |
CN113689338B (en) | Method for generating scaling robustness countermeasure patch | |
CN113468905B (en) | Graphic code identification method, graphic code identification device, computer equipment and storage medium | |
CN113139618B (en) | Robustness-enhanced classification method and device based on integrated defense | |
CN113222053B (en) | Malicious software family classification method, system and medium based on RGB image and Stacking multi-model fusion | |
CN113360911A (en) | Malicious code homologous analysis method and device, computer equipment and storage medium | |
CN116595525A (en) | Threshold mechanism malicious software detection method and system based on software map | |
CN113065407B (en) | Financial bill seal erasing method based on attention mechanism and generation countermeasure network | |
CN111191238A (en) | Webshell detection method, terminal device and storage medium | |
CN105872304A (en) | Steganography method based on carrier error vector | |
Sun et al. | Optimized LSB Matching Steganography Based on Fisher Information. | |
CN114896594A (en) | Malicious code detection device and method based on image feature multi-attention learning | |
CN114387451A (en) | Training method, device and medium for abnormal image detection model | |
CN113553586A (en) | Virus detection method, model training method, device, equipment and storage medium | |
CN117496246A (en) | Malicious software classification method based on convolutional neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |