Knowledge-driven regression detection method and system for shell-added codes
Technical Field
The invention relates to the field of computer network security, in particular to a knowledge-driven regression detection method and system for shell-added codes.
Background
With the development and popularization of computer technology, computer networks are rapidly developed, and the number of malicious codes is exponentially increased. Early malicious code did not employ excessive self-protection mechanisms and all had fixed signatures. Therefore, antivirus software can easily detect virus programs hidden in the system by using virus feature code matching, but with the development of the technology, malicious codes adopt a self-protection technology to resist the detection of an antivirus engine, for example, the malicious codes are shelled (namely encrypted), so that the accuracy of the traditional detection mode is greatly reduced.
Currently, a single detection mode is mostly adopted for detecting the shell-added code, such as: the normalized feature detection is directly extracted from the encrypted ciphertext, but the detection result is not accurate; carrying out decompression detection on codes of a known compression algorithm, wherein the detection result is incomplete; and the execution mode of the dynamic virtual machine is adopted to select the key instruction for detection, but the detection efficiency is not high.
Disclosure of Invention
Based on the problems, the invention provides a knowledge-driven regression detection method and system for the shell-added codes, the balance of accuracy, comprehensiveness and high efficiency is achieved through cooperative detection of a ciphertext knowledge base and a plaintext knowledge base, and finally, the detection knowledge base is continuously updated in a regression mode to adapt to continuous changes of detection codes and detection modes.
Firstly, the invention provides a knowledge-driven regression detection method for a shell-added code, which comprises the following steps:
establishing a characteristic comprehensive database, and acquiring a to-be-detected shell-added sample;
placing the decompressed codes of the to-be-detected shell-added samples into a sample interpreter for short feature matching, if the matching is successful, decrypting the to-be-detected shell-added samples according to a corresponding decryption algorithm of the short features in an inference machine, extracting the short features from the decrypted data, and inputting the short features into a plaintext knowledge base;
otherwise, extracting features of the shelling sample to be detected directly through an inference machine, and inputting the features into a ciphertext knowledge base.
In the method, the characteristic comprehensive database consists of a plaintext knowledge base and a ciphertext knowledge base.
In the method, the short features comprise feature content and algorithm key position information.
In the method, the to-be-detected shell-added sample is decrypted according to the corresponding decryption algorithm of the short features in the inference engine, specifically, the to-be-detected shell-added sample is decrypted by matching the corresponding decryption algorithm according to the feature content and the key position information of the algorithm in the short features.
In the method, the characteristics of the to-be-detected shelling sample are directly extracted through an inference machine, and the method specifically comprises the following steps:
acquiring corresponding data from the to-be-detected shell-added sample according to the known dynamic characteristics and static characteristics, and setting the score of each characteristic;
respectively calculating the total scores of the dynamic features and the static features according to an entropy weight information method;
performing comprehensive evaluation on the total scores of the dynamic features and the static features by adopting a complex analysis method based on the poiicare measurement, and determining and selecting the dynamic features or the static features;
and taking the selected dynamic characteristics or static characteristics as the characteristics of the to-be-detected shelled sample, and recording the characteristics into a ciphertext knowledge base.
In the method, after the shell-added sample to be detected is obtained, the method further comprises the following steps: and extracting the code segment characteristics of the shell sample to be detected, matching the hash value of the characteristics with the existing characteristics in the characteristic comprehensive database, and directly outputting a judgment result if the matching is successful.
The invention also provides a knowledge-driven regression detection system for the shell-added codes, which comprises the following steps:
the database module is used for establishing a characteristic comprehensive database;
the acquisition module acquires a to-be-detected shell sample;
the sample interpreter module is used for putting the decompressed codes of the to-be-detected shell-added samples into the sample interpreter for short feature matching, and if the matching is successful, the short feature matching enters the inference engine module;
the inference engine module is used for decrypting the to-be-detected shell-added sample according to a corresponding decryption algorithm of the short features in the inference engine, extracting the short features from the decrypted data and inputting the short features into a plaintext knowledge base; otherwise, the inference machine directly extracts the characteristics of the shelling sample to be detected and inputs the characteristics into the ciphertext knowledge base.
In the system, the characteristic comprehensive database consists of a plaintext knowledge base and a ciphertext knowledge base.
In the system, the short features include feature content and algorithm key location information.
In the system, the to-be-detected shell-added sample is decrypted according to the corresponding decryption algorithm of the short features in the inference engine, specifically, the to-be-detected shell-added sample is decrypted by matching the corresponding decryption algorithm according to the feature content and the key position information of the algorithm in the short features.
In the system, the characteristics of the to-be-detected shelling sample are directly extracted through an inference machine, and the method specifically comprises the following steps:
acquiring corresponding data from the to-be-detected shell-added sample according to the known dynamic characteristics and static characteristics, and setting the score of each characteristic;
respectively calculating the total scores of the dynamic features and the static features according to an entropy weight information method;
performing comprehensive evaluation on the total scores of the dynamic features and the static features by adopting a complex analysis method based on the poiicare measurement, and determining and selecting the dynamic features or the static features;
and taking the selected dynamic characteristics or static characteristics as the characteristics of the to-be-detected shelled sample, and recording the characteristics into a ciphertext knowledge base.
In the system, after obtaining the to-be-detected shell-added sample, the method further comprises the following steps: and extracting the code segment characteristics of the shell sample to be detected, matching the hash value of the characteristics with the existing characteristics in the characteristic comprehensive database, and directly outputting a judgment result if the matching is successful.
Accordingly, the present invention proposes a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the shelled code regression detection method according to any one of claims 1 to 6.
Accordingly, the present invention provides an electronic device, comprising: the device comprises a shell, a processor, a memory, a circuit board and a power supply, wherein the circuit board is arranged in a space enclosed by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to each circuit or device of the electronic apparatus; the memory is used for storing executable program codes; the processor executes the program corresponding to the executable program code by reading the executable program code stored in the memory, and executes the method flow.
The technical scheme of the invention has the advantages that a plurality of detection modes can be integrated, the plaintext knowledge base and the ciphertext knowledge base form a characteristic integrated database, accurate, comprehensive and efficient detection is realized through cooperative detection of the ciphertext and the plaintext, finally, a regression mode is adopted to update the knowledge base, and a certain degree of characteristic extraction is carried out on a sample added by using the current unknown compression algorithm to adapt to the continuous change of a detection code and a detection mode.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart of an embodiment of a knowledge-driven regression detection method for shell-added codes according to the present invention;
FIG. 2 is a schematic diagram of a knowledge-driven regression detection system for shell-added codes according to the present invention;
fig. 3 is a schematic structural diagram of an embodiment of an electronic device according to the present invention.
Detailed Description
In order to make the technical solutions in the embodiments of the present invention better understood and make the above objects, features and advantages of the present invention more comprehensible, the technical solutions of the present invention are described in further detail below with reference to the accompanying drawings.
First, the present invention provides a knowledge-driven regression detection method for shell-added codes, as shown in fig. 1, including:
s101: establishing a characteristic comprehensive database, and acquiring a to-be-detected shell-added sample;
s102: placing the decompressed code of the shell sample to be detected into a sample interpreter for short feature matching, if the matching is successful, executing S103, otherwise executing S104; decompression code of the shelled sample, typically the second half of the second section of the shelled sample;
s103: decrypting the shell-added sample to be detected according to a corresponding decryption algorithm of the short features in the inference machine, extracting the short features from the decrypted data, and inputting the short features into a plaintext knowledge base;
s104: and directly extracting features of the shelling sample to be detected through an inference machine, and inputting the features into a ciphertext knowledge base.
In the method, the characteristic comprehensive database consists of a plaintext knowledge base and a ciphertext knowledge base.
In the method, the short features comprise feature content and algorithm key position information.
The short feature contains key position information of the algorithm, because in the encryption process, when the same code is encrypted, the first occurring position and offset of the same code can be adopted to finish the encryption, so that in the decryption process, the offset range can be used as the feature, the accuracy can be ensured, and the feature length can be ensured to be relatively short. And because the short features contain the key position information of the algorithm, the short features can form a mapping relation with the encryption algorithm.
In the method, the to-be-detected shell-added sample is decrypted according to the corresponding decryption algorithm of the short features in the inference engine, specifically, the to-be-detected shell-added sample is decrypted by matching the corresponding decryption algorithm according to the feature content and the key position information of the algorithm in the short features.
In the method, the characteristics of the to-be-detected shelling sample are directly extracted through an inference machine, and the method specifically comprises the following steps:
acquiring corresponding data from the to-be-detected shell-added sample according to the known dynamic characteristics and static characteristics, and setting the score of each characteristic; since the threat level of a feature is different, scores need to be set for each feature,
respectively calculating the total scores of the dynamic features and the static features according to an entropy weight information method; such as: for the dynamic characteristics, only considering the device threat (disk sector modification), the system threat (registry modification) and the file IO threat (file reading and writing), different sub-scores are inevitable when the threat degrees of the three are different; for static features, the head 4K feature of the code section, the file icon feature, etc. The mapping of features to scores given by threat levels is based on a large number of analytical experiences.
The dynamic characteristic and the static characteristic are linearly independent, so that the dynamic characteristic and the static characteristic can be regarded as two mutually perpendicular vectors under a plane rectangular coordinate system, and subsequent analysis can be carried out through a complex function.
Performing comprehensive evaluation on the total scores of the dynamic features and the static features by adopting a complex analysis method based on the poiicare measurement, and determining and selecting the dynamic features or the static features;
and taking the selected dynamic characteristics or static characteristics as the characteristics of the to-be-detected shelled sample, and recording the characteristics into a ciphertext knowledge base.
The reason for adopting the method is that the ciphertext features to be extracted do not exist in the knowledge base, the compression algorithm of the ciphertext features is not known, and the feature extraction and detection need to be carried out in an inaccurate mode when a certain detection effect is required, and corresponding data can be found in the sample by adopting a mode of combining dynamic features and static features.
In the method, after the shell-added sample to be detected is obtained, the method further comprises the following steps: and extracting the code segment characteristics of the shell sample to be detected, matching the hash value of the characteristics with the existing characteristics in the characteristic comprehensive database, and directly outputting a judgment result if the matching is successful.
The invention also provides a knowledge-driven regression detection system for the shell-added code, which is shown in fig. 2 and comprises:
a database module 201 for establishing a characteristic comprehensive database;
the acquisition module 202 acquires a to-be-detected shell sample;
the sample interpreter module 203 is used for putting the decompressed codes of the to-be-detected shell-added samples into the sample interpreter for short feature matching, and if the matching is successful, the short feature matching enters the inference engine module;
the inference engine module 204 is used for decrypting the to-be-detected shell-added sample according to a corresponding decryption algorithm of the short features in the inference engine, extracting the short features from the decrypted data, and inputting the short features into a plaintext knowledge base; and if the short feature matching fails, the inference machine directly extracts features of the to-be-detected shelled sample and inputs the features into a ciphertext knowledge base.
In the system, the characteristic comprehensive database consists of a plaintext knowledge base and a ciphertext knowledge base.
In the system, the short features include feature content and algorithm key location information.
In the system, the to-be-detected shell-added sample is decrypted according to the corresponding decryption algorithm of the short features in the inference engine, specifically, the to-be-detected shell-added sample is decrypted by matching the corresponding decryption algorithm according to the feature content and the key position information of the algorithm in the short features.
In the system, the characteristics of the to-be-detected shelling sample are directly extracted through an inference machine, and the method specifically comprises the following steps:
acquiring corresponding data from the to-be-detected shell-added sample according to the known dynamic characteristics and static characteristics, and setting the score of each characteristic;
respectively calculating the total scores of the dynamic features and the static features according to an entropy weight information method;
performing comprehensive evaluation on the total scores of the dynamic features and the static features by adopting a complex analysis method based on the poiicare measurement, and determining and selecting the dynamic features or the static features;
and taking the selected dynamic characteristics or static characteristics as the characteristics of the to-be-detected shelled sample, and recording the characteristics into a ciphertext knowledge base.
In the system, after obtaining the to-be-detected shell-added sample, the method further comprises the following steps: and extracting the code segment characteristics of the shell sample to be detected, matching the hash value of the characteristics with the existing characteristics in the characteristic comprehensive database, and directly outputting a judgment result if the matching is successful.
Accordingly, the present invention proposes a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the shelled code regression detection method according to any one of claims 1 to 6.
Accordingly, the present invention provides an electronic device, as shown in fig. 3, the electronic device includes: the device comprises a shell 301, a processor 302, a memory 303, a circuit board 304 and a power supply 305, wherein the circuit board is arranged inside a space enclosed by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to each circuit or device of the electronic apparatus; the memory is used for storing executable program codes; the processor executes the program corresponding to the executable program code by reading the executable program code stored in the memory, and executes the method flow.
The technical scheme of the invention has the advantages that a plurality of detection modes can be integrated, the plaintext knowledge base and the ciphertext knowledge base form a characteristic integrated database, accurate, comprehensive and efficient detection is realized through cooperative detection of the ciphertext and the plaintext, finally, a regression mode is adopted to update the knowledge base, and a certain degree of characteristic extraction is carried out on a sample added by using the current unknown compression algorithm to adapt to the continuous change of a detection code and a detection mode.
While the present invention has been described with respect to the embodiments, those skilled in the art will appreciate that there are numerous variations and permutations of the present invention without departing from the spirit of the invention, and it is intended that the appended claims cover such variations and modifications as fall within the true spirit of the invention.