CN108363922B

CN108363922B - Automatic malicious code simulation detection method and system

Info

Publication number: CN108363922B
Application number: CN201710974003.9A
Authority: CN
Inventors: 黄云宇; 康学斌; 刘广柱; 王小丰; 肖新光
Original assignee: Beijing Ahtech Network Safe Technology Ltd
Current assignee: Beijing Ahtech Network Safe Technology Ltd
Priority date: 2017-10-19
Filing date: 2017-10-19
Publication date: 2020-02-07
Anticipated expiration: 2037-10-19
Also published as: CN108363922A

Abstract

The invention provides an automatic malicious code simulation detection method and system, wherein the method comprises the following steps: the method comprises the steps of establishing a special protocol database, a special protocol knowledge base, a general protocol database and a general protocol knowledge base by performing family classification on a known malicious code sample set and extracting information interaction data; acquiring a network request of a malicious sample, and identifying a family and a special communication protocol of the malicious sample according to a special protocol knowledge base; identifying the communication request type and the universal communication protocol of the malicious sample according to a universal protocol knowledge base; and automatically simulating a general protocol and a special protocol of the malicious codes to trigger the malicious behaviors of the malicious codes. By the technical scheme, a malicious behavior operation execution mechanism can be triggered, the automatic malicious code analysis capability of the sandbox is optimized, and the manual analysis investment is greatly reduced.

Description

Automatic malicious code simulation detection method and system

Technical Field

The invention relates to the field of computer network security, in particular to an automatic malicious code simulation detection method and system.

Background

The existing malicious code sandbox simulation detection method is used for analyzing and detecting a malicious code structure or monitoring malicious code behaviors, determining behavior characteristics and giving corresponding feedback to realize sandbox detection. For the malicious code sample with inactivated C2, the sandbox detection system can only detect the normal behavior or the malicious behavior of the initial stage of the malicious code without acquiring normal communication with C2, and the core malicious function of the malicious code can be represented only when the malicious code acquires the relevant instruction of C2.

Disclosure of Invention

In order to solve the problems, the invention provides an automatic malicious code simulation detection method and system, which simulate the normal communication of C2 by simulating the response feedback of each communication protocol, further stimulate the behavior of malicious codes and carry out deep detection on the malicious codes.

The invention firstly provides an automatic malicious code simulation detection method, which comprises the following steps:

carrying out family classification on a known malicious code sample set, extracting information interaction data, and establishing a special protocol database, a special protocol knowledge base, a general protocol database and a general protocol knowledge base;

acquiring a network request of a malicious sample, and identifying a family and a special communication protocol of the malicious sample according to a special protocol knowledge base;

identifying the communication request type and the universal communication protocol of the malicious sample according to a universal protocol knowledge base;

automatic simulation of a general protocol of malicious codes: according to the identified communication request type and the general communication protocol, response information is fed back to the malicious sample, and malicious behaviors of malicious codes are triggered;

automatically simulating a protocol special for malicious codes: and according to the identified malicious sample family and the special communication protocol, response information is fed back to the malicious sample, and the malicious behavior of the malicious code is further triggered.

The method further comprises the following steps of automatically simulating the malicious code application layer protocol: the network access request is customized according to known malicious codes, a customized network access request knowledge base is established, an application layer protocol of a malicious sample is identified, the network access request is customized according to the identified application layer protocol, and response information is fed back to the malicious codes.

In the method, the family classification of the known malicious code sample set and the extraction of the information interaction data specifically include:

screening out samples which can normally communicate with a control terminal in each malicious code family for interaction, and acquiring communication interaction data and network behavior response data; the communication interaction data is used for establishing a special protocol database and a special protocol knowledge base; the network behavior response data is used for establishing a universal protocol database and a universal protocol knowledge base.

In the method, according to the identified malicious sample family and the special communication protocol, response information is fed back to the malicious sample, and the method comprises the following steps: the automatic CRC32 special protocol checking simulation specifically comprises the following steps:

and judging whether the malicious code family to which the malicious sample belongs carries out CRC32 special protocol check, if so, simulating to respond to a CRC32 special protocol check request to complete the check.

In the method, according to the identified malicious sample family and the special communication protocol, response information is fed back to the malicious sample, and the method further comprises the following steps: the automatic first package special protocol verification simulation specifically comprises the following steps:

judging the malicious code family to which the malicious sample belongs, and determining the corresponding first packet request family according to the protocol format of the first packet special protocol corresponding to the malicious code family and the special protocol knowledge base.

In the method, after the automatic first packet special protocol verification simulation, the method further comprises the following steps: the verification simulation of the special protocol for the automatic confirmation packet specifically comprises the following steps:

and after determining the corresponding first packet request family, sending corresponding special protocol data of the confirmation packet to the malicious sample.

In the method, after the verification simulation of the protocol dedicated for the automatic acknowledgement packet is completed, the method further comprises the following steps: the automatic heartbeat request verification simulation specifically comprises the following steps:

and identifying a special protocol in the heartbeat request sent by the malicious sample, finishing the identification of the special protocol, making a corresponding response, and sending various remote control request instructions which conform to the special protocol format to the malicious sample.

The invention also provides an automatic malicious code simulation detection system, which comprises:

the database module is used for establishing a special protocol database, a special protocol knowledge base, a general protocol database and a general protocol knowledge base by carrying out family classification on a known malicious code sample set and extracting information interaction data;

the special communication protocol identification module is used for identifying the family and the special communication protocol of the malicious sample according to the special protocol knowledge base by acquiring the network request of the malicious sample;

the universal communication protocol identification module is used for identifying the communication request type and the universal communication protocol of the malicious sample according to the universal protocol knowledge base;

the general protocol simulation module automatically simulates the general protocol of the malicious code: according to the identified communication request type and the general communication protocol, response information is fed back to the malicious sample, and malicious behaviors of malicious codes are triggered;

the special protocol simulation module automatically simulates the special protocol of the malicious code: and according to the identified malicious sample family and the special communication protocol, response information is fed back to the malicious sample, and the malicious behavior of the malicious code is further triggered.

The system also comprises an application protocol simulation module which automatically simulates the malicious code application layer protocol: the network access request is customized according to known malicious codes, a customized network access request knowledge base is established, an application layer protocol of a malicious sample is identified, the network access request is customized according to the identified application layer protocol, and response information is fed back to the malicious codes.

In the system, the family classification of the known malicious code sample set and the extraction of the information interaction data specifically include:

In the system, according to the identified malicious sample family and the special communication protocol, response information is fed back to the malicious sample, and the method comprises the following steps: the automatic CRC32 special protocol checking simulation specifically comprises the following steps:

In the system, according to the identified malicious sample family and the special communication protocol, response information is fed back to the malicious sample, and the method further comprises the following steps: the automatic first package special protocol verification simulation specifically comprises the following steps:

In the system, after the automatic first packet special protocol verification simulation, the method further comprises the following steps: the verification simulation of the special protocol for the automatic confirmation packet specifically comprises the following steps:

In the system, after the verification and simulation of the protocol dedicated for the automatic acknowledgement packet is completed, the method further comprises the following steps: the automatic heartbeat request verification simulation specifically comprises the following steps:

The invention has the advantages that: triggering a malicious behavior operation execution mechanism of the malicious code, inducing the malicious code to execute the malicious operation, and comprehensively triggering a core function of the malicious code; the automatic malicious code analysis capability of the sandbox is optimized, and the automatic sample analysis capability, accuracy and comprehensiveness of the sandbox are improved; the automatic simulation system can automatically interact with the malicious codes and trigger the core behaviors of the malicious codes, so that the automatic analysis of the comprehensive malicious codes is realized, and the manual analysis investment can be greatly reduced.

The technical scheme provided by the invention mainly aims at perfecting the deep culture and dynamic analysis of the malicious codes, and further excavates the potential information value of the deep culture of the malicious codes. The method comprises the steps of simulating normal communication interaction with a Server end by establishing a Client end (Commandand Control, hereinafter referred to as C2) of a simulation botnet, triggering more execution mechanisms of the simulation botnet, executing more malicious behaviors, prompting a sandbox to monitor and obtain more malicious code behavior analysis results, and further improving the detection rate and accuracy of the sandbox on malicious codes. For the malicious code sample with inactivated C2, the sandbox detection system can only detect the normal/malicious behaviors of the initial stage of the malicious code without acquiring normal communication with C2, and the core malicious function of the malicious code can be represented only if the malicious code acquires the relevant instructions of C2.

The invention aims to simulate the communication interaction between the C2 and the malicious code, activate the malicious code, send the response of the related protocol instruction request to the malicious code, meet the communication type and the universal and special protocol requests of the malicious code, trigger the malicious behavior operation execution mechanism of the malicious code, and induce the malicious code to execute the malicious operation, so as to optimize the automatic malicious code analysis capability of the sandbox and greatly reduce the manual analysis investment.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flowchart of an embodiment of an automated malicious code detection method according to the present invention;

FIG. 2 is a schematic structural diagram of an embodiment of an automated malicious code simulation detection system according to the present invention;

fig. 3 is a schematic structural diagram of an embodiment of a computer device according to the present invention.

Detailed Description

In order to make the technical solutions in the embodiments of the present invention better understood and make the above objects, features and advantages of the present invention more comprehensible, the technical solutions of the present invention are described in further detail below with reference to the accompanying drawings.

The invention provides an automatic malicious code simulation detection method and system, which simulate C2 normal communication by simulating response feedback of each communication protocol, further stimulate malicious code behaviors and carry out deep detection on malicious codes.

The invention firstly provides an automatic malicious code simulation detection method, as shown in fig. 1, comprising:

s101: carrying out family classification on a known malicious code sample set, extracting information interaction data, and establishing a special protocol database, a special protocol knowledge base, a general protocol database and a general protocol knowledge base;

the malicious code samples are subjected to family classification, because the classification of the malicious code families is generally consistent with the classification of the malicious code through a network protocol, in order to improve the working efficiency and avoid the waste of working resources caused by multiple analyses of the malicious codes of the same family, a large number of malicious samples are required to be subjected to family classification before the reverse analysis of the network protocol of the malicious codes.

S102: acquiring a network request of a malicious sample, and identifying a family and a special communication protocol of the malicious sample according to a special protocol knowledge base;

the method is used for realizing the automatically identified family name in the simulation of the later-period special protocol, and feeds back accurate family protocol response request data to provide a knowledge basis.

S103: identifying the communication request type and the universal communication protocol of the malicious sample according to a universal protocol knowledge base;

for common communication requests such as DNS (domain name system) requests, HTTP (hyper text transport protocol) requests, FTP (file transfer protocol) requests, GET (GET) requests and the like, various request types need to be identified in detail for automatic identification according to the request types and protocols in a later-stage universal layer protocol environment simulation stage, preset IP (internet protocol) or file data are accurately returned, an execution mechanism of malicious codes is triggered, and the malicious codes are induced to interact with a simulated system to provide knowledge bases.

S104: automatic simulation of a general protocol of malicious codes: according to the identified communication request type and the general communication protocol, response information is fed back to the malicious sample, and malicious behaviors of malicious codes are triggered;

aiming at the common network access requests of the malicious codes, the simulation subsystem needs to realize automation and timely and accurately respond to various common access requests of the malicious codes. By taking the automatic identification of the prior general protocol and the knowledge base as the basis, the request type of the malicious code is automatically identified, the response information conforming to the network logic and the malicious code function logic is automatically fed back according to the request type, and the C2 role of the malicious code can be replaced.

S105: automatically simulating a malicious code application layer protocol: customizing a network access request according to a known malicious code, establishing a customized network access request knowledge base, identifying an application layer protocol of a malicious sample, customizing the network access request according to the identified application layer protocol, and feeding back response information to the malicious code;

in a general protocol, there may be a configurable request different from a common general network access, in which case, using a common general protocol emulation feedback cannot satisfy an execution mechanism for triggering a later malicious behavior of a malicious code, and therefore, it is necessary to implement an automated application layer protocol customization emulation. By quantitatively researching the common customization parameter form of the network protocol layer of the malicious code, various customization network access request knowledge is accumulated, and a customization network access request knowledge base is established, so that the simulation subsystem realizes accurate response to the application layer protocol customization request.

S106: automatically simulating a protocol special for malicious codes: and according to the identified malicious sample family and the special communication protocol, response information is fed back to the malicious sample, and the malicious behavior of the malicious code is further triggered.

The above is merely an example of the flow according to the technical solution of the present application, and the execution order is not limited, and the execution order of the related steps may be changed.

The communication interaction data may include, for example: the method comprises the following steps of carrying out remote instruction requests such as OpenHeart requests of malicious codes, Sure response requests of C2 to the malicious codes, Ping requests, C2 attack issuing to the malicious codes and the like; the network behavior response data may include, for example: response data obtained by common network behaviors such as DNS (domain name system) requests, HTTP (hyper text transport protocol) requests and FTP (file transfer protocol) requests of malicious codes;

and judging whether the malicious code family to which the malicious sample belongs carries out CRC32 special protocol check, if so, simulating to respond to a CRC special protocol check request to complete the check.

In a small part of malicious code families, after the malicious code establishes network communication with the C2 handshake three times, a protocol check special for CRC32, which is generally called a Hello request, needs to be performed, and only after the two parties are checked by the Hello request protocol, the malicious code sends an OpenHeart request to C2. For example, in XorDDoS, Mirai, and other families, malicious code sends a Hello specific protocol check request to C2 first, and sends an OpenHeart request to C2 after successful check.

After establishing network communication with the C2 handshake three times, common malicious codes send OpenHeart dedicated protocol requests to the C2, but OpenHeart requests of malicious codes of each family have their own proprietary protocol format, and the corresponding C2 also has an OpenHeart protocol format checking mechanism that is matched with the OpenHeart request, and is used to check whether the malicious codes of the OpenHeart request belong to the family. In the OpenHeart special protocol verification stage, the simulation subsystem needs to establish an OpenHeart special protocol verification mechanism through a protocol knowledge base, so that the OpenHeart request family can be quickly and accurately identified.

C2 sends the Sure special protocol data as the response to the OpenHeart special protocol request of the malicious code after the OpenHeart protocol format verification is successful, the malicious code also checks the Sure special protocol format after receiving the feedback data Sure responded by C2, and the two parties can normally carry out communication interaction only after the Sure verification is successful. Sure-specific protocol checks are common in large-scale botnet families, and both Mayday, Xor, Gates, Dofloo, Gafgyt, Mirai, Gh0st and the like exist.

After the protocol is checked, normal communication interaction is realized, malicious codes can send network requests with data such as memory utilization rate, CPU utilization rate, network bandwidth and the like to the C2 at specified time intervals, which are generally called heartbeat/Ping requests, the C2 can also send instruction requests such as remote desktop control, CMD, file download, DDoS start/stop attack, file update and the like to the malicious codes at any time, and the communication phase can trigger more malicious behaviors of the malicious codes. However, Ping and various remote control command requests of each family are different, the request protocols of the two parties can trigger various behaviors only through proprietary protocol verification, and if the request protocols are not verified successfully, protocol analysis exception occurs, so that the serious consequence that a target behavior is not triggered, and even a process or a system is triggered to crash due to exception is caused. Therefore, in order to find more malicious behaviors of the malicious code, the emulation subsystem is required to be capable of accurately identifying various proprietary protocol requests of each malicious code family, making corresponding proprietary protocol responses, and then sending various instruction requests conforming to the proprietary protocol format of the malicious code family to the malicious code, so that the malicious code is induced to trigger more comprehensive core malicious behaviors.

The present invention further provides an automated malicious code simulation detection system, as shown in fig. 2, including:

the database module 201 is used for establishing a special protocol database, a special protocol knowledge base, a general protocol database and a general protocol knowledge base by carrying out family classification on a known malicious code sample set and extracting information interaction data;

the special communication protocol identification module 202 is used for identifying the family and the special communication protocol of the malicious sample according to the special protocol knowledge base by acquiring the network request of the malicious sample;

the universal communication protocol identification module 203 identifies the communication request type and the universal communication protocol of the malicious sample according to the universal protocol knowledge base;

the generic protocol simulation module 204 automatically simulates the generic protocol of the malicious code: according to the identified communication request type and the general communication protocol, response information is fed back to the malicious sample, and malicious behaviors of malicious codes are triggered;

the special protocol simulation module 205 automatically simulates the special protocol of the malicious code: and according to the identified malicious sample family and the special communication protocol, response information is fed back to the malicious sample, and the malicious behavior of the malicious code is further triggered.

The system further comprises an application protocol simulation module 206, which is used for automatically simulating the malicious code application layer protocol: the network access request is customized according to known malicious codes, a customized network access request knowledge base is established, an application layer protocol of a malicious sample is identified, the network access request is customized according to the identified application layer protocol, and response information is fed back to the malicious codes.

In addition, the present invention further provides a schematic structural diagram of a computer device according to an embodiment, as shown in fig. 3, the computer device includes a memory 301, a processor 302, and a computer program that is stored in the memory 301 and can be run on the processor 302, and when the processor 302 executes the computer program, the automatic malicious code simulation detection method according to the embodiment is implemented; it may also include a communication interface for communicating between the memory 301 and the processor 302; the memory may comprise RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory; the processor 302 may be a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present invention; the memory 301 and the processor 302 may be disposed independently, or may be integrated on one chip.

In order to implement the above embodiments, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by the processor 302, implements the automated malicious code simulation detection method in the above embodiments.

From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment. While the present invention has been described with respect to the embodiments, those skilled in the art will appreciate that there are numerous variations and permutations of the present invention without departing from the spirit of the invention, and it is intended that the appended claims cover such variations and modifications as fall within the true spirit of the invention.

Claims

1. An automated malicious code emulation detection method, comprising:

2. The method of claim 1, further comprising automatically emulating malicious code application layer protocols: the network access request is customized according to known malicious codes, a customized network access request knowledge base is established, an application layer protocol of a malicious sample is identified, the network access request is customized according to the identified application layer protocol, and response information is fed back to the malicious codes.

3. The method of claim 1, wherein the family classification of the known malicious code sample set and the extraction of the information interaction data specifically include:

4. The method of claim 1, wherein feeding back response information to the malicious samples according to the identified malicious sample family and the proprietary communication protocol comprises: the automatic CRC32 special protocol checking simulation specifically comprises the following steps:

5. The method of any one of claims 1-4, wherein feeding back response information to the malicious sample based on the identified malicious sample family and the proprietary communication protocol, further comprising: the automatic first package special protocol verification simulation specifically comprises the following steps:

6. The method of claim 5, wherein after the automated header specific protocol check emulation, further comprising: the verification simulation of the special protocol for the automatic confirmation packet specifically comprises the following steps:

7. The method of claim 6, wherein after the completion of the automated acknowledgement packet specific protocol check emulation, further comprising: the automatic heartbeat request verification simulation specifically comprises the following steps:

8. An automated malicious code emulation detection system, comprising:

9. The system of claim 8, further comprising an application protocol emulation module that automates emulation of malicious code application layer protocols by: the network access request is customized according to known malicious codes, a customized network access request knowledge base is established, an application layer protocol of a malicious sample is identified, the network access request is customized according to the identified application layer protocol, and response information is fed back to the malicious codes.

10. The system of claim 8, wherein the family classification of the known malicious code sample set and the extraction of the information interaction data specifically include:

11. The system of claim 8, wherein feeding back response information to the malicious samples according to the identified malicious sample family and the proprietary communication protocol comprises: the automatic CRC32 special protocol checking simulation specifically comprises the following steps:

12. The system according to any one of claims 8-11, wherein the feedback of response information to the malicious sample based on the identified malicious sample family and the proprietary communication protocol further comprises: the automatic first package special protocol verification simulation specifically comprises the following steps:

13. The system of claim 12, wherein after the automated first-packet-specific protocol verification emulation, further comprising: the verification simulation of the special protocol for the automatic confirmation packet specifically comprises the following steps:

14. The system of claim 13, wherein after completion of the automated acknowledgement packet specific protocol verification emulation, further comprising: the automatic heartbeat request verification simulation specifically comprises the following steps:

15. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing an automated malicious code emulation detection method according to any one of claims 1 to 7 when executing the program.

16. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the automated malicious code emulation detection method of any of claims 1 to 7.