WO2014089744A1

WO2014089744A1 - Method and apparatus for detecting malicious code

Info

Publication number: WO2014089744A1
Application number: PCT/CN2012/086302
Authority: WO
Inventors: 诸葛建伟; 钱晓斌; 侯永干; 富键; 陆恂; 王若愚
Original assignee: 华为技术有限公司
Priority date: 2012-12-10
Filing date: 2012-12-10
Publication date: 2014-06-19
Also published as: CN103221960A; CN103221960B

Abstract

The present invention relates to the communications security field. Provided is a method for detecting malicious code, which comprises: extracting JavaScript script code from a PDF file (101); starting a pre-selected script interpreter that supports a PDF standard to run a de-confusion process on the JavaScript script code, and obtaining code information corresponding to the JavaScript script code according to the de-confusion process, wherein a type of the code information comprises an operation code and a character string variable (102); and according to a detection rule corresponding to the type of the code information, detecting whether the JavaScript script code is malicious code (103). Also provided is an apparatus for detecting malicious code. The method and the apparatus can improve accuracy in detecting malicious JavaScript code.

Description

Method and device for detecting malicious code

The present invention relates to the field of communication security technologies, and in particular, to a method and apparatus for detecting malicious code.

Background technique

PDF (portal document format) is an electronic file format. This format is not limited by reading software, hardware, and operating system and can be used on any platform including Windows, Linux, and Mac OS. JavaScript is a scripting language widely used for client-side web development. This scripting language is very versatile. Embedding JavaScript scripting languages in PDFs is important for realizing the interactive nature of PDF files, such as the presentation of dynamic content, tables, and 3D interfaces.

A malicious JavaScript script is a new type of virus in malicious attack code that adds, changes, or deletes part of a script to a software system to create a hazard or compromise the integrity, confidentiality, availability, etc. of computer system functions and networks. It is usually written in a JavaScript scripting language. The malicious JavaScript scripts are written in a flexible form and are easily transformed by various code obfuscation techniques. It is difficult for current anti-virus technologies to achieve control and protection.

The spread of malicious JavaScript scripts is usually implemented through browsers, LAN sharing, instant messaging, and email. In recent years, with the growing maturity of PDF exploits, more and more malicious JavaScript has been placed in PDF files.

Code confusing, as the name suggests, is a technique that artificially makes the script code seem cluttered. In many commercial software, in order to protect copyright developers, the code may be confused to add difficulties to the reverse engineer. In malicious scripts, the use of obfuscation is to scan the virus signature database in anti-virus software and firewalls, and to create trouble for manual malicious attack code analysis.

Compared to malicious JavaScript scripts on web pages, malicious scripts in PDF make more use of some of the features in the PDF standard. Use letters and numbers in the definition of a file The hexadecimal code replaces the corresponding text, uses the PDF stream object to hide certain objects containing JavaScript scripts, and uses the encoding nesting function in the PDF stream object to process JavaScript scripts in a variety of encoding methods. Many of the existing obfuscation tools on the browser side cannot solve the confusion of JavaScript scripts that are confused with the above obfuscation methods, thus causing malicious scripts to spread attacks through PDF files. Common attack methods include malicious PDF files in web pages, and targeted fishing. The email contains malicious PDF file attachments, etc. The malicious PDF file refers to a PDF file carrying a malicious JavaScript script.

Existing methods for detecting scripts in PDF files are:

The simulation execution environment is executed to execute the detected PDF file, and by detecting the behavior of the detected PDF file in the normal system operating environment, a series of operations such as calling when the file is executed are detected, thereby discovering malicious behavior. However, this method can't detect such malicious behaviors because of the common JavaScript spoofing hidden means, such as JavaScript scripts in a PDF file that only display malicious behavior at specific time or depending on the specific plug-in.

In the process of implementing the present invention, the inventors have found that at least the following problems exist in the prior art: The malicious JavaScript code carried in the PDF file cannot be accurately detected, especially the malicious JavaScript code carried in the PDF file by obfuscation. . Summary of the invention

The embodiment of the invention provides a method and a device for detecting malicious code, which can improve the detection accuracy of malicious JavaScript code carried in a PDF file.

In order to achieve the above object, embodiments of the present invention use the following technical solutions:

In a first aspect, a method for detecting malicious code is provided, including:

Extract the JavaScript script code from the PDF file;

Launching a predetermined PDF standard-compliant script interpreter to run a de-obfuscation process on the JavaScript script code, and obtaining code information corresponding to the JavaScript script code according to the de-obfuming process, the type of the code information including an operation code and String variable

Detecting the JavaScript foot according to a detection rule corresponding to the type of the code information Whether this code is malicious code.

In a first possible implementation manner of the first aspect, before the starting a predetermined PDF standard-enabled script interpreter runs the de-obfuscation process on the JavaScript script code, the method further includes:

Injecting the library file into the de-obfuscation process of the script interpreter, the library file is used to obtain code information corresponding to the de-scrambled JavaScript script code generated by the script interpreter in the process of disambiguating the JavaScript script code .

With reference to the first aspect and the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, if the type of the code information is an operation code, the type according to the code information Corresponding detection rules, detecting whether the JavaScript script code is malicious code includes:

Matching the opcode corresponding to the JavaScript script code in the stored malicious opcode feature library;

If the opcode is matched in the stored malicious opcode signature database, determining that the JavaScript script code is malicious code;

If the opcode is not matched in the stored malicious opcode signature library, it is determined that the JavaScript script code is not malicious code.

With reference to the first aspect, and the first possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, if the type of the code information is a string variable, the information according to the code information The detection rule corresponding to the type, detecting whether the JavaScript script code is malicious code includes:

Obtaining the length of the string variable corresponding to the JavaScript script code;

If the length of the string variable is in the first interval, acquiring a first feature parameter corresponding to the string variable; determining, according to the stack overflow detection model and the first feature parameter, whether the JavaScript script code is malicious code ;

If the length of the string variable is in the second interval, acquiring a second feature parameter corresponding to the string variable; determining, according to the heap injection detection model and the second feature parameter, Whether the JavaScript script code is malicious code.

In conjunction with the third possible implementation of the first aspect, in a fourth possible implementation manner of the first aspect, the first characteristic parameter includes a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and whether a fingerprint of a known shelling code is included a combination of one or more of the plurality; the second characteristic parameter comprising a combination of one or more of a string information entropy value and a N 0 P instruction occurrence frequency.

With reference to the first aspect and the first possible implementation manner of the first aspect, in a fifth possible implementation manner of the first aspect, if the type of the code information is an operation code and a character string, the according to the code The detection rule corresponding to the type of the information, and detecting whether the JavaScript script code is malicious code includes:

Detecting the JavaScript script code according to a detection rule corresponding to the operation code; and

Determining, according to the detection rule corresponding to the character string, the JavaScript script code; determining, according to the detection rule corresponding to the operation code, that the JavaScript script code is a malicious code, or according to a detection rule corresponding to the string variable, When the JavaScript script code is determined to be malicious code, determining that the JavaScript script code is malicious code;

Determining that the JavaScript script code is not malicious code according to the detection rule corresponding to the operation code, and determining that the JavaScript script code is not malicious code according to the detection rule corresponding to the string variable JavaScript script code is not malicious code;

The detecting rule corresponding to the operation code, detecting whether the JavaScript script code is malicious code includes:

Matching an operation code corresponding to the JavaScript script code in the stored malicious operation code feature database; if the operation code is matched in the stored malicious operation code feature library, determining that the JavaScript script code is a malicious code; If the opcode is not matched in the stored malicious opcode signature database, determining that the JavaScript script code is not malicious code; and detecting the JavaScript foot according to the detection rule corresponding to the string variable Whether this code is malicious code includes:

Obtaining a length of the string variable corresponding to the JavaScript script code; if the length of the string variable is in the third interval, acquiring a third feature parameter corresponding to the string variable; according to the stack overflow detection model and the a third feature parameter, determining whether the JavaScript script code is malicious code; if the length of the string variable is in the fourth interval, acquiring a fourth feature parameter corresponding to the string variable; according to the heap injection detection model and the The fourth characteristic parameter determines whether the JavaScript script code is malicious code.

With reference to the fifth possible implementation manner of the first aspect, in a sixth possible implementation manner of the first aspect, the third characteristic parameter includes a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and whether a fingerprint of a known shelling code is included a combination of one or more of the following; the fourth characteristic parameter comprises a combination of one or more of a string information entropy value and a N 0 P instruction occurrence frequency.

In a second aspect, a detection device for malicious code is provided, including:

a decryption module, configured to extract JavaScript script code in a PDF file; start a predetermined script interpreter supporting the PDF standard to run a de-obfuscation process on the JavaScript script code, and obtain the JavaScript script code according to the de-obfuscation process Corresponding code information, the type of the code information includes an operation code and a string variable;

And a detecting module, configured to detect, according to the detection rule corresponding to the type of the code information obtained by the de-obfuscating module, whether the JavaScript script code is malicious code.

In a first possible implementation of the second aspect, the apparatus further includes: an instrumentation injection module, configured to inject a library file into a de-obfuscation process in which the script interpreter runs the JavaScript script code The library file is used to obtain code information corresponding to the de-scrambled JavaScript script code generated by the script interpreter in the process of disambiguating the JavaScript script code.

With reference to the second aspect, and the first possible implementation manner of the first aspect, in a second possible implementation manner of the second aspect, the detecting module includes:

a first matching unit, configured to: if the type of the code information is an operation code, match an operation code corresponding to the JavaScript script code in a stored malicious operation code feature library; a first determining unit, configured to: when the first matching unit matches the operation code in the stored malicious operation code feature library, determine that the JavaScript script code is malicious code; and When the matching unit does not match the opcode in the stored malicious opcode feature library, it is determined that the JavaScript script code is not malicious code.

With reference to the second possible implementation of the second aspect, in a third possible implementation manner of the second aspect, the detecting module includes:

a first obtaining unit, configured to obtain a length of a string variable corresponding to the JavaScript script code if the type of the code information is a string variable;

a first determining unit, configured to acquire a first feature parameter corresponding to the string variable when the length of the string variable acquired by the first acquiring unit is in the first interval; and the stack overflow detection model and the first a feature parameter, determining whether the JavaScript script code is malicious code; and acquiring, when the length of the string variable acquired by the first acquiring unit is in the second interval, acquiring the second feature corresponding to the string variable a parameter; determining, according to the heap injection detection model and the second characteristic parameter, whether the JavaScript script code is malicious code.

With reference to the second aspect, and the first possible implementation manner of the first aspect, in a fourth possible implementation manner of the second aspect, the first characteristic parameter includes a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and a known a combination of one or more of the shelling code fingerprints; the second characteristic parameter comprising a combination of one or more of a string information entropy value and a NOP instruction occurrence frequency.

With reference to the fourth possible implementation of the second aspect, in a fifth possible implementation manner of the second aspect, the detecting module is specifically configured to: if the type of the code information is an opcode and a string variable, according to the operation Detecting the JavaScript script code corresponding to the detection rule corresponding to the code; and detecting the JavaScript script code according to the detection rule corresponding to the string variable;

And determining, according to the detection rule corresponding to the operation code, determining that the JavaScript script code is malicious code or according to a detection rule corresponding to the string variable, determining the When the JavaScript script code is malicious code, the JavaScript script code is determined to be malicious code;

And determining, when the JavaScript script code is not malicious code according to the detection rule corresponding to the operation code, and determining that the JavaScript script code is not malicious code according to the check: j rule corresponding to the string variable , determining that the JavaScript script code is not malicious code;

The detecting module further includes:

a second matching unit, configured to match an opcode corresponding to the JavaScript script code in the stored malicious opcode;

a second determining unit, configured to: when the second matching unit determines that the operation code is matched in the stored malicious operation code feature database, determine that the JavaScript script code is malicious code; The second matching unit determines that the JavaScript script code is not malicious code when the operation code is not matched in the stored malicious operation code feature database; and the detecting module further includes:

a second obtaining unit, configured to acquire a length of a string variable corresponding to the JavaScript script code;

a second determining unit, configured to acquire a third feature parameter corresponding to the string variable according to a length of the string variable acquired by the second acquiring unit, and a third feature parameter corresponding to the string variable; a third characteristic parameter, determining whether the JavaScript script code is malicious code; and acquiring, when the length of the string variable acquired by the second obtaining unit is in the fourth interval, acquiring the corresponding string variable Four characteristic parameters; determining whether the JavaScript script code is malicious code according to the heap injection detection model and the fourth characteristic parameter.

In conjunction with the fifth possible implementation of the second aspect, in a sixth possible implementation manner of the second aspect, the third characteristic parameter includes a frequency of occurrence of the GetPC instruction, a frequency of occurrence of the flower instruction, and whether the fingerprint of the known shelling code is included a combination of one or more of the following; the fourth characteristic parameter comprises a combination of one or more of a string information entropy value and a N 0 P instruction occurrence frequency. In a third aspect, a detection apparatus is provided, including a memory and a processor, wherein: the memory is configured to store a code;

The processor is configured to read the code stored in the memory to perform the method provided by any of the first aspect, or any of the six possible implementations of the first aspect.

A method and device for detecting malicious code provided by an embodiment of the present invention, by executing a predetermined script interpreter supporting the PDF standard, running a de-obfuscation process on the JavaScript script code to obtain code information corresponding to the JavaScript script code, and Detecting whether the JavaScript script code is malicious code according to a detection rule corresponding to the type of the code information, can detect the malicious JavaScript code carried in the PDF file more accurately than the prior art.

DRAWINGS

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings to be used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any creative work.

1 is a flowchart of a method for detecting malicious code according to an embodiment of the present invention; FIG. 2 is a flowchart of another method for detecting malicious code according to an embodiment of the present invention; FIG. 3 is another embodiment of the present invention. FIG. 4 is a flowchart of a method for detecting malicious code according to an embodiment of the present invention; FIG. 5 is a block diagram of a device for detecting malicious code according to an embodiment of the present invention; 6 is a block diagram of a component of a detection device for another malicious code according to an embodiment of the present invention; FIG. 7 is a block diagram of a component of a detection module according to an embodiment of the present invention;

FIG. 8 is a structural block diagram of another detection module according to an embodiment of the present invention;

FIG. 9 is a structural block diagram of another detection module according to an embodiment of the present invention;

FIG. 10 is a structural block diagram of a detecting device according to an embodiment of the present invention.

detailed description The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

An embodiment of the present invention provides a method for detecting a malicious code, which can be executed by a detecting device. As shown in FIG. 1, the method includes:

101. Extract the JavaScript script code in the PDF file.

The JavaScript script is embedded in the PDF file, and can realize the display and synergistic function of the PDF file, but is also maliciously used by the attacker and the malicious code, and is used to exploit the vulnerability of the PDF reader software to infiltrate the host invaded. The PDF file to be detected may be derived from an attachment of an e-mail, a web page content, etc., and is not limited herein.

The method for extracting the JavaScript script code in the PDF file may specifically include: parsing an element position of the JavaScript stream in the PDF file according to an international common format specification of the PDF file, and compressing the encoding method according to the JavaScript stream. The corresponding decoding is performed to extract the JavaScript code contained in the PDF file.

102. Start a predetermined PDF standard-compliant script interpreter to run a de-obfuscation process on the JavaScript script code, and obtain code information corresponding to the JavaScript script code according to the de-obfuming process, where the type of the code information includes an operation. Code and string variables.

The script interpreter supporting the PDF standard (ie, supporting the PDF format specification and capable of parsing the PDF file generated by the PDF format specification) may be a script interpreter embedded in the PDF reader, wherein the PDF reader may be Any PDF reader having an embedded JavaScript script code interpretation engine, for example, the embodiment of the present invention uses a PDF reading application Acrobat reader provided by Adobe, and can be implemented by using a script interpreter embedded in Acrobat reader. Most JavaScript script code is confusing. The script interpreter supporting the PDF standard may be pre-configured by an administrator of the detecting device prior to detection. The code information refers to the information that the script interpreter that supports the PDF standard used in the embodiment of the present invention interprets and translates the script code and submits it to the JavaScript virtual machine for execution. Among them, the JavaScript virtual machine is an abstract computer that uses software simulation to run all JavaScript code. In this embodiment, the information submitted to the JavaScript virtual machine for execution contains information output by the script interpreter at different stages in the process of interpreting and translating the script code, and may include at least the following two categories: opcodes and string variables. .

Wherein, if the type of the code information includes an operation code, the operation code may be a command code used by the machine, and a typical operation code fragment is as follows:

[ 207] resol ve_g loba 1 r 3, Array (Sidl 0)

[ 212] ge t _by_ id r 1, r 3, prototype (Sidl 1)

[220] method-check

[ 221 ] get-by-id rO, rl, push (o)idl2)

[ 229] mov r2, Int32: 0 (o)k8)

[ 232] call rO, 2, 9

It should be noted that, in the present embodiment, for the sake of clarity, common English characters are used herein to illustrate the operation code, but in practice, the operation code may be expressed in binary.

Wherein, if the type of the code information includes a string variable, the string variable may be a string variable defined in the JavaScript script, and the typical existence form is as follows:

Var s tr = " some value ...,, ;

Thi s Var. replace ( "Monday" , "Friday" );

It should be noted that in malicious code, the value of a string variable itself may also be a compiled instruction. For example, the value of the string variable thisVar. replace can be a Unicode-encoded instruction.

The code information corresponding to the JavaScript script code can be obtained by the instrumentation method to obtain the code information corresponding to the JavaScript script code, and the specific plugging injection is performed. Please refer to the process Figure 4 and the corresponding text description. Moreover, the two different types of parameters, the opcode and the string variable, are intermediate parameters that may occur in different stages of the de-aliasing process. Therefore, the opcode and the string can be obtained by monitoring various steps of the de-obfuscation process. Variables are two different types of parameters.

103. Detect whether the JavaScript script code is malicious code according to a detection rule corresponding to the type of the code information.

It should be noted that the detection method provided by the embodiment of the present invention is different according to the type of the code information. The type of the code information is the operation code, the type of the code information is a character string, and the type of the code information is the operation code and In the three cases of the string, the step 103 is performed according to the detection rule corresponding to the type of the code information, and detecting whether the JavaScript script code is malicious code can be implemented by the following three detection methods, specifically:

The first method is shown in Figure 2, including:

Al031, if the type of the code information is an operation code, matching an operation code corresponding to the JavaScript script code in a stored malicious operation code feature library; if the code is matched in the stored malicious operation code feature library The operation code is executed in step a 032; if the operation code is not matched in the stored malicious operation code feature library, step a033 is performed.

Al032, determining that the JavaScript script code is malicious code.

Al033, determining that the JavaScript script code is not malicious code.

The malicious operation code feature library is a pattern library formed by the operation code sequence performed by the JavaScript script code in the PDF file, and the characteristic pattern of the V V E - 2009 - 0927 vulnerability is as follows:

Getmethod- " get Icon"

Getgvar " var_ 1 "

Call

The second method is shown in Figure 3, including:

Bl031, if the type of the code information is a string variable, obtaining a length of a string variable corresponding to the JavaScript script code; if the string variable is long If the degree is in the first interval, step bl 032 is performed; if the length of the string variable is in the second interval, step bl 034 is performed.

Bl032: Acquire a first feature parameter corresponding to the string variable.

Bl033, determining, according to the stack overflow detection model and the first feature parameter, whether the JavaScript script code is malicious code.

Bl034: Obtain a second feature parameter corresponding to the string variable.

Bl035, determining whether the JavaScript script code is malicious code according to the heap injection detection model and the second characteristic parameter.

a third method: if the type of the code information is an operation code and a string variable, detecting whether the JavaScript script code is malicious code according to a detection rule corresponding to the operation code; and, corresponding to the string variable Detection rules that detect if the JavaScript script code is malicious code. Determining, according to the detection rule corresponding to the operation code, the JavaScript script code as malicious code; or determining, according to the detection rule corresponding to the string variable, that the JavaScript script code is malicious code, determining the JavaScript script The code is a malicious code; when it is determined that the JavaScript script code is not malicious code according to the detection rule corresponding to the operation code, and according to the detection rule corresponding to the string variable, determining that the JavaScript script code is not malicious code , determining that the JavaScript script code is not malicious code.

In the above first method, the stored malicious operation code feature library is a feature corresponding to the operation code corresponding to the JavaScript script code that has confirmed malicious behavior in the technical field of the present invention, and the source of the feature may be Malicious operational code features disclosed by various authorities. In the embodiment of the present invention, the stored malicious operation code feature library is not fixed, and may be updated according to a certain period according to requirements.

In the second method, the first interval and the second interval are respectively set for two types of malicious code attack modes, such as stack overflow and heap injection. The setting method may refer to an empirical value. In general, the first interval may be set to 32-64K bytes, the second interval can be set to be larger than 64K bytes.

In the above second method, the first characteristic parameter may at least include a GetPC command a combination of one or more of a current frequency, a frequency of occurrence of a flower instruction, and a fingerprint of a known husking code; the second characteristic parameter may include at least one of a string information entropy value, a frequency of occurrence of a NOP instruction, or A variety of combinations. The GetPC instruction refers to the instruction used to locate its own virtual address in the She 11 code; the flower instruction is the code used to interfere with the disassembly engine correctly implementing the disassembly; the frequency at which the GetPC instruction and the flower instruction appear in the string can be used as a string There is a partial basis of the shellcode; the shelling code fingerprint means that the shelled shellcode will always unpack itself when it is executed. The characteristics of these shelling codes are the shelling code fingerprint, and the existence of the fingerprint can be used as part of the existence of the shellcode; The string information entropy is an indicator for measuring the amount of string information. If the string information entropy is less than a certain threshold, there may be a heap injection; the NOP instruction is a CPU empty operation instruction, and when the string contains a large number of NOP instructions, the segment The NOP instruction may spawn the shellcode's leading code (Slidge) for the heap.

The obtaining of the first characteristic parameter corresponding to the string variable may use the GetPC instruction matching to identify the frequency of the GetPC class instruction in the string variable, and use the flower instruction matching to identify the frequency of the flower variable included in the string variable. Using the fingerprint matching of the shelling code to identify whether the string variable contains a known shelling code fingerprint; the second characteristic parameter corresponding to the string variable can be calculated by using a general information entropy value calculation formula. The information entropy value of the string variable is determined by the degree of deviation from the statistical average information entropy value, and the NOP instruction matching is used to identify the frequency of the NOP instruction in the string.

In the above second method, the stack overflow detection model and the heap injection detection model are all pre-trained, and the stack overflow detection model can select the frequency of occurrence of the GetPC instruction, the frequency of occurrence of the flower instruction, and whether or not the fingerprint of the known shelling code is included. The vector, and trained using the standard data set, obtains the threshold corresponding to the stack overflow detection model, for example, the lowest frequency of the GetPC instruction, the lowest frequency of the flower instruction, and the fingerprint of the known shelling code. The heap injection detection model can select the information entropy value and the NOP instruction occurrence frequency as the feature vector, and use the standard data set to train, and obtain the threshold corresponding to the heap injection detection model, for example, the minimum information entropy and the minimum frequency of the NOP instruction. In the actual detection process, when one parameter or multiple parameters of the first characteristic parameter exceed the threshold corresponding to the stack overflow detection model, the JavaScript script code is determined to be malicious code, It is determined that the JavaScript script code is not malicious code; when one parameter or multiple parameters in the second feature parameter exceeds the threshold corresponding to the stack overflow detection model, the JavaScript script code is determined to be malicious code, otherwise the JavaScript script code is determined not to be malicious. Code.

In the foregoing third method, the method for detecting the JavaScript script code is specifically: matching the operation code corresponding to the JavaScript script code in the stored malicious operation code feature library according to the detection rule corresponding to the operation code And if the opcode is matched in the stored malicious opcode signature database, determining that the JavaScript script code is malicious code; if the opcode is not matched in the stored malicious opcode signature database, It is determined that the JavaScript script code is not malicious code.

And determining, according to the detection rule corresponding to the string variable, whether the JavaScript script code is a malicious code, specifically obtaining a length of a string variable corresponding to the JavaScript script code; if the length of the string variable is at a a third interval, the third feature parameter corresponding to the string variable is obtained; and according to the stack overflow detection model and the third feature parameter, determining whether the JavaScript script code is malicious code; if the length of the string variable is located And obtaining a fourth feature parameter corresponding to the string variable; and determining, according to the heap injection detection model and the fourth feature parameter, whether the JavaScript script code is a malicious code.

The third characteristic parameter may include at least a combination of a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and whether or not one or more of known fingerprint codes are included; the fourth characteristic parameter may include at least string information. A combination of one or more of the entropy value, the frequency of occurrence of the NOP instruction.

It is to be noted that, according to the detection rule corresponding to the operation code, detecting the JavaScript script code may directly use the first method, specifically including the steps a1 to 033, and according to the detection rule corresponding to the string variable. The detecting the JavaScript script code may directly use the second method, specifically including step M031 to step bl 035. Therefore, the third interval described in the third method may use the setting of the first interval in the second method described above, and the fourth interval may use the second interval in the second method described above. setting. The stack overflow detection model and the heap injection detection model can also use the model in the second method accordingly.

In the embodiment of the present invention, before the detection of the PDF file, the predetermined script interpreter supporting the PDF standard is required to perform the instrumentation processing for obtaining the code information corresponding to the JavaScript script code in the PDF file. In this embodiment, a script interpreter embedded in a PDF reader is taken as an example. As shown in FIG. 4, the specific process is as follows:

201. Start an application process of a predetermined PDF reader.

202. Injecting the library file into the solution confusion process of the script interpreter embedded in the predetermined PDF reader.

The library file is a pre-written dll format file, and is used to obtain code information corresponding to the decongested JavaScript script code generated by the predetermined PDF script interpreter in the process of disambiguating the JavaScript script code. Injecting library files into the process is to add the execution process of a dll file with a specific function to a currently running process, but does not affect the normal working state of the running process.

It is worth noting that the position of the instrumentation injection needs to be selected according to the API provided by the predetermined PDF reader itself. For example, if you want to get the opcode corresponding to the JavaScript script code, you need to get the API that can output the opcode in the predetermined PDF reader for instrumentation.

203. Initialize the injected library file.

The execution of the above steps 201 to 203 is a necessary step for the execution of the step 102, but the steps 201 to 203 only need to be executed once when the application process of the predetermined PDF reader is started, and the subsequent detection of the PDF file is performed. It does not need to be executed again during the process.

Further, it is worth noting that the stack overflow detection model and the heap injection detection model described above need to be established before starting the application process of the predetermined PDF reader, and can be used in the subsequent process of detecting the PDF file.

In addition, it is worth noting that after executing step 103, if it is determined that the JavaScript script code is malicious code, the plaintext code corresponding to the JavaScript script code may also be obtained. And the detection of the malicious code is associated with the plaintext code. For example, the operation code corresponding to the JavaScript script code is a malicious operation code, and the position corresponding to the malicious operation code in the plaintext code is marked for convenient technical research. And integration. The method for obtaining the plaintext code is the same as the method for determining the code information corresponding to the JavaScript script code.

In this embodiment, the code corresponding to the JavaScript script code is obtained by monitoring a predetermined script interpreter supporting the PDF standard, such as a script interpreter embedded in the PDF reader, to obtain a code corresponding to the JavaScript script code. Information, and detecting whether the JavaScript script code is malicious code according to a detection rule corresponding to different types of code information, which can accurately detect malicious JavaScript script code transmitted through a PDF file compared to the prior art. The malicious JavaScr ipt code in the PDF file improves the security of network resources.

The embodiment of the invention further provides a device for detecting malicious code, which can implement the method steps shown in FIG. 1 to FIG. 4 above.

The device is shown in Figure 5 and includes:

The confusing module 31 is configured to extract the JavaScript script code in the PDF file; start a predetermined script interpreter supporting the PDF standard, such as a script interpreter embedded in the PDF reader, and run a de-confusing process on the JavaScript script code, and Obtaining code information corresponding to the JavaScript script code according to the de-obfuming process, the type of the code information includes an operation code and/or a string variable.

The detecting module 32 is configured to detect, according to the detection rule corresponding to the type of the code information obtained by the de-obfuscating module 31, whether the JavaScript script code is malicious code.

Optionally, as shown in FIG. 6, the device further includes:

An instrumentation injection module 33, configured to inject a library file into a de-obfuscation process run by the script interpreter on the JavaScript script code, where the library file is used to obtain the script interpreter in a solution confusion JavaScript script The code information corresponding to the confusing J a V a Script code generated in the code process. Optionally, as shown in FIG. 7, the detecting module 32 includes:

The first matching unit 321, is configured to match the operation code corresponding to the JavaScript script code in the stored malicious operation code feature library if the type of the code information is an operation code.

a first determining unit 322, configured to: when the first matching unit 321 matches the opcode in the stored malicious opcode feature library, determine that the JavaScript script code is malicious code; The first matching unit 321 determines that the JavaScript script code is not malicious code when the operation code is not matched in the stored malicious operation code feature library.

Optionally, as shown in FIG. 8, the detecting module 32 includes:

The first obtaining unit 323 is configured to obtain a length of the string variable corresponding to the JavaScript script code if it is determined that the type of the code information is a string variable.

The first determining unit 324 is configured to: when the length of the string variable acquired by the first acquiring unit 323 is in the first interval, acquire the first feature parameter corresponding to the string variable; Determining, by the first feature parameter, whether the JavaScript script code is malicious code; and when the length of the string variable acquired by the first obtaining unit 323 is located in the second interval, acquiring the string variable corresponding to a second characteristic parameter; determining whether the JavaScript script code is malicious code according to the heap injection detection model and the second characteristic parameter.

Optionally, the first characteristic parameter may include at least a combination of a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and whether or not one or more of known fingerprint codes are included; the second characteristic parameter may be at least A combination of one or more of a string information entropy value and a NOP instruction occurrence frequency. The GetPC instruction refers to the instruction in the shellcode for locating its own virtual address; the flower instruction is the code used to interfere with the disassembly engine correctly implementing the disassembly; the frequency of the GetPC instruction and the flower instruction in the string can be used as the shell code in the string. Partial basis; shelling code fingerprint means that the shelled shellcode will always shell itself when it is executed. The characteristics of these shelling codes are the shelling code fingerprint. The existence of the fingerprint can be used as part of the existence of shellcode; Entropy is an indicator to measure the amount of string information. If the string information entropy is less than a certain threshold, there may be heap injection; the NOP instruction is a CPU empty operation instruction, when the string When a large number of NOP instructions are included, the NOP instruction may be the leading code of the shell injection shellcode (Slidge).

Optionally, the detecting module 32 is specifically configured to detect, according to the detection rule corresponding to the operation code, if the type of the code information is an operation code and a string variable.

Whether the JavaScript script code is malicious code; and detecting whether the JavaScript script code is malicious code according to a detection rule corresponding to the string variable.

And determining, when the JavaScript script code is determined to be malicious code according to the detection rule corresponding to the operation code, or determining the JavaScript script code as malicious code according to the detection rule corresponding to the string variable, determining the The JavaScript script code is malicious code.

As shown in FIG. 9, the detecting module 32 further includes:

The second matching unit 325 is configured to match an operation code corresponding to the JavaScript script code in the stored malicious operation code feature library.

a second determining unit 326, configured to: when the second matching unit 325 determines that the operation code is matched in the stored malicious operation code feature library, determine that the JavaScript script code is malicious code; The second matching unit 325 determines that the JavaScript script code is not malicious code when the opcode is not matched in the stored malicious opcode signature database.

As shown in FIG. 9, the detecting module may further include:

The second obtaining unit 327 is configured to obtain a length of the string variable corresponding to the JavaScript script code.

a second determining unit 328, configured to acquire, in the third interval, the length of the string variable acquired by the second acquiring unit 327, and acquire a third feature parameter corresponding to the string variable; Determining, according to the stack overflow detection model and the third feature parameter, whether the JavaScript script code is malicious code; and when the length of the string variable acquired by the second obtaining unit 327 is in the fourth interval, Obtaining a fourth feature parameter corresponding to the string variable; and determining, according to the heap injection detection model and the fourth feature parameter, whether the JavaScript script code is malicious code.

Optionally, the third characteristic parameter may include at least a combination of a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and whether or not one or more of known fingerprint codes are included; the fourth characteristic parameter may be at least A combination of one or more of a string information entropy value and a N 0 P instruction occurrence frequency.

In the present embodiment, the detecting means of the malicious code obtains the JavaScript by monitoring a predetermined script interpreter supporting the PDF standard, such as a script interpreter embedded in the PDF reader, to perform a de-obfuscation process on the JavaScript script code. The code information corresponding to the script code, and detecting whether the JavaScript script code is malicious code according to a detection rule corresponding to different types of code information, and the malicious JavaScript script code propagated through the PDF file cannot be effectively recognized compared to the prior art. , can accurately detect malicious JavaScript code in PDF files, and improve the security of network resources.

The embodiment of the invention further provides a detecting device, which can implement the method steps shown in FIG. 1 to FIG. 4 above.

The device, as shown in FIG. 10, includes a processor 41 and a memory 42. The memory 42 may include random access memory (RAM) or the like. The memory 42 is configured to store program code; the processor 41 is configured to read program code stored in the memory to perform the steps in the method embodiments. The processor 41 communicates with the memory 42 via a bus. The memory 42 is further configured to store the JavaScript script code in the PDF file and the code information corresponding to the J a V a S c r i p t script code.

The processor 41 is configured to extract a JavaScript script code in a PDF file stored in the memory 42; and start a predetermined script interpreter supporting the PDF standard to the JavaScript foot The code runs a solution confusing process, and obtains code information corresponding to the JavaScript script code according to the solution confusing process. The type of the code information includes an operation code and a string variable.

The memory 42 is also used to store library files.

Optionally, the processor 41 is further configured to inject a library file stored in the memory 42 into the script interpreter supporting the PDF standard, such as a script interpreter embedded in a predetermined PDF reader. In the process of disambiguating the JavaScript script code, the library file is used to obtain code information corresponding to the confusing JavaScript script code generated by the script interpreter in the process of disambiguating the JavaScript script code.

Optionally, the processor 41 is configured to: if the type of the code information stored by the memory 42 is an operation code, match an operation code corresponding to the JavaScript script code in a stored malicious operation code feature database. And if the opcode is matched in the stored malicious opcode signature database, determining that the JavaScript script code is malicious code; if the opcode is not matched in the stored malicious opcode signature database, It is determined that the JavaScript script code is not malicious code.

The memory 42 is configured to store a malicious operation code feature library.

Optionally, the processor 41 is configured to: if the type of the code information stored by the memory 42 is a string variable, obtain a length of a string variable corresponding to the JavaScript script code; And determining, by the stack overflow detection model and the first feature parameter, whether the JavaScript script code is a malicious code; The length of the string variable is located in the second interval, and the second feature parameter corresponding to the string variable is obtained. According to the heap injection detection model and the second feature parameter, whether the JavaScript script code is malicious code is determined.

The memory 42 is configured to store a length of a character string corresponding to the JavaScript script code, a first feature parameter, a second feature parameter, a first interval, a second interval, a stack overflow detection model, and a heap injection detection model. The first characteristic parameter may include at least a combination of a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and whether or not one or more of known fingerprint codes are included; the second characteristic parameter may include at least a character string. A combination of one or more of an information entropy value, a frequency of occurrence of a NOP instruction.

Optionally, the processor 41 is configured to: if the type of the code information stored by the memory 42 is an operation code and a string variable, detect the JavaScript script code according to a detection rule corresponding to the operation code. Whether it is malicious code; and,

Determining, according to the detection rule corresponding to the string variable, whether the JavaScript script code is malicious code; determining, according to the detection rule corresponding to the operation code, that the JavaScript script code is a malicious code, or corresponding to the string according to the string a detection rule, when determining that the JavaScript script code is malicious code, determining that the JavaScript script code is malicious code; determining that the JavaScript script code is not malicious code according to a detection rule corresponding to the operation code, and The detection rule corresponding to the string variable determines that the JavaScript script code is not malicious code when it is determined that the JavaScript script code is not malicious code.

Further, the method for detecting, by the processor 41, whether the JavaScript script code is malicious code according to the detection rule corresponding to the operation code includes:

Matching an operation code corresponding to the JavaScript script code in the stored malicious operation code feature database; if the operation code is matched in the stored malicious operation code feature library, determining that the JavaScript script code is a malicious code; If the opcode is not matched in the stored malicious opcode signature database, determining that the JavaScript script code is not malicious code; the processor 41 detecting the JavaScript according to the detection rule corresponding to the string variable Whether the script code is a malicious code implementation method specifically includes:

Obtaining a length of the string variable corresponding to the JavaScript script code; if the length of the string variable is in the third interval, acquiring a third feature parameter corresponding to the string variable; according to the stack overflow detection model and the a third characteristic parameter, determining whether the JavaScript script code is malicious code; if the length of the string variable is in the fourth interval, And taking a fourth characteristic parameter corresponding to the string variable; and determining, according to the heap injection detection model and the fourth characteristic parameter, whether the JavaScript script code is a malicious code.

Optionally, the third characteristic parameter may include at least a combination of a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and whether or not one or more of known fingerprint codes are included; the fourth characteristic parameter may include at least A combination of one or more of a string information entropy value and a NOP instruction occurrence frequency.

In this embodiment, the detecting device obtains code information corresponding to the JavaScript script code by monitoring a de-obfuscation process of the JavaScript script code running by a script interpreter embedded in a predetermined PDF reader, and according to different types of codes. The detection rule corresponding to the information is used to detect whether the JavaScript script code is malicious code, and the malicious JavaScript script code transmitted through the PDF file cannot be effectively recognized compared with the prior art, and the malicious file in the PDF file to be detected can be accurately detected. JavaScript code that improves the security of network resources.

Through the description of the above embodiments, those skilled in the art can clearly understand that the present invention can be implemented by means of software plus necessary general hardware, and of course, by hardware, but in many cases, the former is a better implementation. . Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a readable storage medium, such as a floppy disk of a computer. A hard disk or optical disk or the like includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention. The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Accordingly, the scope of the invention should be determined by the scope of the appended claims.

Claims

claims

1. A malicious code detection method, characterized by including:

Extract JavaScript script code from Portable Document Format PDF files;

Start a predetermined script interpreter that supports the PDF standard to run a deobfuscation process on the JavaScript script code, and obtain the code information corresponding to the JavaScript script code according to the deobfuscation process. The types of the code information include operation codes and string variable;

According to the detection rules corresponding to the type of the code information, whether the JavaScript script code is malicious code is detected.

2. The method according to claim 1, characterized in that, before starting a predetermined script interpreter that supports the PDF standard to run a deobfuscation process on the JavaScript script code, it also includes:

The library file is instrumented and injected into the deobfuscation process that the script interpreter runs on the JavaScript script code. The library file is used to obtain the deobfuscated data generated by the script interpreter during the process of deobfuscating the JavaScript script code. Code information corresponding to JavaScript script code.

3. The method according to claim 1 or 2, characterized in that, if the type of the code information is an operation code, the detection rule corresponding to the type of the code information detects whether the JavaScript script code Malicious code includes:

Match the opcode corresponding to the JavaScript script code in the stored malicious opcode signature library;

If the operation code is matched in the stored malicious operation code feature library, the JavaScript script code is determined to be malicious code;

If the operation code is not matched in the stored malicious operation code feature library, it is determined that the JavaScript script code is not malicious code.

4. The method according to claim 1 or 2, characterized in that, if the type of the code information is a string variable, the detection rule corresponding to the type of the code information detects the JavaScript script code. Whether the code is malicious includes: Obtain the length of the string variable corresponding to the JavaScript script code; If the length of the string variable is located in the first interval, obtain the first characteristic parameter corresponding to the string variable; According to the stack overflow detection model and the third A characteristic parameter to determine the

Whether the JavaScript script code is malicious code;

If the length of the string variable is located in the second interval, the second characteristic parameter corresponding to the string variable is obtained; according to the heap injection detection model and the second characteristic parameter, determine the

Whether the JavaScript script code is malicious code.

5. The method according to claim 4, wherein the first characteristic parameter includes one or more combinations of GetPC instruction occurrence frequency, Flower instruction occurrence frequency, and whether it contains known shelling code fingerprints; The second characteristic parameter includes one or a combination of one or more of string information entropy value and NOP instruction frequency.

6. The method according to claim 1 or 2, characterized in that, if the type of the code information is an operation code and a string, the detection rule corresponding to the type of the code information detects the JavaScript Whether the script code is malicious code includes:

Detect the JavaScript script code according to the detection rules corresponding to the operation code; and,

The JavaScript script code is detected according to the detection rule corresponding to the string; when the JavaScript script code is determined to be malicious code according to the detection rule corresponding to the operation code, or according to the detection rule corresponding to the string variable, When it is determined that the JavaScript script code is malicious code, it is determined that the JavaScript script code is malicious code; when it is determined that the JavaScript script code is not malicious code according to the detection rules corresponding to the operation code, and according to the string According to the detection rules corresponding to the variables, when it is determined that the JavaScript script code is not malicious code, it is determined that the JavaScript script code is not malicious code;

Among them, the detection rules corresponding to the operation code and the detection of the JavaScript script code include:

Match the JavaScript script code corresponding to the stored malicious opcode signature library operation code; if the operation code is matched in the stored malicious operation code signature library, the JavaScript script code is determined to be malicious code; if the operation code is not matched in the stored malicious operation code signature library code, it is determined that the JavaScript script code is not malicious code; According to the detection rules corresponding to the string variables, detecting the JavaScript script code includes:

Obtain the length of the string variable corresponding to the JavaScript script code; if the length of the string variable is located in the third interval, obtain the third characteristic parameter corresponding to the string variable; According to the stack overflow detection model and the third Three characteristic parameters to determine whether the JavaScript script code is malicious code; if the length of the string variable is in the fourth interval, obtain the fourth characteristic parameter corresponding to the string variable; according to the heap injection detection model and the The fourth characteristic parameter is to determine whether the JavaScript script code is malicious code.

7. The method according to claim 6, characterized in that the third characteristic parameter includes one or more combinations of GetPC instruction occurrence frequency, Flower instruction occurrence frequency, and whether it contains known shelling code fingerprints; The fourth characteristic parameter includes one or a combination of one or more of string information entropy value and NOP instruction frequency.

8. A malicious code detection device, characterized by including:

The deobfuscation module is used to extract the JavaScript script code in the PDF file; start a predetermined script interpreter that supports the PDF standard to run the deobfuscation process on the JavaScript script code, and obtain the JavaScript script code according to the deobfuscation process. Corresponding code information, the type of code information includes operation codes and string variables;

A detection module, configured to detect whether the JavaScript script code is malicious code according to the detection rules corresponding to the type of code information obtained by the deobfuscation module.

9. The device according to claim 8, characterized in that, the device further includes: an instrumentation injection module, configured to instrument and inject library files into the deobfuscation of the JavaScript script code run by the script interpreter. During the process, the library file is used to obtain code information corresponding to the deobfuscated JavaScript script code generated by the script interpreter during the deobfuscating JavaScript script code process.

10. The device according to claim 8 or 9, characterized in that, the detection module includes:

The first matching unit is configured to, if the type of the code information is an operation code, match the operation code corresponding to the JavaScript script code in the stored malicious operation code feature library;

a first determination unit, configured to determine that the JavaScript script code is a malicious code when the first matching unit matches the operation code in the stored malicious operation code feature library; and to determine that the JavaScript script code is a malicious code in the first When the matching unit does not match the operation code in the stored malicious operation code feature library, it determines that the JavaScript script code is not malicious code.

11. The device according to claim 8 or 9, characterized in that, the detection module includes:

The first acquisition unit is used to obtain the length of the string variable corresponding to the JavaScript script code if the type of the code information is a string variable;

The first judgment unit is configured to obtain the first characteristic parameter corresponding to the string variable when the length of the string variable obtained by the first acquisition unit is located in the first interval; according to the stack overflow detection model and the third A characteristic parameter to determine whether the JavaScript script code is malicious code; and used to obtain the second characteristic corresponding to the string variable when the length of the string variable obtained by the first acquisition unit is located in the second interval Parameters; Determine whether the JavaScript script code is malicious code according to the heap injection detection model and the second characteristic parameter.

12. The device according to claim 11, wherein the first characteristic parameter includes one or more combinations of GetPC instruction occurrence frequency, Flower instruction occurrence frequency, and whether it contains known shelling code fingerprints; The second characteristic parameter includes one or a combination of one or more of string information entropy value and NOP instruction frequency.

13. The device according to claim 8 or 9, characterized in that, the detection module is specifically configured to detect if the type of the code information is an operation code and a string variable, according to the detection rule corresponding to the operation code. The JavaScript script code; and, detect the JavaScript script code according to the detection rule corresponding to the string variable;

and, used to determine the JavaScript according to the detection rule corresponding to the operation code The script code is malicious code or it is determined based on the detection rules corresponding to the string variable.

When the JavaScript script code is malicious code, determine that the JavaScript script code is malicious code;

And, when it is determined that the JavaScript script code is not malicious code according to the detection rules corresponding to the operation code and it is determined that the JavaScript script code is not malicious code according to the detection rules corresponding to the string variable, determine The JavaScript script code is not malicious code;

The detection module further includes:

a second matching unit, configured to match the operation code corresponding to the JavaScript script code among the stored malicious operation codes;

a second determination unit, configured to determine that the JavaScript script code is a malicious code when the second matching unit determines that the operation code is matched in the stored malicious operation code feature library; and for determining that the JavaScript script code is malicious code; and When the second matching unit determines that the operation code is not matched in the stored malicious operation code feature library, it determines that the JavaScript script code is not malicious code;

The detection module further includes:

The second acquisition unit is used to acquire the length of the string variable corresponding to the JavaScript script code;

The second judgment unit is used to obtain the third characteristic parameter corresponding to the string variable when the length of the string variable obtained in the second acquisition unit is located in the third interval; According to the stack overflow detection model and the The third characteristic parameter is used to determine whether the JavaScript script code is malicious code; and is used to obtain the third value corresponding to the string variable when the length of the string variable obtained by the second acquisition unit is located in the fourth interval. Four characteristic parameters; According to the heap injection detection model and the fourth characteristic parameter, determine whether the JavaScript script code is malicious code.

14. The device according to claim 13, wherein the third characteristic parameter includes one or more combinations of GetPC instruction occurrence frequency, Flower instruction occurrence frequency, and whether it contains known shelling code fingerprints; The fourth characteristic parameters include string information entropy value, NOP instruction One or more combinations of frequencies.

15. A detection device, characterized in that it includes a memory and a processor, wherein: the memory is configured to store codes;

The processor is configured to read the code stored in the storage and execute the method according to any one of claims 1 to 7.