WO2014089744A1 - Method and apparatus for detecting malicious code - Google Patents

Method and apparatus for detecting malicious code Download PDF

Info

Publication number
WO2014089744A1
WO2014089744A1 PCT/CN2012/086302 CN2012086302W WO2014089744A1 WO 2014089744 A1 WO2014089744 A1 WO 2014089744A1 CN 2012086302 W CN2012086302 W CN 2012086302W WO 2014089744 A1 WO2014089744 A1 WO 2014089744A1
Authority
WO
WIPO (PCT)
Prior art keywords
code
malicious
javascript script
javascript
script code
Prior art date
Application number
PCT/CN2012/086302
Other languages
French (fr)
Chinese (zh)
Inventor
诸葛建伟
钱晓斌
侯永干
富键
陆恂
王若愚
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201280002026.9A priority Critical patent/CN103221960B/en
Priority to PCT/CN2012/086302 priority patent/WO2014089744A1/en
Publication of WO2014089744A1 publication Critical patent/WO2014089744A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities

Definitions

  • the present invention relates to the field of communication security technologies, and in particular, to a method and apparatus for detecting malicious code.
  • PDF portable document format
  • This format is not limited by reading software, hardware, and operating system and can be used on any platform including Windows, Linux, and Mac OS.
  • JavaScript is a scripting language widely used for client-side web development. This scripting language is very versatile. Embedding JavaScript scripting languages in PDFs is important for realizing the interactive nature of PDF files, such as the presentation of dynamic content, tables, and 3D interfaces.
  • a malicious JavaScript script is a new type of virus in malicious attack code that adds, changes, or deletes part of a script to a software system to create a hazard or compromise the integrity, confidentiality, availability, etc. of computer system functions and networks. It is usually written in a JavaScript scripting language. The malicious JavaScript scripts are written in a flexible form and are easily transformed by various code obfuscation techniques. It is difficult for current anti-virus technologies to achieve control and protection.
  • malware scripts in PDF make more use of some of the features in the PDF standard.
  • Use letters and numbers in the definition of a file The hexadecimal code replaces the corresponding text, uses the PDF stream object to hide certain objects containing JavaScript scripts, and uses the encoding nesting function in the PDF stream object to process JavaScript scripts in a variety of encoding methods.
  • Many of the existing obfuscation tools on the browser side cannot solve the confusion of JavaScript scripts that are confused with the above obfuscation methods, thus causing malicious scripts to spread attacks through PDF files.
  • Common attack methods include malicious PDF files in web pages, and targeted fishing.
  • the email contains malicious PDF file attachments, etc.
  • the malicious PDF file refers to a PDF file carrying a malicious JavaScript script.
  • the simulation execution environment is executed to execute the detected PDF file, and by detecting the behavior of the detected PDF file in the normal system operating environment, a series of operations such as calling when the file is executed are detected, thereby discovering malicious behavior.
  • this method can't detect such malicious behaviors because of the common JavaScript spoofing hidden means, such as JavaScript scripts in a PDF file that only display malicious behavior at specific time or depending on the specific plug-in.
  • the embodiment of the invention provides a method and a device for detecting malicious code, which can improve the detection accuracy of malicious JavaScript code carried in a PDF file.
  • a method for detecting malicious code including:
  • the method before the starting a predetermined PDF standard-enabled script interpreter runs the de-obfuscation process on the JavaScript script code, the method further includes:
  • the library file is used to obtain code information corresponding to the de-scrambled JavaScript script code generated by the script interpreter in the process of disambiguating the JavaScript script code .
  • detecting whether the JavaScript script code is malicious code includes:
  • the type of the code information is a string variable
  • the information according to the code information includes:
  • the length of the string variable is in the first interval, acquiring a first feature parameter corresponding to the string variable; determining, according to the stack overflow detection model and the first feature parameter, whether the JavaScript script code is malicious code ;
  • the length of the string variable is in the second interval, acquiring a second feature parameter corresponding to the string variable; determining, according to the heap injection detection model and the second feature parameter, Whether the JavaScript script code is malicious code.
  • the first characteristic parameter includes a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and whether a fingerprint of a known shelling code is included a combination of one or more of the plurality; the second characteristic parameter comprising a combination of one or more of a string information entropy value and a N 0 P instruction occurrence frequency.
  • the type of the code information is an operation code and a character string
  • the detection rule corresponding to the type of the information, and detecting whether the JavaScript script code is malicious code includes:
  • the JavaScript script code Determining, according to the detection rule corresponding to the character string, the JavaScript script code; determining, according to the detection rule corresponding to the operation code, that the JavaScript script code is a malicious code, or according to a detection rule corresponding to the string variable, When the JavaScript script code is determined to be malicious code, determining that the JavaScript script code is malicious code;
  • the detecting rule corresponding to the operation code, detecting whether the JavaScript script code is malicious code includes:
  • Matching an operation code corresponding to the JavaScript script code in the stored malicious operation code feature database; if the operation code is matched in the stored malicious operation code feature library, determining that the JavaScript script code is a malicious code; If the opcode is not matched in the stored malicious opcode signature database, determining that the JavaScript script code is not malicious code; and detecting the JavaScript foot according to the detection rule corresponding to the string variable Whether this code is malicious code includes:
  • the third characteristic parameter includes a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and whether a fingerprint of a known shelling code is included a combination of one or more of the following;
  • the fourth characteristic parameter comprises a combination of one or more of a string information entropy value and a N 0 P instruction occurrence frequency.
  • a detection device for malicious code including:
  • a decryption module configured to extract JavaScript script code in a PDF file; start a predetermined script interpreter supporting the PDF standard to run a de-obfuscation process on the JavaScript script code, and obtain the JavaScript script code according to the de-obfuscation process
  • the type of the code information includes an operation code and a string variable
  • a detecting module configured to detect, according to the detection rule corresponding to the type of the code information obtained by the de-obfuscating module, whether the JavaScript script code is malicious code.
  • the apparatus further includes: an instrumentation injection module, configured to inject a library file into a de-obfuscation process in which the script interpreter runs the JavaScript script code
  • the library file is used to obtain code information corresponding to the de-scrambled JavaScript script code generated by the script interpreter in the process of disambiguating the JavaScript script code.
  • the detecting module includes:
  • a first matching unit configured to: if the type of the code information is an operation code, match an operation code corresponding to the JavaScript script code in a stored malicious operation code feature library; a first determining unit, configured to: when the first matching unit matches the operation code in the stored malicious operation code feature library, determine that the JavaScript script code is malicious code; and When the matching unit does not match the opcode in the stored malicious opcode feature library, it is determined that the JavaScript script code is not malicious code.
  • the detecting module includes:
  • a first obtaining unit configured to obtain a length of a string variable corresponding to the JavaScript script code if the type of the code information is a string variable
  • a first determining unit configured to acquire a first feature parameter corresponding to the string variable when the length of the string variable acquired by the first acquiring unit is in the first interval; and the stack overflow detection model and the first a feature parameter, determining whether the JavaScript script code is malicious code; and acquiring, when the length of the string variable acquired by the first acquiring unit is in the second interval, acquiring the second feature corresponding to the string variable a parameter; determining, according to the heap injection detection model and the second characteristic parameter, whether the JavaScript script code is malicious code.
  • the first characteristic parameter includes a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and a known a combination of one or more of the shelling code fingerprints; the second characteristic parameter comprising a combination of one or more of a string information entropy value and a NOP instruction occurrence frequency.
  • the detecting module is specifically configured to: if the type of the code information is an opcode and a string variable, according to the operation Detecting the JavaScript script code corresponding to the detection rule corresponding to the code; and detecting the JavaScript script code according to the detection rule corresponding to the string variable;
  • the detecting module further includes:
  • a second matching unit configured to match an opcode corresponding to the JavaScript script code in the stored malicious opcode
  • a second determining unit configured to: when the second matching unit determines that the operation code is matched in the stored malicious operation code feature database, determine that the JavaScript script code is malicious code; The second matching unit determines that the JavaScript script code is not malicious code when the operation code is not matched in the stored malicious operation code feature database; and the detecting module further includes:
  • a second obtaining unit configured to acquire a length of a string variable corresponding to the JavaScript script code
  • a second determining unit configured to acquire a third feature parameter corresponding to the string variable according to a length of the string variable acquired by the second acquiring unit, and a third feature parameter corresponding to the string variable; a third characteristic parameter, determining whether the JavaScript script code is malicious code; and acquiring, when the length of the string variable acquired by the second obtaining unit is in the fourth interval, acquiring the corresponding string variable Four characteristic parameters; determining whether the JavaScript script code is malicious code according to the heap injection detection model and the fourth characteristic parameter.
  • the third characteristic parameter includes a frequency of occurrence of the GetPC instruction, a frequency of occurrence of the flower instruction, and whether the fingerprint of the known shelling code is included a combination of one or more of the following;
  • the fourth characteristic parameter comprises a combination of one or more of a string information entropy value and a N 0 P instruction occurrence frequency.
  • a detection apparatus including a memory and a processor, wherein: the memory is configured to store a code;
  • the processor is configured to read the code stored in the memory to perform the method provided by any of the first aspect, or any of the six possible implementations of the first aspect.
  • a method and device for detecting malicious code provided by an embodiment of the present invention, by executing a predetermined script interpreter supporting the PDF standard, running a de-obfuscation process on the JavaScript script code to obtain code information corresponding to the JavaScript script code, and Detecting whether the JavaScript script code is malicious code according to a detection rule corresponding to the type of the code information, can detect the malicious JavaScript code carried in the PDF file more accurately than the prior art.
  • FIG. 1 is a flowchart of a method for detecting malicious code according to an embodiment of the present invention
  • FIG. 2 is a flowchart of another method for detecting malicious code according to an embodiment of the present invention
  • FIG. 3 is another embodiment of the present invention.
  • FIG. 4 is a flowchart of a method for detecting malicious code according to an embodiment of the present invention
  • FIG. 5 is a block diagram of a device for detecting malicious code according to an embodiment of the present invention
  • 6 is a block diagram of a component of a detection device for another malicious code according to an embodiment of the present invention
  • FIG. 7 is a block diagram of a component of a detection module according to an embodiment of the present invention
  • FIG. 8 is a structural block diagram of another detection module according to an embodiment of the present invention.
  • FIG. 9 is a structural block diagram of another detection module according to an embodiment of the present invention.
  • FIG. 10 is a structural block diagram of a detecting device according to an embodiment of the present invention.
  • An embodiment of the present invention provides a method for detecting a malicious code, which can be executed by a detecting device. As shown in FIG. 1, the method includes:
  • the JavaScript script is embedded in the PDF file, and can realize the display and synergistic function of the PDF file, but is also maliciously used by the attacker and the malicious code, and is used to exploit the vulnerability of the PDF reader software to infiltrate the host invaded.
  • the PDF file to be detected may be derived from an attachment of an e-mail, a web page content, etc., and is not limited herein.
  • the method for extracting the JavaScript script code in the PDF file may specifically include: parsing an element position of the JavaScript stream in the PDF file according to an international common format specification of the PDF file, and compressing the encoding method according to the JavaScript stream. The corresponding decoding is performed to extract the JavaScript code contained in the PDF file.
  • the script interpreter supporting the PDF standard may be a script interpreter embedded in the PDF reader, wherein the PDF reader may be Any PDF reader having an embedded JavaScript script code interpretation engine, for example, the embodiment of the present invention uses a PDF reading application Acrobat reader provided by Adobe, and can be implemented by using a script interpreter embedded in Acrobat reader. Most JavaScript script code is confusing.
  • the script interpreter supporting the PDF standard may be pre-configured by an administrator of the detecting device prior to detection.
  • the code information refers to the information that the script interpreter that supports the PDF standard used in the embodiment of the present invention interprets and translates the script code and submits it to the JavaScript virtual machine for execution.
  • the JavaScript virtual machine is an abstract computer that uses software simulation to run all JavaScript code.
  • the information submitted to the JavaScript virtual machine for execution contains information output by the script interpreter at different stages in the process of interpreting and translating the script code, and may include at least the following two categories: opcodes and string variables. .
  • the operation code may be a command code used by the machine, and a typical operation code fragment is as follows:
  • the string variable may be a string variable defined in the JavaScript script, and the typical existence form is as follows:
  • Var s tr " some value ...,, ;
  • the value of a string variable itself may also be a compiled instruction.
  • the value of the string variable thisVar. replace can be a Unicode-encoded instruction.
  • the code information corresponding to the JavaScript script code can be obtained by the instrumentation method to obtain the code information corresponding to the JavaScript script code, and the specific plugging injection is performed. Please refer to the process Figure 4 and the corresponding text description.
  • the two different types of parameters, the opcode and the string variable are intermediate parameters that may occur in different stages of the de-aliasing process. Therefore, the opcode and the string can be obtained by monitoring various steps of the de-obfuscation process.
  • Variables are two different types of parameters.
  • the detection method provided by the embodiment of the present invention is different according to the type of the code information.
  • the type of the code information is the operation code
  • the type of the code information is a character string
  • the type of the code information is the operation code
  • the step 103 is performed according to the detection rule corresponding to the type of the code information, and detecting whether the JavaScript script code is malicious code can be implemented by the following three detection methods, specifically:
  • the first method is shown in Figure 2, including:
  • step a033 if the type of the code information is an operation code, matching an operation code corresponding to the JavaScript script code in a stored malicious operation code feature library; if the code is matched in the stored malicious operation code feature library The operation code is executed in step a 032; if the operation code is not matched in the stored malicious operation code feature library, step a033 is performed.
  • the second method is shown in Figure 3, including:
  • step bl 032 if the type of the code information is a string variable, obtaining a length of a string variable corresponding to the JavaScript script code; if the string variable is long If the degree is in the first interval, step bl 032 is performed; if the length of the string variable is in the second interval, step bl 034 is performed.
  • Bl032 Acquire a first feature parameter corresponding to the string variable.
  • Bl034 Obtain a second feature parameter corresponding to the string variable.
  • a third method if the type of the code information is an operation code and a string variable, detecting whether the JavaScript script code is malicious code according to a detection rule corresponding to the operation code; and, corresponding to the string variable Detection rules that detect if the JavaScript script code is malicious code.
  • the stored malicious operation code feature library is a feature corresponding to the operation code corresponding to the JavaScript script code that has confirmed malicious behavior in the technical field of the present invention, and the source of the feature may be Malicious operational code features disclosed by various authorities.
  • the stored malicious operation code feature library is not fixed, and may be updated according to a certain period according to requirements.
  • the first interval and the second interval are respectively set for two types of malicious code attack modes, such as stack overflow and heap injection.
  • the setting method may refer to an empirical value.
  • the first interval may be set to 32-64K bytes
  • the second interval can be set to be larger than 64K bytes.
  • the first characteristic parameter may at least include a GetPC command a combination of one or more of a current frequency, a frequency of occurrence of a flower instruction, and a fingerprint of a known husking code; the second characteristic parameter may include at least one of a string information entropy value, a frequency of occurrence of a NOP instruction, or A variety of combinations.
  • the GetPC instruction refers to the instruction used to locate its own virtual address in the She 11 code; the flower instruction is the code used to interfere with the disassembly engine correctly implementing the disassembly; the frequency at which the GetPC instruction and the flower instruction appear in the string can be used as a string
  • the shelling code fingerprint means that the shelled shellcode will always unpack itself when it is executed.
  • the characteristics of these shelling codes are the shelling code fingerprint, and the existence of the fingerprint can be used as part of the existence of the shellcode;
  • the string information entropy is an indicator for measuring the amount of string information.
  • the NOP instruction is a CPU empty operation instruction, and when the string contains a large number of NOP instructions, the segment The NOP instruction may spawn the shellcode's leading code (Slidge) for the heap.
  • the obtaining of the first characteristic parameter corresponding to the string variable may use the GetPC instruction matching to identify the frequency of the GetPC class instruction in the string variable, and use the flower instruction matching to identify the frequency of the flower variable included in the string variable.
  • the second characteristic parameter corresponding to the string variable can be calculated by using a general information entropy value calculation formula.
  • the information entropy value of the string variable is determined by the degree of deviation from the statistical average information entropy value, and the NOP instruction matching is used to identify the frequency of the NOP instruction in the string.
  • the stack overflow detection model and the heap injection detection model are all pre-trained, and the stack overflow detection model can select the frequency of occurrence of the GetPC instruction, the frequency of occurrence of the flower instruction, and whether or not the fingerprint of the known shelling code is included.
  • the vector, and trained using the standard data set obtains the threshold corresponding to the stack overflow detection model, for example, the lowest frequency of the GetPC instruction, the lowest frequency of the flower instruction, and the fingerprint of the known shelling code.
  • the heap injection detection model can select the information entropy value and the NOP instruction occurrence frequency as the feature vector, and use the standard data set to train, and obtain the threshold corresponding to the heap injection detection model, for example, the minimum information entropy and the minimum frequency of the NOP instruction.
  • the JavaScript script code is determined to be malicious code, It is determined that the JavaScript script code is not malicious code; when one parameter or multiple parameters in the second feature parameter exceeds the threshold corresponding to the stack overflow detection model, the JavaScript script code is determined to be malicious code, otherwise the JavaScript script code is determined not to be malicious. Code.
  • the method for detecting the JavaScript script code is specifically: matching the operation code corresponding to the JavaScript script code in the stored malicious operation code feature library according to the detection rule corresponding to the operation code And if the opcode is matched in the stored malicious opcode signature database, determining that the JavaScript script code is malicious code; if the opcode is not matched in the stored malicious opcode signature database, It is determined that the JavaScript script code is not malicious code.
  • determining, according to the detection rule corresponding to the string variable, whether the JavaScript script code is a malicious code specifically obtaining a length of a string variable corresponding to the JavaScript script code; if the length of the string variable is at a a third interval, the third feature parameter corresponding to the string variable is obtained; and according to the stack overflow detection model and the third feature parameter, determining whether the JavaScript script code is malicious code; if the length of the string variable is located And obtaining a fourth feature parameter corresponding to the string variable; and determining, according to the heap injection detection model and the fourth feature parameter, whether the JavaScript script code is a malicious code.
  • the third characteristic parameter may include at least a combination of a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and whether or not one or more of known fingerprint codes are included; the fourth characteristic parameter may include at least string information. A combination of one or more of the entropy value, the frequency of occurrence of the NOP instruction.
  • detecting the JavaScript script code may directly use the first method, specifically including the steps a1 to 033, and according to the detection rule corresponding to the string variable.
  • the detecting the JavaScript script code may directly use the second method, specifically including step M031 to step bl 035. Therefore, the third interval described in the third method may use the setting of the first interval in the second method described above, and the fourth interval may use the second interval in the second method described above. setting.
  • the stack overflow detection model and the heap injection detection model can also use the model in the second method accordingly.
  • the predetermined script interpreter supporting the PDF standard is required to perform the instrumentation processing for obtaining the code information corresponding to the JavaScript script code in the PDF file.
  • a script interpreter embedded in a PDF reader is taken as an example. As shown in FIG. 4, the specific process is as follows:
  • the library file is a pre-written dll format file, and is used to obtain code information corresponding to the decongested JavaScript script code generated by the predetermined PDF script interpreter in the process of disambiguating the JavaScript script code. Injecting library files into the process is to add the execution process of a dll file with a specific function to a currently running process, but does not affect the normal working state of the running process.
  • the position of the instrumentation injection needs to be selected according to the API provided by the predetermined PDF reader itself. For example, if you want to get the opcode corresponding to the JavaScript script code, you need to get the API that can output the opcode in the predetermined PDF reader for instrumentation.
  • the execution of the above steps 201 to 203 is a necessary step for the execution of the step 102, but the steps 201 to 203 only need to be executed once when the application process of the predetermined PDF reader is started, and the subsequent detection of the PDF file is performed. It does not need to be executed again during the process.
  • the stack overflow detection model and the heap injection detection model described above need to be established before starting the application process of the predetermined PDF reader, and can be used in the subsequent process of detecting the PDF file.
  • the plaintext code corresponding to the JavaScript script code may also be obtained.
  • the detection of the malicious code is associated with the plaintext code.
  • the operation code corresponding to the JavaScript script code is a malicious operation code, and the position corresponding to the malicious operation code in the plaintext code is marked for convenient technical research.
  • the method for obtaining the plaintext code is the same as the method for determining the code information corresponding to the JavaScript script code.
  • the code corresponding to the JavaScript script code is obtained by monitoring a predetermined script interpreter supporting the PDF standard, such as a script interpreter embedded in the PDF reader, to obtain a code corresponding to the JavaScript script code.
  • the malicious JavaScr ipt code in the PDF file improves the security of network resources.
  • the embodiment of the invention further provides a device for detecting malicious code, which can implement the method steps shown in FIG. 1 to FIG. 4 above.
  • the device is shown in Figure 5 and includes:
  • the confusing module 31 is configured to extract the JavaScript script code in the PDF file; start a predetermined script interpreter supporting the PDF standard, such as a script interpreter embedded in the PDF reader, and run a de-confusing process on the JavaScript script code, and Obtaining code information corresponding to the JavaScript script code according to the de-obfuming process, the type of the code information includes an operation code and/or a string variable.
  • the detecting module 32 is configured to detect, according to the detection rule corresponding to the type of the code information obtained by the de-obfuscating module 31, whether the JavaScript script code is malicious code.
  • the device further includes:
  • An instrumentation injection module 33 configured to inject a library file into a de-obfuscation process run by the script interpreter on the JavaScript script code, where the library file is used to obtain the script interpreter in a solution confusion JavaScript script
  • the detecting module 32 includes:
  • the first matching unit 321, is configured to match the operation code corresponding to the JavaScript script code in the stored malicious operation code feature library if the type of the code information is an operation code.
  • a first determining unit 322 configured to: when the first matching unit 321 matches the opcode in the stored malicious opcode feature library, determine that the JavaScript script code is malicious code; The first matching unit 321 determines that the JavaScript script code is not malicious code when the operation code is not matched in the stored malicious operation code feature library.
  • the detecting module 32 includes:
  • the first obtaining unit 323 is configured to obtain a length of the string variable corresponding to the JavaScript script code if it is determined that the type of the code information is a string variable.
  • the first determining unit 324 is configured to: when the length of the string variable acquired by the first acquiring unit 323 is in the first interval, acquire the first feature parameter corresponding to the string variable; Determining, by the first feature parameter, whether the JavaScript script code is malicious code; and when the length of the string variable acquired by the first obtaining unit 323 is located in the second interval, acquiring the string variable corresponding to a second characteristic parameter; determining whether the JavaScript script code is malicious code according to the heap injection detection model and the second characteristic parameter.
  • the first characteristic parameter may include at least a combination of a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and whether or not one or more of known fingerprint codes are included; the second characteristic parameter may be at least A combination of one or more of a string information entropy value and a NOP instruction occurrence frequency.
  • the GetPC instruction refers to the instruction in the shellcode for locating its own virtual address; the flower instruction is the code used to interfere with the disassembly engine correctly implementing the disassembly; the frequency of the GetPC instruction and the flower instruction in the string can be used as the shell code in the string. Partial basis; shelling code fingerprint means that the shelled shellcode will always shell itself when it is executed.
  • the characteristics of these shelling codes are the shelling code fingerprint.
  • the existence of the fingerprint can be used as part of the existence of shellcode;
  • Entropy is an indicator to measure the amount of string information. If the string information entropy is less than a certain threshold, there may be heap injection;
  • the NOP instruction is a CPU empty operation instruction, when the string When a large number of NOP instructions are included, the NOP instruction may be the leading code of the shell injection shellcode (Slidge).
  • the detecting module 32 is specifically configured to detect, according to the detection rule corresponding to the operation code, if the type of the code information is an operation code and a string variable.
  • JavaScript script code is malicious code; and detecting whether the JavaScript script code is malicious code according to a detection rule corresponding to the string variable.
  • the detecting module 32 further includes:
  • the second matching unit 325 is configured to match an operation code corresponding to the JavaScript script code in the stored malicious operation code feature library.
  • a second determining unit 326 configured to: when the second matching unit 325 determines that the operation code is matched in the stored malicious operation code feature library, determine that the JavaScript script code is malicious code; The second matching unit 325 determines that the JavaScript script code is not malicious code when the opcode is not matched in the stored malicious opcode signature database.
  • the detecting module may further include:
  • the second obtaining unit 327 is configured to obtain a length of the string variable corresponding to the JavaScript script code.
  • a second determining unit 328 configured to acquire, in the third interval, the length of the string variable acquired by the second acquiring unit 327, and acquire a third feature parameter corresponding to the string variable; Determining, according to the stack overflow detection model and the third feature parameter, whether the JavaScript script code is malicious code; and when the length of the string variable acquired by the second obtaining unit 327 is in the fourth interval, Obtaining a fourth feature parameter corresponding to the string variable; and determining, according to the heap injection detection model and the fourth feature parameter, whether the JavaScript script code is malicious code.
  • the third characteristic parameter may include at least a combination of a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and whether or not one or more of known fingerprint codes are included; the fourth characteristic parameter may be at least A combination of one or more of a string information entropy value and a N 0 P instruction occurrence frequency.
  • the detecting means of the malicious code obtains the JavaScript by monitoring a predetermined script interpreter supporting the PDF standard, such as a script interpreter embedded in the PDF reader, to perform a de-obfuscation process on the JavaScript script code.
  • the code information corresponding to the script code, and detecting whether the JavaScript script code is malicious code according to a detection rule corresponding to different types of code information, and the malicious JavaScript script code propagated through the PDF file cannot be effectively recognized compared to the prior art. , can accurately detect malicious JavaScript code in PDF files, and improve the security of network resources.
  • the embodiment of the invention further provides a detecting device, which can implement the method steps shown in FIG. 1 to FIG. 4 above.
  • the device includes a processor 41 and a memory 42.
  • the memory 42 may include random access memory (RAM) or the like.
  • the memory 42 is configured to store program code; the processor 41 is configured to read program code stored in the memory to perform the steps in the method embodiments.
  • the processor 41 communicates with the memory 42 via a bus.
  • the memory 42 is further configured to store the JavaScript script code in the PDF file and the code information corresponding to the J a V a S c r i p t script code.
  • the processor 41 is configured to extract a JavaScript script code in a PDF file stored in the memory 42; and start a predetermined script interpreter supporting the PDF standard to the JavaScript foot
  • the code runs a solution confusing process, and obtains code information corresponding to the JavaScript script code according to the solution confusing process.
  • the type of the code information includes an operation code and a string variable.
  • the memory 42 is also used to store library files.
  • the processor 41 is further configured to inject a library file stored in the memory 42 into the script interpreter supporting the PDF standard, such as a script interpreter embedded in a predetermined PDF reader.
  • the library file is used to obtain code information corresponding to the confusing JavaScript script code generated by the script interpreter in the process of disambiguating the JavaScript script code.
  • the processor 41 is configured to: if the type of the code information stored by the memory 42 is an operation code, match an operation code corresponding to the JavaScript script code in a stored malicious operation code feature database. And if the opcode is matched in the stored malicious opcode signature database, determining that the JavaScript script code is malicious code; if the opcode is not matched in the stored malicious opcode signature database, It is determined that the JavaScript script code is not malicious code.
  • the memory 42 is configured to store a malicious operation code feature library.
  • the processor 41 is configured to: if the type of the code information stored by the memory 42 is a string variable, obtain a length of a string variable corresponding to the JavaScript script code; And determining, by the stack overflow detection model and the first feature parameter, whether the JavaScript script code is a malicious code; The length of the string variable is located in the second interval, and the second feature parameter corresponding to the string variable is obtained. According to the heap injection detection model and the second feature parameter, whether the JavaScript script code is malicious code is determined.
  • the memory 42 is configured to store a length of a character string corresponding to the JavaScript script code, a first feature parameter, a second feature parameter, a first interval, a second interval, a stack overflow detection model, and a heap injection detection model.
  • the first characteristic parameter may include at least a combination of a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and whether or not one or more of known fingerprint codes are included; the second characteristic parameter may include at least a character string.
  • the processor 41 is configured to: if the type of the code information stored by the memory 42 is an operation code and a string variable, detect the JavaScript script code according to a detection rule corresponding to the operation code. Whether it is malicious code; and,
  • the method for detecting, by the processor 41, whether the JavaScript script code is malicious code according to the detection rule corresponding to the operation code includes:
  • the processor 41 detecting the JavaScript according to the detection rule corresponding to the string variable Whether the script code is a malicious code implementation method specifically includes:
  • the third characteristic parameter may include at least a combination of a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and whether or not one or more of known fingerprint codes are included; the fourth characteristic parameter may include at least A combination of one or more of a string information entropy value and a NOP instruction occurrence frequency.
  • the detecting device obtains code information corresponding to the JavaScript script code by monitoring a de-obfuscation process of the JavaScript script code running by a script interpreter embedded in a predetermined PDF reader, and according to different types of codes.
  • the detection rule corresponding to the information is used to detect whether the JavaScript script code is malicious code, and the malicious JavaScript script code transmitted through the PDF file cannot be effectively recognized compared with the prior art, and the malicious file in the PDF file to be detected can be accurately detected.
  • JavaScript code that improves the security of network resources.
  • the present invention can be implemented by means of software plus necessary general hardware, and of course, by hardware, but in many cases, the former is a better implementation. .
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a readable storage medium, such as a floppy disk of a computer.
  • a hard disk or optical disk or the like includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Devices For Executing Special Programs (AREA)
  • Debugging And Monitoring (AREA)
  • Storage Device Security (AREA)

Abstract

The present invention relates to the communications security field. Provided is a method for detecting malicious code, which comprises: extracting JavaScript script code from a PDF file (101); starting a pre-selected script interpreter that supports a PDF standard to run a de-confusion process on the JavaScript script code, and obtaining code information corresponding to the JavaScript script code according to the de-confusion process, wherein a type of the code information comprises an operation code and a character string variable (102); and according to a detection rule corresponding to the type of the code information, detecting whether the JavaScript script code is malicious code (103). Also provided is an apparatus for detecting malicious code. The method and the apparatus can improve accuracy in detecting malicious JavaScript code.

Description

恶意代码的检测方法及装置 技术领域  Method and device for detecting malicious code
本发明涉及通信安全技术领域,尤其涉及一种恶意代码的检测方法及 装置。  The present invention relates to the field of communication security technologies, and in particular, to a method and apparatus for detecting malicious code.
背景技术 Background technique
PDF ( portal document format, 便携文件格式) 是一种电子文件格 式。 这种格式不受阅读软件、 硬件以及操作系统的限制, 可以在包括 Windows , Linux和 Mac OS的任何平台中使用。 JavaScript是一种广泛 用于客户端网页开发的脚本语言, 这种脚本语言能够实现的功能十分丰 富。 在 PDF中嵌入 JavaScript脚本语言对于实现 PDF文件的交互特性十 分重要, 譬如动态内容的呈现、 表格和 3D界面等等。  PDF (portal document format) is an electronic file format. This format is not limited by reading software, hardware, and operating system and can be used on any platform including Windows, Linux, and Mac OS. JavaScript is a scripting language widely used for client-side web development. This scripting language is very versatile. Embedding JavaScript scripting languages in PDFs is important for realizing the interactive nature of PDF files, such as the presentation of dynamic content, tables, and 3D interfaces.
恶意 JavaScript脚本程序是恶意攻击代码中的一个新型病毒, 对软 件系统增加、 改变或删除部分脚本, 以制造危害或者破坏计算机系统功能 和网络的完整性、 保密性、 可用性等为目的。 它通常由一段 JavaScript 脚本语言编写而成, 恶意 JavaScript脚本程序书写形式灵活化, 易通过 各种代码混淆技术产生变种,当前的反病毒技术很难达到对它的控制及防 护能力。  A malicious JavaScript script is a new type of virus in malicious attack code that adds, changes, or deletes part of a script to a software system to create a hazard or compromise the integrity, confidentiality, availability, etc. of computer system functions and networks. It is usually written in a JavaScript scripting language. The malicious JavaScript scripts are written in a flexible form and are easily transformed by various code obfuscation techniques. It is difficult for current anti-virus technologies to achieve control and protection.
恶意 JavaScript脚本的传播通常是通过浏览器、 局域网共享、 即时 聊天和 Email为载体实现的。近年来,随着 PDF漏洞利用技术的日益成熟, 越来越多的恶意 JavaScript被放在了 PDF文件里。  The spread of malicious JavaScript scripts is usually implemented through browsers, LAN sharing, instant messaging, and email. In recent years, with the growing maturity of PDF exploits, more and more malicious JavaScript has been placed in PDF files.
代码混淆,顾名思义是一种人为刻意地使脚本代码显得杂乱难懂的技 术。 在许多商业软件中, 为了保护版权开发者可能将代码进行混淆从而给 逆向工程师增添困难。 在恶意脚本中, 混淆的使用则是为了躱过杀毒软件 和防火墙中的病毒特征库扫描, 并给人工的恶意攻击代码分析制造麻烦。  Code confusing, as the name suggests, is a technique that artificially makes the script code seem cluttered. In many commercial software, in order to protect copyright developers, the code may be confused to add difficulties to the reverse engineer. In malicious scripts, the use of obfuscation is to scan the virus signature database in anti-virus software and firewalls, and to create trouble for manual malicious attack code analysis.
相比于网页中的恶意 JavaScript脚本, PDF 中的恶意脚本在混淆上 更多地利用了 PDF标准里的一些特性。如在文件的定义中使用字母和数字 的十六进制码来代替相应文字,使用 PDF流对象来对某些含有 JavaScript 脚本的对象进行隐藏和使用 PDF 流对象中的编码嵌套功能用多种编码方 法处理 JavaScript脚本。 现有浏览器端的许多解混淆工具无法对釆用上 述混淆方法进行混淆的 JavaScript脚本进行解混淆, 因此促使了恶意脚 本通过 PDF文件传播攻击,常见的攻击方式包括网页中包含恶意 PDF文件、 定向钓鱼邮件中包含恶意 PDF文件附件等,其中恶意 PDF文件是指携带恶 意 JavaScript脚本的 PDF文件。 Compared to malicious JavaScript scripts on web pages, malicious scripts in PDF make more use of some of the features in the PDF standard. Use letters and numbers in the definition of a file The hexadecimal code replaces the corresponding text, uses the PDF stream object to hide certain objects containing JavaScript scripts, and uses the encoding nesting function in the PDF stream object to process JavaScript scripts in a variety of encoding methods. Many of the existing obfuscation tools on the browser side cannot solve the confusion of JavaScript scripts that are confused with the above obfuscation methods, thus causing malicious scripts to spread attacks through PDF files. Common attack methods include malicious PDF files in web pages, and targeted fishing. The email contains malicious PDF file attachments, etc. The malicious PDF file refers to a PDF file carrying a malicious JavaScript script.
现有对 PDF文件中脚本进行检测的方法有:  Existing methods for detecting scripts in PDF files are:
模拟执行环境来执行被检测的 PDF文件,通过模拟被检测的 PDF文件 在正常的系统操作环境中的行为, 来检测文件执行时的调用等一系列操 作, 从而发现恶意行为。 但这种方法对于常见的 JavaScript 的欺骗隐藏 手段, 例如某个 PDF文件中的 JavaScript脚本只在特定时段或依赖特定 的插件才会呈现恶意行为的设置, 就无法检测到此类恶意行为。  The simulation execution environment is executed to execute the detected PDF file, and by detecting the behavior of the detected PDF file in the normal system operating environment, a series of operations such as calling when the file is executed are detected, thereby discovering malicious behavior. However, this method can't detect such malicious behaviors because of the common JavaScript spoofing hidden means, such as JavaScript scripts in a PDF file that only display malicious behavior at specific time or depending on the specific plug-in.
发明人在实现本发明的过程中发现, 现有技术至少存在如下问题: 不 能对携带在 PDF文件中的恶意 JavaScript代码, 特别是通过混淆的方式 携带在 PDF文件中的恶意 JavaScript代码进行准确的检测。 发明内容  In the process of implementing the present invention, the inventors have found that at least the following problems exist in the prior art: The malicious JavaScript code carried in the PDF file cannot be accurately detected, especially the malicious JavaScript code carried in the PDF file by obfuscation. . Summary of the invention
本发明实施例提供一种恶意代码的检测方法及装置,能够提高对携带 在 PDF文件中的恶意 JavaScript代码的检测准确性。  The embodiment of the invention provides a method and a device for detecting malicious code, which can improve the detection accuracy of malicious JavaScript code carried in a PDF file.
为达到上述目的, 本发明的实施例釆用如下技术方案:  In order to achieve the above object, embodiments of the present invention use the following technical solutions:
第一方面, 提供了一种恶意代码的检测方法, 包括:  In a first aspect, a method for detecting malicious code is provided, including:
提取 PDF文件中的 JavaScript脚本代码;  Extract the JavaScript script code from the PDF file;
启动预定的支持 PDF标准的脚本解释器对所述 JavaScript脚本代码 运行解混淆进程, 并根据所述解混淆进程, 获得所述 JavaScript脚本代 码对应的代码信息, 所述代码信息的类型包括操作码和字符串变量;  Launching a predetermined PDF standard-compliant script interpreter to run a de-obfuscation process on the JavaScript script code, and obtaining code information corresponding to the JavaScript script code according to the de-obfuming process, the type of the code information including an operation code and String variable
根据所述代码信息的类型对应的检测规则, 检测所述 JavaScript脚 本代码是否为恶意代码。 Detecting the JavaScript foot according to a detection rule corresponding to the type of the code information Whether this code is malicious code.
在第一方面的第一种可能实现方式中, 在所述启动预定的支持 PDF 标准的脚本解释器对所述 JavaScript脚本代码运行解混淆进程之前, 还 包括:  In a first possible implementation manner of the first aspect, before the starting a predetermined PDF standard-enabled script interpreter runs the de-obfuscation process on the JavaScript script code, the method further includes:
将库文件插桩注入在所述脚本解释器的解混淆进程中,所述库文件用 于获取所述脚本解释器在解混淆 JavaScript脚本代码进程中产生的被解 混淆 JavaScript脚本代码对应的代码信息。  Injecting the library file into the de-obfuscation process of the script interpreter, the library file is used to obtain code information corresponding to the de-scrambled JavaScript script code generated by the script interpreter in the process of disambiguating the JavaScript script code .
结合第一方面以及第一方面的第一种可能实现方式,在第一方面的第 二种可能实现方式中, 如果所述代码信息的类型为操作码, 则所述根据所 述代码信息的类型对应的检测规则, 检测所述 JavaScript脚本代码是否 为恶意代码包括:  With reference to the first aspect and the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, if the type of the code information is an operation code, the type according to the code information Corresponding detection rules, detecting whether the JavaScript script code is malicious code includes:
在已存储的恶意操作码特征库中匹配所述 JavaScript脚本代码对应 的操作码;  Matching the opcode corresponding to the JavaScript script code in the stored malicious opcode feature library;
若在已存储的恶意操作码特征库中匹配到所述操作码, 则确定所述 JavaScript脚本代码为恶意代码;  If the opcode is matched in the stored malicious opcode signature database, determining that the JavaScript script code is malicious code;
若在已存储的恶意操作码特征库中未匹配到所述操作码,则确定所述 JavaScript脚本代码不是恶意代码。  If the opcode is not matched in the stored malicious opcode signature library, it is determined that the JavaScript script code is not malicious code.
结合第一方面以及第一方面的第一种可能实现方式,在第一方面的第 三种可能实现方式中, 如果所述代码信息的类型为字符串变量, 则所述根 据所述代码信息的类型对应的检测规则, 检测所述 JavaScript脚本代码 是否为恶意代码包括:  With reference to the first aspect, and the first possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, if the type of the code information is a string variable, the information according to the code information The detection rule corresponding to the type, detecting whether the JavaScript script code is malicious code includes:
获取所述 JavaScript脚本代码对应的字符串变量的长度;  Obtaining the length of the string variable corresponding to the JavaScript script code;
若所述字符串变量的长度位于第一区间,则获取所述字符串变量对应 的第一特征参量; 根据栈溢出检测模型和所述第一特征参量, 判断所述 JavaScript脚本代码是否为恶意代码;  If the length of the string variable is in the first interval, acquiring a first feature parameter corresponding to the string variable; determining, according to the stack overflow detection model and the first feature parameter, whether the JavaScript script code is malicious code ;
若所述字符串变量的长度位于第二区间,则获取所述字符串变量对应 的第二特征参量; 根据堆喷射检测模型和所述第二特征参量, 判断所述 JavaScript脚本代码是否为恶意代码。 If the length of the string variable is in the second interval, acquiring a second feature parameter corresponding to the string variable; determining, according to the heap injection detection model and the second feature parameter, Whether the JavaScript script code is malicious code.
结合第一方面的第三种可能实现方式,在第一方面的第四种可能实现 方式中, 所述第一特征参量包括 GetPC指令出现频率、 花指令出现频率、 是否包含已知脱壳代码指紋中的一种或多种的组合;所述第二特征参量包 括字符串信息熵值、 N 0 P指令出现频率中的一种或多种的组合。  In conjunction with the third possible implementation of the first aspect, in a fourth possible implementation manner of the first aspect, the first characteristic parameter includes a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and whether a fingerprint of a known shelling code is included a combination of one or more of the plurality; the second characteristic parameter comprising a combination of one or more of a string information entropy value and a N 0 P instruction occurrence frequency.
结合第一方面以及第一方面的第一种可能实现方式,在第一方面的第 五种可能实现方式中, 如果所述代码信息的类型为操作码和字符串, 则所 述根据所述代码信息的类型对应的检测规则, 检测所述 JavaScript脚本 代码是否为恶意代码包括:  With reference to the first aspect and the first possible implementation manner of the first aspect, in a fifth possible implementation manner of the first aspect, if the type of the code information is an operation code and a character string, the according to the code The detection rule corresponding to the type of the information, and detecting whether the JavaScript script code is malicious code includes:
根据所述操作码对应的检测规则, 检测所述 JavaScript脚本代码; 以及,  Detecting the JavaScript script code according to a detection rule corresponding to the operation code; and
根据所述字符串对应的检测规则, 检测所述 JavaScript脚本代码; 当根据所述操作码对应的检测规则, 确定所述 JavaScript脚本代码 为恶意代码, 或根据所述字符串变量对应的检测规则, 确定所述 JavaScript脚本代码为恶意代码时, 则确定所述 JavaScript脚本代码为 恶意代码;  Determining, according to the detection rule corresponding to the character string, the JavaScript script code; determining, according to the detection rule corresponding to the operation code, that the JavaScript script code is a malicious code, or according to a detection rule corresponding to the string variable, When the JavaScript script code is determined to be malicious code, determining that the JavaScript script code is malicious code;
当根据所述操作码对应的检测规则, 确定所述 JavaScript脚本代码 不为恶意代码, 且根据所述字符串变量对应的检测规则, 确定所述 JavaScript脚本代码不为恶意代码时, 则确定所述 JavaScript脚本代码 不为恶意代码;  Determining that the JavaScript script code is not malicious code according to the detection rule corresponding to the operation code, and determining that the JavaScript script code is not malicious code according to the detection rule corresponding to the string variable JavaScript script code is not malicious code;
其中, 所述 居所述操作码对应的检测规则, 检测所述 JavaScript 脚本代码是否为恶意代码包括:  The detecting rule corresponding to the operation code, detecting whether the JavaScript script code is malicious code includes:
在已存储的恶意操作码特征库中匹配所述 JavaScript脚本代码对应 的操作码; 若在已存储的恶意操作码特征库中匹配到所述操作码, 则确定 所述 JavaScript脚本代码为恶意代码; 若在已存储的恶意操作码特征库 中未匹配到所述操作码,则确定所述 JavaScript脚本代码不是恶意代码; 所述根据所述字符串变量对应的检测规则, 检测所述 JavaScript脚 本代码是否为恶意代码包括: Matching an operation code corresponding to the JavaScript script code in the stored malicious operation code feature database; if the operation code is matched in the stored malicious operation code feature library, determining that the JavaScript script code is a malicious code; If the opcode is not matched in the stored malicious opcode signature database, determining that the JavaScript script code is not malicious code; and detecting the JavaScript foot according to the detection rule corresponding to the string variable Whether this code is malicious code includes:
获取所述 JavaScript脚本代码对应的字符串变量的长度; 若所述字 符串变量的长度位于第三区间,则获取所述字符串变量对应的第三特征参 量; 根据栈溢出检测模型和所述第三特征参量, 判断所述 JavaScript脚 本代码是否为恶意代码; 若所述字符串变量的长度位于第四区间, 则获 取所述字符串变量对应的第四特征参量;根据堆喷射检测模型和所述第四 特征参量, 判断所述 JavaScript脚本代码是否为恶意代码。  Obtaining a length of the string variable corresponding to the JavaScript script code; if the length of the string variable is in the third interval, acquiring a third feature parameter corresponding to the string variable; according to the stack overflow detection model and the a third feature parameter, determining whether the JavaScript script code is malicious code; if the length of the string variable is in the fourth interval, acquiring a fourth feature parameter corresponding to the string variable; according to the heap injection detection model and the The fourth characteristic parameter determines whether the JavaScript script code is malicious code.
结合第一方面的第五种可能实现方式,在第一方面的第六种可能实现 方式中, 所述第三特征参量包括 GetPC指令出现频率、 花指令出现频率、 是否包含已知脱壳代码指紋中的一种或多种的组合;所述第四特征参量包 括字符串信息熵值、 N 0 P指令出现频率中的一种或多种的组合。  With reference to the fifth possible implementation manner of the first aspect, in a sixth possible implementation manner of the first aspect, the third characteristic parameter includes a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and whether a fingerprint of a known shelling code is included a combination of one or more of the following; the fourth characteristic parameter comprises a combination of one or more of a string information entropy value and a N 0 P instruction occurrence frequency.
第二方面, 提供了一种恶意代码的检测装置, 包括:  In a second aspect, a detection device for malicious code is provided, including:
解混淆模块, 用于提取 PDF文件中的 JavaScript脚本代码; 启动预 定的支持 PDF标准的脚本解释器对所述 JavaScript脚本代码运行解混淆 进程, 并根据所述解混淆进程, 获得所述 JavaScript脚本代码对应的代 码信息, 所述代码信息的类型包括操作码和字符串变量;  a decryption module, configured to extract JavaScript script code in a PDF file; start a predetermined script interpreter supporting the PDF standard to run a de-obfuscation process on the JavaScript script code, and obtain the JavaScript script code according to the de-obfuscation process Corresponding code information, the type of the code information includes an operation code and a string variable;
检测模块,用于根据所述解混淆模块获得的代码信息的类型对应的检 测规则, 检测所述 JavaScript脚本代码是否为恶意代码。  And a detecting module, configured to detect, according to the detection rule corresponding to the type of the code information obtained by the de-obfuscating module, whether the JavaScript script code is malicious code.
在第二方面的第一种可能实现方式中, 所述装置还包括: 插桩注入模 块, 用于将库文件插桩注入在所述脚本解释器对所述 JavaScript脚本代 码运行的解混淆进程中, 所述库文件用于获取所述脚本解释器在解混淆 JavaScript脚本代码进程中产生的被解混淆 JavaScript脚本代码对应的 代码信息。  In a first possible implementation of the second aspect, the apparatus further includes: an instrumentation injection module, configured to inject a library file into a de-obfuscation process in which the script interpreter runs the JavaScript script code The library file is used to obtain code information corresponding to the de-scrambled JavaScript script code generated by the script interpreter in the process of disambiguating the JavaScript script code.
结合第二方面以及第一方面的第一种可能实现方式,在第二方面的第 二种可能实现方式中, 所述检测模块包括:  With reference to the second aspect, and the first possible implementation manner of the first aspect, in a second possible implementation manner of the second aspect, the detecting module includes:
第一匹配单元, 用于如果所述代码信息的类型为操作码, 在已存储的 恶意操作码特征库中匹配所述 JavaScript脚本代码对应的操作码; 第一确定单元,用于在所述第一匹配单元在已存储的恶意操作码特征 库中匹配到所述操作码时, 确定所述 JavaScript脚本代码为恶意代码; 以及用于在所述第一匹配单元在已存储的恶意操作码特征库中未匹配到 所述操作码时, 确定所述 JavaScript脚本代码不是恶意代码。 a first matching unit, configured to: if the type of the code information is an operation code, match an operation code corresponding to the JavaScript script code in a stored malicious operation code feature library; a first determining unit, configured to: when the first matching unit matches the operation code in the stored malicious operation code feature library, determine that the JavaScript script code is malicious code; and When the matching unit does not match the opcode in the stored malicious opcode feature library, it is determined that the JavaScript script code is not malicious code.
结合第二方面的第二种可能实现方式,在第二方面的第三种可能实现 方式中, 所述检测模块包括:  With reference to the second possible implementation of the second aspect, in a third possible implementation manner of the second aspect, the detecting module includes:
第一获取单元, 用于如果所述代码信息的类型为字符串变量, 获取所 述 JavaScript脚本代码对应的字符串变量的长度;  a first obtaining unit, configured to obtain a length of a string variable corresponding to the JavaScript script code if the type of the code information is a string variable;
第一判断单元,用于在所述第一获取单元获取的字符串变量的长度位 于第一区间时, 则获取所述字符串变量对应的第一特征参量; 根据栈溢出 检测模型和所述第一特征参量, 判断所述 JavaScript脚本代码是否为恶 意代码; 以及用于在所述第一获取单元获取的字符串变量的长度位于第 二区间时, 则获取所述字符串变量对应的第二特征参量; 根据堆喷射检测 模型和所述第二特征参量, 判断所述 JavaScript脚本代码是否为恶意代 码。  a first determining unit, configured to acquire a first feature parameter corresponding to the string variable when the length of the string variable acquired by the first acquiring unit is in the first interval; and the stack overflow detection model and the first a feature parameter, determining whether the JavaScript script code is malicious code; and acquiring, when the length of the string variable acquired by the first acquiring unit is in the second interval, acquiring the second feature corresponding to the string variable a parameter; determining, according to the heap injection detection model and the second characteristic parameter, whether the JavaScript script code is malicious code.
结合第二方面以及第一方面的第一种可能实现方式,在第二方面的第 四种可能实现方式中, 所述第一特征参量包括 GetPC指令出现频率、花指 令出现频率、 是否包含已知脱壳代码指紋中的一种或多种的组合; 所述第 二特征参量包括字符串信息熵值、 NOP指令出现频率中的一种或多种的组 合。  With reference to the second aspect, and the first possible implementation manner of the first aspect, in a fourth possible implementation manner of the second aspect, the first characteristic parameter includes a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and a known a combination of one or more of the shelling code fingerprints; the second characteristic parameter comprising a combination of one or more of a string information entropy value and a NOP instruction occurrence frequency.
结合第二方面的第四种可能实现方式,在第二方面第五种可能实现方 式中,所述检测模块具体用于如果所述代码信息的类型为操作码和字符串 变量, 根据所述操作码对应的检测规则, 检测所述 JavaScript脚本代码; 以及, 根据所述字符串变量对应的检测规则, 检测所述 JavaScript脚本 代码;  With reference to the fourth possible implementation of the second aspect, in a fifth possible implementation manner of the second aspect, the detecting module is specifically configured to: if the type of the code information is an opcode and a string variable, according to the operation Detecting the JavaScript script code corresponding to the detection rule corresponding to the code; and detecting the JavaScript script code according to the detection rule corresponding to the string variable;
以及,用于当根据所述操作码对应的检测规则,确定所述 JavaScript 脚本代码为恶意代码或根据所述字符串变量对应的检测规则, 确定所述 JavaScript脚本代码为恶意代码时, 确定所述 JavaScript脚本代码为恶 意代码; And determining, according to the detection rule corresponding to the operation code, determining that the JavaScript script code is malicious code or according to a detection rule corresponding to the string variable, determining the When the JavaScript script code is malicious code, the JavaScript script code is determined to be malicious code;
以及,用于当根据所述操作码对应的检测规则,确定所述 JavaScript 脚本代码不为恶意代码且根据所述字符串变量对应的检 :j规则,确定所述 JavaScript脚本代码不为恶意代码时, 确定所述 JavaScript脚本代码不 为恶意代码;  And determining, when the JavaScript script code is not malicious code according to the detection rule corresponding to the operation code, and determining that the JavaScript script code is not malicious code according to the check: j rule corresponding to the string variable , determining that the JavaScript script code is not malicious code;
所述检测模块进一步包括:  The detecting module further includes:
第二匹配单元, 用于在已存储的恶意操作码中匹配所述 JavaScript 脚本代码对应的操作码;  a second matching unit, configured to match an opcode corresponding to the JavaScript script code in the stored malicious opcode;
第二确定单元,用于在所述第二匹配单元确定在已存储的恶意操作码 特征库中匹配到所述操作码时, 确定所述 JavaScript脚本代码为恶意代 码;以及用于在所述第二匹配单元确定在已存储的恶意操作码特征库中未 匹配到所述操作码时, 确定所述 JavaScript脚本代码不是恶意代码; 所述检测模块进一步还包括:  a second determining unit, configured to: when the second matching unit determines that the operation code is matched in the stored malicious operation code feature database, determine that the JavaScript script code is malicious code; The second matching unit determines that the JavaScript script code is not malicious code when the operation code is not matched in the stored malicious operation code feature database; and the detecting module further includes:
第二获取单元, 用于获取所述 JavaScript脚本代码对应的字符串变 量的长度;  a second obtaining unit, configured to acquire a length of a string variable corresponding to the JavaScript script code;
第二判断单元,用于在所述第二获取单元获取的所述字符串变量的长 度位于第三区间, 则获取所述字符串变量对应的第三特征参量; 根据栈溢 出检测模型和所述第三特征参量, 判断所述 JavaScript脚本代码是否为 恶意代码; 以及用于在所述第二获取单元获取的所述字符串变量的长度 位于第四区间时, 获取所述字符串变量对应的第四特征参量; 根据堆喷射 检测模型和所述第四特征参量, 判断所述 JavaScript脚本代码是否为恶 意代码。  a second determining unit, configured to acquire a third feature parameter corresponding to the string variable according to a length of the string variable acquired by the second acquiring unit, and a third feature parameter corresponding to the string variable; a third characteristic parameter, determining whether the JavaScript script code is malicious code; and acquiring, when the length of the string variable acquired by the second obtaining unit is in the fourth interval, acquiring the corresponding string variable Four characteristic parameters; determining whether the JavaScript script code is malicious code according to the heap injection detection model and the fourth characteristic parameter.
结合第二方面的第五种可能实现方式,在第二方面的第六种可能实现 方式中, 所述第三特征参量包括 GetPC指令出现频率、 花指令出现频率、 是否包含已知脱壳代码指紋中的一种或多种的组合;所述第四特征参量包 括字符串信息熵值、 N 0 P指令出现频率中的一种或多种的组合。 第三方面, 提供了一种检测设备, 包括存储器和处理器, 其中: 所述存储器被配置存储代码; In conjunction with the fifth possible implementation of the second aspect, in a sixth possible implementation manner of the second aspect, the third characteristic parameter includes a frequency of occurrence of the GetPC instruction, a frequency of occurrence of the flower instruction, and whether the fingerprint of the known shelling code is included a combination of one or more of the following; the fourth characteristic parameter comprises a combination of one or more of a string information entropy value and a N 0 P instruction occurrence frequency. In a third aspect, a detection apparatus is provided, including a memory and a processor, wherein: the memory is configured to store a code;
所述处理器被配置读取所述存储器中存储的代码, 执行上述第一方 面、 或第一方面的六种可能实现方式中任意一种提供的方法。  The processor is configured to read the code stored in the memory to perform the method provided by any of the first aspect, or any of the six possible implementations of the first aspect.
本发明实施例提供的一种恶意代码的检测方法及装置,通过启动预定 的支持 PDF标准的脚本解释器对所述 JavaScript脚本代码运行解混淆进 程来获得所述 JavaScript脚本代码对应的代码信息, 并根据所述代码信 息的类型对应的检测规则来检测所述 JavaScript脚本代码是否为恶意代 码, 相比于现有技术能够比较准确地对携带在 PDF 文件中的恶意 JavaScript代码进行检测。  A method and device for detecting malicious code provided by an embodiment of the present invention, by executing a predetermined script interpreter supporting the PDF standard, running a de-obfuscation process on the JavaScript script code to obtain code information corresponding to the JavaScript script code, and Detecting whether the JavaScript script code is malicious code according to a detection rule corresponding to the type of the code information, can detect the malicious JavaScript code carried in the PDF file more accurately than the prior art.
附图说明 DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对 实施例或现有技术描述中所需要使用的附图作简单地介绍, 显而易见地, 下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员 来讲, 在不付出创造性劳动的前提下, 还可以根据这些附图获得其他的附 图。  In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings to be used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any creative work.
图 1为本发明实施例提供的一种恶意代码的检测方法流程图; 图 2为本发明实施例提供的另一种恶意代码的检测方法流程图; 图 3为本发明实施例提供的另一种恶意代码的检测方法流程图; 图 4为本发明实施例提供的另一种恶意代码的检测方法流程图; 图 5为本发明实施例提供的一种恶意代码的检测装置的组成框图; 图 6为本发明实施例提供的另一种恶意代码的检测装置的组成框图; 图 7为本发明实施例提供的检测模块的组成框图;  1 is a flowchart of a method for detecting malicious code according to an embodiment of the present invention; FIG. 2 is a flowchart of another method for detecting malicious code according to an embodiment of the present invention; FIG. 3 is another embodiment of the present invention. FIG. 4 is a flowchart of a method for detecting malicious code according to an embodiment of the present invention; FIG. 5 is a block diagram of a device for detecting malicious code according to an embodiment of the present invention; 6 is a block diagram of a component of a detection device for another malicious code according to an embodiment of the present invention; FIG. 7 is a block diagram of a component of a detection module according to an embodiment of the present invention;
图 8为本发明实施例提供的另一种检测模块的组成框图;  FIG. 8 is a structural block diagram of another detection module according to an embodiment of the present invention;
图 9为本发明实施例提供的另一种检测模块的组成框图;  FIG. 9 is a structural block diagram of another detection module according to an embodiment of the present invention;
图 10为本发明实施例提供的一种检测设备的组成框图。  FIG. 10 is a structural block diagram of a detecting device according to an embodiment of the present invention.
具体实施方式 下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进 行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例, 而不是全部的实施例。基于本发明中的实施例, 本领域普通技术人员在没 有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的 范围。 detailed description The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
本发明一实施例提供了一种恶意代码的检测方法,该方法可以由一个 检测设备来执行, 如图 1所示, 包括:  An embodiment of the present invention provides a method for detecting a malicious code, which can be executed by a detecting device. As shown in FIG. 1, the method includes:
101、 提取 PDF文件中的 JavaScript脚本代码。  101. Extract the JavaScript script code in the PDF file.
其中, 所述 JavaScript脚本嵌入在 PDF文件中, 可以实现 PDF文件 的显示增效功能, 但也被攻击者和恶意代码恶意使用, 用于利用 PDF阅读 器软件的漏洞, 渗透入侵所在主机。 待检测的该 PDF文件可以来源于电子 邮件的附件, 网页内容等等, 在这里不进行限定。  The JavaScript script is embedded in the PDF file, and can realize the display and synergistic function of the PDF file, but is also maliciously used by the attacker and the malicious code, and is used to exploit the vulnerability of the PDF reader software to infiltrate the host invaded. The PDF file to be detected may be derived from an attachment of an e-mail, a web page content, etc., and is not limited herein.
其中, 所述提取 PDF文件中的 JavaScript脚本代码的实现方法具体 可以包括: 根据 PDF 文件的国际通用格式规范, 解析 PDF 文件中 JavaScript流的元素位置, 并根据该 JavaScript流所釆用的压缩编码方 法进行相应的解码, 从而提取到 PDF文件中包含的 JavaScript代码。  The method for extracting the JavaScript script code in the PDF file may specifically include: parsing an element position of the JavaScript stream in the PDF file according to an international common format specification of the PDF file, and compressing the encoding method according to the JavaScript stream. The corresponding decoding is performed to extract the JavaScript code contained in the PDF file.
102、启动预定的支持 PDF标准的脚本解释器对所述 JavaScript脚本 代码运行解混淆进程, 并根据所述解混淆进程, 获得所述 JavaScript脚 本代码对应的代码信息, 所述代码信息的类型包括操作码和字符串变量。  102. Start a predetermined PDF standard-compliant script interpreter to run a de-obfuscation process on the JavaScript script code, and obtain code information corresponding to the JavaScript script code according to the de-obfuming process, where the type of the code information includes an operation. Code and string variables.
其中, 所述预定的支持 PDF标准(即支持 PDF格式规范, 能够解析以 PDF格式规范生成的 PDF文件) 的脚本解释器, 可以是 PDF阅读器内嵌的 脚本解释器, 其中 PDF阅读器可以为任意一种拥有内嵌 JavaScript脚本 代码解释引擎的 PDF阅读器, 例如, 本发明实施例使用由 Adobe公司提供 的 PDF阅读应用程序 Acrobat reader, 使用 Acrobat reader 内嵌的脚本 解释器可以实现对目前绝大多数的 JavaScript脚本代码进行解混淆。 所 述支持 PDF 标准的脚本解释器可以是所述检测设备的管理人员在检测之 前预先配置的。 其中,所述代码信息是指本发明实施例中使用的支持 PDF标准的脚本 解释器对脚本代码解释并翻译后,提交给 JavaScript虚拟机执行的信息。 其中, JavaScript虚拟机是使用软件模拟的用于运行所有 JavaScript代 码的抽象计算机。 在本实施例中, 提交给 JavaScript虚拟机执行的信息 包含了脚本解释器在对脚本代码解释和翻译的过程中的不同阶段所输出 的信息, 至少可以包括以下两类: 操作码和字符串变量。 The script interpreter supporting the PDF standard (ie, supporting the PDF format specification and capable of parsing the PDF file generated by the PDF format specification) may be a script interpreter embedded in the PDF reader, wherein the PDF reader may be Any PDF reader having an embedded JavaScript script code interpretation engine, for example, the embodiment of the present invention uses a PDF reading application Acrobat reader provided by Adobe, and can be implemented by using a script interpreter embedded in Acrobat reader. Most JavaScript script code is confusing. The script interpreter supporting the PDF standard may be pre-configured by an administrator of the detecting device prior to detection. The code information refers to the information that the script interpreter that supports the PDF standard used in the embodiment of the present invention interprets and translates the script code and submits it to the JavaScript virtual machine for execution. Among them, the JavaScript virtual machine is an abstract computer that uses software simulation to run all JavaScript code. In this embodiment, the information submitted to the JavaScript virtual machine for execution contains information output by the script interpreter at different stages in the process of interpreting and translating the script code, and may include at least the following two categories: opcodes and string variables. .
其中, 若代码信息的类型包括操作码, 则所述操作码可以为机器使用 的命令代码, 典型操作码片断如下:  Wherein, if the type of the code information includes an operation code, the operation code may be a command code used by the machine, and a typical operation code fragment is as follows:
[ 207] resol ve_g loba 1 r 3, Array (Sidl 0)  [ 207] resol ve_g loba 1 r 3, Array (Sidl 0)
[ 212] ge t _by_ id r 1, r 3, prototype (Sidl 1)  [ 212] ge t _by_ id r 1, r 3, prototype (Sidl 1)
[ 220] method-check  [220] method-check
[ 221] get-by-id rO, rl, push (o)idl2)  [ 221 ] get-by-id rO, rl, push (o)idl2)
[ 229] mov r2, Int32: 0 (o)k8)  [ 229] mov r2, Int32: 0 (o)k8)
[ 232] call rO, 2, 9  [ 232] call rO, 2, 9
需要说明的是, 虽然在本实施例中, 为了表达清楚, 在此使用了通用 英文字符来对操作码进行举例说明, 但实际过程中, 操作码可以以二进制 来表示。  It should be noted that, in the present embodiment, for the sake of clarity, common English characters are used herein to illustrate the operation code, but in practice, the operation code may be expressed in binary.
其中, 若代码信息的类型包括字符串变量, 所述字符串变量可以为 JavaScript脚本中定义的字符串变量, 典型的存在形式如下:  Wherein, if the type of the code information includes a string variable, the string variable may be a string variable defined in the JavaScript script, and the typical existence form is as follows:
var s tr = " some value …,, ;  Var s tr = " some value ...,, ;
thi s Var. replace ( "Monday" , "Friday" );  Thi s Var. replace ( "Monday" , "Friday" );
需要说明的是, 在恶意代码中, 字符串变量的值本身也可能是经过编 译的指令。 例如, 字符串变量 thisVar. replace的值 Monday就可以是经 过 Unicode编码的一段指令。  It should be noted that in malicious code, the value of a string variable itself may also be a compiled instruction. For example, the value of the string variable thisVar. replace can be a Unicode-encoded instruction.
其中, 所述 居所述解混淆进程, 获得所述 JavaScript脚本代码对 应的代码信息可以通过插桩注入的方法对解混淆进程进行监测以获得所 述 JavaScript脚本代码对应的代码信息, 具体插桩注入的过程请参见后 面附图 4及对应的文字描述。 并且, 所述操作码和字符串变量这两种不同 类型的参数是解混淆过程的不同阶段中会出现的中间参数, 因而, 通过监 测解混淆进程的各个步骤就能够获取到操作码和字符串变量两种不同类 型的参数。 The code information corresponding to the JavaScript script code can be obtained by the instrumentation method to obtain the code information corresponding to the JavaScript script code, and the specific plugging injection is performed. Please refer to the process Figure 4 and the corresponding text description. Moreover, the two different types of parameters, the opcode and the string variable, are intermediate parameters that may occur in different stages of the de-aliasing process. Therefore, the opcode and the string can be obtained by monitoring various steps of the de-obfuscation process. Variables are two different types of parameters.
103、 根据所述代码信息的类型对应的检测规则 , 检测所述 JavaScript脚本代码是否为恶意代码。  103. Detect whether the JavaScript script code is malicious code according to a detection rule corresponding to the type of the code information.
值得说明的是, 基于代码信息类型的不同, 本发明实施例提供的检测 方法也不相同,根据代码信息的类型为操作码、代码信息的类型为字符串、 以及代码信息的类型为操作码和字符串三种情况,所述步骤 103根据所述 代码信息的类型对应的检测规则, 检测所述 JavaScript脚本代码是否为 恶意代码分别可以由以下三种检测方法实现, 具体为:  It should be noted that the detection method provided by the embodiment of the present invention is different according to the type of the code information. The type of the code information is the operation code, the type of the code information is a character string, and the type of the code information is the operation code and In the three cases of the string, the step 103 is performed according to the detection rule corresponding to the type of the code information, and detecting whether the JavaScript script code is malicious code can be implemented by the following three detection methods, specifically:
第一种方法如图 2所示, 包括:  The first method is shown in Figure 2, including:
al031、 如果所述代码信息的类型为操作码, 则在已存储的恶意操作 码特征库中匹配所述 JavaScript脚本代码对应的操作码; 若在已存储的 恶意操作码特征库中匹配到所述操作码, 则执行步骤 al 032; 若在已存储 的恶意操作码特征库中未匹配到所述操作码, 则执行步骤 al 033。  Al031, if the type of the code information is an operation code, matching an operation code corresponding to the JavaScript script code in a stored malicious operation code feature library; if the code is matched in the stored malicious operation code feature library The operation code is executed in step a 032; if the operation code is not matched in the stored malicious operation code feature library, step a033 is performed.
al032、 确定所述 JavaScript脚本代码为恶意代码。  Al032, determining that the JavaScript script code is malicious code.
al033、 确定所述 JavaScript脚本代码不是恶意代码。  Al033, determining that the JavaScript script code is not malicious code.
其中, 所述恶意操作码特征库为 PDF文件中的所述 JavaScript脚本 代码实施恶意漏洞利用所进行的操作码序列构成的模式库, 典型针对 C V E - 2009 - 0927漏洞的特征模式实例如下:  The malicious operation code feature library is a pattern library formed by the operation code sequence performed by the JavaScript script code in the PDF file, and the characteristic pattern of the V V E - 2009 - 0927 vulnerability is as follows:
getmethod- " get Icon"  Getmethod- " get Icon"
getgvar " var_ 1 "  Getgvar " var_ 1 "
call  Call
第二种方法如图 3所示, 包括:  The second method is shown in Figure 3, including:
bl031、 如果所述代码信息的类型为字符串变量, 获取所述 JavaScript 脚本代码对应的字符串变量的长度; 若所述字符串变量的长 度位于第一区间, 则执行步骤 bl 032; 若所述字符串变量的长度位于第二 区间, 则执行步骤 bl 034。 Bl031, if the type of the code information is a string variable, obtaining a length of a string variable corresponding to the JavaScript script code; if the string variable is long If the degree is in the first interval, step bl 032 is performed; if the length of the string variable is in the second interval, step bl 034 is performed.
bl032、 获取所述字符串变量对应的第一特征参量。  Bl032: Acquire a first feature parameter corresponding to the string variable.
bl033、 根据栈溢出检测模型和所述第一特征参量, 判断所述 JavaScript脚本代码是否为恶意代码。  Bl033, determining, according to the stack overflow detection model and the first feature parameter, whether the JavaScript script code is malicious code.
bl034、 获取所述字符串变量对应的第二特征参量。  Bl034: Obtain a second feature parameter corresponding to the string variable.
bl035、 根据堆喷射检测模型和所述第二特征参量, 判断所述 JavaScript脚本代码是否为恶意代码。  Bl035, determining whether the JavaScript script code is malicious code according to the heap injection detection model and the second characteristic parameter.
第三种方法: 如果所述代码信息的类型为操作码和字符串变量, 根据 所述操作码对应的检测规则, 检测所述 JavaScript脚本代码是否为恶意 代码; 以及, 根据所述字符串变量对应的检测规则,检测所述 JavaScript 脚本代码是否为恶意代码。 当根据所述操作码对应的检测规则, 确定所述 JavaScript 脚本代码为恶意代码; 或根据所述字符串变量对应的检测规 则,确定所述 JavaScript脚本代码为恶意代码时,则确定所述 JavaScript 脚本代码为恶意代码; 当根据所述操作码对应的检测规则, 确定所述 JavaScript 脚本代码不为恶意代码, 且根据所述字符串变量对应的检测 规则, 确定所述 JavaScript 脚本代码不为恶意代码时, 则确定所述 JavaScript脚本代码不为恶意代码。  a third method: if the type of the code information is an operation code and a string variable, detecting whether the JavaScript script code is malicious code according to a detection rule corresponding to the operation code; and, corresponding to the string variable Detection rules that detect if the JavaScript script code is malicious code. Determining, according to the detection rule corresponding to the operation code, the JavaScript script code as malicious code; or determining, according to the detection rule corresponding to the string variable, that the JavaScript script code is malicious code, determining the JavaScript script The code is a malicious code; when it is determined that the JavaScript script code is not malicious code according to the detection rule corresponding to the operation code, and according to the detection rule corresponding to the string variable, determining that the JavaScript script code is not malicious code , determining that the JavaScript script code is not malicious code.
在上述第一种方法中,所述已存储的恶意操作码特征库均为本发明所 处技术领域中已确认具有恶意行为的 JavaScript脚本代码对应的操作码 对应的特征,这些特征的来源可以为各个权威机构公开的恶意操作码特征 等。 在本发明实施例中, 已存储的恶意操作码特征库并不是固定不变的, 可以根据需要按照一定周期进行更新。  In the above first method, the stored malicious operation code feature library is a feature corresponding to the operation code corresponding to the JavaScript script code that has confirmed malicious behavior in the technical field of the present invention, and the source of the feature may be Malicious operational code features disclosed by various authorities. In the embodiment of the present invention, the stored malicious operation code feature library is not fixed, and may be updated according to a certain period according to requirements.
在上述第二种方法中, 第一区间和第二区间为分别针对栈溢出、 堆喷 射两种恶意代码攻击方式设置的, 其设置方法可以参照经验值, 一般情况 下, 第一区间可设置为 32-64K字节, 第二区间可设置为大于 64K字节。  In the second method, the first interval and the second interval are respectively set for two types of malicious code attack modes, such as stack overflow and heap injection. The setting method may refer to an empirical value. In general, the first interval may be set to 32-64K bytes, the second interval can be set to be larger than 64K bytes.
在上述第二种方法中,所述第一特征参量至少可以包括 GetPC指令出 现频率、 花指令出现频率、 是否包含已知脱壳代码指紋中的一种或多种的 组合; 所述第二特征参量至少可以包括字符串信息熵值、 NOP指令出现频 率中的一种或多种的组合。 GetPC指令是指 She 11 code中用于定位自身虚 拟地址的指令; 花指令是用于干扰反汇编引擎正确实现反汇编的代码; 字 符串中 出现 GetPC 指令和花指令的频率可作为字符串中是否存在 shellcode 的部分依据;脱壳代码指紋是指加壳 shellcode在执行时总会 自行脱壳, 这些脱壳代码的特征即为脱壳代码指紋, 该指紋的存在可作为 存在 shellcode的部分依据;字符串信息熵是衡量字符串信息量大小的指 标,如果字符串信息熵小于某一阀值,则可能存在堆喷射; NOP指令为 CPU 空操作指令, 当串中包含大量 NOP指令时, 则该段 NOP指令可能为堆喷射 shellcode的前导代码 (Slidge)。 In the above second method, the first characteristic parameter may at least include a GetPC command a combination of one or more of a current frequency, a frequency of occurrence of a flower instruction, and a fingerprint of a known husking code; the second characteristic parameter may include at least one of a string information entropy value, a frequency of occurrence of a NOP instruction, or A variety of combinations. The GetPC instruction refers to the instruction used to locate its own virtual address in the She 11 code; the flower instruction is the code used to interfere with the disassembly engine correctly implementing the disassembly; the frequency at which the GetPC instruction and the flower instruction appear in the string can be used as a string There is a partial basis of the shellcode; the shelling code fingerprint means that the shelled shellcode will always unpack itself when it is executed. The characteristics of these shelling codes are the shelling code fingerprint, and the existence of the fingerprint can be used as part of the existence of the shellcode; The string information entropy is an indicator for measuring the amount of string information. If the string information entropy is less than a certain threshold, there may be a heap injection; the NOP instruction is a CPU empty operation instruction, and when the string contains a large number of NOP instructions, the segment The NOP instruction may spawn the shellcode's leading code (Slidge) for the heap.
其中, 所述字符串变量对应的第一特征参量的获取可以利用 GetPC 指令匹配来识别出字符串变量中包含 GetPC类指令的频率,利用花指令匹 配来识别出字符串变量中包含花指令的频率,利用脱壳代码器指紋匹配来 识别出字符串变量中是否包含已知脱壳代码指紋;所述字符串变量对应的 第二特征参量的获取可以利用通用的信息熵值计算公式计算得出字符串 变量的信息熵值, 并以其偏离统计平均信息熵值的程度确定其异常度, 利 用 NOP指令匹配来识别出字符串中包含 NOP指令的频率。  The obtaining of the first characteristic parameter corresponding to the string variable may use the GetPC instruction matching to identify the frequency of the GetPC class instruction in the string variable, and use the flower instruction matching to identify the frequency of the flower variable included in the string variable. Using the fingerprint matching of the shelling code to identify whether the string variable contains a known shelling code fingerprint; the second characteristic parameter corresponding to the string variable can be calculated by using a general information entropy value calculation formula. The information entropy value of the string variable is determined by the degree of deviation from the statistical average information entropy value, and the NOP instruction matching is used to identify the frequency of the NOP instruction in the string.
在上述第二种方法中,栈溢出检测模型和堆喷射检测模型都是预先训 练好的, 栈溢出检测模型可选择 GetPC指令出现频率、 花指令出现频率、 是否包含已知脱壳代码指紋为特征向量, 并使用标准数据集进行训练, 得 到栈溢出检测模型对应的阔值, 例如, GetPC指令出现最低频率、 花指令 出现最低频率、 包含已知脱壳代码指紋等。 堆喷射检测模型可选择信息熵 值、 NOP指令出现频率为特征向量, 并使用标准数据集进行训练, 得到堆 喷射检测模型对应的阔值, 例如, 信息熵最小值、 NOP指令出现最小频率 等。 在实际检测过程中, 当第一特征参量中的一个参数或多个参数超过栈 溢出检测模型对应的阔值, 则认定 JavaScript脚本代码为恶意代码, 否 则认定 JavaScript脚本代码不为恶意代码; 当第二特征参量中的一个参 数或多个参数超过栈溢出检测模型对应的阔值, 则认定 JavaScript脚本 代码为恶意代码, 否则认定 JavaScript脚本代码不为恶意代码。 In the above second method, the stack overflow detection model and the heap injection detection model are all pre-trained, and the stack overflow detection model can select the frequency of occurrence of the GetPC instruction, the frequency of occurrence of the flower instruction, and whether or not the fingerprint of the known shelling code is included. The vector, and trained using the standard data set, obtains the threshold corresponding to the stack overflow detection model, for example, the lowest frequency of the GetPC instruction, the lowest frequency of the flower instruction, and the fingerprint of the known shelling code. The heap injection detection model can select the information entropy value and the NOP instruction occurrence frequency as the feature vector, and use the standard data set to train, and obtain the threshold corresponding to the heap injection detection model, for example, the minimum information entropy and the minimum frequency of the NOP instruction. In the actual detection process, when one parameter or multiple parameters of the first characteristic parameter exceed the threshold corresponding to the stack overflow detection model, the JavaScript script code is determined to be malicious code, It is determined that the JavaScript script code is not malicious code; when one parameter or multiple parameters in the second feature parameter exceeds the threshold corresponding to the stack overflow detection model, the JavaScript script code is determined to be malicious code, otherwise the JavaScript script code is determined not to be malicious. Code.
在上述第三种方法中, 根据所述操作码对应的检测规则, 检测所述 JavaScript 脚本代码的实现方法具体为: 在已存储的恶意操作码特征库 中匹配所述 JavaScript脚本代码对应的操作码; 若在已存储的恶意操作 码特征库中匹配到所述操作码, 则确定所述 JavaScript脚本代码为恶意 代码; 若在已存储的恶意操作码特征库中未匹配到所述操作码, 则确定所 述 JavaScript脚本代码不是恶意代码。  In the foregoing third method, the method for detecting the JavaScript script code is specifically: matching the operation code corresponding to the JavaScript script code in the stored malicious operation code feature library according to the detection rule corresponding to the operation code And if the opcode is matched in the stored malicious opcode signature database, determining that the JavaScript script code is malicious code; if the opcode is not matched in the stored malicious opcode signature database, It is determined that the JavaScript script code is not malicious code.
根据所述字符串变量对应的检测规则, 检测所述 JavaScript脚本代 码是否为恶意代码的实现方式具体为获取所述 JavaScript脚本代码对应 的字符串变量的长度; 若所述字符串变量的长度位于第三区间, 则获取所 述字符串变量对应的第三特征参量;根据栈溢出检测模型和所述第三特征 参量, 判断所述 JavaScript脚本代码是否为恶意代码; 若所述字符串变 量的长度位于第四区间, 则获取所述字符串变量对应的第四特征参量; 根 据堆喷射检测模型和所述第四特征参量, 判断所述 JavaScript脚本代码 是否为恶意代码。  And determining, according to the detection rule corresponding to the string variable, whether the JavaScript script code is a malicious code, specifically obtaining a length of a string variable corresponding to the JavaScript script code; if the length of the string variable is at a a third interval, the third feature parameter corresponding to the string variable is obtained; and according to the stack overflow detection model and the third feature parameter, determining whether the JavaScript script code is malicious code; if the length of the string variable is located And obtaining a fourth feature parameter corresponding to the string variable; and determining, according to the heap injection detection model and the fourth feature parameter, whether the JavaScript script code is a malicious code.
其中所述第三特征参量至少可以包括 GetPC指令出现频率、花指令出 现频率、 是否包含已知脱壳代码指紋中的一种或多种的组合; 所述第四特 征参量至少可以包括字符串信息熵值、 NOP指令出现频率中的一种或多种 的组合。  The third characteristic parameter may include at least a combination of a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and whether or not one or more of known fingerprint codes are included; the fourth characteristic parameter may include at least string information. A combination of one or more of the entropy value, the frequency of occurrence of the NOP instruction.
值得说明的是, 根据所述操作码对应的检测规则, 检测所述 JavaScript 脚本代码可直接使用上述第一种方法, 具体包括步骤 al031 至步骤 al 033, 而根据所述字符串变量对应的检测规则, 检测所述 JavaScript脚本代码则可直接使用上述第二种方法,具体包括步骤 M031 至步骤 bl 035。 因此, 在第三种方法中描述的第三区间可使用上述第二种 方法中的第一区间的设置,第四区间可使用上述第二种方法中的第二区间 的设置。栈溢出检测模型和堆喷射检测模型也可以相应使用第二种方法中 的模型。 It is to be noted that, according to the detection rule corresponding to the operation code, detecting the JavaScript script code may directly use the first method, specifically including the steps a1 to 033, and according to the detection rule corresponding to the string variable. The detecting the JavaScript script code may directly use the second method, specifically including step M031 to step bl 035. Therefore, the third interval described in the third method may use the setting of the first interval in the second method described above, and the fourth interval may use the second interval in the second method described above. setting. The stack overflow detection model and the heap injection detection model can also use the model in the second method accordingly.
在本发明实施例中, 在对 PDF文件进行检测之前, 还需要对预定的支 持 PDF 标准的脚本解释器进行插桩注入处理, 用以获取 PDF 文件中的 JavaScript脚本代码对应的代码信息等, 在本实施例中以 PDF 阅读器内 嵌的脚本解释器为例进行说明, 如图 4所示, 其具体流程为:  In the embodiment of the present invention, before the detection of the PDF file, the predetermined script interpreter supporting the PDF standard is required to perform the instrumentation processing for obtaining the code information corresponding to the JavaScript script code in the PDF file. In this embodiment, a script interpreter embedded in a PDF reader is taken as an example. As shown in FIG. 4, the specific process is as follows:
201、 启动预定的 PDF阅读器的应用程序进程。  201. Start an application process of a predetermined PDF reader.
202、 将库文件插桩注入在预定的 PDF阅读器内嵌的脚本解释器的解 混淆进程中。  202. Injecting the library file into the solution confusion process of the script interpreter embedded in the predetermined PDF reader.
其中, 所述库文件为预先写好的 dll格式的文件, 用于获取所述预定 的 PDF脚本解释器在解混淆 JavaScript脚本代码进程中产生的被解混淆 的 JavaScript脚本代码对应的代码信息。 将库文件插桩注入进程中就是 将具有特定功能的 dll 文件的执行进程添加在一个当前正在运行的进程 中, 但不并不影响正在运行的进程的正常工作状态。  The library file is a pre-written dll format file, and is used to obtain code information corresponding to the decongested JavaScript script code generated by the predetermined PDF script interpreter in the process of disambiguating the JavaScript script code. Injecting library files into the process is to add the execution process of a dll file with a specific function to a currently running process, but does not affect the normal working state of the running process.
值得说明的是,插桩注入的位置需要根据预定的 PDF阅读器本身提供 的 API来进行选择。 例如, 如果要获取 JavaScript脚本代码对应的操作 码,则需要获取到预定的 PDF阅读器中能够输出操作码的 API进行插桩注 入才行。  It is worth noting that the position of the instrumentation injection needs to be selected according to the API provided by the predetermined PDF reader itself. For example, if you want to get the opcode corresponding to the JavaScript script code, you need to get the API that can output the opcode in the predetermined PDF reader for instrumentation.
203、 对注入的库文件进行初始化运行。  203. Initialize the injected library file.
以上步骤 201至 203的执行是为步骤 102的执行的必要步骤, 但是, 步骤 201至 203只需要在启动预定的 PDF阅读器的应用程序进程时执行 一次即可, 在后续对 PDF文件进行检测的过程中并不需要再次执行。  The execution of the above steps 201 to 203 is a necessary step for the execution of the step 102, but the steps 201 to 203 only need to be executed once when the application process of the predetermined PDF reader is started, and the subsequent detection of the PDF file is performed. It does not need to be executed again during the process.
进一步, 值得说明的是, 上述栈溢出检测模型和堆喷射检测模型需要 在启动预定的 PDF 阅读器的应用程序进程之前建立, 并且可以在后续对 PDF文件进行检测的过程中一直使用。  Further, it is worth noting that the stack overflow detection model and the heap injection detection model described above need to be established before starting the application process of the predetermined PDF reader, and can be used in the subsequent process of detecting the PDF file.
另外, 值得说明的是, 在执行步骤 103之后, 若确定所述 JavaScript 脚本代码为恶意代码,还可以获取 JavaScript脚本代码对应的明文代码, 并将恶意代码的检测 告和明文代码进行关联, 例如, JavaScript 脚本 代码对应的操作码为恶意操作码则在明文代码中与所述恶意操作码对应 的位置进行标示, 用以方便技术人员进行研究和整合。 其中, 明文代码的 获取方法与确定所述 JavaScript脚本代码对应的代码信息的实现方法相 同, In addition, it is worth noting that after executing step 103, if it is determined that the JavaScript script code is malicious code, the plaintext code corresponding to the JavaScript script code may also be obtained. And the detection of the malicious code is associated with the plaintext code. For example, the operation code corresponding to the JavaScript script code is a malicious operation code, and the position corresponding to the malicious operation code in the plaintext code is marked for convenient technical research. And integration. The method for obtaining the plaintext code is the same as the method for determining the code information corresponding to the JavaScript script code.
在本实施例中, 通过监测预定的支持 PDF标准的脚本解释器, 如 PDF 阅读器内嵌的脚本解释器, 对所述 JavaScript脚本代码运行的解混淆进 程来获得所述 JavaScript脚本代码对应的代码信息, 并根据不同类型的 代码信息对应的检测规则来检测所述 JavaScript脚本代码是否为恶意代 码, 相比于现有技术不能有效地识别出通过 PDF 文件传播的恶意 JavaScript脚本代码, 能够准确地检测出 PDF文件中的恶意 JavaScr ipt 代码, 提高了网络资源的安全性。  In this embodiment, the code corresponding to the JavaScript script code is obtained by monitoring a predetermined script interpreter supporting the PDF standard, such as a script interpreter embedded in the PDF reader, to obtain a code corresponding to the JavaScript script code. Information, and detecting whether the JavaScript script code is malicious code according to a detection rule corresponding to different types of code information, which can accurately detect malicious JavaScript script code transmitted through a PDF file compared to the prior art. The malicious JavaScr ipt code in the PDF file improves the security of network resources.
本发明实施例还提供了一种恶意代码的检测装置, 可实现上述如图 1 至图 4所示的方法步骤。  The embodiment of the invention further provides a device for detecting malicious code, which can implement the method steps shown in FIG. 1 to FIG. 4 above.
该装置如图 5所示, 包括:  The device is shown in Figure 5 and includes:
解混淆模块 31, 用于提取 PDF文件中的 JavaScript脚本代码; 启动 预定的支持 PDF标准的脚本解释器, 如 PDF阅读器内嵌的脚本解释器, 对 所述 JavaScript脚本代码运行解混淆进程, 并根据所述解混淆进程, 获 得所述 JavaScript脚本代码对应的代码信息, 所述代码信息的类型包括 操作码和 /或字符串变量。  The confusing module 31 is configured to extract the JavaScript script code in the PDF file; start a predetermined script interpreter supporting the PDF standard, such as a script interpreter embedded in the PDF reader, and run a de-confusing process on the JavaScript script code, and Obtaining code information corresponding to the JavaScript script code according to the de-obfuming process, the type of the code information includes an operation code and/or a string variable.
检测模块 32, 用于根据所述解混淆模块 31获得的代码信息的类型对 应的检测规则, 检测所述 JavaScript脚本代码是否为恶意代码。  The detecting module 32 is configured to detect, according to the detection rule corresponding to the type of the code information obtained by the de-obfuscating module 31, whether the JavaScript script code is malicious code.
可选的是, 如图 6所示, 所述装置还包括:  Optionally, as shown in FIG. 6, the device further includes:
插桩注入模块 33, 用于将库文件插桩注入在所述脚本解释器对所述 JavaScript 脚本代码运行的解混淆进程中, 所述库文件用于获取所述脚 本解释器在解混淆 JavaScript 脚本代码进程中产生的被解混淆的 J a V a S c r i p t脚本代码对应的代码信息。 可选的是, 如图 7所示, 所述检测模块 32包括: An instrumentation injection module 33, configured to inject a library file into a de-obfuscation process run by the script interpreter on the JavaScript script code, where the library file is used to obtain the script interpreter in a solution confusion JavaScript script The code information corresponding to the confusing J a V a Script code generated in the code process. Optionally, as shown in FIG. 7, the detecting module 32 includes:
第一匹配单元 321, 用于如果所述代码信息的类型为操作码, 在已存 储的恶意操作码特征库中匹配所述 JavaScript脚本代码对应的操作码。  The first matching unit 321, is configured to match the operation code corresponding to the JavaScript script code in the stored malicious operation code feature library if the type of the code information is an operation code.
第一确定单元 322, 用于在所述第一匹配单元 321在已存储的恶意操 作码特征库中匹配到所述操作码时, 确定所述 JavaScript脚本代码为恶 意代码;以及用于在所述第一匹配单元 321在已存储的恶意操作码特征库 中未匹配到所述操作码时,确定所述 JavaScript脚本代码不是恶意代码。  a first determining unit 322, configured to: when the first matching unit 321 matches the opcode in the stored malicious opcode feature library, determine that the JavaScript script code is malicious code; The first matching unit 321 determines that the JavaScript script code is not malicious code when the operation code is not matched in the stored malicious operation code feature library.
可选的是, 如图 8所示, 所述检测模块 32包括:  Optionally, as shown in FIG. 8, the detecting module 32 includes:
第一获取单元 323, 用于如果确定所述代码信息的类型为字符串变 量, 获取所述 JavaScript脚本代码对应的字符串变量的长度。  The first obtaining unit 323 is configured to obtain a length of the string variable corresponding to the JavaScript script code if it is determined that the type of the code information is a string variable.
第一判断单元 324, 用于在所述第一获取单元 323获取的字符串变量 的长度位于第一区间时, 则获取所述字符串变量对应的第一特征参量; 根 据栈溢出检测模型和所述第一特征参量, 判断所述 JavaScript脚本代码 是否为恶意代码; 以及用于在所述第一获取单元 323获取的字符串变量 的长度位于第二区间时, 则获取所述字符串变量对应的第二特征参量; 根 据堆喷射检测模型和所述第二特征参量, 判断所述 JavaScript脚本代码 是否为恶意代码。  The first determining unit 324 is configured to: when the length of the string variable acquired by the first acquiring unit 323 is in the first interval, acquire the first feature parameter corresponding to the string variable; Determining, by the first feature parameter, whether the JavaScript script code is malicious code; and when the length of the string variable acquired by the first obtaining unit 323 is located in the second interval, acquiring the string variable corresponding to a second characteristic parameter; determining whether the JavaScript script code is malicious code according to the heap injection detection model and the second characteristic parameter.
可选的是, 所述第一特征参量至少可以包括 GetPC指令出现频率、 花 指令出现频率、 是否包含已知脱壳代码指紋中的一种或多种的组合; 所述 第二特征参量至少可以包括字符串信息熵值、 NOP指令出现频率中的一种 或多种的组合。 GetPC指令是指 Shellcode中用于定位自身虚拟地址的指 令; 花指令是用于干扰反汇编引擎正确实现反汇编的代码; 字符串中出现 GetPC指令和花指令的频率可作为字符串中是否存在 shellcode的部分依 据;脱壳代码指紋是指加壳 shellcode在执行时总会自行脱壳, 这些脱壳 代码的特征即为脱壳代码指紋,该指紋的存在可作为存在 shellcode的部 分依据; 字符串信息熵是衡量字符串信息量大小的指标, 如果字符串信息 熵小于某一阀值, 则可能存在堆喷射; NOP指令为 CPU空操作指令, 当串 中包含大量 NOP指令时,则该段 NOP指令可能为堆喷射 shellcode的前导 代码 (Slidge)。 Optionally, the first characteristic parameter may include at least a combination of a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and whether or not one or more of known fingerprint codes are included; the second characteristic parameter may be at least A combination of one or more of a string information entropy value and a NOP instruction occurrence frequency. The GetPC instruction refers to the instruction in the shellcode for locating its own virtual address; the flower instruction is the code used to interfere with the disassembly engine correctly implementing the disassembly; the frequency of the GetPC instruction and the flower instruction in the string can be used as the shell code in the string. Partial basis; shelling code fingerprint means that the shelled shellcode will always shell itself when it is executed. The characteristics of these shelling codes are the shelling code fingerprint. The existence of the fingerprint can be used as part of the existence of shellcode; Entropy is an indicator to measure the amount of string information. If the string information entropy is less than a certain threshold, there may be heap injection; the NOP instruction is a CPU empty operation instruction, when the string When a large number of NOP instructions are included, the NOP instruction may be the leading code of the shell injection shellcode (Slidge).
可选的是, 所述检测模块 32具体用于如果所述代码信息的类型为操 作码和字符串变量, 根据所述操作码对应的检测规则, 检测所述 Optionally, the detecting module 32 is specifically configured to detect, according to the detection rule corresponding to the operation code, if the type of the code information is an operation code and a string variable.
JavaScript 脚本代码是否为恶意代码; 以及, 根据所述字符串变量对应 的检测规则, 检测所述 JavaScript脚本代码是否为恶意代码。 Whether the JavaScript script code is malicious code; and detecting whether the JavaScript script code is malicious code according to a detection rule corresponding to the string variable.
以及,用于当根据所述操作码对应的检测规则,确定所述 JavaScript 脚本代码为恶意代码或根据所述字符串变量对应的检测规则, 确定所述 JavaScript脚本代码为恶意代码时, 确定所述 JavaScript脚本代码为恶 意代码。  And determining, when the JavaScript script code is determined to be malicious code according to the detection rule corresponding to the operation code, or determining the JavaScript script code as malicious code according to the detection rule corresponding to the string variable, determining the The JavaScript script code is malicious code.
以及,用于当根据所述操作码对应的检测规则,确定所述 JavaScript 脚本代码不为恶意代码且根据所述字符串变量对应的检 :j规则,确定所述 JavaScript脚本代码不为恶意代码时, 确定所述 JavaScript脚本代码不 为恶意代码;  And determining, when the JavaScript script code is not malicious code according to the detection rule corresponding to the operation code, and determining that the JavaScript script code is not malicious code according to the check: j rule corresponding to the string variable , determining that the JavaScript script code is not malicious code;
如图 9所示, 所述检测模块 32进一步包括:  As shown in FIG. 9, the detecting module 32 further includes:
第二匹配单元 325, 用于在已存储的恶意操作码特征库中匹配所述 JavaScript脚本代码对应的操作码。  The second matching unit 325 is configured to match an operation code corresponding to the JavaScript script code in the stored malicious operation code feature library.
第二确定单元 326, 用于在所述第二匹配单元 325确定在已存储的恶 意操作码特征库中匹配到所述操作码时, 确定所述 JavaScript脚本代码 为恶意代码;以及用于在所述第二匹配单元 325确定在已存储的恶意操作 码特征库中未匹配到所述操作码时, 确定所述 JavaScript脚本代码不是 恶意代码。  a second determining unit 326, configured to: when the second matching unit 325 determines that the operation code is matched in the stored malicious operation code feature library, determine that the JavaScript script code is malicious code; The second matching unit 325 determines that the JavaScript script code is not malicious code when the opcode is not matched in the stored malicious opcode signature database.
如图 9所示, 所述检测模块进一步还可以包括:  As shown in FIG. 9, the detecting module may further include:
第二获取单元 327, 用于获取所述 JavaScript脚本代码对应的字符 串变量的长度。  The second obtaining unit 327 is configured to obtain a length of the string variable corresponding to the JavaScript script code.
第二判断单元 328, 用于在所述第二获取单元 327获取的所述字符串 变量的长度位于第三区间, 则获取所述字符串变量对应的第三特征参量; 根据栈溢出检测模型和所述第三特征参量, 判断所述 JavaScript脚本代 码是否为恶意代码; 以及用于在所述第二获取单元 327获取的所述字符 串变量的长度位于第四区间时, 获取所述字符串变量对应的第四特征参 量; 根据堆喷射检测模型和所述第四特征参量, 判断所述 JavaScript脚 本代码是否为恶意代码。 a second determining unit 328, configured to acquire, in the third interval, the length of the string variable acquired by the second acquiring unit 327, and acquire a third feature parameter corresponding to the string variable; Determining, according to the stack overflow detection model and the third feature parameter, whether the JavaScript script code is malicious code; and when the length of the string variable acquired by the second obtaining unit 327 is in the fourth interval, Obtaining a fourth feature parameter corresponding to the string variable; and determining, according to the heap injection detection model and the fourth feature parameter, whether the JavaScript script code is malicious code.
可选的是, 所述第三特征参量至少可以包括 GetPC指令出现频率、 花 指令出现频率、 是否包含已知脱壳代码指紋中的一种或多种的组合; 所述 第四特征参量至少可以包括字符串信息熵值、 N 0 P指令出现频率中的一种 或多种的组合。  Optionally, the third characteristic parameter may include at least a combination of a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and whether or not one or more of known fingerprint codes are included; the fourth characteristic parameter may be at least A combination of one or more of a string information entropy value and a N 0 P instruction occurrence frequency.
在本实施例中,恶意代码的检测装置通过监测预定的支持 PDF标准的 脚本解释器, 如 PDF阅读器内嵌的脚本解释器, 对所述 JavaScript脚本 代码运行的解混淆进程来获得所述 JavaScript 脚本代码对应的代码信 息, 并根据不同类型的代码信息对应的检测规则来检测所述 JavaScript 脚本代码是否为恶意代码, 相比于现有技术不能有效地识别出通过 PDF 文件传播的恶意 JavaScript脚本代码, 能够准确地检测出 PDF文件中的 恶意 JavaScript代码, 提高了网络资源的安全性。  In the present embodiment, the detecting means of the malicious code obtains the JavaScript by monitoring a predetermined script interpreter supporting the PDF standard, such as a script interpreter embedded in the PDF reader, to perform a de-obfuscation process on the JavaScript script code. The code information corresponding to the script code, and detecting whether the JavaScript script code is malicious code according to a detection rule corresponding to different types of code information, and the malicious JavaScript script code propagated through the PDF file cannot be effectively recognized compared to the prior art. , can accurately detect malicious JavaScript code in PDF files, and improve the security of network resources.
本发明实施例还提供了一种检测设备,可实现上述如图 1至图 4所示 的方法步骤。  The embodiment of the invention further provides a detecting device, which can implement the method steps shown in FIG. 1 to FIG. 4 above.
该设备如图 10所示, 包括处理器 ( processor ) 41和存储器 42。 存 储器 42可以包括随机存取存储器 (RAM) 等。 所述存储器 42被配置存储 程序代码(code); 所述处理器 41被配置读取所述存储器中存储的程序代 码, 从而执行方法实施例中的各步骤。 所述处理器 41 与所述存储器 42 通过总线进行通信。 所述存储器 42还用于存储 PDF文件中的 JavaScript脚本代码和所述 J a V a S c r i p t脚本代码对应的代码信息。  The device, as shown in FIG. 10, includes a processor 41 and a memory 42. The memory 42 may include random access memory (RAM) or the like. The memory 42 is configured to store program code; the processor 41 is configured to read program code stored in the memory to perform the steps in the method embodiments. The processor 41 communicates with the memory 42 via a bus. The memory 42 is further configured to store the JavaScript script code in the PDF file and the code information corresponding to the J a V a S c r i p t script code.
所述处理器 41,用于提取存储器 42存储的 PDF文件中的 JavaScript 脚本代码; 启动预定的支持 PDF标准的脚本解释器对所述 JavaScript脚 本代码运行解混淆进程, 并根据所述解混淆进程, 获得所述 JavaScript 脚本代码对应的代码信息, 所述代码信息的类型包括操作码和字符串变 量。 The processor 41 is configured to extract a JavaScript script code in a PDF file stored in the memory 42; and start a predetermined script interpreter supporting the PDF standard to the JavaScript foot The code runs a solution confusing process, and obtains code information corresponding to the JavaScript script code according to the solution confusing process. The type of the code information includes an operation code and a string variable.
所述存储器 42, 还用于存储库文件。  The memory 42 is also used to store library files.
可选的是, 所述处理器 41, 还用于将所述存储器 42存储的库文件插 桩注入在所述支持 PDF标准的脚本解释器,例如预定的 PDF阅读器内嵌的 脚本解释器, 对所述 JavaScript脚本代码运行的解混淆进程中, 所述库 文件用于获取所述脚本解释器在解混淆 JavaScript脚本代码进程中产生 的被解混淆的 JavaScript脚本代码对应的代码信息。  Optionally, the processor 41 is further configured to inject a library file stored in the memory 42 into the script interpreter supporting the PDF standard, such as a script interpreter embedded in a predetermined PDF reader. In the process of disambiguating the JavaScript script code, the library file is used to obtain code information corresponding to the confusing JavaScript script code generated by the script interpreter in the process of disambiguating the JavaScript script code.
可选的是, 所述处理器 41, 用于如果所述存储器 42存储的所述代码 信息的类型为操作码, 在已存储的恶意操作码特征库中 匹配所述 JavaScript 脚本代码对应的操作码; 若在已存储的恶意操作码特征库中 匹配到所述操作码, 则确定所述 JavaScript脚本代码为恶意代码; 若在 已存储的恶意操作码特征库中未匹配到所述操作码, 则确定所述 JavaScript脚本代码不是恶意代码。  Optionally, the processor 41 is configured to: if the type of the code information stored by the memory 42 is an operation code, match an operation code corresponding to the JavaScript script code in a stored malicious operation code feature database. And if the opcode is matched in the stored malicious opcode signature database, determining that the JavaScript script code is malicious code; if the opcode is not matched in the stored malicious opcode signature database, It is determined that the JavaScript script code is not malicious code.
所述存储器 42, 用于存储恶意操作码特征库。  The memory 42 is configured to store a malicious operation code feature library.
可选的是, 所述处理器 41, 用于如果所述存储器 42存储的所述代码 信息的类型为字符串变量时, 获取所述 JavaScript脚本代码对应的字符 串变量的长度; 若所述字符串变量的长度位于第一区间, 则获取所述字符 串变量对应的第一特征参量; 根据栈溢出检测模型和所述第一特征参量, 判断所述 JavaScript脚本代码是否为恶意代码; 若所述字符串变量的长 度位于第二区间, 则获取所述字符串变量对应的第二特征参量; 根据堆喷 射检测模型和所述第二特征参量, 判断所述 JavaScript脚本代码是否为 恶意代码。  Optionally, the processor 41 is configured to: if the type of the code information stored by the memory 42 is a string variable, obtain a length of a string variable corresponding to the JavaScript script code; And determining, by the stack overflow detection model and the first feature parameter, whether the JavaScript script code is a malicious code; The length of the string variable is located in the second interval, and the second feature parameter corresponding to the string variable is obtained. According to the heap injection detection model and the second feature parameter, whether the JavaScript script code is malicious code is determined.
所述存储器 42, 用于存储所述 JavaScript脚本代码对应的字符串的 长度、 第一特征参量、 第二特征参量、 第一区间、 第二区间、 栈溢出检测 模型、 堆喷射检测模型。 其中, 所述第一特征参量至少可以包括 GetPC指令出现频率、 花指令 出现频率、 是否包含已知脱壳代码指紋中的一种或多种的组合; 所述第二 特征参量至少可以包括字符串信息熵值、 NOP指令出现频率中的一种或多 种的组合。 The memory 42 is configured to store a length of a character string corresponding to the JavaScript script code, a first feature parameter, a second feature parameter, a first interval, a second interval, a stack overflow detection model, and a heap injection detection model. The first characteristic parameter may include at least a combination of a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and whether or not one or more of known fingerprint codes are included; the second characteristic parameter may include at least a character string. A combination of one or more of an information entropy value, a frequency of occurrence of a NOP instruction.
可选的是, 所述处理器 41, 用于如果所述存储器 42存储的所述代码 信息的类型为操作码和字符串变量, 根据所述操作码对应的检测规则, 检 测所述 JavaScript脚本代码是否为恶意代码; 以及,  Optionally, the processor 41 is configured to: if the type of the code information stored by the memory 42 is an operation code and a string variable, detect the JavaScript script code according to a detection rule corresponding to the operation code. Whether it is malicious code; and,
根据所述字符串变量对应的检测规则, 检测所述 JavaScript脚本代 码是否为恶意代码; 当根据所述操作码对应的检测规则, 确定所述 JavaScript 脚本代码为恶意代码, 或根据所述字符串对应的检测规则, 确定所述 JavaScript脚本代码为恶意代码时,确定所述 JavaScript脚本 代码为恶意代码; 当根据所述操作码对应的检测规则, 确定所述 JavaScript 脚本代码不为恶意代码, 且根据所述字符串变量对应的检测 规则, 确定所述 JavaScript 脚本代码不为恶意代码时, 确定所述 JavaScript脚本代码不为恶意代码。  Determining, according to the detection rule corresponding to the string variable, whether the JavaScript script code is malicious code; determining, according to the detection rule corresponding to the operation code, that the JavaScript script code is a malicious code, or corresponding to the string according to the string a detection rule, when determining that the JavaScript script code is malicious code, determining that the JavaScript script code is malicious code; determining that the JavaScript script code is not malicious code according to a detection rule corresponding to the operation code, and The detection rule corresponding to the string variable determines that the JavaScript script code is not malicious code when it is determined that the JavaScript script code is not malicious code.
进一步的, 所述处理器 41根据所述操作码对应的检测规则, 检测所 述 JavaScript脚本代码是否为恶意代码的实现方法具体包括:  Further, the method for detecting, by the processor 41, whether the JavaScript script code is malicious code according to the detection rule corresponding to the operation code includes:
在已存储的恶意操作码特征库中匹配所述 JavaScript脚本代码对应 的操作码; 若在已存储的恶意操作码特征库中匹配到所述操作码, 则确定 所述 JavaScript脚本代码为恶意代码; 若在已存储的恶意操作码特征库 中未匹配到所述操作码,则确定所述 JavaScript脚本代码不是恶意代码; 所述处理器 41 根据所述字符串变量对应的检测规则, 检测所述 JavaScript脚本代码是否为恶意代码的实现方法具体包括:  Matching an operation code corresponding to the JavaScript script code in the stored malicious operation code feature database; if the operation code is matched in the stored malicious operation code feature library, determining that the JavaScript script code is a malicious code; If the opcode is not matched in the stored malicious opcode signature database, determining that the JavaScript script code is not malicious code; the processor 41 detecting the JavaScript according to the detection rule corresponding to the string variable Whether the script code is a malicious code implementation method specifically includes:
获取所述 JavaScript脚本代码对应的字符串变量的长度; 若所述字 符串变量的长度位于第三区间,则获取所述字符串变量对应的第三特征参 量; 根据栈溢出检测模型和所述第三特征参量, 判断所述 JavaScript脚 本代码是否为恶意代码; 若所述字符串变量的长度位于第四区间, 则获 取所述字符串变量对应的第四特征参量;根据堆喷射检测模型和所述第四 特征参量, 判断所述 JavaScript脚本代码是否为恶意代码。 Obtaining a length of the string variable corresponding to the JavaScript script code; if the length of the string variable is in the third interval, acquiring a third feature parameter corresponding to the string variable; according to the stack overflow detection model and the a third characteristic parameter, determining whether the JavaScript script code is malicious code; if the length of the string variable is in the fourth interval, And taking a fourth characteristic parameter corresponding to the string variable; and determining, according to the heap injection detection model and the fourth characteristic parameter, whether the JavaScript script code is a malicious code.
可选地, 所述第三特征参量至少可以包括 GetPC指令出现频率、花指 令出现频率、 是否包含已知脱壳代码指紋中的一种或多种的组合; 所述第 四特征参量至少可以包括字符串信息熵值、 NOP指令出现频率中的一种或 多种的组合。  Optionally, the third characteristic parameter may include at least a combination of a frequency of occurrence of a GetPC instruction, a frequency of occurrence of a flower instruction, and whether or not one or more of known fingerprint codes are included; the fourth characteristic parameter may include at least A combination of one or more of a string information entropy value and a NOP instruction occurrence frequency.
在本实施例中,检测设备通过监测预定的 PDF阅读器内嵌的脚本解释 器对所述 JavaScript脚本代码运行的解混淆进程来获得所述 JavaScript 脚本代码对应的代码信息,并根据不同类型的代码信息对应的检测规则来 检测所述 JavaScript脚本代码是否为恶意代码, 相比于现有技术不能有 效地识别出通过 PDF文件传播的恶意 JavaScript脚本代码, 能够准确地 检测出待检测 PDF文件中的恶意 JavaScript代码, 提高了网络资源的安 全性。  In this embodiment, the detecting device obtains code information corresponding to the JavaScript script code by monitoring a de-obfuscation process of the JavaScript script code running by a script interpreter embedded in a predetermined PDF reader, and according to different types of codes. The detection rule corresponding to the information is used to detect whether the JavaScript script code is malicious code, and the malicious JavaScript script code transmitted through the PDF file cannot be effectively recognized compared with the prior art, and the malicious file in the PDF file to be detected can be accurately detected. JavaScript code that improves the security of network resources.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到 本发明可借助软件加必需的通用硬件的方式来实现, 当然也可以通过硬 件, 但很多情况下前者是更佳的实施方式。 基于这样的理解, 本发明的技 术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式 体现出来, 该计算机软件产品存储在可读取的存储介质中, 如计算机的软 盘, 硬盘或光盘等, 包括若干指令用以使得一台计算机设备(可以是个人 计算机, 服务器, 或者网络设备等) 执行本发明各个实施例所述的方法。 以上所述,仅为本发明的具体实施方式, 但本发明的保护范围并不局限于 此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易 想到变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保 护范围应以所述权利要求的保护范围为准。  Through the description of the above embodiments, those skilled in the art can clearly understand that the present invention can be implemented by means of software plus necessary general hardware, and of course, by hardware, but in many cases, the former is a better implementation. . Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a readable storage medium, such as a floppy disk of a computer. A hard disk or optical disk or the like includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention. The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Accordingly, the scope of the invention should be determined by the scope of the appended claims.

Claims

权 利 要 求 书 claims
1、 一种恶意代码的检测方法, 其特征在于, 包括: 1. A malicious code detection method, characterized by including:
提取便携文件格式 PDF文件中的 JavaScript脚本代码; Extract JavaScript script code from Portable Document Format PDF files;
启动预定的支持 PDF标准的脚本解释器对所述 JavaScript脚本代码运 行解混淆进程, 并根据所述解混淆进程, 获得所述 JavaScript脚本代码对 应的代码信息, 所述代码信息的类型包括操作码和字符串变量; Start a predetermined script interpreter that supports the PDF standard to run a deobfuscation process on the JavaScript script code, and obtain the code information corresponding to the JavaScript script code according to the deobfuscation process. The types of the code information include operation codes and string variable;
根据所述代码信息的类型对应的检测规则,检测所述 JavaScript脚本 代码是否为恶意代码。 According to the detection rules corresponding to the type of the code information, whether the JavaScript script code is malicious code is detected.
2、 根据权利要求 1所述的方法, 其特征在于, 在所述启动预定的支持 PDF标准的脚本解释器对所述 JavaScript脚本代码运行解混淆进程之前, 还包括: 2. The method according to claim 1, characterized in that, before starting a predetermined script interpreter that supports the PDF standard to run a deobfuscation process on the JavaScript script code, it also includes:
将库文件插桩注入在所述脚本解释器对所述 JavaScript 脚本代码运 行的解混淆进程中, 所述库文件用于获取所述脚本解释器在解混淆 JavaScript 脚本代码进程中产生的被解混淆 JavaScript 脚本代码对应的 代码信息。 The library file is instrumented and injected into the deobfuscation process that the script interpreter runs on the JavaScript script code. The library file is used to obtain the deobfuscated data generated by the script interpreter during the process of deobfuscating the JavaScript script code. Code information corresponding to JavaScript script code.
3、 根据权利要求 1或 2所述的方法, 其特征在于, 如果所述代码信息 的类型为操作码, 则所述根据所述代码信息的类型对应的检测规则, 检测 所述 JavaScript脚本代码是否为恶意代码包括: 3. The method according to claim 1 or 2, characterized in that, if the type of the code information is an operation code, the detection rule corresponding to the type of the code information detects whether the JavaScript script code Malicious code includes:
在已存储的恶意操作码特征库中匹配所述 JavaScript 脚本代码对应 的操作码; Match the opcode corresponding to the JavaScript script code in the stored malicious opcode signature library;
若在已存储的恶意操作码特征库中匹配到所述操作码, 则确定所述 JavaScript脚本代码为恶意代码; If the operation code is matched in the stored malicious operation code feature library, the JavaScript script code is determined to be malicious code;
若在已存储的恶意操作码特征库中未匹配到所述操作码, 则确定所述 JavaScript脚本代码不是恶意代码。 If the operation code is not matched in the stored malicious operation code feature library, it is determined that the JavaScript script code is not malicious code.
4、 根据权利要求 1或 2所述的方法, 其特征在于, 如果所述代码信息 的类型为字符串变量, 则所述根据所述代码信息的类型对应的检测规则, 检测所述 JavaScript脚本代码是否为恶意代码包括: 获取所述 JavaScript脚本代码对应的字符串变量的长度; 若所述字符串变量的长度位于第一区间, 则获取所述字符串变量对应 的第一特征参量; 根据栈溢出检测模型和所述第一特征参量, 判断所述 4. The method according to claim 1 or 2, characterized in that, if the type of the code information is a string variable, the detection rule corresponding to the type of the code information detects the JavaScript script code. Whether the code is malicious includes: Obtain the length of the string variable corresponding to the JavaScript script code; If the length of the string variable is located in the first interval, obtain the first characteristic parameter corresponding to the string variable; According to the stack overflow detection model and the third A characteristic parameter to determine the
JavaScript脚本代码是否为恶意代码; Whether the JavaScript script code is malicious code;
若所述字符串变量的长度位于第二区间, 则获取所述字符串变量对应 的第二特征参量; 根据堆喷射检测模型和所述第二特征参量, 判断所述 If the length of the string variable is located in the second interval, the second characteristic parameter corresponding to the string variable is obtained; according to the heap injection detection model and the second characteristic parameter, determine the
JavaScript脚本代码是否为恶意代码。 Whether the JavaScript script code is malicious code.
5、 根据权利要求 4所述的方法, 其特征在于, 所述第一特征参量包括 GetPC 指令出现频率、 花指令出现频率、 是否包含已知脱壳代码指紋中的 一种或多种的组合; 所述第二特征参量包括字符串信息熵值、 NOP 指令出 现频率中的一种或多种的组合。 5. The method according to claim 4, wherein the first characteristic parameter includes one or more combinations of GetPC instruction occurrence frequency, Flower instruction occurrence frequency, and whether it contains known shelling code fingerprints; The second characteristic parameter includes one or a combination of one or more of string information entropy value and NOP instruction frequency.
6、 根据权利要求 1或 2所述的方法, 其特征在于, 如果所述代码信息 的类型为操作码和字符串, 则所述根据所述代码信息的类型对应的检测规 则, 检测所述 JavaScript脚本代码是否为恶意代码包括: 6. The method according to claim 1 or 2, characterized in that, if the type of the code information is an operation code and a string, the detection rule corresponding to the type of the code information detects the JavaScript Whether the script code is malicious code includes:
根据所述操作码对应的检测规则, 检测所述 JavaScript脚本代码; 以 及, Detect the JavaScript script code according to the detection rules corresponding to the operation code; and,
根据所述字符串对应的检测规则, 检测所述 JavaScript脚本代码; 当根据所述操作码对应的检测规则,确定所述 JavaScript脚本代码为 恶意代码, 或根据所述字符串变量对应的检测规则, 确定所述 JavaScript 脚本代码为恶意代码时, 则确定所述 JavaScript脚本代码为恶意代码; 当根据所述操作码对应的检测规则,确定所述 JavaScript脚本代码不 为恶意代码,且根据所述字符串变量对应的检测规则,确定所述 JavaScript 脚本代码不为恶意代码时, 则确定所述 JavaScript 脚本代码不为恶意代 码; The JavaScript script code is detected according to the detection rule corresponding to the string; when the JavaScript script code is determined to be malicious code according to the detection rule corresponding to the operation code, or according to the detection rule corresponding to the string variable, When it is determined that the JavaScript script code is malicious code, it is determined that the JavaScript script code is malicious code; when it is determined that the JavaScript script code is not malicious code according to the detection rules corresponding to the operation code, and according to the string According to the detection rules corresponding to the variables, when it is determined that the JavaScript script code is not malicious code, it is determined that the JavaScript script code is not malicious code;
其中, 所述 居所述操作码对应的检测规则, 检测所述 JavaScript脚 本代码包括: Among them, the detection rules corresponding to the operation code and the detection of the JavaScript script code include:
在已存储的恶意操作码特征库中匹配所述 JavaScript 脚本代码对应 的操作码; 若在已存储的恶意操作码特征库中匹配到所述操作码, 则确定 所述 JavaScript脚本代码为恶意代码;若在已存储的恶意操作码特征库中 未匹配到所述操作码, 则确定所述 JavaScript脚本代码不是恶意代码; 所述根据所述字符串变量对应的检测规则,检测所述 JavaScript脚本 代码包括: Match the JavaScript script code corresponding to the stored malicious opcode signature library operation code; if the operation code is matched in the stored malicious operation code signature library, the JavaScript script code is determined to be malicious code; if the operation code is not matched in the stored malicious operation code signature library code, it is determined that the JavaScript script code is not malicious code; According to the detection rules corresponding to the string variables, detecting the JavaScript script code includes:
获取所述 JavaScript脚本代码对应的字符串变量的长度;若所述字符 串变量的长度位于第三区间,则获取所述字符串变量对应的第三特征参量; 根据栈溢出检测模型和所述第三特征参量,判断所述 JavaScript脚本代码 是否为恶意代码; 若所述字符串变量的长度位于第四区间, 则获取所述字 符串变量对应的第四特征参量;根据堆喷射检测模型和所述第四特征参量, 判断所述 JavaScript脚本代码是否为恶意代码。 Obtain the length of the string variable corresponding to the JavaScript script code; if the length of the string variable is located in the third interval, obtain the third characteristic parameter corresponding to the string variable; According to the stack overflow detection model and the third Three characteristic parameters to determine whether the JavaScript script code is malicious code; if the length of the string variable is in the fourth interval, obtain the fourth characteristic parameter corresponding to the string variable; according to the heap injection detection model and the The fourth characteristic parameter is to determine whether the JavaScript script code is malicious code.
7、 根据权利要求 6所述的方法, 其特征在于, 所述第三特征参量包括 GetPC 指令出现频率、 花指令出现频率、 是否包含已知脱壳代码指紋中的 一种或多种的组合; 所述第四特征参量包括字符串信息熵值、 NOP 指令出 现频率中的一种或多种的组合。 7. The method according to claim 6, characterized in that the third characteristic parameter includes one or more combinations of GetPC instruction occurrence frequency, Flower instruction occurrence frequency, and whether it contains known shelling code fingerprints; The fourth characteristic parameter includes one or a combination of one or more of string information entropy value and NOP instruction frequency.
8、 一种恶意代码的检测装置, 其特征在于, 包括: 8. A malicious code detection device, characterized by including:
解混淆模块, 用于提取 PDF文件中的 JavaScript脚本代码; 启动预定 的支持 PDF 标准的脚本解释器对所述 JavaScript 脚本代码运行解混淆进 程, 并根据所述解混淆进程, 获得所述 JavaScript脚本代码对应的代码信 息, 所述代码信息的类型包括操作码和字符串变量; The deobfuscation module is used to extract the JavaScript script code in the PDF file; start a predetermined script interpreter that supports the PDF standard to run the deobfuscation process on the JavaScript script code, and obtain the JavaScript script code according to the deobfuscation process. Corresponding code information, the type of code information includes operation codes and string variables;
检测模块, 用于根据所述解混淆模块获得的代码信息的类型对应的检 测规则, 检测所述 JavaScript脚本代码是否为恶意代码。 A detection module, configured to detect whether the JavaScript script code is malicious code according to the detection rules corresponding to the type of code information obtained by the deobfuscation module.
9、 根据权利要求 8所述的装置, 其特征在于, 所述装置还包括: 插桩注入模块, 用于将库文件插桩注入在所述脚本解释器对所述 JavaScript脚本代码运行的解混淆进程中, 所述库文件用于获取所述脚本 解释器在解混淆 JavaScript 脚本代码进程中产生的被解混淆 JavaScript 脚本代码对应的代码信息。 9. The device according to claim 8, characterized in that, the device further includes: an instrumentation injection module, configured to instrument and inject library files into the deobfuscation of the JavaScript script code run by the script interpreter. During the process, the library file is used to obtain code information corresponding to the deobfuscated JavaScript script code generated by the script interpreter during the deobfuscating JavaScript script code process.
10、 根据权利要求 8或 9所述的装置, 其特征在于, 则所述检测模块 包括: 10. The device according to claim 8 or 9, characterized in that, the detection module includes:
第一匹配单元, 用于如果所述代码信息的类型为操作码, 在已存储的 恶意操作码特征库中匹配所述 JavaScript脚本代码对应的操作码; The first matching unit is configured to, if the type of the code information is an operation code, match the operation code corresponding to the JavaScript script code in the stored malicious operation code feature library;
第一确定单元, 用于在所述第一匹配单元在已存储的恶意操作码特征 库中匹配到所述操作码时, 确定所述 JavaScript脚本代码为恶意代码; 以 及用于在所述第一匹配单元在已存储的恶意操作码特征库中未匹配到所述 操作码时, 确定所述 JavaScript脚本代码不是恶意代码。 a first determination unit, configured to determine that the JavaScript script code is a malicious code when the first matching unit matches the operation code in the stored malicious operation code feature library; and to determine that the JavaScript script code is a malicious code in the first When the matching unit does not match the operation code in the stored malicious operation code feature library, it determines that the JavaScript script code is not malicious code.
11、 根据权利要求 8或 9所述的装置, 其特征在于, 则所述检测模块 包括: 11. The device according to claim 8 or 9, characterized in that, the detection module includes:
第一获取单元, 用于如果所述代码信息的类型为字符串变量, 获取所 述 JavaScript脚本代码对应的字符串变量的长度; The first acquisition unit is used to obtain the length of the string variable corresponding to the JavaScript script code if the type of the code information is a string variable;
第一判断单元, 用于在所述第一获取单元获取的字符串变量的长度位 于第一区间时, 则获取所述字符串变量对应的第一特征参量; 根据栈溢出 检测模型和所述第一特征参量,判断所述 JavaScript脚本代码是否为恶意 代码; 以及用于在所述第一获取单元获取的字符串变量的长度位于第二区 间时, 则获取所述字符串变量对应的第二特征参量; 根据堆喷射检测模型 和所述第二特征参量, 判断所述 JavaScript脚本代码是否为恶意代码。 The first judgment unit is configured to obtain the first characteristic parameter corresponding to the string variable when the length of the string variable obtained by the first acquisition unit is located in the first interval; according to the stack overflow detection model and the third A characteristic parameter to determine whether the JavaScript script code is malicious code; and used to obtain the second characteristic corresponding to the string variable when the length of the string variable obtained by the first acquisition unit is located in the second interval Parameters; Determine whether the JavaScript script code is malicious code according to the heap injection detection model and the second characteristic parameter.
12、 根据权利要求 11所述的装置, 其特征在于, 所述第一特征参量包 括 GetPC指令出现频率、 花指令出现频率、 是否包含已知脱壳代码指紋中 的一种或多种的组合; 所述第二特征参量包括字符串信息熵值、 NOP 指令 出现频率中的一种或多种的组合。 12. The device according to claim 11, wherein the first characteristic parameter includes one or more combinations of GetPC instruction occurrence frequency, Flower instruction occurrence frequency, and whether it contains known shelling code fingerprints; The second characteristic parameter includes one or a combination of one or more of string information entropy value and NOP instruction frequency.
13、 根据权利要求 8或 9所述的装置, 其特征在于, 所述检测模块具 体用于如果所述代码信息的类型为操作码和字符串变量, 根据所述操作码 对应的检测规则, 检测所述 JavaScript脚本代码; 以及, 根据所述字符串 变量对应的检测规则, 检测所述 JavaScript脚本代码; 13. The device according to claim 8 or 9, characterized in that, the detection module is specifically configured to detect if the type of the code information is an operation code and a string variable, according to the detection rule corresponding to the operation code. The JavaScript script code; and, detect the JavaScript script code according to the detection rule corresponding to the string variable;
以及, 用于当根据所述操作码对应的检测规则, 确定所述 JavaScript 脚本代码为恶意代码或根据所述字符串变量对应的检测规则, 确定所述 and, used to determine the JavaScript according to the detection rule corresponding to the operation code The script code is malicious code or it is determined based on the detection rules corresponding to the string variable.
JavaScript 脚本代码为恶意代码时, 确定所述 JavaScript 脚本代码为恶 意代码; When the JavaScript script code is malicious code, determine that the JavaScript script code is malicious code;
以及, 用于当根据所述操作码对应的检测规则, 确定所述 JavaScript 脚本代码不为恶意代码且根据所述字符串变量对应的检测规则, 确定所述 JavaScript 脚本代码不为恶意代码时, 确定所述 JavaScript 脚本代码不 为恶意代码; And, when it is determined that the JavaScript script code is not malicious code according to the detection rules corresponding to the operation code and it is determined that the JavaScript script code is not malicious code according to the detection rules corresponding to the string variable, determine The JavaScript script code is not malicious code;
所述检测模块进一步包括: The detection module further includes:
第二匹配单元,用于在已存储的恶意操作码中匹配所述 JavaScript脚 本代码对应的操作码; a second matching unit, configured to match the operation code corresponding to the JavaScript script code among the stored malicious operation codes;
第二确定单元, 用于在所述第二匹配单元确定在已存储的恶意操作码 特征库中匹配到所述操作码时, 确定所述 JavaScript 脚本代码为恶意代 码; 以及用于在所述第二匹配单元确定在已存储的恶意操作码特征库中未 匹配到所述操作码时, 确定所述 JavaScript脚本代码不是恶意代码; a second determination unit, configured to determine that the JavaScript script code is a malicious code when the second matching unit determines that the operation code is matched in the stored malicious operation code feature library; and for determining that the JavaScript script code is malicious code; and When the second matching unit determines that the operation code is not matched in the stored malicious operation code feature library, it determines that the JavaScript script code is not malicious code;
所述检测模块进一步还包括: The detection module further includes:
第二获取单元,用于获取所述 JavaScript脚本代码对应的字符串变量 的长度; The second acquisition unit is used to acquire the length of the string variable corresponding to the JavaScript script code;
第二判断单元, 用于在所述第二获取单元获取的所述字符串变量的长 度位于第三区间, 则获取所述字符串变量对应的第三特征参量; 根据栈溢 出检测模型和所述第三特征参量,判断所述 JavaScript脚本代码是否为恶 意代码; 以及用于在所述第二获取单元获取的所述字符串变量的长度位于 第四区间时, 获取所述字符串变量对应的第四特征参量; 根据堆喷射检测 模型和所述第四特征参量, 判断所述 JavaScript 脚本代码是否为恶意代 码。 The second judgment unit is used to obtain the third characteristic parameter corresponding to the string variable when the length of the string variable obtained in the second acquisition unit is located in the third interval; According to the stack overflow detection model and the The third characteristic parameter is used to determine whether the JavaScript script code is malicious code; and is used to obtain the third value corresponding to the string variable when the length of the string variable obtained by the second acquisition unit is located in the fourth interval. Four characteristic parameters; According to the heap injection detection model and the fourth characteristic parameter, determine whether the JavaScript script code is malicious code.
14、 根据权利要求 13所述的装置, 其特征在于, 所述第三特征参量包 括 GetPC指令出现频率、 花指令出现频率、 是否包含已知脱壳代码指紋中 的一种或多种的组合; 所述第四特征参量包括字符串信息熵值、 NOP 指令 出现频率中的一种或多种的组合。 14. The device according to claim 13, wherein the third characteristic parameter includes one or more combinations of GetPC instruction occurrence frequency, Flower instruction occurrence frequency, and whether it contains known shelling code fingerprints; The fourth characteristic parameters include string information entropy value, NOP instruction One or more combinations of frequencies.
15、 一种检测设备, 其特征在于, 包括存储器和处理器, 其中: 所述存储器被配置存储代码; 15. A detection device, characterized in that it includes a memory and a processor, wherein: the memory is configured to store codes;
所述处理器被配置读取所述存储 中存储的代码, 执行如权利要求 1 至 7任一所述的方法。 The processor is configured to read the code stored in the storage and execute the method according to any one of claims 1 to 7.
PCT/CN2012/086302 2012-12-10 2012-12-10 Method and apparatus for detecting malicious code WO2014089744A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201280002026.9A CN103221960B (en) 2012-12-10 2012-12-10 The detection method of malicious code and device
PCT/CN2012/086302 WO2014089744A1 (en) 2012-12-10 2012-12-10 Method and apparatus for detecting malicious code

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2012/086302 WO2014089744A1 (en) 2012-12-10 2012-12-10 Method and apparatus for detecting malicious code

Publications (1)

Publication Number Publication Date
WO2014089744A1 true WO2014089744A1 (en) 2014-06-19

Family

ID=48818191

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/086302 WO2014089744A1 (en) 2012-12-10 2012-12-10 Method and apparatus for detecting malicious code

Country Status (2)

Country Link
CN (1) CN103221960B (en)
WO (1) WO2014089744A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3557466A4 (en) * 2016-12-19 2020-07-22 Telefonica Digital, S.L.U. Method and system for detecting malicious programs integrated into an electronic document

Families Citing this family (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902905B (en) * 2013-12-17 2017-02-15 哈尔滨安天科技股份有限公司 Malicious code generator identification method and system based on software structure cluster
CN104134019A (en) * 2014-07-25 2014-11-05 北京奇虎科技有限公司 Script virus detection method and device
CN104462986B (en) * 2014-11-28 2019-02-01 北京奇虎科技有限公司 The detection method and device that loophole threatens are triggered in PDF
CN104715195B (en) * 2015-03-12 2017-11-03 广东电网有限责任公司信息中心 Malicious code detection system and method based on dynamic pitching pile
CN106156120B (en) * 2015-04-07 2020-02-28 阿里巴巴集团控股有限公司 Method and device for classifying character strings
US10803165B2 (en) * 2015-06-27 2020-10-13 Mcafee, Llc Detection of shellcode
CN105117332B (en) * 2015-08-19 2018-08-14 电子科技大学 A kind of detection method of stack overflow position
CN105468972B (en) * 2015-11-17 2018-11-30 四川神琥科技有限公司 A kind of mobile terminal document detection method
CN105224873B (en) * 2015-11-17 2018-06-08 四川神琥科技有限公司 A kind of smart machine document authentication method
CN105243327B (en) * 2015-11-17 2018-08-31 四川神琥科技有限公司 A kind of secure file processing method
CN106855925B (en) * 2015-12-09 2020-02-18 中国电信股份有限公司 Stack injection detection method and device
CN106897211A (en) * 2015-12-21 2017-06-27 阿里巴巴集团控股有限公司 For the localization method and system of obscuring script
CN107203707B (en) * 2016-03-16 2020-05-12 阿里巴巴集团控股有限公司 Method and system for implementing program code confusion
CN105868630A (en) * 2016-03-24 2016-08-17 中国科学院信息工程研究所 Malicious PDF document detection method
CN107292168A (en) * 2016-03-30 2017-10-24 阿里巴巴集团控股有限公司 Detect method and device, the server of program code
CN106096405B (en) * 2016-04-26 2019-07-05 浙江工业大学 A kind of Android malicious code detecting method abstract based on Dalvik instruction
CN106022132A (en) * 2016-05-30 2016-10-12 南京邮电大学 Real-time webpage Trojan detection method based on dynamic content analysis
CN108062474B (en) * 2016-11-08 2022-01-11 阿里巴巴集团控股有限公司 File detection method and device
CN108171055A (en) * 2016-12-08 2018-06-15 武汉安天信息技术有限责任公司 A kind of remote control malicious code behavior triggering method and system
CN106650449B (en) * 2016-12-29 2020-05-22 哈尔滨安天科技集团股份有限公司 Script heuristic detection method and system based on variable name confusion degree
CN108664791B (en) * 2017-03-29 2023-05-16 腾讯科技(深圳)有限公司 Method and device for detecting back door of webpage in hypertext preprocessor code
CN107609399A (en) * 2017-09-09 2018-01-19 北京工业大学 Malicious code mutation detection method based on NIN neutral nets
CN108694042B (en) * 2018-06-15 2021-08-31 福州大学 JavaScript code confusion resolution method in webpage
CN109344615B (en) * 2018-07-27 2023-02-17 北京奇虎科技有限公司 Method and device for detecting malicious command
CN109408810A (en) * 2018-09-28 2019-03-01 东巽科技(北京)有限公司 A kind of malice PDF document detection method and device
CN110866252A (en) * 2018-12-21 2020-03-06 北京安天网络安全技术有限公司 Malicious code detection method and device, electronic equipment and storage medium
CN112329012B (en) * 2019-07-19 2023-05-30 中国人民解放军战略支援部队信息工程大学 Detection method for malicious PDF document containing JavaScript and electronic device
CN110569032B (en) * 2019-09-16 2023-03-14 郑州昂视信息科技有限公司 Method and device for judging application label of script language interpreter
CN110806980A (en) * 2019-11-04 2020-02-18 深信服科技股份有限公司 Detection method, device, equipment and storage medium
CN111368303B (en) * 2020-03-12 2023-12-29 深信服科技股份有限公司 PowerShell malicious script detection method and device
CN111881047B (en) * 2020-07-30 2022-09-06 山石网科通信技术股份有限公司 Method and device for processing obfuscated script
CN112231701A (en) * 2020-09-29 2021-01-15 广州威尔森信息科技有限公司 PDF file processing method and device
CN112528282B (en) * 2020-12-14 2022-10-18 山东小葱数字科技有限公司 Method and device for anti-obfuscating code and electronic equipment
CN112613034B (en) * 2020-12-18 2022-12-02 北京中科网威信息技术有限公司 Malicious document detection method and system, electronic device and storage medium
CN114912114A (en) * 2022-05-11 2022-08-16 北京天融信网络安全技术有限公司 Malicious PDF document detection method, device, equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101359352A (en) * 2008-09-25 2009-02-04 中国人民解放军信息工程大学 API use action discovering and malice deciding method after confusion of multi-tier synergism
CN101482907A (en) * 2009-02-18 2009-07-15 中国科学技术大学 Main unit malice code behavior detection system based on expert system
CN102663284A (en) * 2012-03-21 2012-09-12 南京邮电大学 Malicious code identification method based on cloud computing
CN102663296A (en) * 2012-03-31 2012-09-12 杭州安恒信息技术有限公司 Intelligent detection method for Java script malicious code facing to the webpage

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7841010B2 (en) * 2007-01-08 2010-11-23 Apple Inc. Software or other information integrity verification using variable block length and selection
CN102043919B (en) * 2010-12-27 2012-11-21 北京安天电子设备有限公司 Universal vulnerability detection method and system based on script virtual machine
CN102708320B (en) * 2012-05-04 2015-05-06 北京奇虎科技有限公司 Method and device for recognition of virus APK (android package)

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101359352A (en) * 2008-09-25 2009-02-04 中国人民解放军信息工程大学 API use action discovering and malice deciding method after confusion of multi-tier synergism
CN101482907A (en) * 2009-02-18 2009-07-15 中国科学技术大学 Main unit malice code behavior detection system based on expert system
CN102663284A (en) * 2012-03-21 2012-09-12 南京邮电大学 Malicious code identification method based on cloud computing
CN102663296A (en) * 2012-03-31 2012-09-12 杭州安恒信息技术有限公司 Intelligent detection method for Java script malicious code facing to the webpage

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3557466A4 (en) * 2016-12-19 2020-07-22 Telefonica Digital, S.L.U. Method and system for detecting malicious programs integrated into an electronic document
US11301565B2 (en) 2016-12-19 2022-04-12 Telefonica Cybersecurity & Cloud Tech S.L.U. Method and system for detecting malicious software integrated in an electronic document

Also Published As

Publication number Publication date
CN103221960A (en) 2013-07-24
CN103221960B (en) 2016-05-25

Similar Documents

Publication Publication Date Title
WO2014089744A1 (en) Method and apparatus for detecting malicious code
US20240121266A1 (en) Malicious script detection
Aslan et al. Investigation of possibilities to detect malware using existing tools
US9135443B2 (en) Identifying malicious threads
EP2955658B1 (en) System and methods for detecting harmful files of different formats
US10140451B2 (en) Detection of malicious scripting language code in a network environment
Lu et al. De-obfuscation and detection of malicious PDF files with high accuracy
Carmony et al. Extract Me If You Can: Abusing PDF Parsers in Malware Detectors.
US10013555B2 (en) System and method for detecting harmful files executable on a virtual stack machine based on parameters of the files and the virtual stack machine
US9654486B2 (en) System and method for generating sets of antivirus records for detection of malware on user devices
CN102622543B (en) A kind of method and apparatus of dynamic detection malicious web pages script
WO2015101097A1 (en) Method and device for feature extraction
KR101874373B1 (en) A method and apparatus for detecting malicious scripts of obfuscated scripts
JP6687761B2 (en) Coupling device, coupling method and coupling program
CN104809391B (en) Buffer overflow attack detection device, method and security protection system
JP2009093615A (en) Method and device for analyzing exploit code in non-executable file using virtual environment
JPWO2019013266A1 (en) Determination device, determination method, and determination program
US10601867B2 (en) Attack content analysis program, attack content analysis method, and attack content analysis apparatus
US20160134652A1 (en) Method for recognizing disguised malicious document
Gupta et al. A client‐server JavaScript code rewriting‐based framework to detect the XSS worms from online social network
US10356108B2 (en) System and method of detecting malicious multimedia files
EP3333746B1 (en) System and method of execution of code by an interpreter
CN113886826A (en) Threat defense method and system based on anti-sandbox characteristics of malicious software
US20140283046A1 (en) Anti-malware scanning of database tables
SCHADE A CUCKOO’S EGG IN THE MALWARE NEST

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12890014

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12890014

Country of ref document: EP

Kind code of ref document: A1