WO2024079803A1

WO2024079803A1 - Vulnerability detection device, vulnerability detection method, and vulnerability detection program

Info

Publication number: WO2024079803A1
Application number: PCT/JP2022/037943
Authority: WO
Inventors: 利宣碓井; 裕平川古谷; 誠岩村
Original assignee: 日本電信電話株式会社
Priority date: 2022-10-11
Filing date: 2022-10-11
Publication date: 2024-04-18

Abstract

A vulnerability detection device (10) comprises: a virtual machine analysis unit (121) that analyzes a VM of a script engine; an instruction set architecture analysis unit (122) that analyzes an instruction set architecture, which is a VM instruction system, that collects VM instructions, and that determines the instruction content of the collected VM instructions; a calculation unit (123) that, on the basis of the architecture information acquired by the virtual machine analysis unit (121) and the instruction set architecture analysis unit (122), constructs a first control flow graph indicating an overall path; and a vulnerability detection unit (124) that alters an input value, that constructs a second control flow graph indicating a path which is executed as a test and in which the altered input value input to an analysis target script, that calculates code coverage, and that, on the basis of the code coverage calculation result, selects a value to input and performs fuzzing of the analysis target script executed on the VM.

Description

Vulnerability detection device, vulnerability detection method, and vulnerability detection program

The present invention relates to a vulnerability detection device, a vulnerability detection method, and a vulnerability detection program.

There is a technique called dynamic testing that can be used to discover potential defects in software. Dynamic testing includes software testing. Software testing is carried out by actually providing input values to the target program, running it, and observing its behavior.

One of the measures to measure whether this dynamic testing has been performed comprehensively is code coverage (also called code coverage rate). This evaluates the percentage of the code in the target program that has been tested.

Some dynamic tests use code coverage as an indicator to evaluate the progress of testing and to plan tests.

One of these is fuzzing, a dynamic testing technique for discovering latent vulnerabilities in software. Fuzzing is a technique that discovers vulnerabilities by repeatedly generating or mutating input values while running the target program and observing the state of the program to search for inputs that cause problems such as crashes.

In this case, what is used as the input value and how the input value is generated or mutated are important factors in determining the efficiency of vulnerability discovery. For example, if there is a path that can only be executed within a specific range of input values and a vulnerability exists there, it will take a long time to discover the vulnerability unless the input values required to follow that path can be found efficiently.

In the most primitive form of fuzzing, input values were generated randomly. In addition, black-box testing was used to observe the program only to see if a crash or other problem occurred.

However, this method cannot efficiently detect vulnerabilities in the cases mentioned above.

For this reason, a technique called grey box fuzzing is used. Unlike black box fuzzing, grey box fuzzing also observes the internal state of the program, i.e., the execution path, when it is executed. Grey box fuzzing then calculates the code coverage for the execution paths that have already been tested, and prioritizes the use of input values as seeds for generation and mutation so that this value is large.

Therefore, to implement gray-box fuzzing, it is necessary to observe the execution path and calculate code coverage.

Grey-box fuzzing allows for efficient searching for vulnerabilities in a wider range of programs.

In Non-Patent Document 1, gray box fuzzing is used to discover vulnerabilities in web applications created with PHP scripts. The method described in Non-Patent Document 1 measures code coverage by constructing an abstract syntax tree for the source code and implementing static instrumentation.

Non-Patent Document 2 employs gray box fuzzing to discover vulnerabilities in web applications written in JavaScript. In the method described in Non-Patent Document 2, an abstract syntax tree is constructed for the JavaScript source code and static instrumentation is performed to measure code coverage, just like Non-Patent Document 1.

However, the techniques described in Non-Patent Documents 1 and 2 have the problem that it is not possible to comprehensively know all possible branches, so the code coverage for the branches must be estimated from other information. In addition, the techniques described in Non-Patent Documents 1 and 2 are static instrumentation, so there is an issue that there are cases where code coverage cannot be measured, such as dynamically evaluated scripts.

In order to accurately measure the code coverage of a script, dynamic instrumentation is required. To achieve dynamic instrumentation for a script, it is necessary to instrument the bytecode. This generally requires the use of support functions such as a debugger provided by the script engine. This is because the internal specifications of the virtual machine (VM) in the script engine that controls the execution of the script are often not made public, making it difficult to observe the execution path and analyze the bytecode required to measure code coverage without support functions.

However, if such support functions are not provided, it is necessary to reverse engineer the VM to reveal its internal specifications, independently observe the execution path and analyze the bytecode, and obtain the information needed to calculate code coverage.

　Manually and individually analyzing, designing, and implementing this for script engines is not realistic given the amount of work involved.

The present invention has been made in consideration of the above, and aims to provide a vulnerability discovery device, a vulnerability discovery method, and a vulnerability discovery program that can realize gray box fuzzing based on accurate code coverage at runtime, without the need for manual individual analysis, design, and implementation, even for script engines that do not have support functions that can be used for code coverage measurement and whose internal specifications are unknown.

　To solve the above problems and achieve the object, the vulnerability discovery device of the present invention is characterized by having a first analysis unit that analyzes the virtual machine of a script engine, a second analysis unit that analyzes the instruction set architecture, which is the instruction system of the virtual machine, to collect virtual machine instructions and determine the instruction contents of the collected virtual machine instructions, a calculation unit that constructs a first control flow graph showing an entire path based on the architecture information acquired by the first analysis unit and the second analysis unit, and a vulnerability discovery unit that mutates an input value based on the architecture information acquired by the first analysis unit and the second analysis unit, constructs a second control flow graph showing a path executed as a test by inputting the mutated input value to a script to be analyzed, calculates code coverage, which is the ratio of the path executed as the test to the entire path, and selects an input value based on the calculation result of the code coverage to perform fuzzing on the script to be analyzed executed on the virtual machine.

According to the present invention, even for script engines that do not have support functions that can be used for code coverage measurement and whose internal specifications are unknown, it is possible to achieve gray-box fuzzing based on accurate code coverage at runtime without the need for individual manual analysis, design, and implementation.

FIG. 1 is a diagram illustrating an example of the configuration of a script engine. FIG. 2 is a diagram showing pseudo code of a VM included in the script engine. FIG. 3 is a diagram illustrating an example of the configuration of a vulnerability detecting device according to an embodiment. FIG. 4 is a diagram showing an example of a test script used to detect a virtual program counter (VPC). FIG. 5 is a diagram showing an example of a test script used for detecting a branch VM instruction. FIG. 6 is a diagram illustrating an example of an execution trace. FIG. 7 illustrates an example of a VM execution trace. FIG. 8 is a diagram illustrating the process of the VM instruction boundary detection unit. FIG. 9 is a diagram for explaining the process of the virtual program counter detection unit. FIG. 10 is a diagram illustrating the process of the dispatcher detection unit. FIG. 11 is a diagram illustrating the process of the code cache detection unit. FIG. 12 is a diagram illustrating the process of the VM command determination unit. FIG. 13 is a diagram illustrating the process of the VM branch trace construction unit. FIG. 14 is a diagram for explaining the control flow graph construction unit. FIG. 15 is a flowchart showing a processing procedure of the vulnerability discovering process according to the embodiment. FIG. 16 is a flowchart showing a processing procedure of the vulnerability discovering process according to the embodiment. FIG. 17 is a flowchart of the execution trace acquisition process shown in FIG. FIG. 18 is a flowchart illustrating a procedure of the VM instruction boundary detection process illustrated in FIG. FIG. 19 is a flowchart illustrating the processing procedure of the virtual program counter detection processing shown in FIG. FIG. 20 is a diagram for explaining the dispatcher detection process shown in FIG. FIG. 21 is a flowchart showing the procedure of the conditional branch flag detection process shown in FIG. FIG. 22 is a flowchart illustrating the procedure of the code cache detection process shown in FIG. FIG. 23 is a flowchart illustrating the procedure of the VM execution trace acquisition process illustrated in FIG. 15 . FIG. 24 is a flowchart illustrating the procedure of the VM command collection process illustrated in FIG. FIG. 25 is a flowchart illustrating a processing procedure of the VM command determination process shown in FIG. FIG. 26 is a flowchart showing the procedure of the multi-path execution process shown in FIG. FIG. 27 is a flowchart illustrating the procedure of the VM branch trace construction process illustrated in FIG. 15 . FIG. 28 is a flowchart of the control flow graph construction process shown in FIG. FIG. 29 is a flowchart showing the procedure of the mutation process shown in FIG. FIG. 30 is a flowchart showing the procedure of the execution process shown in FIG. FIG. 31 is a flowchart illustrating the procedure of the code coverage calculation process shown in FIG. FIG. 32 is a diagram showing an example of a computer that realizes a vulnerability discovering device by executing a program.

Below, embodiments of the vulnerability discovery device, vulnerability discovery method, and vulnerability discovery program according to the present application are described in detail with reference to the drawings. Furthermore, the present invention is not limited to the embodiments described below.

[Embodiment]
The vulnerability detection device of the embodiment is a vulnerability detection device that can realize gray box fuzzing based on accurate code coverage at runtime, even for script engines that do not have support functions that can be used for code coverage measurement and whose internal specifications are unknown, without requiring individual manual analysis, design, and implementation.

The vulnerability discovery device according to the embodiment executes test scripts while monitoring the binaries of the script engine, and obtains branch traces and memory access traces as execution traces. The vulnerability discovery device analyzes the VM based on the execution traces, and obtains, as architecture information, VM instruction boundaries, a virtual program counter (VPC), a dispatcher, a conditional branch flag, and a code cache in which executed VM instructions are stored.

Then, the vulnerability detection device executes the test script while monitoring the VPC and the dispatcher, and obtains a VM execution trace. By analyzing the VM execution trace, the vulnerability detection device collects VM instructions, determines the contents of the VM instructions, and obtains information on the instruction set architecture.

Then, based on the acquired architecture information, the vulnerability detection device constructs a first control flow graph that shows the entire path that was comprehensively executed in the multi-pass execution.

The vulnerability detection device then mutates the input values based on the acquired architecture information, and constructs a second control flow graph that shows the path that is executed by inputting the mutated input values into the script to be analyzed. The vulnerability detection device calculates code coverage, selects input values based on the code coverage calculation results, and performs fuzzing on the script to be analyzed that is being executed on the VM. Code coverage is the ratio of paths executed as tests to the total paths. At this time, the vulnerability detection device selectively chooses and mutates input values that will increase code coverage more based on the code coverage calculation results.

The configuration and function of a typical script engine will be described with reference to Figures 1 and 2. Figure 1 is a diagram for explaining an example of the configuration of a script engine. As shown in Figure 1, script engine 1 has a bytecode compiler 2 and a VM 3. Furthermore, bytecode compiler 2 has a syntax analysis unit 4 and a bytecode generation unit 5. Furthermore, VM 3 has a code cache unit 6, a fetch unit 7, a decode unit 8, and an execution unit 9. These fetch unit 7, decode unit 8, and execution unit 9 are executed repeatedly and are called an interpreter loop. Then, script engine 1 accepts the input of a script.

The syntax analysis unit 4 receives the script as input, and through lexical and syntactic analysis generates an Abstract Syntax Tree (AST), which it outputs to the bytecode generation unit 5. The bytecode generation unit 5 receives the AST as input, converts it into bytecode, and stores it in the code cache unit 6.

The fetch unit 7 fetches the VM opcode from the code cache unit 6 and outputs it to the decode unit 8. Here, the VM opcode refers to the opcode portion of the VM instruction. The decode unit 8 receives the VM opcode as input, interprets the VM opcode using a decoder/dispatcher, and dispatches it to the corresponding program. The execution unit 9 executes the program corresponding to the VM instruction. The contents written in the script are executed by executing the VM instructions one after another through a repeated interpreter loop.

The functions of the components of the script engine will be described with reference to Figure 2. Figure 2 is a diagram showing pseudocode for a VM in the script engine. As shown in Figure 2, the pseudocode first initializes the VPC (line 1). In the pseudocode, the while loop is the interpreter loop (line 2). In the pseudocode, the VM opcode pointed to by the VPC is obtained from the code cache (line 3), and is decoded and dispatched using a Switch statement (lines 4, 5, and 7). Then, in the pseudocode, the program corresponding to the VM opcode that was dispatched is executed (lines 6 and 8).

In addition, a branch VM command is a VM command that causes a branch to occur within a script, and a conditional branch flag is an area that holds a flag indicating whether or not a branch will be taken when a conditional branch occurs.

[Configuration of vulnerability detection device]
Next, the configuration of the vulnerability discovering device 10 according to the embodiment will be specifically described with reference to Fig. 3. Fig. 3 is a diagram illustrating an example of the configuration of the vulnerability discovering device according to the embodiment.

As shown in FIG. 3, the vulnerability discovery device 10 has an input unit 11, a control unit 12, a storage unit 13, and an output unit 14. The vulnerability discovery device 10 accepts inputs of a test script, a script engine binary, and a script to be analyzed.

The input unit 11 is composed of input devices such as a keyboard and a mouse, and accepts information input from the outside and inputs it to the control unit 12. The input unit 11 also has a communication interface for sending and receiving various information to and from other devices connected via a wired connection or a network, etc., and accepts input of information sent from other devices. The input unit 11 accepts input of test scripts, script engine binaries, and scripts to be analyzed, and outputs them to the control unit 12.

The test script is a script that is input when dynamically analyzing the script engine to obtain an execution trace and a VM execution trace. Details of the test script are described later. The script engine binary is an executable file that constitutes the script engine. The script engine binary may be composed of multiple executable files. The analysis target script is the script to be analyzed.

The control unit 12 has an internal memory for storing programs that define various processing procedures and the necessary data, and executes various processes using these. For example, the control unit 12 is an electronic circuit such as a CPU (Central Processing Unit) or an MPU (Micro Processing Unit). The control unit 12 has a virtual machine analysis unit 121 (first analysis unit), an instruction set architecture analysis unit 122 (second analysis unit), a calculation unit 123, and a vulnerability discovery unit 124.

The virtual machine analysis unit 121 analyzes the VM of the script engine. The virtual machine analysis unit 121 obtains multiple execution traces by changing the conditions at run time, analyzes the multiple execution traces using differential execution analysis, and obtains VPCs and conditional branch flags. The virtual machine analysis unit 121 also analyzes the script engine binary to obtain VM instruction boundaries and dispatchers. The virtual machine analysis unit 121 detects a code cache from the VM execution trace. The VM instructions to be executed are stored in the code cache.

The virtual machine analysis unit 121 has an execution trace acquisition unit 1211 (first acquisition unit), a VM instruction boundary detection unit 1212 (first detection unit), a virtual program counter detection unit 1213 (second detection unit), a dispatcher detection unit 1214 (third detection unit), a conditional branch flag detection unit 1215 (fourth detection unit), and a code cache detection unit 1216.

The execution trace acquisition unit 1211 accepts the test script and the script engine binary as input. The execution trace acquisition unit 1211 acquires an execution trace by executing the test script while monitoring the execution of the script engine binary.

An execution trace consists of a branch trace and a memory access trace. A branch trace records the type of branch instruction at the time of execution, the branch source address, and the branch destination address. A memory access trace records the type of memory operation and the memory address of the operation target. It is known that branch traces and memory access traces can be acquired by instruction hooks. The execution trace acquired by the execution trace acquisition unit 1211 is stored in the execution trace DB 131.

The VM instruction boundary detection unit 1212 clusters the execution traces to detect the boundaries of each VM instruction. The VM instruction boundary detection unit 1212 clusters the execution traces to detect clusters with a threshold or more of execution counts as VM instructions. In clustering, consecutive code regions that are executed multiple times are detected. For example, executed instructions that are close in distance to each other in the code may be grouped together, common subsequences of executed code blocks may be searched for, or other methods may be used. The vulnerability discovery device 10 detects the start and end points of consecutive instruction sequences that make up the detected VM instruction as boundaries. The VM instruction boundaries detected here are used in VPC detection and dispatcher detection.

The virtual program counter detection unit 1213 extracts and analyzes the execution trace for the first test script stored in the execution trace DB 131 to detect the VPC. The virtual program counter detection unit 1213 analyzes multiple execution traces using differential execution analysis focusing on the number of memory reads and the boundaries of each VM instruction detected by the VM instruction boundary detection unit 1212 to detect the VPC. The virtual program counter detection unit 1213 makes use of the fact that a read into the memory that holds the VPC always occurs after the execution of each VM instruction, and detects the VPC by discovering the destination of this read.

For this reason, the virtual program counter detection unit 1213 uses differential execution analysis that focuses on the number of memory reads to detect VPCs. The virtual program counter detection unit 1213 compares execution traces of multiple test scripts acquired using the test scripts, and finds memories whose memory read counts change in proportion to both the increase or decrease in the number of repetitions and the number of repeated statements. The virtual program counter detection unit 1213 then refers to the boundaries of each VM instruction detected by the VM instruction boundary detection unit 1212, and narrows down the memory values that have been read to those that always point to the start point of the VM instruction. The virtual program counter detection unit 1213 detects this memory as a VPC.

The dispatcher detection unit 1214 extracts each VM instruction portion from the script engine binary based on the boundaries of the VM instructions detected by the VM instruction boundary detection unit 1212, and detects the portions with high similarity between each VM instruction as dispatchers. As a premise, the dispatcher is realized by referencing the pointer cache and jumping to the pointer of the next VM instruction handler. Dispatchers are placed in a distributed manner at the rear of each VM instruction handler, and the code therein is generally highly identical. The vulnerability discovery device 10 detects dispatchers using a predetermined method by searching for code with high similarity that exists at the rear of such VM instruction handlers. To detect the portions with high similarity, for example, a series alignment algorithm may be used, or other methods may be used.

The conditional branch flag detection unit 1215 extracts and analyzes the execution trace for the second test script stored in the execution trace DB 131 to discover the conditional branch flag. The conditional branch flag detection unit 1215 analyzes multiple execution traces using differential execution analysis that focuses on the number of times memory is read, and detects the conditional branch flag. The conditional branch flag detection unit 1215 executes conditional branches in various patterns, and detects the memory that stores the conditional branch flag by comparing the pattern of memory changes at that time with the conditional branch pattern in the test script.

The code cache detection unit 1216 detects the code cache, which is a cache in which the VM instructions to be executed are stored, from the VM execution trace based on the execution trace, VPC, and VM execution trace.

The code cache detection unit 1216 detects the memory area pointed to by the VPC as a code cache from the VM execution trace. The code cache detection unit 1216 detects the code location from which the memory allocation function that allocated this code cache was called from the execution trace. The code cache detection unit 1216 detects all memory areas allocated at this code location from the VM execution trace as code caches.

The code cache detection unit 1216 detects code locations that are writing to the code cache from the execution trace. The code cache detection unit 1216 detects writing by these code locations in the VM execution trace as updates to the code cache.

The instruction set architecture analysis unit 122 analyzes the instruction set architecture, which is the system of VM instructions. The instruction set architecture analysis unit 122 collects VM instructions. It determines the instruction content of the collected VM instructions.

The instruction set architecture analysis unit 122 has a VM execution trace acquisition unit 1221 (second acquisition unit), a VM instruction collection unit 1222 (first collection unit), and a VM instruction determination unit 1223 (first determination unit).

Like execution trace acquisition unit 1211, VM execution trace acquisition unit 1221 accepts test scripts and script engine binaries as input. VM execution trace acquisition unit 1221 acquires VM execution traces by monitoring VPCs and pointers to VM instruction handlers dispatched by the dispatcher. VM execution trace acquisition unit 1221 acquires VM execution traces, which are execution traces executed on a VM, by executing test scripts while monitoring the execution of script engine binaries. When detecting branch VM instructions, VM execution trace acquisition unit 1221 executes multiple test scripts to acquire VM execution traces. VM execution trace acquisition unit 1221 links pointers to VM instructions with VM instructions, and virtually assigns VM opcodes as identifiers to each.

A VM execution trace is an execution trace executed in a VM, to which a VM opcode is virtually assigned as an identifier, and which records a pointer to the executed VM handler and a VPC. A VM execution trace records a pointer to an executed VM instruction handler and a VPC. Specifically, a VM execution trace is composed of a VPC and a VM opcode for each executed VM instruction. The VPC can be recorded by monitoring the memory of the VPC detected by the virtual program counter detection unit 1213. A VM opcode is an identifier virtually assigned to each of the pointer to the VM instruction and the VM instruction that are linked together. The VM execution trace acquired by the VM execution trace acquisition unit 1221 is stored in the VM execution trace DB 133.

The VM command collection unit 1222 receives the VPC and dispatcher as input, executes the script while monitoring the VPC and dispatcher, and obtains the VM execution trace. The VM command collection unit 1222 collects VM commands from the VM execution trace.

The VM instruction determination unit 1223 determines the instruction content of the VM instructions collected by the VM instruction collection unit 1222. The VM instruction determination unit 1223 detects branch VM instructions based on the variation in the amount of change in VPC for each VM opcode in the VM execution trace.

The VM instruction determination unit 1223 retrieves and analyzes the VM execution traces stored in the VM execution trace DB 133 to determine whether the VM instruction is a branch VM instruction. For each VM opcode assigned as an identifier, the VM instruction determination unit 1223 collects the amount of change in VPC before and after its execution. If the VM opcode is other than a branch VM instruction, the amount of change in VPC is almost constant. On the other hand, if the VM opcode is a branch VM instruction, the VPC varies depending on the branch destination.

The VM instruction determination unit 1223 therefore determines whether an instruction is a branch VM instruction based on the variance in the amount of change in the virtual program counter for each VM opcode in the VM execution trace. The VM instruction determination unit 1223 focuses on the fact that the amount of variance in the VPC value differs between branch VM instructions and other VM instructions, determines a threshold value, and determines instructions with greater variance in the VPC value as branch VM instructions. Specifically, the VM instruction determination unit 1223 evaluates the variance in the amount of change in the VPC for each VM opcode using variance, and determines instructions with variance equal to or greater than a certain threshold as branch VM instructions.

In addition, in order to construct an accurate control flow graph, the VM instruction determination unit 1223 determines which of the branch VM instructions are conditional branch VM instructions. When a conditional branch occurs, access to a conditional branch flag is always generated to determine the branch destination. Therefore, a conditional branch VM instruction can be determined by verifying whether the conditional branch flag is accessed when each branch VM instruction is executed. In other words, if the conditional branch flag is accessed when a branch VM instruction is executed, it can be determined that it is a conditional branch VM instruction, and if it is not accessed, it is not a conditional branch VM instruction. Therefore, the VM instruction determination unit 1223 determines that, among the branch VM instructions, those that involve access to a conditional branch flag are conditional branch VM instructions based on the VM execution trace and memory access trace.

Furthermore, the VM instruction determination unit 1223 also determines call and return VM instructions. A branch caused by a call VM instruction is characterized in that the address immediately following the caller's bytecode is saved, and after the called subroutine is executed, the return VM instruction returns to that saved address. Thus, when a certain branch VM instruction is designated as instruction 1, and another subsequent branch VM instruction is designated as instruction 2, and instruction 2 returns to the address immediately following instruction 1's bytecode, the VM instruction determination unit 1223 determines that the pair of instructions 1 and 2 are call and return VM instructions.

The calculation unit 123 constructs a first control flow graph that indicates the entire path comprehensively executed by multi-pass execution, based on the architecture information acquired by the virtual machine analysis unit 121 and the instruction set architecture analysis unit 122. The calculation unit 123 has a multi-pass execution unit 1231 (first execution unit), a VM branch trace construction unit 1232 (first construction unit), and a control flow graph construction unit 1233 (second construction unit).

The multi-path execution unit 1231 performs multi-path execution of the script to be analyzed while acquiring a VM execution trace based on the architecture information acquired by the virtual machine analysis unit 121 and the instruction set architecture analysis unit 122. The multi-path execution unit 1231 monitors the VPC and VM instructions, and executes the script to be analyzed while acquiring a VM execution trace. The multi-path execution unit 1231 forks the execution state for each conditional branch instruction, leaving one as is and rewriting the conditional branch flag for the other, thereby performing multi-path execution by comprehensively executing multiple execution paths.

The VM branch trace construction unit 1232 detects branch VM instructions from the VM execution trace 41, which records the opcodes and VPCs of executed VM instructions, and constructs a VM branch trace that associates the VPCs before and after the execution of the detected branch VM instructions.

The VM branch trace construction unit 1232 detects the branch VM instruction that was actually executed from the VM execution trace acquired by the multi-path execution unit 1231, and constructs a first VM branch trace that associates the VPC before and after the execution of the detected branch VM instruction.

The control flow graph construction unit 1233 uses the VM branch trace to construct a control flow graph in which basic blocks are nodes and branches resulting from execution of branch VM instructions are edges. The control flow graph construction unit 1233 constructs a first control flow graph based on the first VM branch trace. The first control flow graph is constructed based on information that was actually executed in multi-path execution, and is therefore a complete control flow graph in which all paths are shown. The control flow graph construction unit 1233 may also construct the graph by scanning the detected code cache and taking into account branch instructions that were not actually executed.

The vulnerability discovery unit 124 constructs a second control flow graph indicating paths executed as tests based on the architecture information acquired by the virtual machine analysis unit 121 and the instruction set architecture analysis unit 122. The vulnerability discovery unit 124 calculates code coverage based on the first control flow graph and the second control flow graph. Code coverage is the ratio of paths executed as tests to all paths. Based on the calculation result of code coverage, the vulnerability discovery unit 124 selectively chooses and mutates input values that will increase code coverage, and performs fuzzing on the analysis target script executed on the VM.

The vulnerability detection unit 124 has a mutation unit 1241, a fuzzing execution unit 1242 (second execution unit), a VM branch trace construction unit 1243 (third construction unit), a control flow graph construction unit 1244 (fourth construction unit), and a code coverage calculation unit 1245 (first calculation unit).

The mutation unit 1241 accepts a seed input value and mutates the input value. The input value is a value to be input to the script to be analyzed. If the code coverage increases above a predetermined value, the vulnerability discovery unit 124 selects the mutated input value and adds it to a dictionary of input values. The mutation unit 1241 mutates the input value based on the dictionary to which the mutated input value has been added.

The fuzzing execution unit 1242 executes the test target based on the architecture information acquired by the virtual machine analysis unit 121 and the instruction set architecture analysis unit 122. The fuzzing execution unit 1242 inputs the input value mutated by the mutation unit 1241 to the analysis target script and executes the test while acquiring a VM execution trace. The fuzzing execution unit 1242 records the VM instructions executed during this execution and acquires the VM execution trace.

The VM branch trace construction unit 1243 detects the branch VM instruction that was actually executed as a test from the VM execution trace acquired by the fuzzing execution unit 1242, and constructs a second VM branch trace that associates the VPC before and after the execution of the detected branch VM instruction.

The control flow graph construction unit 1244 constructs a second control flow graph based on the second VM branch trace, in which basic blocks are nodes and branches resulting from execution of branch VM instructions are edges.

The code coverage calculation unit 1245 calculates code coverage based on the first control flow graph and the second control flow graph. The code coverage calculation unit 1245 calculates, as code coverage, the ratio of the number of nodes and/or edges of the second control flow graph to the number of nodes and edges of the first control flow graph.

The storage unit 13 is realized by a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk, and stores the processing program that operates the vulnerability detection device 10 and data used during execution of the processing program. The storage unit 13 has an execution trace database (DB) 131, a VM execution trace DB 133, and an architecture information DB 132 that stores architecture information acquired by the virtual machine analysis unit 121 and the instruction set architecture analysis unit 122.

The execution trace DB 131 and the VM execution trace DB 133 store the execution traces and VM execution traces acquired by the execution trace acquisition unit 1211 and the VM execution trace acquisition unit 1221, respectively. The execution trace DB 131 and the VM execution trace DB 133 are managed by the vulnerability discovery device 10. Of course, the execution trace DB 131 and the VM execution trace DB 133 may be managed by other devices (servers, etc.), in which case the execution

trace acquisition units

1211 and 1221 output the acquired execution traces and VM execution traces to management servers, etc., of the execution trace DB 131 and the VM execution trace DB 133 via the communication interface of the output unit 14, and store them in the execution trace DB 131 and the VM execution trace DB 133.

The output unit 14 is, for example, an LCD display or a printer, and outputs various information including information related to the vulnerability detection device 10. The output unit 14 may also be an interface that handles the input and output of various data between an external device, and may output various information to an external device.

[Test script configuration]
Let us now explain the test script. A test script is a script that is input when dynamically analyzing a script engine. This test script focuses on the number of branch instruction executions and memory reads and writes, and is used to capture the difference in the behavior of the script engine that occurs when the test script is executed a different number of times. This test script is prepared in advance of the analysis and is created manually. Creating it requires knowledge of the specifications of the target script language.

Figure 4 shows an example of a test script (first test script) used to detect VPCs. The first test script uses a repetitive process (line 2). The first test script changes the execution conditions and generates differences by increasing or decreasing the number of repetitions (line 2) and the number of repeated statements (lines 3 to 5) in the test script.

FIG. 5 is a diagram showing an example of a test script (second test script) used to detect branch VM instructions. The second test script uses multiple conditional branches (lines 4 to 8). In the second test script, the branch conditions are controlled so that the multiple conditional branches are either taken or not taken in a specific order pattern (lines 1 and 5). In the second test script, the number of conditional branches and the order pattern of branch success or failure are changed to generate differences.

Execution Trace Configuration
Next, the execution trace will be described. Fig. 6 is a diagram showing an example of an execution trace. As described above, an execution trace is composed of a branch trace and a memory access trace. Fig. 6 shows an excerpt of an execution trace. The structure of an execution trace will be described below with reference to Fig. 6.

An execution trace has an element called trace. Trace indicates whether the log line is a branch trace or a memory access trace.

A branch trace log line has the format shown, for example, in lines 1 to 10 of Figure 6, and consists of three elements: type, src, and dst. type indicates whether the executed branch instruction was a call instruction, a jmp instruction, or a ret instruction. src indicates the address of the branch source, and dst indicates the address of the branch destination.

A log line of a memory access trace has the format shown, for example, in lines 11 to 13 of Figure 6, and consists of three elements: type, target, and value. Type indicates whether the memory access is a read or write. Target indicates the memory address that is the target of the memory access. Value stores the result of the memory access.

[VM Execution Trace Configuration]
Next, a VM execution trace will be described. Fig. 7 is a diagram showing an example of a VM execution trace. As described above, a VM execution trace is a record of a VM opcode and a VPC. Fig. 7 shows a part of a VM execution trace. The configuration of a VM execution trace will be described below with reference to Fig. 7.

A log line of a VM execution trace is, for example, in the format shown in Figure 7, and consists of two elements: vpc and vmop (vm opcode). vpc indicates the value of the VPC. Also, vmop indicates the value of the VM opcode that is virtually assigned to each pointer that points to the beginning of the VM instruction handler to be executed, obtained from the pointer cache.

[Processing of VM instruction boundary detection unit]
Next, a description will be given of the processing of the VM instruction boundary detection unit 1212. FIG.

The VM instruction boundary detection unit 1212 detects the boundaries of each VM instruction. At this time, the VM instruction boundary detection unit 1212 detects VM instructions and their boundaries for threaded code type VMs, which do not have an interpreter loop and therefore make it difficult to grasp the boundaries of VM instructions. Specifically, the VM instruction boundary detection unit 1212 extracts execution traces from the execution trace DB 131. Then, as shown in FIG. 8, the VM instruction boundary detection unit 1212 clusters the execution traces using a predetermined method, and detects clusters with a threshold or more of execution counts as VM instructions (e.g., VM instruction handlers 1 to 3). The VM instruction boundary detection unit 1212 detects the start and end points of the consecutive instruction strings that make up a VM instruction as boundaries.

[Processing of Virtual Program Counter Detection Unit]
Next, the processing of the virtual program counter detection unit 1213 will be described. The virtual program counter detection unit 1213 detects the VPC and the pointer cache. The detection of the virtual program counter is realized by analyzing the log of the memory access trace of the acquired execution trace. The virtual program counter detection unit 1213 uses differential execution analysis focusing on the number of times memory is read. FIG. 9 is a diagram for explaining the processing of the virtual program counter detection unit 1213.

The virtual program counter detection unit 1213 extracts one execution trace by the first test script from the execution trace DB 131. The number of times the VPC is read is proportional to the number of repetitions in the test script and the number of statements in the repetitive process. If the number of repetitions is N and the number of repeated statements is M, then approximately MN VPC reads will occur. For this reason, the virtual program counter detection unit 1213 extracts memory that has increased by 4MN and 9MN in the execution trace for the first test script in which N and M have been increased to 2N and 2M, respectively, and 3N and 3M. Specifically, as shown in FIG. 9, the virtual program counter detection unit 1213 extracts memory areas that have a monotonically increasing read/write for each VM instruction execution ((1) in FIG. 9).

Then, the virtual program counter detection unit 1213 detects as a VPC a memory value that has been read and that always points to the start point of a VM instruction. Specifically, the virtual program counter detection unit 1213 compares the VPC's pointing destination with the address of the VM instruction handler, and narrows it down to matching memory areas ((2) in FIG. 9).

[Dispatcher Detection Processing]
Next, a description will be given of the process of the dispatcher detection unit 1214. The dispatcher detection unit 1214 detects a dispatcher by analyzing the binary of the script engine using a predetermined method. FIG. 10 is a diagram for explaining the process of the dispatcher detection unit 1214.

The dispatcher detection unit 1214 detects dispatchers. Based on the boundaries of VM instructions detected by the VM instruction boundary detection unit 1212, the dispatcher detection unit 1214 extracts each VM instruction portion from the script engine binary. Then, based on the assumption that the similarity of dispatcher code is high ((1) in FIG. 10), the dispatcher detection unit 1214 calculates the similarity between the codes of each VM instruction and detects the portion with high similarity between all VM instructions as a dispatcher. The dispatcher detection unit 1214 can detect the code that is commonly executed in the latter half of the VM instructions as a dispatcher ((1) in FIG. 10).

[Code cache detection part]
Next, a description will be given of the processing of the code cache detection unit 1216. FIG.

The code cache detection unit 1216 detects the memory area pointed to by the VPC as a code cache from the VM execution trace ((1) in FIG. 11).

The code cache detection unit 1216 detects the code location that called the memory allocation function that allocated this code cache from the execution trace ((2) in FIG. 11). The code cache detection unit 1216 detects all memory areas allocated at this code location from the VM execution trace as code caches ((3) in FIG. 11).

The code cache detection unit 1216 detects the code location that is writing to the code cache from the execution trace ((4) in FIG. 11). The code cache detection unit 1216 detects the writing by this code location in the VM execution trace as an update to the code cache ((5) in FIG. 11).

[VM command determination unit]
Next, the process of the VM instruction determination unit 1223 will be described. The VM instruction determination unit 1223 determines a branch VM instruction by analyzing the acquired VM execution trace log. The test script here may be any script that includes a branch VM instruction and that includes a branch control syntax. For example, the test script is prepared by collecting information from the Internet or obtaining information from official documents.

First, the VM instruction determination unit 1223 associates a pointer to a VM instruction with a VM instruction for each VM execution trace in the VM execution trace DB 133, and virtually assigns a VM opcode as an identifier to each of them. Figure 12 is a diagram explaining the processing of the VM instruction determination unit 1223.

Here, if a VM instruction is a branch instruction, the advance of the VPC changes depending on the branch destination. On the other hand, if it is not a branch instruction, the advance of the VPC changes depending on the size of the VM instruction. For this reason, when pairs of VM instruction opcodes and pointers to VM instructions are collected and the advance of the VPC is examined for each opcode, if it is a branch instruction, the advance of the VPC will vary depending on the branch destination.

Therefore, the VM instruction determination unit 1223 uses variance to evaluate the variance of the pointer to this VM instruction. The VM instruction determination unit 1223 calculates the variance of the amount of change in the VPC for each VM opcode, and narrows it down to only VM opcodes whose calculated variance is greater than a threshold value. In this way, the VM instruction determination unit 1223 associates the pointer with the VM instruction, and determines that a VM instruction with variance in the advance of the VPC (VM instruction handler 3 in the example of FIG. 12) is a branch VM instruction ((1) in FIG. 12).

When the set O of VPC advances for a certain opcode is O={ _o0 , _o1 , ..., _oN } (see equation (1) for the average of VPCo) and t is a threshold, whether or not an instruction is a branch instruction is determined based on the variance s (see equation (2)) as shown in equation (3). In this way, the VM instruction determination unit 1223 determines whether an instruction is a branch VM instruction.

In addition, for VM instructions other than branches, there is almost no variation, and the boundary between branch VM instructions and other VM instructions is often clear. For this reason, the threshold value is set to a value that can divide the two groups that result by plotting the obtained variance value on a number line, for example.

[Processing of VM branch trace construction unit]
Next, a description will be given of the processing of the VM branch

trace construction units

1232 and 1243. Fig. 13 is a diagram for explaining the processing of the VM branch trace construction units.

As shown in FIG. 13, the VM branch

trace construction units

1232 and 1243 detect a branch VM instruction from the VM execution trace 41 that records the opcode of the executed VM instruction and the VPC ((1) in FIG. 13). The branch VM instruction can be recognized by referring to the branch VM instruction list 42 detected by the VM instruction determination unit 1223.

Then, the VM branch

trace construction units

1232, 1243 construct a VM branch trace 43 that associates the VPCs before and after the execution of the detected branch VM instruction ((2) in FIG. 13). For example, the VM branch

trace construction units

1232, 1243 detect the branch VM instruction "0x1f" from line R41 of the VM execution trace, and construct the VM branch trace shown in line R61 based on line R41 and line R42, which is the line that follows line R42. That is, the VM branch

trace construction units

1232, 1243 associate the branch source VPC "0x555c7e48" of line R41 with the branch source VPC "0x555c82a0" of line R42.

Similarly, when the VM branch

trace construction units

1232, 1243 detect the branch VM instruction "0x21" from line R51 of the M execution trace, they associate the VPC "0x555c832c" of line R51 with the VPC "0x555c7514" of line R52 based on this line R51 and the next line R52 (line R71).

[Processing of control flow graph construction part]
14 is a diagram illustrating the processing of the control flow

graph construction units

1233, 1244. The control flow

graph construction units

1233, 1244 use the VM branch trace 43 constructed in the VM branch trace construction process to construct a control flow graph in which basic blocks are nodes and each branch of the VM branch trace 43 is an edge ((1) in FIG. 14).

Specifically, the control flow

graph construction units

1233 and 1244 designate the branch shown in row R61 of the VM branch trace 43 as edge E61, the basic block from which edge E61 branches as node N61, and the basic block to which edge E61 branches as node N62.

Then, the branch shown in row R71 of VM branch trace 43 is edge E71, the subblock from which edge 71 branches is node N71, and the basic block to which edge E71 branches is node N72. In the example of the control flow graph in FIG. 14, in addition to edge E71, node N71 has a branch shown in edge E72, and the basic block to which this branch is directed is indicated by node N73. In this way, the control flow

graph construction units

1233 and 1244 construct a control flow graph that expresses the branching of basic blocks in a graph structure based on the VM branch trace.

[Processing procedure of vulnerability detection device]
Next, there will be described the procedure of vulnerability discovering processing by the vulnerability discovering device 10. Fig. 15 and Fig. 16 are flowcharts showing the procedure of vulnerability discovering processing according to the embodiment.

First, the input unit 11 receives a test script and a script engine binary as input (step S1).

Then, the execution trace acquisition unit 1211 performs an execution trace acquisition process in which the test script is executed while monitoring the binary of the script engine to acquire branch traces and memory access traces (step S2).

The VM instruction boundary detection unit 1212 detects VM instructions and performs VM instruction boundary detection processing to detect VM instruction boundaries (step S3). The virtual program counter detection unit 1213 extracts and analyzes the execution trace for the first test script stored in the execution trace DB 131, and performs virtual program counter detection processing to discover the VPC (step S4).

The dispatcher detection unit 1214 performs dispatcher detection processing to extract each VM command portion from the script engine binary and detect the portion with high similarity between each VM command as a dispatcher (step S5).

The conditional branch flag detection unit 1215 performs a conditional branch detection process to extract and analyze the execution trace for the second test script stored in the execution trace DB 131 and discover the conditional branch flag (step S6).

The code cache detection unit 1216 performs a code cache detection process based on the execution trace and VPC to detect the area of the code location from which the memory allocation function was called as a code cache, and to detect the area in which writing is being done to the code location area as an update to the code cache (step S7).

The VM execution trace acquisition unit 1221 receives the test script and the script engine binary as input, and executes the test script while monitoring the execution of the script engine binary, thereby performing a VM execution trace acquisition process to acquire a VM execution trace (step S8).

The VM instruction collection unit 1222 performs a VM instruction collection process to acquire VM instructions from the VM execution trace (step S9). The VM instruction determination unit 1223 performs a VM instruction determination process to determine the instruction content of the collected VM instructions (step S10).

The input unit 11 accepts input of a script to be analyzed (step S11). The multi-pass execution unit 1231 executes the script to be analyzed in multiple passes while acquiring a VM execution trace, based on the architecture information acquired by the virtual machine analysis unit 121 and the instruction set architecture analysis unit 122 (step S12).

The VM branch trace construction unit 1232 detects a branch VM instruction from the VM execution trace, and executes a VM branch trace construction process to construct a VM branch trace that associates VPCs before and after the execution of the detected branch VM instruction (step S13). In step S13, the VM branch trace construction unit 1232 constructs a first VM branch trace based on the multi-path execution by the multi-path execution unit 1231.

The control flow graph construction unit 1233 performs a control flow graph construction process to construct a control flow graph using the VM branch trace (step S14). In step S14, the control flow graph construction unit 1233 constructs a first control flow graph based on the first VM branch trace.

The vulnerability discovery unit 124 accepts the input of a seed input value (step S15) and creates a dictionary of input values (step S16).

The mutation unit 1241 performs a mutation process to mutate the input value to be input to the analysis target script (step S17). The fuzzing execution unit 1242 inputs the input value mutated by the mutation unit 1241 to the analysis target script and performs an execution process to execute a test while acquiring a VM execution trace (step S18).

The vulnerability discovery unit 124 determines whether a problem such as a crash has occurred (step S19). If a problem such as a crash has occurred (step S19: Yes), the vulnerability discovery unit 124 outputs the input value that caused the problem (step S25).

If no problem such as a crash has occurred (step S19: No), the VM branch trace construction unit 1243 performs a VM branch trace construction process to construct a second VM branch trace by performing the same process as in step S13 on the VM execution trace acquired by the fuzzing execution unit 1242 (step S20).

The control flow graph construction unit 1244 performs a control flow graph construction process to construct a second control flow graph by performing the same process as step S14 on the second VM branch trace (step S21).

The code coverage calculation unit 1245 performs a code coverage calculation process to calculate the code coverage based on the first control flow graph and the second control flow graph (step S22).

The vulnerability discovery unit 124 determines whether the code coverage calculated in step S22 has increased beyond a predetermined value (step S23). The predetermined value may be set in advance or may be set dynamically according to the processing history of the vulnerability discovery device 10.

If the code coverage has increased above the predetermined value (step S23: Yes), the vulnerability discovery unit 124 selects the mutated input value and adds it to the dictionary of input values (step S24), and returns to step S17. Then, the vulnerability discovery unit 124 causes the mutation unit 1241 to mutate the input value based on the dictionary to which the mutated input value has been added. If the code coverage has not increased above the predetermined value (step S23: No), the vulnerability discovery unit 124 returns to step S17.

[Processing procedure for execution trace acquisition processing]
Next, a flow of the execution trace acquisition process shown in Fig. 15 will be described below. Fig. 17 is a flowchart showing the processing procedure of the execution trace acquisition process shown in Fig. 15.

First, the execution trace acquisition unit 1211 receives a test script and a script engine binary as input (step S31). Then, the execution trace acquisition unit 1211 hooks the received script engine to acquire a branch trace (step S32). The execution trace acquisition unit 1211 also hooks the received script engine to acquire a memory access trace (step S33).

Then, the execution trace acquisition unit 1211 inputs the test script received in this state into the script engine for execution (step S34), and stores the execution trace acquired thereby in the execution trace DB 131 (step S35).

The execution trace acquisition unit 1211 determines whether or not all of the input test scripts have been executed (step S36). If all of the input test scripts have been executed (step S36: Yes), the execution trace acquisition unit 1211 ends the process. On the other hand, if all of the input test scripts have not been executed (step S36: No), the execution trace acquisition unit 1211 returns to the execution of the test scripts in step S34 and continues the process.

[Procedure of VM instruction boundary detection process]
Next, a description will be given of the flow of the VM instruction boundary detection process shown in Fig. 15. Fig. 18 is a flowchart showing the processing procedure of the VM instruction boundary detection process shown in Fig. 15.

First, the VM instruction boundary detection unit 1212 extracts execution traces from the execution trace DB 131 (step S41). The VM instruction boundary detection unit 1212 clusters the execution traces using a predetermined method (step S42). Any method may be used for clustering.

The VM instruction boundary detection unit 1212 detects clusters whose execution count is equal to or exceeds a threshold as VM instructions (step S43). The VM instruction boundary detection unit 1212 then determines the start and end points of the continuous instruction sequence that constitutes the VM instruction as boundaries (step S44). The VM instruction boundary detection unit 1212 outputs the VM instruction boundary as a return value (step S45), and ends the VM instruction boundary detection process.

[Procedure for Virtual Program Counter Detection Processing]
Next, a description will be given of the flow of the virtual program counter detection process shown in Fig. 15. Fig. 19 is a flowchart showing the processing procedure of the virtual program counter detection process shown in Fig. 15.

First, the virtual program counter detection unit 1213 extracts one execution trace by the first test script from the execution trace DB 131 (step S51). Next, the virtual program counter detection unit 1213 focuses on memory access traces among the execution traces, and counts up the number of reads for each memory read destination (step S52).

The virtual program counter detection unit 1213 receives as input the first test script used to obtain the execution trace (step S53), and analyzes the first test script to obtain the number of repetitions and the number of repeated statements (step S54).

Then, the virtual program counter detection unit 1213 extracts from the execution trace DB 131 another execution trace by the first test script, which has a different number of repetitions and number of repeated statements (step S55). Then, the virtual program counter detection unit 1213 focuses on the memory access trace and counts the number of reads for each memory read destination (step S56). The virtual program counter detection unit 1213 also receives as input the first test script used to obtain the execution trace (step S57), analyzes the test script, and obtains the number of repetitions and the number of repeated statements (step S58).

Here, the virtual program counter detection unit 1213 narrows down the memory read destinations to only those whose read counts change in proportion to the number of repetitions or the increase or decrease in the number of repeated statements (step S59). Furthermore, the virtual program counter detection unit 1213 narrows down the memory read destinations narrowed down in step S59 to those whose read memory values always point to the start point of the VM instruction (step S60).

Then, the virtual program counter detection unit 1213 determines whether the memory read destinations have been narrowed down to only one (step S61). If the virtual program counter detection unit 1213 has not narrowed down the memory read destinations to only one (step S61: No), the process returns to step S55, where the virtual program counter detection unit 1213 retrieves the next execution trace and continues processing. On the other hand, if the virtual program counter detection unit 1213 has narrowed down the memory read destinations to only one (step S61: Yes), the virtual program counter detection unit 1213 stores the narrowed down memory read destination as a virtual program counter in the architecture information DB 132 (step S62), and ends processing.

[Processing procedure for dispatcher detection processing]
Next, a description will be given of the flow of the dispatcher detection process shown in Fig. 15. Fig. 20 is a flowchart showing the processing procedure of the dispatcher detection process shown in Fig. 15.

First, the dispatcher detection unit 1214 receives the script engine binary as input (step S71). The dispatcher detection unit 1214 receives the boundaries of VM commands from the VM command boundary detection unit 1212 (step S72).

The dispatcher detection unit 1214 extracts each VM instruction portion from the script engine binary based on the boundaries of the VM instructions received from the VM instruction boundary detection unit 1212 (step S73). The dispatcher detection unit 1214 calculates the similarity between the codes of each VM instruction using a predetermined method (step S74). Any method may be used to calculate the similarity, as long as it is capable of calculating the similarity between codes.

The dispatcher detection unit 1214 extracts the part with high similarity among all VM commands based on the similarity calculated in step S74 (step S75). The dispatcher detection unit 1214 then determines whether it is the end part of the VM command (step S76).

If it is not the end of the VM command (step S76: No), the dispatcher detection unit 1214 returns to step S75 and continues processing. If it is the end of the VM command (step S76: Yes), the dispatcher detection unit 1214 outputs the extracted part as a dispatcher (step S77) and ends processing.

[Conditional Branch Flag Detection Processing Procedure]
Next, a description will be given of the flow of the conditional branch flag detection process shown in Fig. 15. Fig. 21 is a flowchart showing the processing procedure of the conditional branch flag detection process shown in Fig. 15.

First, the conditional branch flag detection unit 1215 extracts one execution trace by the second test script from the execution trace DB 131 (step S81). Then, the conditional branch flag detection unit 1215 focuses on the memory access trace and counts the number of reads for each memory read destination (step S82).

The conditional branch flag detection unit 1215 also receives as input the second test script used to obtain the execution trace (step S83), analyzes this second test script, and obtains the number of conditional branches and the True/False order pattern (step S84). The conditional branch flag detection unit 1215 then narrows down the memory read destinations to only those whose read count changes in proportion to the number of conditional branches (step S85). Furthermore, the conditional branch flag detection unit 1215 narrows down the memory read destinations to only those whose read memory value alternates between two values in accordance with the True/False order pattern (step S86).

The conditional branch flag detection unit 1215 determines whether the memory read destinations have been narrowed down to only one (step S87). If the conditional branch flag detection unit 1215 has not narrowed down the memory read destinations to only one (step S87: No), it returns to step S81, retrieves the next execution trace, and continues processing. On the other hand, if the conditional branch flag detection unit 1215 has narrowed down the memory read destinations to only one (step S87: Yes), it stores the narrowed down read destination in the architecture information DB 132 as a virtual program counter (step S88), and ends processing.

[Code cache detection process procedure]
Next, a description will be given of the flow of the code cache detection process shown in Fig. 15. Fig. 22 is a flowchart showing the processing procedure of the code cache detection process shown in Fig. 15.

When the code cache detection unit 1216 receives an execution trace and a VM execution trace as input (step S91), it acquires the memory area pointed to by the VPC from the VM execution trace (step S92). The VM execution trace is acquired by the VM execution trace acquisition unit 1221.

The code cache detection unit 1216 obtains from the execution trace the code location of the caller of the memory allocation function that allocated the memory area obtained in step S92 (step S93). The code cache detection unit 1216 detects, from the VM execution trace, all areas allocated at the code location obtained in step S93 as code caches (step S94).

The code cache detection unit 1216 acquires the code location that is writing to the code cache from the execution trace (step S95). The code cache detection unit 1216 detects all areas in the VM execution trace that are written to at the code location acquired in step S95 as code cache updates (step S96). The code cache detection unit 1216 returns the detected code cache and its updated location (step S97), and ends the code cache detection process.

[Procedure of VM Execution Trace Acquisition Processing]
Next, a description will be given of the flow of the VM execution trace acquisition process shown in Fig. 15. Fig. 23 is a flowchart showing the procedure of the VM execution trace acquisition process shown in Fig. 15.

First, the VM execution trace acquisition unit 1221 receives a test script and a script engine binary as input (step S101). Then, the VM execution trace acquisition unit 1221 applies a hook to the received script engine to record the VPC and VM opcode (step S102).

The VM execution trace acquisition unit 1221 inputs the received test script in this state into the script engine for execution (step S103), and stores the VM execution trace acquired thereby in the VM execution trace DB 133 (step S104).

The VM execution trace acquisition unit 1221 determines whether or not all of the input test scripts have been executed (step S105). If all of the input test scripts have been executed (step S105: Yes), the VM execution trace acquisition unit 1221 ends the process. If all of the input test scripts have not been executed (step S105: No), the VM execution trace acquisition unit 1221 returns to the execution of the test scripts in step S103 and continues the process.

[Procedure of VM command collection process]
Next, a description will be given of the flow of the VM command collection process shown in Fig. 15. Fig. 24 is a flowchart showing the procedure of the VM command collection process shown in Fig. 15.

The VM command collection unit 1222 receives the VPC and dispatcher as input (step S111) and acquires various scripts from the Internet (step S112). The VM command collection unit 1222 executes the scripts while monitoring the VPC and dispatcher, and acquires a VM execution trace (step S113).

The VM instruction collection unit 1222 acquires VM instructions from the VM execution trace (step S114) and adds them to a list of VM instructions (step S115). If the VM instruction collection unit 1222 finds a VM instruction that is not in the list (step S116: No), it returns to step S112. If the VM instruction collection unit 1222 finds no VM instructions that are not in the list (step S116: Yes), it returns the list of VM instructions (step S117) and ends the VM instruction collection process.

[Processing Procedure of VM Command Determination Processing]
Next, a description will be given of the flow of the VM command determination process shown in Fig. 15. Fig. 25 is a flowchart showing the processing procedure of the VM command determination process shown in Fig. 15.

The VM instruction determination unit 1223 extracts one VM execution trace from the VM execution trace DB 133 (step S121). The VM instruction determination unit 1223 associates a pointer to the VM instruction with the VM instruction, and assigns a VM opcode to each as an identifier (step S122). Then, the VM instruction determination unit 1223 counts the amount of change in VPC before and after execution for each VM opcode (step S123).

The VM instruction determination unit 1223 determines whether all VM execution traces in the VM execution trace DB 133 have been processed (step S124). If all VM execution traces in the VM execution trace DB 133 have not been processed (step S124: No), the VM instruction determination unit 1223 returns to step S121 and retrieves and processes the next VM execution trace.

If all VM execution traces in the VM execution trace DB 133 have been processed (step S124: Yes), the VM instruction determination unit 1223 calculates the variance of the amount of change in VPC for each VM opcode (step S125). Then, the VM instruction determination unit 1223 receives a threshold value as an input (step S126). The VM instruction determination unit 1223 narrows down to only VM opcodes whose variance is greater than the threshold value (step S127), stores them as branch VM instructions in the architecture information DB 132 (step S128), and ends the process.

[Processing procedure for multi-path execution processing]
Next, a description will be given of the flow of the multi-pass execution process shown in Fig. 15. Fig. 26 is a flowchart showing the processing procedure of the multi-pass execution process shown in Fig. 15.

The multi-path execution unit 1231 receives the script to be analyzed as input (step S131). The multi-path execution unit 1231 receives the VPC, dispatcher, and conditional branch flag as input (step S132).

The multi-path execution unit 1231 monitors the VPC and VM instructions, and executes the script to be analyzed while acquiring the VM execution trace (step S133). The multi-path execution unit 1231 forks the execution state for each conditional branch instruction, leaving one as is and rewriting the conditional branch flag for the other, thereby comprehensively executing multiple execution paths (step S134).

If all execution paths have not been exhaustively executed (step S135: No), the multi-path execution unit 1231 returns to step S133. If all execution paths have been exhaustively executed (step S135: Yes), the multi-path execution unit 1231 returns the VM execution trace (step S136) and ends the multi-path execution process.

[Procedure of VM branch trace construction process]
Next, a description will be given of the flow of the VM branch trace construction process shown in Fig. 15. Fig. 27 is a flowchart showing the processing procedure of the VM branch trace construction process shown in Fig. 15.

The VM branch

trace construction units

1232 and 1243 receive the VM execution trace and the VM branch instruction list as input (step S141).

The VM branch

trace construction units

1232 and 1243 extract the VM execution trace entries (step S142). The VM branch

trace construction units

1232 and 1243 determine whether the VM opcode is present in the VM branch instruction list (step S143).

If the VM opcode exists in the VM branch instruction list (step S143: Yes), the VM branch

trace construction units

1232 and 1243 save the VPC as the branch source and the VPC of the next entry as the branch destination in the VM branch trace (step S144).

If the VM opcode does not exist in the VM branch instruction list (step S143: No), or after step S144 is completed, the VM branch

trace construction units

1232 and 1243 determine whether all entries in the VM execution trace have been processed (step S145).

If not all entries in the VM execution trace have been processed (step S145: No), the VM branch

trace construction units

1232 and 1243 extract the next entry in the VM execution trace (step S146). Then, the VM branch

trace construction units

1232 and 1243 return to step S143 and determine whether the VM opcode for the next entry is present in the VM branch instruction list.

On the other hand, if all entries in the VM execution trace have been processed (step S145: Yes), the VM branch

trace construction units

1232 and 1243 end the VM branch trace construction process.

[Processing procedure for constructing control flow graph]
Next, a flow of the control flow graph construction process shown in Fig. 15 will be described below. Fig. 28 is a flowchart showing the processing procedure of the control flow graph construction process shown in Fig. 15.

When the control flow

graph construction units

1233 and 1244 receive a VM branch trace as input (step S151), they extract the VM branch trace entries (step S152).

The control flow

graph construction units

1233 and 1244 add a basic block starting from the branch destination address as a node to the control flow graph (step S153). The control flow

graph construction units

1233 and 1244 add an edge from the branch source address to the branch destination address to the control flow graph (step S154). The control flow

graph construction units

1233 and 1244 determine whether all entries of the VM branch trace have been processed (step S155).

If the control flow

graph construction units

1233 and 1244 have not processed all the entries of the VM branch trace (step S155: No), they extract the next entry of the VM execution trace (step S156). Then, the control flow

graph construction units

1233 and 1244 return to step S153 and add the basic block starting from the branch destination address of the next entry as a node to the control flow graph.

When all entries of the VM branch trace have been processed (step S155: Yes), the control flow

graph construction units

1233 and 1244 output the constructed control flow graph (step S157).

[Mutation processing procedure]
Next, a flow of the mutation process shown in Fig. 16 will be described below. Fig. 29 is a flowchart showing the processing procedure of the mutation process shown in Fig. 16.

The mutation unit 1241 accepts a dictionary of input values as input (step S161) and extracts one input value from the dictionary (step S162).

The mutation unit 1241 mutates the extracted input value using a predetermined method, such as randomly changing the value (step S163). The mutation unit 1241 adds the mutated input value to the dictionary (step S164).

If the mutation unit 1241 has not mutated all of the input values to be mutated (step S165: No), it returns to step S163 and mutates the input values to be mutated. If the mutation unit 1241 has mutated all of the input values to be mutated (step S165: Yes), it returns the dictionary of updated input values (step S166) and ends the mutation process.

[Execution process procedure]
Next, a description will be given of the flow of the execution process shown in Fig. 16. Fig. 30 is a flowchart showing the processing procedure of the execution process shown in Fig. 16.

The fuzzing execution unit 1242 receives the script to be analyzed as input (step S171). The fuzzing execution unit 1242 receives the VPC, dispatcher, and conditional branch flag as input (step S172).

The fuzzing execution unit 1242 monitors the VM instructions and VPC being executed (step S173), and executes the script to be analyzed (step S174). The fuzzing execution unit 1242 records the VM instructions executed during execution, and obtains a VM execution trace (step S175). The fuzzing execution unit 1242 returns the VM execution trace (step S176), and ends the execution process.

[Code coverage calculation process procedure]
Next, a description will be given of the flow of the code coverage calculation process shown in Fig. 16. Fig. 31 is a flowchart showing the processing procedure of the code coverage calculation process shown in Fig. 16.

The code coverage calculation unit 1245 counts the number of all nodes and edges in the first control flow graph (step S181). The code coverage calculation unit 1245 counts the number of all nodes and edges in the second control flow graph (step S182).

The code coverage calculation unit 1245 calculates the ratio of the number of nodes and edges of the second control flow graph to the number of nodes and edges of the first control flow graph (step S183). The code coverage calculation unit 1245 may calculate, as code coverage, both the number of nodes of the second control flow graph to the number of nodes of the first control flow graph and the ratio of the number of edges of the second control flow graph to the number of edges of the first control flow graph, or may calculate either one of them.

The code coverage calculation unit 1245 returns the calculated percentage as the code coverage (step S184) and ends the code coverage calculation process.

[Effects of the embodiment]
In this way, the vulnerability discovery device 10 according to the embodiment analyzes the VM of the script engine, collects VM instructions, and determines the contents of the collected VM instructions to obtain information on the instruction set architecture, which is the system of VM instructions. Based on the obtained architecture information, the vulnerability discovery device 10 constructs a first control flow graph showing an entire path. Based on the obtained architecture information, the vulnerability discovery device 10 mutates an input value, and constructs a second control flow graph showing a path executed as a test by inputting the mutated input value to the analysis target script. The vulnerability discovery device 10 calculates code coverage, selects an input value based on the code coverage calculation result, and performs fuzzing on the analysis target script executed on the virtual machine.

For this reason, even for script engines that do not have support functions that can be used for code coverage measurement and whose internal specifications are unknown, the vulnerability discovery device 10 can analyze the VM of the script engine and obtain information on the instruction set architecture, which is the system of instructions for the VM, thereby achieving gray box fuzzing based on accurate code coverage at runtime without the need for individual manual analysis, design, and implementation.

Specifically, the vulnerability discovery device 10 executes the test script while monitoring the binary of the script engine, and obtains branch traces and memory access traces as execution traces. The vulnerability discovery device 10 analyzes the virtual machine based on the execution trace, and obtains architecture information of the VM instruction boundary, VPC, dispatcher, conditional branch flags, and code cache. Furthermore, the vulnerability discovery device 10 executes the test script while monitoring the VPC and dispatcher, and obtains a VM execution trace. By analyzing the VM execution trace, the vulnerability discovery device 10 collects VM instructions, determines the contents of the VM instructions, and obtains information on the instruction set architecture.

In this way, even for a script engine whose VM's internal specifications are unknown, the vulnerability detection device 10 can detect architecture information including information indicating where in the VM the bytecode generated by the script engine is stored, and information on the instruction set architecture of the bytecode that the VM can interpret.

Then, based on the acquired architecture information, the vulnerability discovery device 10 constructs a first control flow graph of the entire path comprehensively through multi-pass execution. Based on the acquired architecture information, the vulnerability discovery device 10 repeats the following steps to fuzz the script: mutating the input value, inputting the mutated input value into the script to be analyzed, executing the test, constructing a second control flow graph showing the path executed as the test, and calculating the code coverage.

As a result, the vulnerability detection device 10 can detect various architectural information by analyzing the execution trace and VM execution trace obtained, even for script engines whose VM internal specifications are unknown, and can perform gray-box fuzzing on running scripts without requiring manual reverse engineering.

In addition, the vulnerability detection device 10 selectively chooses and mutates input values that will increase the code coverage based on the results of the code coverage calculation, allowing for more appropriate execution of gray box fuzzing.

Furthermore, the vulnerability detection device 10 can automatically perform gray-box fuzzing on scripts for a variety of script engines as long as a test script is prepared, so that the fuzzing can be performed without the need for individual design or implementation.

As a result, the vulnerability detection device 10 can perform gray-box fuzzing on running scripts, even those written in various scripting languages, and discover potential vulnerabilities.

In this way, the vulnerability detection device 10 according to this embodiment can perform gray-box fuzzing on scripts written in a wide variety of scripting languages by analyzing a script engine whose VM's internal specifications are unknown and acquiring information about the VM's architecture and instruction set architecture.

In addition, the vulnerability discovery device 10 according to this embodiment is useful for discovering vulnerabilities in a wide variety of scripts, and is suitable for efficiently discovering vulnerabilities that lie in scripts by taking into account the code coverage executed by input values through gray box fuzzing.

Therefore, the vulnerability discovery device 10 according to this embodiment can be used to discover and fix potential vulnerabilities by performing gray box fuzzing on various scripts.

[System configuration of the embodiment]
Each component of vulnerability discovery device 10 shown in Fig. 3 is a functional concept, and does not necessarily have to be physically configured as shown. In other words, the specific form of distribution and integration of the functions of vulnerability discovery device 10 is not limited to that shown in the figure, and all or part of it can be functionally or physically distributed or integrated in any unit depending on various loads, usage conditions, etc.

Furthermore, each process performed by the vulnerability discovery device 10 may be realized, in whole or in part, by a CPU and a program analyzed and executed by the CPU. Furthermore, each process performed by the vulnerability discovery device 10 may be realized as hardware using wired logic.

Furthermore, among the processes described in the embodiments, all or part of the processes described as being performed automatically can be performed manually. Alternatively, all or part of the processes described as being performed manually can be performed automatically using known methods. In addition, the information including the processing procedures, control procedures, specific names, various data, and parameters described above and illustrated in the drawings can be changed as appropriate unless otherwise specified.

[program]
32 is a diagram showing an example of a computer in which a program is executed to realize the vulnerability discovering device 10. The computer 1000 has, for example, a memory 1010 and a CPU 1020. The computer 1000 also has a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These components are connected by a bus 1080.

The memory 1010 includes a ROM 1011 and a RAM 1012. The ROM 1011 stores a boot program such as a BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to a hard disk drive 1090. The disk drive interface 1040 is connected to a disk drive 1100. A removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected to a mouse 1110 and a keyboard 1120, for example. The video adapter 1060 is connected to a display 1130, for example.

The hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, the programs that define each process of the vulnerability detection device 10 are implemented as program modules 1093 in which code executable by the computer 1000 is written. The program modules 1093 are stored, for example, in the hard disk drive 1090. For example, a program module 1093 for executing processes similar to the functional configuration of the vulnerability detection device 10 is stored in the hard disk drive 1090. The hard disk drive 1090 may be replaced by an SSD (Solid State Drive).

Furthermore, the setting data used in the processing of the above-mentioned embodiment is stored as program data 1094, for example, in memory 1010 or hard disk drive 1090. Then, the CPU 1020 reads the program module 1093 or program data 1094 stored in memory 1010 or hard disk drive 1090 into RAM 1012 as necessary and executes it.

The program module 1093 and program data 1094 may not necessarily be stored in the hard disk drive 1090, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and program data 1094 may be stored in another computer connected via a network (such as a LAN (Local Area Network), WAN (Wide Area Network)). The program module 1093 and program data 1094 may then be read by the CPU 1020 from the other computer via the network interface 1070.

The above describes an embodiment of the invention made by the inventor, but the present invention is not limited to the descriptions and drawings that form part of the disclosure of the present invention according to this embodiment. In other words, other embodiments, examples, and operational techniques made by those skilled in the art based on this embodiment are all included in the scope of the present invention.

REFERENCE SIGNS LIST 10 Vulnerability detection device 11 Input unit 12 Control unit 13 Memory unit 14 Output unit 121 Virtual machine analysis unit 122 Instruction set architecture analysis unit 123 Calculation unit 124 Vulnerability detection unit 131 Execution trace database (DB)
132 Architecture Information DB
133 VM execution trace DB
1211 Execution trace acquisition unit 1212 VM instruction boundary detection unit 1213 Virtual program counter detection unit 1214 Dispatcher detection unit 1215 Conditional branch flag detection unit 1216 Code cache detection unit 1221 VM execution trace acquisition unit 1222 VM instruction collection unit 1223 VM instruction determination unit 1231

Multi-path execution unit

1232, 1243 VM branch

trace construction unit

1233, 1244 Control flow graph construction unit 1241 Mutation unit 1242 Fuzzing execution unit 1245 Code coverage calculation unit

Claims

a first analysis unit that analyzes a virtual machine of a script engine;
a second analysis unit that analyzes an instruction set architecture, which is an instruction system of the virtual machine, collects virtual machine instructions, and determines instruction contents of the collected virtual machine instructions;
a calculation unit that constructs a first control flow graph that indicates an entire path based on architecture information acquired by the first analysis unit and the second analysis unit;
a vulnerability discovery unit that mutates an input value based on architecture information acquired by the first analysis unit and the second analysis unit, constructs a second control flow graph showing a path executed as a test by inputting the mutated input value to an analysis target script, calculates code coverage which is a ratio of the path executed as the test to the entire path, selects an input value based on a calculation result of the code coverage, and performs fuzzing on the analysis target script executed on a virtual machine;
A vulnerability detection device comprising:
The vulnerability detection device according to claim 1, characterized in that the calculation unit constructs the first control flow graph showing the entire path comprehensively executed by multi-pass execution based on the architecture information acquired by the first analysis unit and the second analysis unit.
The vulnerability detection device according to claim 1, characterized in that, when the code coverage increases above a predetermined value, the vulnerability detection unit selects the mutated input value, adds it to the input value, and inputs the mutated input value of the added input value to the analysis target script.
The vulnerability discovery device according to claim 1, characterized in that the first analysis unit uses differential execution analysis to analyze multiple execution traces obtained by changing execution conditions, and detects a conditional branch flag, which is an area that holds a flag indicating whether or not a branch is taken when a conditional branch occurs in the execution state.
The first analysis unit
a first acquisition unit that acquires a plurality of execution traces by changing execution conditions;
a first detector for clustering the execution trace to detect boundaries of each virtual machine instruction;
a second detection unit that analyzes the execution traces using a differential execution analysis focusing on the number of memory reads and the boundaries of each virtual machine instruction detected by the first detection unit, and detects a virtual program counter that is a variable indicating the next instruction of the virtual machine to be executed;
a third detection unit that analyzes the binary of the script engine based on the boundaries of each virtual machine instruction detected by the first detection unit and detects a dispatcher;
a fourth detection unit that analyzes the plurality of execution traces using differential execution analysis focusing on the number of memory reads and detects a conditional branch flag that is an area that holds a flag indicating whether or not a branch is to be taken at the time of a conditional branch in an execution state;
5. The vulnerability discovering device according to claim 4, further comprising:
The second analysis unit includes:
a second acquisition unit that acquires a virtual machine execution trace, which is an execution trace executed in the virtual machine;
a first collection unit that executes a test script while monitoring the virtual program counter and the dispatcher to obtain the virtual machine execution trace, and collects virtual machine instructions from the virtual machine execution trace;
a first determination unit that determines whether a virtual machine instruction is a branch virtual machine instruction based on a variation in a change amount of a virtual program counter for each virtual machine opcode of the virtual machine execution trace;
6. The vulnerability discovering device according to claim 5, further comprising:
The calculation unit is
a first execution unit that executes a script to be analyzed in multiple passes while acquiring a virtual machine execution trace, which is an execution trace executed in the virtual machine, based on architecture information acquired by the first analysis unit and the second analysis unit;
a first construction unit that detects a branch virtual machine instruction from the virtual machine execution trace acquired by the first execution unit and constructs a first virtual machine branch trace in which virtual program counters before and after execution of the detected branch virtual machine instruction are associated with each other;
a second construction unit that constructs the first control flow graph based on the first virtual machine branch trace, with basic blocks as nodes and branches resulting from execution of branch virtual machine instructions as edges;
having
The vulnerability detection unit is
A mutation unit that mutates an input value;
a second execution unit that inputs the mutated input value into the analysis target script and executes the test while acquiring a virtual machine execution trace;
a third construction unit that detects a branch virtual machine instruction from the virtual machine execution trace acquired by the second execution unit and constructs a second virtual machine branch trace in which virtual program counters before and after execution of the detected branch virtual machine instruction are associated with each other;
a fourth construction unit that constructs the second control flow graph based on the second virtual machine branch trace, with basic blocks as nodes and branches resulting from execution of branch virtual machine instructions as edges;
a first calculation unit that calculates, as the code coverage, a ratio of a number of nodes and/or a number of edges of the second control flow graph to a number of nodes and/or a number of edges of the first control flow graph;
having
the vulnerability discovery unit, when the code coverage increases to a value greater than a predetermined value, selects the mutated input value, adds it to the input value, and mutates it into the mutation unit;
7. The vulnerability discovering device according to claim 6, wherein the first execution unit inputs the mutated input value into the analysis target script and executes the test.
A vulnerability discovery method executed by a vulnerability discovery device, comprising:
a first analysis step of analyzing a virtual machine of a script engine;
a second analysis step of analyzing an instruction set architecture, which is an instruction system of the virtual machine, to collect virtual machine instructions and determine the instruction contents of the collected virtual machine instructions;
a calculation step of constructing a first control flow graph showing an entire path based on architecture information acquired in the first analysis step and the second analysis step;
an execution step of mutating an input value based on architecture information acquired in the first analysis step and the second analysis step, constructing a second control flow graph showing a path executed as a test by inputting the mutated input value to an analysis target script, calculating a code coverage that is a ratio of the path executed as the test to the entire path, selecting an input value based on a calculation result of the code coverage, and performing fuzzing on the analysis target script executed on a virtual machine;
A vulnerability discovery method comprising:
a first analysis step of analyzing a virtual machine of a script engine;
a second analysis step of analyzing an instruction set architecture, which is an instruction system of the virtual machine, to collect virtual machine instructions and determine instruction contents of the collected virtual machine instructions;
a calculation step of constructing a first control flow graph showing an entire path based on architecture information acquired in the first analysis step and the second analysis step;
an execution step of mutating an input value based on architecture information acquired in the first analysis step and the second analysis step, constructing a second control flow graph showing a path executed as a test by inputting the mutated input value to an analysis target script, calculating code coverage which is the ratio of the path executed as the test to the entire path, selecting an input value based on a calculation result of the code coverage, and performing fuzzing on the analysis target script executed on a virtual machine;
A vulnerability detection program that causes a computer to run the following: