WO2024079793A1 - 脆弱性発見装置、脆弱性発見方法及び脆弱性発見プログラム - Google Patents
脆弱性発見装置、脆弱性発見方法及び脆弱性発見プログラム Download PDFInfo
- Publication number
- WO2024079793A1 WO2024079793A1 PCT/JP2022/037924 JP2022037924W WO2024079793A1 WO 2024079793 A1 WO2024079793 A1 WO 2024079793A1 JP 2022037924 W JP2022037924 W JP 2022037924W WO 2024079793 A1 WO2024079793 A1 WO 2024079793A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- instruction
- virtual machine
- execution
- unit
- branch
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/57—Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
Definitions
- the present invention relates to a vulnerability detection device, a vulnerability detection method, and a vulnerability detection program.
- Dynamic testing is a type of software testing. Software testing involves actually providing input values to the target program, running it, and observing its behavior.
- Fuzzing is a technique for discovering vulnerabilities that may exist in software. Fuzzing discovers vulnerabilities by repeatedly generating and mutating input values, running the target program, observing the program's execution state, and searching for inputs that may cause problems that could lead to vulnerabilities, such as crashes.
- script engines also known as interpreters. Because such script engines may execute untrusted scripts from outside, it is important to discover vulnerabilities before attackers do.
- the first method uses a script as a fuzzing seed, mutating it while inputting it into the script engine to search for vulnerabilities.
- the second method uses the bytecode generated from the script as a seed, and similarly inputs it into the script engine.
- Non-Patent Document 1 as the first method, a unique intermediate representation is defined to fuzz the PHP script engine.
- the PHP script is converted into this intermediate representation, mutated in that state, and then converted back into the PHP script to be used as an input value.
- Non-Patent Document 2 fuzzes JavaScript (registered trademark) engines by statically analyzing JavaScript code to obtain type information, and by performing mutations using abstract syntax trees, it is possible to perform mutations that maintain types and structures.
- Non-Patent Document 3 proposes a method of mutating the DEX bytecode to fuzz Android's ART VM.
- Non-Patent Document 4 in order to efficiently fuzz Java VMs (JVMs), when mutating Java bytecode, the type of mutation to be made is selected using Markov Chain Monte Carlo, with the uniqueness of code coverage at runtime as an indicator.
- Non-Patent Document 5 uses a mutation and selection technique to use fuzzing to discover defects hidden deeper within the JVM, with the aim of maintaining the correct bytecode that the JVM can execute when mutation occurs.
- Non-Patent Documents 1 and 2 perform fuzzing only by mutating scripts, which means there is a problem that there is a limit to the comprehensiveness of tests when viewed from the bytecode sequence.
- Non-Patent Documents 3, 4, and 5 use VM information to mutate the bytecode, which means that they cannot be applied to VMs whose internal specifications are unknown.
- the present invention has been made in consideration of the above, and aims to provide a vulnerability discovery device, a vulnerability discovery method, and a vulnerability discovery program that can realize fuzzing using bytecode as an input value, even for script engines whose VM internal specifications are unknown, without requiring individual manual analysis, design, and implementation.
- the vulnerability detection device of the present invention is characterized by having a first analysis unit that analyzes the virtual machine of a script engine, a second analysis unit that analyzes the instruction set architecture, which is the instruction system of the virtual machine, to collect virtual machine instructions and determine the instruction contents of the collected virtual machine instructions, and an execution unit that fuzzes the virtual machine using mutated code based on the architecture information obtained by the first analysis unit and the second analysis unit.
- fuzzing can be performed using bytecode as input, even for script engines whose VM internal specifications are unknown, without the need for individual manual analysis, design, and implementation.
- FIG. 1 is a diagram illustrating an example of the configuration of a script engine.
- FIG. 2 is a diagram showing pseudo code of a VM included in the script engine.
- FIG. 3 is a diagram illustrating an example of a configuration for detecting vulnerabilities according to an embodiment.
- FIG. 4 is a diagram showing an example of a test script used for detecting a virtual program counter (VPC).
- FIG. 5 is a diagram showing an example of a test script used for detecting a branch VM instruction.
- FIG. 6 is a diagram illustrating an example of an execution trace.
- FIG. 7 illustrates an example of a VM execution trace.
- FIG. 8 is a diagram illustrating the process of the VM instruction boundary detection unit.
- FIG. 9 is a diagram for explaining the process of the virtual program counter detection unit.
- FIG. 1 is a diagram illustrating an example of the configuration of a script engine.
- FIG. 2 is a diagram showing pseudo code of a VM included in the script engine.
- FIG. 10 is a diagram illustrating the process of the dispatcher detection unit.
- FIG. 11 is a diagram illustrating the process of the code cache detection unit.
- FIG. 12 is a diagram illustrating the process of the VM command determination unit.
- FIG. 13 is a diagram for explaining the process of the mutation unit.
- FIG. 14 is a flowchart illustrating a processing procedure of the analysis process according to the embodiment.
- FIG. 15 is a flowchart illustrating the procedure of the execution trace acquisition process shown in FIG.
- FIG. 16 is a flowchart illustrating a procedure of the VM instruction boundary detection process illustrated in FIG.
- FIG. 17 is a flowchart showing the processing procedure of the virtual program counter detection processing shown in FIG.
- FIG. 18 is a flowchart illustrating the procedure of the dispatcher detection process shown in FIG. FIG.
- FIG. 19 is a diagram for explaining the conditional branch flag detection process shown in FIG.
- FIG. 20 is a flowchart illustrating the processing procedure of the code cache detection processing shown in FIG.
- FIG. 21 is a flowchart illustrating the procedure of the VM execution trace acquisition process illustrated in FIG. 14 .
- FIG. 22 is a flowchart illustrating the procedure of the VM command collection process illustrated in FIG.
- FIG. 23 is a flowchart illustrating a processing procedure of the VM command determination processing shown in FIG.
- FIG. 24 is a flowchart of the branch VM instruction determination process shown in FIG.
- FIG. 25 is a flowchart illustrating the procedure of the conditional branch VM instruction determination process illustrated in FIG. FIG.
- FIG. 26 is a flowchart illustrating the procedure of the called VM instruction determination process illustrated in FIG.
- FIG. 27 is a flowchart illustrating the procedure of the return VM instruction determination process illustrated in FIG.
- FIG. 28 is a flowchart of the bytecode extraction process shown in FIG.
- FIG. 29 is a flowchart showing the procedure of the mutation process shown in FIG.
- FIG. 30 is a flowchart showing the procedure of the execution process shown in FIG.
- FIG. 31 is a diagram illustrating an example of a computer in which a program is executed to realize vulnerability discovery.
- the vulnerability detection device of the embodiment is a vulnerability detection device that can perform fuzzing using bytecode as input value, even for script engines whose VM internal specifications are unknown, without requiring individual manual analysis, design, and implementation.
- the vulnerability discovery device executes test scripts while monitoring the binaries of the script engine, and obtains branch traces and memory access traces as execution traces.
- the vulnerability discovery device analyzes the VM based on the execution traces, and obtains, as architecture information, VM instruction boundaries, a virtual program counter (VPC), a dispatcher, a conditional branch flag, and a code cache in which executed VM instructions are stored.
- VPC virtual program counter
- the vulnerability detection device executes test scripts while monitoring the VPC and dispatcher to obtain VM execution traces. By analyzing the VM execution traces, the vulnerability detection device collects VM instructions, determines the contents of the VM instructions, and obtains information on the instruction set architecture.
- the vulnerability detection device uses the mutated code to fuzz the VM based on the acquired architecture information.
- the vulnerability detection device achieves fuzzing by repeatedly extracting, mutating, embedding, and executing bytecode based on the acquired architecture information.
- the vulnerability detection device executes a seed script while monitoring the code cache to extract bytecode from the code cache, mutates the extracted bytecode, and embeds it back into the code cache to perform fuzzing.
- Figure 1 is a diagram for explaining an example of the configuration of a script engine.
- script engine 1 has a bytecode compiler 2 and a VM 3.
- bytecode compiler 2 has a syntax analysis unit 4 and a bytecode generation unit 5.
- VM 3 has a code cache unit 6, a fetch unit 7, a decode unit 8, and an execution unit 9. These fetch unit 7, decode unit 8, and execution unit 9 are executed repeatedly and are called an interpreter loop. Then, script engine 1 accepts the input of a script.
- the syntax analysis unit 4 receives the script as input, and through lexical and syntactic analysis generates an Abstract Syntax Tree (AST), which it outputs to the bytecode generation unit 5.
- the bytecode generation unit 5 receives the AST as input, converts it into bytecode, and stores it in the code cache unit 6.
- the fetch unit 7 fetches the VM opcode from the code cache unit 6 and outputs it to the decode unit 8.
- the VM opcode refers to the opcode portion of the VM instruction.
- the decode unit 8 receives the VM opcode as input, interprets the VM opcode using a decoder/dispatcher, and dispatches it to the corresponding program.
- the execution unit 9 executes the program corresponding to the VM instruction. The contents written in the script are executed by executing the VM instructions one after another through a repeated interpreter loop.
- FIG 2 is a diagram showing pseudocode for a VM in the script engine.
- the pseudocode first initializes the VPC (line 1).
- the while loop is the interpreter loop (line 2).
- the VM opcode pointed to by the VPC is obtained from the code cache (line 3), and is decoded and dispatched using a Switch statement (lines 4, 5, and 7).
- the program corresponding to the VM opcode that was dispatched is executed (lines 6 and 8).
- a branch VM command is a VM command that causes a branch to occur within a script
- a conditional branch flag is an area that holds a flag indicating whether or not a branch will be taken when a conditional branch occurs.
- Fig. 3 is a diagram illustrating an example of the configuration of the vulnerability discovering device according to the embodiment.
- the vulnerability discovery device 10 has an input unit 11, a control unit 12, a memory unit 13, and an output unit 14.
- the vulnerability discovery device 10 accepts inputs of a test script, a script engine binary, and a seed script.
- the input unit 11 is composed of input devices such as a keyboard and a mouse, and accepts information input from the outside and inputs it to the control unit 12.
- the input unit 11 also has a communication interface for sending and receiving various information to and from other devices connected via a wired connection or a network, etc., and accepts input of information sent from other devices.
- the input unit 11 accepts input of test scripts, script engine binaries, and seed scripts, and outputs them to the control unit 12.
- a test script is a script that is input when dynamically analyzing a script engine to obtain an execution trace and a VM execution trace. Details of test scripts are described later.
- a script engine binary is an executable file that constitutes a script engine.
- a script engine binary may be composed of multiple executable files.
- a seed script is a script that contains bytecode that serves as the initial input value.
- the control unit 12 has an internal memory for storing programs that define various processing procedures and the necessary data, and executes various processes using these.
- the control unit 12 is an electronic circuit such as a CPU (Central Processing Unit) or an MPU (Micro Processing Unit).
- the control unit 12 has a virtual machine analysis unit 121 (first analysis unit), an instruction set architecture analysis unit 122 (second analysis unit), and a vulnerability discovery unit 123.
- the virtual machine analysis unit 121 analyzes the VM of the script engine.
- the virtual machine analysis unit 121 obtains multiple execution traces by changing the conditions at run time, analyzes the multiple execution traces using differential execution analysis, and obtains VPCs and conditional branch flags.
- the virtual machine analysis unit 121 also analyzes the script engine binary to obtain VM instruction boundaries and dispatchers.
- the virtual machine analysis unit 121 detects a code cache from the VM execution trace.
- the VM instructions to be executed are stored in the code cache.
- the virtual machine analysis unit 121 has an execution trace acquisition unit 1211 (first acquisition unit), a VM instruction boundary detection unit 1212 (first detection unit), a virtual program counter detection unit 1213 (second detection unit), a dispatcher detection unit 1214 (third detection unit), a conditional branch flag detection unit 1215 (fourth detection unit), and a code cache detection unit 1216 (fifth detection unit).
- the execution trace acquisition unit 1211 accepts the test script and the script engine binary as input.
- the execution trace acquisition unit 1211 acquires an execution trace by executing the test script while monitoring the execution of the script engine binary.
- An execution trace consists of a branch trace and a memory access trace.
- a branch trace records the type of branch instruction at the time of execution, the branch source address, and the branch destination address.
- a memory access trace records the type of memory operation and the memory address of the operation target. It is known that branch traces and memory access traces can be acquired by instruction hooks.
- the execution trace acquired by the execution trace acquisition unit 1211 is stored in the execution trace DB 131.
- the VM instruction boundary detection unit 1212 clusters the execution traces to detect the boundaries of each VM instruction.
- the VM instruction boundary detection unit 1212 clusters the execution traces to detect clusters with a threshold or more of execution counts as VM instructions. In clustering, consecutive code regions that are executed multiple times are detected. For example, executed instructions that are close in distance to each other in the code may be grouped together, common subsequences of executed code blocks may be searched for, or other methods may be used.
- the vulnerability discovery device 10 detects the start and end points of consecutive instruction sequences that make up the detected VM instruction as boundaries.
- the VM instruction boundaries detected here are used in VPC detection and dispatcher detection.
- the virtual program counter detection unit 1213 extracts and analyzes the execution trace for the first test script stored in the execution trace DB 131 to detect the VPC.
- the virtual program counter detection unit 1213 analyzes multiple execution traces using differential execution analysis focusing on the number of memory reads and the boundaries of each VM instruction detected by the VM instruction boundary detection unit 1212 to detect the VPC.
- the virtual program counter detection unit 1213 makes use of the fact that a read into the memory that holds the VPC always occurs after the execution of each VM instruction, and detects the VPC by discovering the destination of this read.
- the virtual program counter detection unit 1213 uses differential execution analysis that focuses on the number of memory reads to detect VPCs.
- the virtual program counter detection unit 1213 compares execution traces of multiple test scripts acquired using the test scripts, and finds memories whose memory read counts change in proportion to both the increase or decrease in the number of repetitions and the number of repeated statements.
- the virtual program counter detection unit 1213 then refers to the boundaries of each VM instruction detected by the VM instruction boundary detection unit 1212, and narrows down the memory values that have been read to those that always point to the start point of the VM instruction.
- the virtual program counter detection unit 1213 detects this memory as a VPC.
- the dispatcher detection unit 1214 extracts each VM instruction portion from the script engine binary based on the boundaries of the VM instructions detected by the VM instruction boundary detection unit 1212, and detects the parts with high similarity between each VM instruction as dispatchers.
- the dispatcher is realized by referencing the pointer cache and jumping to the pointer of the next VM instruction handler.
- the dispatchers are placed in a distributed manner at the rear of each VM instruction handler, and the code therein is generally highly identical.
- the vulnerability detection device detects the dispatcher using a specified method by searching for code with high similarity that exists at the rear of such VM instruction handlers. To detect the parts with high similarity, for example, a series alignment algorithm may be used, or other methods may be used.
- the conditional branch flag detection unit 1215 extracts and analyzes the execution trace for the second test script stored in the execution trace DB 131 to discover the conditional branch flag.
- the conditional branch flag detection unit 1215 analyzes multiple execution traces using differential execution analysis that focuses on the number of times memory is read, and detects the conditional branch flag.
- the conditional branch flag detection unit 1215 executes conditional branches in various patterns, and detects the memory that stores the conditional branch flag by comparing the pattern of memory changes at that time with the conditional branch pattern in the test script.
- the code cache detection unit 1216 detects the code cache, which is a cache in which the VM instructions to be executed are stored, from the VM execution trace based on the execution trace, VPC, and VM execution trace.
- the code cache detection unit 1216 detects the memory area pointed to by the VPC as a code cache from the VM execution trace.
- the code cache detection unit 1216 detects the code location from which the memory allocation function that allocated this code cache was called from the execution trace.
- the code cache detection unit 1216 detects all memory areas allocated at this code location from the VM execution trace as code caches.
- the code cache detection unit 1216 detects code locations that are writing to the code cache from the execution trace.
- the code cache detection unit 1216 detects writing by these code locations in the VM execution trace as updates to the code cache.
- the instruction set architecture analysis unit 122 analyzes the instruction set architecture, which is the system of VM instructions.
- the instruction set architecture analysis unit 122 collects VM instructions. It determines the instruction content of the collected VM instructions.
- the instruction set architecture analysis unit 122 has a VM execution trace acquisition unit 1221 (second acquisition unit), a VM instruction collection unit 1222 (first collection unit), and a VM instruction determination unit 1223 (first determination unit).
- VM execution trace acquisition unit 1221 accepts test scripts and script engine binaries as input.
- VM execution trace acquisition unit 1221 acquires VM execution traces by monitoring VPCs and pointers to VM instruction handlers dispatched by the dispatcher.
- VM execution trace acquisition unit 1221 acquires VM execution traces, which are execution traces executed on a VM, by executing test scripts while monitoring the execution of script engine binaries.
- VM execution trace acquisition unit 1221 executes multiple test scripts to acquire VM execution traces.
- VM execution trace acquisition unit 1221 links pointers to VM instructions with VM instructions, and virtually assigns VM opcodes as identifiers to each.
- a VM execution trace is an execution trace executed in a VM, in which a VM opcode is virtually assigned as an identifier, and in which a pointer to the executed VM handler and a VPC are recorded.
- a VM execution trace is a record of a pointer to an executed VM instruction handler and a VPC.
- a VM execution trace is composed of a VPC and a VM opcode for each executed VM instruction.
- the recording of a VPC can be achieved by monitoring the memory of the VPC detected by the virtual program counter detection unit 1213.
- a VM opcode is an identifier virtually assigned to each of a pointer to a VM instruction and a VM instruction that are linked together.
- the VM execution trace acquired by the VM execution trace acquisition unit 1221 is stored in the VM execution trace DB 133.
- the VM command collection unit 1222 receives the VPC and dispatcher as input, executes the script while monitoring the VPC and dispatcher, and obtains the VM execution trace.
- the VM command collection unit 1222 collects VM commands from the VM execution trace.
- the VM instruction determination unit 1223 determines the instruction content of the VM instruction collected by the VM instruction collection unit 1222. First, the VM instruction determination unit 1223 determines whether it is a branch VM instruction based on the variation in the amount of change in VPC for each VM opcode in the VM execution trace.
- the VM instruction determination unit 1223 retrieves and analyzes the VM execution traces stored in the VM execution trace DB 133 to determine whether the VM instruction is a branch VM instruction. For each VM opcode assigned as an identifier, the VM instruction determination unit 1223 collects the amount of change in VPC before and after its execution. If the VM opcode is other than a branch VM instruction, the amount of change in VPC is almost constant. On the other hand, if the VM opcode is a branch VM instruction, the VPC varies depending on the branch destination.
- the VM instruction determination unit 1223 therefore determines whether an instruction is a branch VM instruction based on the variance in the amount of change in the virtual program counter for each VM opcode in the VM execution trace.
- the VM instruction determination unit 1223 focuses on the fact that the amount of variance in the VPC value differs between branch VM instructions and other VM instructions, determines a threshold value, and determines instructions with greater variance in the VPC value as branch VM instructions.
- the VM instruction determination unit 1223 evaluates the variance in the amount of change in the VPC for each VM opcode using variance, and determines instructions with variance equal to or greater than a certain threshold as branch VM instructions.
- the VM instruction determination unit 1223 determines which of the branch VM instructions are conditional branch VM instructions. When a conditional branch occurs, access to a conditional branch flag is always generated to determine the branch destination. Therefore, a conditional branch VM instruction can be determined by verifying whether the conditional branch flag is accessed when each branch VM instruction is executed. In other words, if the conditional branch flag is accessed when a branch VM instruction is executed, it can be determined that it is a conditional branch VM instruction, and if it is not accessed, it is not a conditional branch VM instruction. Therefore, the VM instruction determination unit 1223 determines that, among the branch VM instructions, those that involve access to a conditional branch flag are conditional branch VM instructions based on the VM execution trace and memory access trace.
- the VM instruction determination unit 1223 also determines call and return VM instructions.
- a branch caused by a call VM instruction is characterized in that the address immediately following the caller's bytecode is saved, and after the called subroutine is executed, the return VM instruction returns to that saved address.
- the VM instruction determination unit 1223 determines that the pair of instructions 1 and 2 are call and return VM instructions.
- the VM instruction determination unit 1223 determines that, among the branch VM instructions, a branch VM instruction that accesses a conditional branch flag is a conditional branch VM instruction. In this case, the VM instruction determination unit 1223 retrieves a list of opcodes of branch VM instructions from the architecture information DB 132, retrieves one VM execution trace from the VM execution trace DB 133, and retrieves the corresponding execution trace from the execution trace DB 131.
- the VM instruction determination unit 1223 extracts one location where a branch VM instruction is being executed from the VM execution trace, and extracts a memory access trace corresponding to the execution of the extracted branch VM instruction from the execution trace.
- the VM instruction determination unit 1223 determines whether the retrieved branch VM instruction is accessing a conditional branch flag based on the memory access trace. If the conditional branch flag is being accessed, the VM instruction determination unit 1223 determines that the retrieved branch VM instruction is a conditional branch VM instruction.
- the VM instruction determination unit 1223 scans the VM execution trace for any branch VM instruction, and if there is a branch instruction that branches immediately after this arbitrary branch VM instruction, it determines that this arbitrary branch VM instruction is a call VM instruction.
- the VM instruction determination unit 1223 extracts a list of opcodes of branch VM instructions from the architecture information DB 132, and extracts one VM execution trace from the VM execution trace DB 133.
- the VM instruction determination unit 1223 extracts one location where a branch VM instruction is being executed from the VM execution trace.
- the VM instruction determination unit 1223 scans the VM execution trace for branch VM instructions that appear after the fetched branch VM instruction. Then, based on the scan results, the VM instruction determination unit 1223 determines whether or not there is a branch VM instruction that branches immediately after the fetched branch VM instruction. If there is a branch VM instruction that branches immediately after the fetched branch VM instruction, the VM instruction determination unit 1223 determines that the fetched branch VM instruction is a call VM instruction.
- the VM instruction determination unit 1223 extracts a call VM instruction from the VM execution trace, and if there is a branch VM instruction that branches immediately after the extracted call VM instruction, it determines that the branch VM instruction that branches immediately after the extracted call VM instruction is a return VM instruction.
- the VM instruction determination unit 1223 extracts a list of opcodes of branch VM instructions from the architecture information DB 132, and extracts one VM execution trace from the VM execution trace DB 133.
- the VM instruction determination unit 1223 extracts one location where a calling VM instruction is being executed from the VM execution trace.
- the VM instruction determination unit 1223 scans the VM execution trace for branch VM instructions that appear after the retrieved call VM instruction, and determines whether there is a branch VM instruction that branches immediately after the retrieved call VM instruction.
- the VM instruction determination unit 1223 determines that the branch VM instruction that branches immediately after the fetched call VM instruction is a return VM instruction.
- the vulnerability discovery unit 123 fuzzes the VM using the mutated code based on the architecture information acquired by the virtual machine analysis unit 121 and the instruction set architecture analysis unit 122.
- the vulnerability discovery unit 123 executes a seed script while monitoring the code cache to extract bytecode from the code cache.
- the vulnerability discovery unit 123 then mutates the extracted bytecode and embeds it again in the code cache for execution.
- the vulnerability discovery unit 123 has a mutation unit 1231 and a fuzzing execution unit 1232.
- the mutation unit 1231 mutates the bytecode extracted by the fuzzing execution unit 1232 using a predetermined method such as adding, deleting, or changing VM instructions, and outputs the updated bytecode dictionary to the fuzzing execution unit 2132.
- the fuzzing execution unit 1232 accepts the seed script, VPC, and code cache as input. The fuzzing execution unit 1232 then executes the seed script while monitoring the code cache to extract bytecode from the code cache. The fuzzing execution unit 2132 re-embeds the mutated code at the destination pointed to by the VPC in the code cache and resumes execution. The fuzzing execution unit 1232 outputs the input value (bytecode) that caused a problem such as a crash during execution.
- the storage unit 13 is realized by a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk, and stores the processing program that operates the vulnerability detection device 10, data used during the execution of the processing program, etc.
- the storage unit 13 has an execution trace database (DB) 131, a VM execution trace DB 133, and an architecture information DB 132 that stores architecture information acquired by the virtual machine analysis unit 121 and the instruction set architecture analysis unit 122.
- DB execution trace database
- the execution trace DB 131 and the VM execution trace DB 133 store the execution traces and VM execution traces acquired by the execution trace acquisition unit 1211 and the VM execution trace acquisition unit 1221, respectively.
- the execution trace DB 131 and the VM execution trace DB 133 are managed by the vulnerability discovery device 10.
- the execution trace DB 131 and the VM execution trace DB 133 may be managed by other devices (servers, etc.), in which case the execution trace acquisition units 1211 and 1221 output the acquired execution traces and VM execution traces to management servers, etc., of the execution trace DB 131 and the VM execution trace DB 133 via the communication interface of the output unit 14, and store them in the execution trace DB 131 and the VM execution trace DB 133.
- the output unit 14 is, for example, an LCD display or a printer, and outputs various information including information related to the vulnerability detection device 10.
- the output unit 14 may also be an interface that handles the input and output of various data between an external device, and may output various information to an external device.
- test script configuration Let us now explain the test script.
- a test script is a script that is input when dynamically analyzing a script engine. This test script focuses on the number of branch instruction executions and memory reads and writes, and is used to capture the difference in the behavior of the script engine that occurs when the test script is executed a different number of times. This test script is prepared in advance of the analysis and is created manually. Creating it requires knowledge of the specifications of the target script language.
- Figure 4 shows an example of a test script (first test script) used to detect VPCs.
- the first test script uses a repetitive process (line 2).
- the first test script changes the execution conditions and generates differences by increasing or decreasing the number of repetitions (line 2) and the number of repeated statements (lines 3 to 5) in the test script.
- FIG. 5 is a diagram showing an example of a test script (second test script) used to detect branch VM instructions.
- the second test script uses multiple conditional branches (lines 4 to 8).
- the branch conditions are controlled so that the multiple conditional branches are either taken or not taken in a specific order pattern (lines 1 and 5).
- the number of conditional branches and the order pattern of branch success or failure are changed to generate differences.
- Fig. 6 is a diagram showing an example of an execution trace. As described above, an execution trace is composed of a branch trace and a memory access trace. Fig. 6 shows an excerpt of an execution trace. The structure of an execution trace will be described below with reference to Fig. 6.
- Trace indicates whether the log line is a branch trace or a memory access trace.
- a branch trace log line has the format shown, for example, in lines 1 to 10 of Figure 6, and consists of three elements: type, src, and dst.
- type indicates whether the executed branch instruction was a call instruction, a jmp instruction, or a ret instruction.
- src indicates the address of the branch source, and dst indicates the address of the branch destination.
- a log line of a memory access trace has the format shown, for example, in lines 11 to 13 of Figure 6, and consists of three elements: type, target, and value.
- Type indicates whether the memory access is a read or write.
- Target indicates the memory address that is the target of the memory access. Value stores the result of the memory access.
- Fig. 7 is a diagram showing an example of a VM execution trace.
- a VM execution trace is a record of a VM opcode and a VPC.
- Fig. 7 shows a part of a VM execution trace. The configuration of a VM execution trace will be described below with reference to Fig. 7.
- a log line of a VM execution trace is, for example, in the format shown in Figure 7, and consists of two elements: vpc and vmop (vm opcode).
- vpc indicates the value of the VPC.
- vmop indicates the value of the VM opcode that is virtually assigned to each pointer that points to the beginning of the VM instruction handler to be executed, obtained from the pointer cache.
- the VM instruction boundary detection unit 1212 detects the boundaries of each VM instruction. At this time, the VM instruction boundary detection unit 1212 detects VM instructions and their boundaries for threaded code type VMs, which do not have an interpreter loop and therefore make it difficult to grasp the boundaries of VM instructions. Specifically, the VM instruction boundary detection unit 1212 extracts execution traces from the execution trace DB 131. Then, as shown in FIG. 8, the VM instruction boundary detection unit 1212 clusters the execution traces using a predetermined method, and detects clusters with a threshold or more of execution counts as VM instructions (e.g., VM instruction handlers 1 to 3). The VM instruction boundary detection unit 1212 detects the start and end points of the consecutive instruction strings that make up a VM instruction as boundaries.
- VM instructions e.g., VM instruction handlers 1 to 3
- the virtual program counter detection unit 1213 detects the VPC and the pointer cache. The detection of the virtual program counter is realized by analyzing the log of the memory access trace of the acquired execution trace. The virtual program counter detection unit 1213 uses differential execution analysis focusing on the number of times memory is read.
- FIG. 9 is a diagram for explaining the processing of the virtual program counter detection unit 1213.
- the virtual program counter detection unit 1213 extracts one execution trace by the first test script from the execution trace DB 131.
- the number of times the VPC is read is proportional to the number of repetitions in the test script and the number of statements in the repetitive process. If the number of repetitions is N and the number of repeated statements is M, then approximately MN VPC reads will occur. For this reason, the virtual program counter detection unit 1213 extracts memory that has increased by 4MN and 9MN in the execution trace for the first test script in which N and M have been increased to 2N and 2M, respectively, and 3N and 3M. Specifically, as shown in FIG. 9, the virtual program counter detection unit 1213 extracts memory areas that have a monotonically increasing read/write for each VM instruction execution ((1) in FIG. 9).
- the virtual program counter detection unit 1213 detects as a VPC a memory value that has been read and that always points to the start point of a VM instruction. Specifically, the virtual program counter detection unit 1213 compares the VPC's pointing destination with the address of the VM instruction handler, and narrows it down to matching memory areas ((2) in FIG. 9).
- the dispatcher detection unit 1214 detects a dispatcher by analyzing the binary of the script engine using a predetermined method.
- FIG. 10 is a diagram for explaining the processing of the dispatcher detection unit 1214.
- the dispatcher detection unit 1214 detects dispatchers. Based on the boundaries of VM instructions detected by the VM instruction boundary detection unit 1212, the dispatcher detection unit 1214 extracts each VM instruction portion from the script engine binary. Then, based on the assumption that the similarity of dispatcher code is high ((1) in FIG. 10), the dispatcher detection unit 1214 calculates the similarity between the codes of each VM instruction and detects the portion with high similarity between all VM instructions as a dispatcher. The dispatcher detection unit 1214 can detect the code that is commonly executed in the latter half of the VM instructions as a dispatcher ((1) in FIG. 10).
- the code cache detection unit 1216 detects the memory area pointed to by the VPC as a code cache from the VM execution trace ((1) in FIG. 11).
- the code cache detection unit 1216 detects the code location that called the memory allocation function that allocated this code cache from the execution trace ((2) in FIG. 11).
- the code cache detection unit 1216 detects all memory areas allocated at this code location from the VM execution trace as code caches ((3) in FIG. 11).
- the code cache detection unit 1216 detects the code location that is writing to the code cache from the execution trace ((4) in FIG. 11). The code cache detection unit 1216 detects the writing by this code location in the VM execution trace as an update to the code cache ((5) in FIG. 11).
- the VM instruction determination unit 1223 first analyzes the acquired VM execution trace log to determine a branch VM instruction.
- the test script here may be any script that includes a branch VM instruction and includes a branch control syntax.
- the test script is prepared by collecting information from the Internet or obtaining information from official documents.
- the VM instruction determination unit 1223 associates a pointer to a VM instruction with a VM instruction for each VM execution trace in the VM execution trace DB 133, and virtually assigns a VM opcode as an identifier to each of them.
- Figure 12 is a diagram explaining the processing of the VM instruction determination unit 1223.
- a VM instruction is a branch instruction
- the advance of the VPC changes depending on the branch destination.
- the advance of the VPC changes depending on the size of the VM instruction. For this reason, when pairs of VM instruction opcodes and pointers to VM instructions are collected and the advance of the VPC is examined for each opcode, if it is a branch instruction, the advance of the VPC will vary depending on the branch destination.
- the VM instruction determination unit 1223 uses variance to evaluate the variance of the pointer to this VM instruction.
- the VM instruction determination unit 1223 calculates the variance of the amount of change in the VPC for each VM opcode, and narrows it down to only VM opcodes whose calculated variance is greater than a threshold. In this way, the VM instruction determination unit 1223 associates the pointer with the VM instruction, and determines that the VM instruction with variance in the advance of the VPC (VM instruction handler 3 in the example of FIG. 12) is a branch VM instruction ((1) in FIG. 12).
- the VM instruction determination unit 1223 determines whether the VM instruction is a conditional branch VM instruction, a call VM instruction, or a return VM instruction based on the branch VM instruction.
- the mutation unit 1231 refers to the list of VM instructions ((1) in FIG. 13). This list of VM instructions is collected by the VM instruction collection unit 1222. Then, one bytecode is extracted from the bytecode dictionary ((2) in FIG. 13). The mutation unit 1231 mutates the extracted bytecode in a predetermined manner, such as by adding, deleting, or changing a VM instruction ((3) in FIG. 13).
- Fig. 14 is a flowchart showing the procedure of the analysis process according to the embodiment.
- the input unit 11 receives a test script and a script engine binary as input (step S1).
- the execution trace acquisition unit 1211 performs an execution trace acquisition process in which the test script is executed while monitoring the binary of the script engine to acquire branch traces and memory access traces (step S2).
- the VM instruction boundary detection unit 1212 detects VM instructions and performs VM instruction boundary detection processing to detect VM instruction boundaries (step S3).
- the virtual program counter detection unit 1213 extracts and analyzes the execution trace for the first test script stored in the execution trace DB 131, and performs virtual program counter detection processing to discover the VPC (step S4).
- the dispatcher detection unit 1214 performs dispatcher detection processing to extract each VM command portion from the script engine binary and detect the portion with high similarity between each VM command as a dispatcher (step S5).
- the conditional branch flag detection unit 1215 performs a conditional branch detection process to extract and analyze the execution trace for the second test script stored in the execution trace DB 131 and discover the conditional branch flag (step S6).
- the code cache detection unit 1216 performs a code cache detection process based on the execution trace and VPC to detect the area of the code location from which the memory allocation function was called as a code cache, and to detect the area in which writing is being done to the code location area as an update to the code cache (step S7).
- the VM execution trace acquisition unit 1221 receives the test script and the script engine binary as input, and executes the test script while monitoring the execution of the script engine binary, thereby performing a VM execution trace acquisition process to acquire a VM execution trace (step S8).
- the VM instruction collection unit 1222 performs a VM instruction collection process to acquire VM instructions from the VM execution trace (step S9).
- the VM instruction determination unit 1223 performs a VM instruction determination process to determine the instruction content of the collected VM instructions (step S10).
- the fuzzing execution unit 1232 executes a bytecode extraction process to execute the seed script while monitoring the code cache and extract bytecode from the code cache (step S12).
- the mutation unit 1231 performs a mutation process to mutate the bytecode extracted by the fuzzing execution unit 1232 using a predetermined method such as adding, deleting, or changing a VM instruction (step S13).
- the fuzzing execution unit 1232 re-embeds the mutated code at the destination pointed to by the VPC in the code cache, and performs an execution process (step S14).
- the vulnerability discovery unit 123 determines whether or not a problem such as a crash has occurred during the execution process (step S15). If a problem such as a crash has occurred (step S15: Yes), the vulnerability discovery unit 123 outputs the input value where the problem occurred (step S16). If a problem such as a crash has not occurred (step S15: No), the process returns to the mutation process of step S13 and continues fuzzing the VM.
- FIG. 15 is a flowchart showing the processing procedure of the execution trace acquisition process shown in Fig. 14.
- the execution trace acquisition unit 1211 receives a test script and a script engine binary as input (step S21). Then, the execution trace acquisition unit 1211 hooks the received script engine to acquire a branch trace (step S22). The execution trace acquisition unit 1211 also hooks the received script engine to acquire a memory access trace (step S23).
- the execution trace acquisition unit 1211 inputs the test script received in this state into the script engine and executes it (step S24), and stores the execution trace acquired thereby in the execution trace DB 131 (step S25).
- the execution trace acquisition unit 1211 determines whether or not all of the input test scripts have been executed (step S26). If all of the input test scripts have been executed (step S26: Yes), the execution trace acquisition unit 1211 ends the process. On the other hand, if all of the input test scripts have not been executed (step S26: No), the execution trace acquisition unit 1211 returns to the execution of the test scripts in step S24 and continues the process.
- Fig. 16 is a flowchart showing the processing procedure of the VM instruction boundary detection process shown in Fig. 14.
- the VM instruction boundary detection unit 1212 extracts execution traces from the execution trace DB 131 (step S31).
- the VM instruction boundary detection unit 1212 clusters the execution traces using a predetermined method (step S32). Any method may be used for the clustering.
- the VM instruction boundary detection unit 1212 detects clusters whose execution count is equal to or exceeds a threshold as VM instructions (step S33). Then, the VM instruction boundary detection unit 1212 determines the start and end points of a sequence of consecutive instructions that constitute a VM instruction as boundaries (step S34). The VM instruction boundary detection unit 1212 outputs the VM instruction boundary as a return value (step S35), and ends the VM instruction boundary detection process.
- Fig. 17 is a flowchart showing the processing procedure of the virtual program counter detection process shown in Fig. 14.
- the virtual program counter detection unit 1213 extracts one execution trace by the first test script from the execution trace DB 131 (step S41). Next, the virtual program counter detection unit 1213 focuses on memory access traces among the execution traces, and counts up the number of reads for each memory read destination (step S42).
- the virtual program counter detection unit 1213 receives as input the first test script used to obtain the execution trace (step S43), and analyzes the first test script to obtain the number of repetitions and the number of repeated statements (step S44).
- the virtual program counter detection unit 1213 extracts from the execution trace DB 131 another execution trace by the first test script, which has a different number of repetitions and number of repeated statements (step S45). Then, the virtual program counter detection unit 1213 focuses on the memory access trace and counts the number of reads for each memory read destination (step S46). The virtual program counter detection unit 1213 also receives as input the first test script used to obtain the execution trace (step S47), analyzes the test script, and obtains the number of repetitions and the number of repeated statements (step S48).
- the virtual program counter detection unit 1213 narrows down the memory read destinations to only those whose read counts change in proportion to the number of repetitions or the increase or decrease in the number of repeated statements (step S49). Furthermore, the virtual program counter detection unit 1213 narrows down the memory read destinations narrowed down in step S49 to those whose read memory values always point to the start point of the VM instruction (step S50).
- the virtual program counter detection unit 1213 determines whether the memory read destinations have been narrowed down to only one (step S51). If the virtual program counter detection unit 1213 has not narrowed down the memory read destinations to only one (step S51: No), the process returns to step S45, where the virtual program counter detection unit 1213 retrieves the next execution trace and continues processing. On the other hand, if the virtual program counter detection unit 1213 has narrowed down the memory read destinations to only one (step S51: Yes), the virtual program counter detection unit 1213 stores the narrowed down memory read destination as a virtual program counter in the architecture information DB 132 (step S52), and ends processing.
- Fig. 18 is a flowchart showing the processing procedure of the dispatcher detection process shown in Fig. 14.
- the dispatcher detection unit 1214 receives the script engine binary as input (step S61).
- the dispatcher detection unit 1214 receives the VM command boundary from the VM command boundary detection unit 1212 (step S62).
- the dispatcher detection unit 1214 extracts each VM instruction portion from the script engine binary based on the boundaries of the VM instructions received from the VM instruction boundary detection unit 1212 (step S63).
- the dispatcher detection unit 1214 calculates the similarity between the codes of each VM instruction using a predetermined method (step S64). Any method for calculating the similarity may be used as long as it is a method that can calculate the similarity between codes.
- the dispatcher detection unit 1214 extracts the part with high similarity among all VM commands based on the similarity calculated in step S64 (step S65). The dispatcher detection unit 1214 then determines whether it is the end part of the VM command (step S66).
- step S66: No If it is not the end of the VM command (step S66: No), the dispatcher detection unit 1214 returns to step S65 and continues processing. If it is the end of the VM command (step S66: Yes), the dispatcher detection unit 1214 outputs the extracted part as a dispatcher (step S67) and ends processing.
- Fig. 19 is a flowchart showing the processing procedure of the conditional branch flag detection process shown in Fig. 14.
- conditional branch flag detection unit 1215 extracts one execution trace by the second test script from the execution trace DB 131 (step S71). Then, the conditional branch flag detection unit 1215 focuses on the memory access trace and counts the number of reads for each memory read destination (step S72).
- the conditional branch flag detection unit 1215 also receives as input the second test script used to obtain the execution trace (step S73), analyzes this second test script, and obtains the number of conditional branches and the True/False sequence pattern (step S74). The conditional branch flag detection unit 1215 then narrows down the memory read destinations to only those whose read count changes in proportion to the number of conditional branches (step S75). Furthermore, the conditional branch flag detection unit 1215 narrows down the memory read destinations to only those whose read memory value alternates between two values in accordance with the True/False sequence pattern (step S76).
- the conditional branch flag detection unit 1215 determines whether the memory read destinations have been narrowed down to only one (step S77). If the conditional branch flag detection unit 1215 has not narrowed down the memory read destinations to only one (step S77: No), it returns to step S71, retrieves the next execution trace, and continues processing. On the other hand, if the conditional branch flag detection unit 1215 has narrowed down the memory read destinations to only one (step S77: Yes), it stores the narrowed down read destination in the architecture information DB 132 as a virtual program counter (step S78), and ends processing.
- Fig. 20 is a flowchart showing the processing procedure of the code cache detection process shown in Fig. 14.
- the code cache detection unit 1216 When the code cache detection unit 1216 receives the execution trace, the VPC, and the VM execution trace as input (step S81), it acquires the memory area pointed to by the VPC from the VM execution trace (step S82). The VM execution trace is acquired by the VM execution trace acquisition unit 1221.
- the code cache detection unit 1216 obtains from the execution trace the code location that called the memory allocation function that allocated the memory area obtained in step S82 (step S83).
- the code cache detection unit 1216 detects all areas that were allocated at the code location obtained in step S83 from the VM execution trace as code caches (step S84).
- the code cache detection unit 1216 acquires the code location that is writing to the code cache from the execution trace (step S85). The code cache detection unit 1216 detects all areas in the VM execution trace that are written to at the code location acquired in step S85 as code cache updates (step S86). The code cache detection unit 1216 returns the detected code cache and its updated location (step S87), and ends the code cache detection process.
- FIG. 21 is a flowchart showing the procedure of the VM execution trace acquisition process shown in Fig. 14.
- the VM execution trace acquisition unit 1221 receives a test script and a script engine binary as input (step S91). Then, the VM execution trace acquisition unit 1221 hooks the received script engine to record the VPC and VM opcode (step S92).
- the VM execution trace acquisition unit 1221 inputs the received test script in this state into the script engine for execution (step S93), and stores the VM execution trace acquired thereby in the VM execution trace DB 133 (step S94).
- the VM execution trace acquisition unit 1221 determines whether or not all of the input test scripts have been executed (step S95). If all of the input test scripts have been executed (step S95: Yes), the VM execution trace acquisition unit 1221 ends the process. If all of the input test scripts have not been executed (step S95: No), the VM execution trace acquisition unit 1221 returns to the execution of the test scripts in step S93 and continues the process.
- Fig. 22 is a flowchart showing the procedure of the VM command collection process shown in Fig. 14.
- the VM command collection unit 1222 receives the VPC and dispatcher as input (step S101) and acquires various scripts from the Internet (step S102).
- the VM command collection unit 1222 executes the scripts while monitoring the VPC and dispatcher, and acquires a VM execution trace (step S103).
- the VM instruction collection unit 1222 acquires VM instructions from the VM execution trace (step S104) and adds them to a list of VM instructions (step S105). If the VM instruction collection unit 1222 finds a VM instruction that is not in the list (step S106: No), it returns to step S102. If the VM instruction collection unit 1222 finds no VM instructions that are not in the list (step S106: Yes), it returns the list of VM instructions (step S107) and ends the VM instruction collection process.
- Fig. 23 is a flowchart showing the processing procedure of the VM command determination process shown in Fig. 14.
- the VM instruction determination unit 1223 performs a branch VM instruction determination process to determine branch VM instructions from among the VM instructions collected by the VM instruction collection unit 1222 (step S111). The VM instruction determination unit 1223 determines that, among the branch VM instructions, a branch VM instruction that accesses a conditional branch flag is a conditional branch VM instruction (step S112).
- the VM instruction determination unit 1223 scans the VM execution trace for any branch VM instruction, and if there is a branch instruction that branches immediately after the arbitrary branch VM instruction, performs a call VM instruction determination process to determine that the arbitrary branch VM instruction is a call VM instruction (step S113).
- the VM instruction determination unit 1223 extracts the calling VM instruction from the VM execution trace, and performs a return VM instruction determination process to determine whether the extracted VM instruction is a return VM instruction (step S114).
- the VM instruction determination unit 1223 determines other VM instructions using a predetermined method based on the difference in the VM execution traces obtained using multiple test scripts in which the VM instruction is called (step S115).
- the VM command determination unit 1223 adds the determined VM command to a VM command list (step S116).
- the VM command determination unit 1223 returns the list of VM commands (step S117) and ends the VM command determination process.
- FIG. 24 is a flowchart of the branch VM instruction determination process shown in FIG.
- the VM instruction determination unit 1223 extracts one VM execution trace from the VM execution trace DB 133 (step S121).
- the VM instruction determination unit 1223 links a pointer to the VM instruction with the VM instruction, and assigns a VM opcode to each as an identifier (step S122). Then, the VM instruction determination unit 1223 counts the amount of change in VPC before and after execution for each VM opcode (step S123).
- the VM instruction determination unit 1223 determines whether all VM execution traces in the VM execution trace DB 133 have been processed (step S124). If all VM execution traces in the VM execution trace DB 133 have not been processed (step S124: No), the VM instruction determination unit 1223 returns to step S121 and retrieves and processes the next VM execution trace.
- the VM instruction determination unit 1223 calculates the variance of the amount of change in VPC for each VM opcode (step S125). Then, the VM instruction determination unit 1223 receives a threshold value as an input (step S126). The VM instruction determination unit 1223 narrows down to only VM opcodes whose variance is greater than the threshold value (step S127), stores them as branch VM instructions in the architecture information DB 132 (step S128), and ends the process.
- FIG. 25 is a flowchart illustrating the procedure of the conditional branch VM instruction determination process illustrated in FIG.
- the VM instruction determination unit 1223 extracts a list of opcodes of branch VM instructions from the architecture information DB 132 (step S131).
- the VM instruction determination unit 1223 extracts one VM execution trace from the VM execution trace DB 133 (step S132).
- the VM instruction determination unit 1223 extracts the corresponding execution trace from the execution trace DB 131 (step S133).
- the VM instruction determination unit 1223 extracts one location where a branch VM instruction is being executed from the VM execution trace (step S134).
- the VM instruction determination unit 1223 extracts a memory access trace corresponding to the execution of the extracted branch VM instruction from the execution trace (step S135).
- the VM instruction determination unit 1223 determines whether the branch VM instruction extracted in step S134 accesses a conditional branch flag based on the memory access trace (step S136).
- step S136 If the conditional branch flag is being accessed (step S136: Yes), the VM instruction determination unit 1223 determines that the branch VM instruction extracted in step S134 is a conditional branch VM instruction (step S137). The VM instruction determination unit 1223 stores the opcode of this conditional branch VM instruction in the architecture information DB 132 (step S138).
- step S136 If the conditional branch flag has not been accessed (step S136: No), or after step S138 is completed, the VM instruction determination unit 1223 determines whether all locations where the branch VM instruction is being executed have been processed (step S139).
- step S139 If not all locations where a branch VM instruction is being executed have been processed (step S139: No), the VM instruction determination unit 1223 extracts the next location where a branch VM instruction is being executed (step S140), returns to step S135, and processes the extracted branch VM instruction.
- step S139 If all locations where the branch VM instruction is executed have been processed (step S139: Yes), the VM instruction determination unit 1223 determines whether all VM execution traces have been processed (step S141).
- step S141 If not all VM execution traces have been processed (step S141: No), the VM instruction determination unit 1223 retrieves the next VM execution trace (step S142), returns to step S133, and performs processing using the retrieved VM execution trace. If all VM execution traces have been processed (step S141: Yes), the VM instruction determination unit 1223 ends the conditional branch VM instruction determination process.
- FIG. 26 is a flowchart illustrating the procedure of the called VM instruction determination process illustrated in FIG.
- the VM instruction determination unit 1223 extracts a list of opcodes of branch VM instructions from the architecture information DB 132 (step S151).
- the VM instruction determination unit 1223 extracts one VM execution trace from the VM execution trace DB 133 (step S152).
- the VM instruction determination unit 1223 extracts one location where a branch VM instruction is being executed from the VM execution trace (step S153).
- the VM instruction determination unit 1223 scans the VM execution trace for branch VM instructions that appear after the branch VM instruction retrieved in step S153 (step S154). Based on the results of the scan in step S154, the VM instruction determination unit 1223 determines whether there is a branch VM instruction that branches immediately after the branch VM instruction retrieved in step S153 (step S155).
- the VM instruction determination unit 1223 determines that the branch VM instruction retrieved in step S153 is a call VM instruction (step S156).
- the VM instruction determination unit 1223 stores the opcode of the call VM instruction in the architecture information DB 132 (step S157).
- step S158 determines whether or not all locations where the branch VM instruction is being executed have been processed.
- step S158 If not all locations where a branch VM instruction is being executed have been processed (step S158: No), the VM instruction determination unit 1223 extracts the next location where a branch VM instruction is being executed (step S159), returns to step S154, and processes the extracted branch VM instruction.
- step S158 If all locations where the branch VM instruction is being executed have been processed (step S158: Yes), the VM instruction determination unit 1223 determines whether all VM execution traces have been processed (step S160).
- step S160: No the VM instruction determination unit 1223 retrieves the next VM execution trace (step S161), returns to step S153, and performs processing using the retrieved VM execution trace. If all VM execution traces have been processed (step S160: Yes), the VM instruction determination unit 1223 ends the called VM instruction determination process.
- FIG. 27 is a flowchart illustrating the procedure of the return VM instruction determination process illustrated in FIG.
- the VM instruction determination unit 1223 extracts a list of opcodes of branch VM instructions from the architecture information DB 132 (step S171).
- the VM instruction determination unit 1223 extracts one VM execution trace from the VM execution trace DB 133 (step S172).
- the VM instruction determination unit 1223 extracts one location where a called VM instruction is being executed from the VM execution trace (step S173).
- the VM instruction determination unit 1223 scans the VM execution trace for branch VM instructions that appear after the call VM instruction extracted in step S173 (step S174).
- the VM instruction determination unit 1223 determines whether there is a branch VM instruction that branches immediately after the call VM instruction retrieved in step S173 (step S175).
- step S175 If there is a branch VM instruction that branches immediately after the call VM instruction retrieved in step S173 (step S175: Yes), the VM instruction determination unit 1223 determines that the branch VM instruction that branches immediately after the call VM instruction retrieved in step S173 is a return VM instruction (step S176). The VM instruction determination unit 1223 stores the opcode of the return VM instruction in the architecture information DB 132 (step S177).
- step S178 determines whether or not all locations where branch VM instructions are being executed have been processed.
- step S178 If not all locations where a branch VM instruction is being executed have been processed (step S178: No), the VM instruction determination unit 1223 extracts the next location where a branch VM instruction is being executed (step S179), returns to step S174, and processes the extracted branch VM instruction.
- step S178 If all locations where the branch VM instruction is being executed have been processed (step S178: Yes), the VM instruction determination unit 1223 determines whether all VM execution traces have been processed (step S180).
- step S180: No the VM command determination unit 1223 retrieves the next VM execution trace (step S181), returns to step S173, and performs processing using the retrieved VM execution trace. If all VM execution traces have been processed (step S180: Yes), the VM command determination unit 1223 ends the return VM command determination process.
- Fig. 28 is a flowchart showing the processing procedure of the bytecode extraction process shown in Fig. 14.
- the vulnerability discovery unit 123 accepts a seed script as input (step S191).
- the vulnerability discovery unit 123 accepts a code cache as input (step S192).
- the fuzzing execution unit 1232 executes the seed script while monitoring the code cache (step S193).
- the vulnerability discovery unit 123 extracts the bytecode from the code cache (step S194) and stores the bytecode in the dictionary (step S195).
- Fig. 29 is a flowchart showing the processing procedure of the mutation process shown in Fig. 14.
- the mutation unit 1231 receives as input the dictionary of bytecodes obtained in the bytecode extraction process (step S201).
- the mutation unit 1231 receives as input a list of VM instructions (step S202).
- the mutation unit 1231 retrieves one bytecode from the bytecode dictionary (step S203), and mutates the retrieved bytecode using a predetermined method, such as adding, deleting, or changing a VM instruction (step S204). The mutation unit 1231 then adds the mutated bytecode to the bytecode dictionary (step S205).
- step S206 If the mutation unit 1231 has not mutated all of the bytecodes to be mutated (step S206: No), the process returns to step S203. If the mutation unit 1231 has mutated all of the bytecodes to be mutated (step S206: Yes), the mutation unit 1231 returns the updated bytecode dictionary (step S207).
- Fig. 30 is a flowchart showing the processing procedure of the execution process shown in Fig. 14.
- the fuzzing execution unit 1232 accepts a seed script as input (step S211).
- the fuzzing execution unit 1232 accepts a VPC and a code cache as input (step S212).
- the fuzzing execution unit 1232 executes the seed script while monitoring the VPC and the code cache (step S213).
- the fuzzing execution unit 1232 stops the execution the moment the bytecode is executed (step S214). Then, the fuzzing execution unit 1232 retrieves the bytecode from the bytecode dictionary (step S215).
- the fuzzing execution unit 1232 places the retrieved bytecode at the destination pointed to by the VPC in the code cache (step S216). The fuzzing execution unit 1232 then resumes execution (step S217).
- the vulnerability discovery device 10 analyzes the VM of the script engine, collects VM instructions, and determines the contents of the collected VM instructions to obtain information on the instruction set architecture, which is the system of instructions for the VM.
- the vulnerability discovery device 10 fuzzes the VM using mutated code based on the obtained architecture information. Therefore, the vulnerability discovery device 10 can realize fuzzing using bytecode as an input value even for script engines whose VM internal specifications are unknown.
- the vulnerability discovery device 10 executes the test script while monitoring the binary of the script engine, and obtains branch traces and memory access traces as execution traces.
- the vulnerability discovery device 10 analyzes the VM based on the execution trace, and obtains architecture information of the VM instruction boundary, VPC, dispatcher, conditional branch flags, and code cache.
- the vulnerability discovery device 10 executes the test script while monitoring the VPC and dispatcher, and obtains a VM execution trace. By analyzing the VM execution trace, the vulnerability discovery device 10 collects VM instructions, determines the contents of the VM instructions, and obtains information on the instruction set architecture.
- the vulnerability detection device 10 can obtain architecture information including information indicating where in the VM the bytecode generated by the script engine is stored, and information on the instruction set architecture of the bytecode that the VM can interpret.
- the vulnerability discovery device 10 detects the code cache based on the execution trace, the VPC, and the VM execution trace.
- the vulnerability discovery device 10 executes the seed script while monitoring the code cache to extract bytecode from the code cache, mutates the extracted bytecode, and re-embeds it into the code cache for execution. Therefore, based on the acquired architecture information, the vulnerability discovery device 10 is able to extract the current bytecode from the VM of the script engine, mutate it within the correct range as a bytecode instruction, and re-embed it into the VM.
- the vulnerability discovery device 10 detects various architectural information by analyzing the execution trace and VM execution trace obtained, even for script engines whose VM internal specifications are unknown, making it possible to realize fuzzing using bytecode as input without requiring manual reverse engineering.
- the vulnerability discovery device 10 can automatically perform fuzzing using bytecode as input for a variety of script engines as long as a test script is prepared, so fuzzing using bytecode can be performed without the need for individual design or execution.
- the vulnerability discovery device 10 can analyze a script engine whose VM's internal specifications are unknown, and obtain information about the storage destination of the bytecode and the instruction set architecture, thereby enabling fuzzing of script engines of a wide variety of scripting languages using the bytecode as an input value.
- the vulnerability discovery device 10 is useful for discovering vulnerabilities in a wide variety of script engines, and is suitable for discovering vulnerabilities hidden in behavior that is difficult to extract when a script is used as the input value, by fuzzing using bytecode as the input value.
- the vulnerability discovery device 10 to perform fuzzing on various script engines using bytecode as input, it is possible to discover potential vulnerabilities and take measures such as fixing them.
- Each component of vulnerability discovery device 10 shown in Fig. 3 is a functional concept, and does not necessarily have to be physically configured as shown.
- the specific form of distribution and integration of the functions of vulnerability discovery device 10 is not limited to that shown in the figure, and all or part of it can be functionally or physically distributed or integrated in any unit depending on various loads, usage conditions, etc.
- each process performed by the vulnerability discovery device 10 may be realized, in whole or in part, by a CPU and a program analyzed and executed by the CPU. Furthermore, each process performed by the vulnerability discovery device 10 may be realized as hardware using wired logic.
- [program] 31 is a diagram showing an example of a computer in which a program is executed to realize the vulnerability detecting device 10.
- the computer 1000 has, for example, a memory 1010 and a CPU 1020.
- the computer 1000 also has a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These components are connected by a bus 1080.
- the memory 1010 includes a ROM 1011 and a RAM 1012.
- the ROM 1011 stores a boot program such as a BIOS (Basic Input Output System).
- BIOS Basic Input Output System
- the hard disk drive interface 1030 is connected to a hard disk drive 1090.
- the disk drive interface 1040 is connected to a disk drive 1100.
- a removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1100.
- the serial port interface 1050 is connected to a mouse 1110 and a keyboard 1120, for example.
- the video adapter 1060 is connected to a display 1130, for example.
- the hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, the programs that define each process of the vulnerability detection device 10 are implemented as program modules 1093 in which code executable by the computer 1000 is written.
- the program modules 1093 are stored, for example, in the hard disk drive 1090.
- a program module 1093 for executing processes similar to the functional configuration of the vulnerability detection device 10 is stored in the hard disk drive 1090.
- the hard disk drive 1090 may be replaced by an SSD (Solid State Drive).
- the setting data used in the processing of the above-mentioned embodiment is stored as program data 1094, for example, in memory 1010 or hard disk drive 1090.
- the CPU 1020 reads the program module 1093 or program data 1094 stored in memory 1010 or hard disk drive 1090 into RAM 1012 as necessary and executes it.
- the program module 1093 and program data 1094 may not necessarily be stored in the hard disk drive 1090, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1100 or the like.
- the program module 1093 and program data 1094 may be stored in another computer connected via a network (such as a LAN (Local Area Network), WAN (Wide Area Network)).
- the program module 1093 and program data 1094 may then be read by the CPU 1020 from the other computer via the network interface 1070.
- Vulnerability detection device 11 Input unit 12 Control unit 13 Memory unit 14 Output unit 121 Virtual machine analysis unit 122 Instruction set architecture analysis unit 123 Vulnerability detection unit 131 Execution trace database (DB) 132 Architecture Information DB 133 VM execution trace DB 1211 Execution trace acquisition unit 1212 VM instruction boundary detection unit 1213 Virtual program counter detection unit 1214 Dispatcher detection unit 1215 Conditional branch flag detection unit 1216 Code cache detection unit 1221 VM execution trace acquisition unit 1222 VM instruction collection unit 1223 VM instruction determination unit 1231 Mutation unit 1232 Fuzzing execution unit
Landscapes
- Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Stored Programmes (AREA)
- Debugging And Monitoring (AREA)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2022/037924 WO2024079793A1 (ja) | 2022-10-11 | 2022-10-11 | 脆弱性発見装置、脆弱性発見方法及び脆弱性発見プログラム |
| JP2024550947A JP7838662B2 (ja) | 2022-10-11 | 2022-10-11 | 脆弱性発見装置、脆弱性発見方法及び脆弱性発見プログラム |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2022/037924 WO2024079793A1 (ja) | 2022-10-11 | 2022-10-11 | 脆弱性発見装置、脆弱性発見方法及び脆弱性発見プログラム |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024079793A1 true WO2024079793A1 (ja) | 2024-04-18 |
Family
ID=90668967
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2022/037924 Ceased WO2024079793A1 (ja) | 2022-10-11 | 2022-10-11 | 脆弱性発見装置、脆弱性発見方法及び脆弱性発見プログラム |
Country Status (2)
| Country | Link |
|---|---|
| JP (1) | JP7838662B2 (https=) |
| WO (1) | WO2024079793A1 (https=) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119402400A (zh) * | 2024-11-01 | 2025-02-07 | 浙江工业大学 | 一种基于状态引导以及种子变异的网络协议模糊测试方法和装置 |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2022180702A1 (ja) * | 2021-02-24 | 2022-09-01 | 日本電信電話株式会社 | 解析機能付与装置、解析機能付与プログラム及び解析機能付与方法 |
-
2022
- 2022-10-11 JP JP2024550947A patent/JP7838662B2/ja active Active
- 2022-10-11 WO PCT/JP2022/037924 patent/WO2024079793A1/ja not_active Ceased
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2022180702A1 (ja) * | 2021-02-24 | 2022-09-01 | 日本電信電話株式会社 | 解析機能付与装置、解析機能付与プログラム及び解析機能付与方法 |
Non-Patent Citations (3)
| Title |
|---|
| KYLE STEPHEN S.KYLE@ED.AC.UK; LEATHER HUGH HLEATHER@INF.ED.AC.UK; FRANKE BJöRN BFRANKE@INF.ED.AC.UK; BUTCHER DAVE DAVE.BUTCHE: "Application of Domain-aware Binary Fuzzing to Aid Android Virtual Machine Testing", ACM SIGPLAN NOTICES, ASSOCIATION FOR COMPUTING MACHINERY, US, vol. 50, no. 7, 14 March 2015 (2015-03-14), US , pages 121 - 132, XP058493219, ISSN: 0362-1340, DOI: 10.1145/2817817.2731198 * |
| USUI, TOSHINORI; IKUSE, TOMONORI; KAWAKOYA, YUHEI; IWAMURA, MAKOTO; MATSUURA, KANTA: "Automatically Appending Execution Stall/Stop Prevention to Vanilla Script Engines", IPSJ COMPUTER SECURITY SYMPOSIUM (CSS 2021); OCTOBER 26-29, 2021, INFORMATION PROCESSING SOCIETY OF JAPAN (IPSJ), vol. 2021, 1 January 2021 (2021-01-01) - 29 October 2021 (2021-10-29), pages 794 - 801, XP009554372 * |
| YUTING CHEN ; TING SU ; ZHENDONG SU: "Deep differential testing of JVM implementations", SOFTWARE ENGINEERING, IEEE PRESS, 25 May 2019 (2019-05-25) - 31 May 2019 (2019-05-31), pages 1257 - 1268, XP058432850, DOI: 10.1109/ICSE.2019.00127 * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119402400A (zh) * | 2024-11-01 | 2025-02-07 | 浙江工业大学 | 一种基于状态引导以及种子变异的网络协议模糊测试方法和装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| JPWO2024079793A1 (https=) | 2024-04-18 |
| JP7838662B2 (ja) | 2026-04-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| He et al. | Sofi: Reflection-augmented fuzzing for javascript engines | |
| JP7517585B2 (ja) | 解析機能付与装置、解析機能付与プログラム及び解析機能付与方法 | |
| TWI553503B (zh) | 產生候選鈎點以偵測惡意程式之方法及其系統 | |
| Sun et al. | {KSG}: Augmenting kernel fuzzing with system call specification generation | |
| Cesare et al. | Classification of malware using structured control flow | |
| US11048798B2 (en) | Method for detecting libraries in program binaries | |
| JP7115552B2 (ja) | 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム | |
| Wu et al. | Evaluating and improving neural program-smoothing-based fuzzing | |
| CN111460450A (zh) | 一种基于图卷积网络的源代码漏洞检测方法 | |
| US10459704B2 (en) | Code relatives detection | |
| CN115146282A (zh) | 基于ast的源代码异常检测方法及其装置 | |
| CN112948828A (zh) | 一种二进制程序恶意代码检测方法、终端设备及存储介质 | |
| Shi et al. | {AIFORE}: Smart fuzzing based on automatic input format reverse engineering | |
| Wang et al. | Conftainter: Static taint analysis for configuration options | |
| Imtiaz et al. | Predicting vulnerability for requirements | |
| WO2023067668A1 (ja) | 解析機能付与方法、解析機能付与装置及び解析機能付与プログラム | |
| Arasteh et al. | Binhunter: A fine-grained graph representation for localizing vulnerabilities in binary executables | |
| JP7838662B2 (ja) | 脆弱性発見装置、脆弱性発見方法及び脆弱性発見プログラム | |
| Liu et al. | Vulmatch: Binary-level vulnerability detection through signature | |
| CN113626823B (zh) | 一种基于可达性分析的组件间交互威胁检测方法及装置 | |
| WO2024214263A1 (ja) | 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム | |
| CN118656835A (zh) | 缓冲区溢出漏洞检测方法、装置、存储介质及计算机设备 | |
| Wu et al. | Gnnic: Finding long-lost sibling functions with abstract similarity | |
| Assiri et al. | Software Vulnerability Fuzz Testing: A Mutation-Selection Optimization Systematic Review | |
| JP7800717B2 (ja) | 脆弱性発見装置、脆弱性発見方法及び脆弱性発見プログラム |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22962013 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024550947 Country of ref document: JP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 22962013 Country of ref document: EP Kind code of ref document: A1 |