WO2024079793A1 - Vulnerability discovery device, vulnerability discovery method, and vulnerability discovery program - Google Patents

Vulnerability discovery device, vulnerability discovery method, and vulnerability discovery program Download PDF

Info

Publication number
WO2024079793A1
WO2024079793A1 PCT/JP2022/037924 JP2022037924W WO2024079793A1 WO 2024079793 A1 WO2024079793 A1 WO 2024079793A1 JP 2022037924 W JP2022037924 W JP 2022037924W WO 2024079793 A1 WO2024079793 A1 WO 2024079793A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
virtual machine
execution
unit
branch
Prior art date
Application number
PCT/JP2022/037924
Other languages
French (fr)
Japanese (ja)
Inventor
利宣 碓井
裕平 川古谷
誠 岩村
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2022/037924 priority Critical patent/WO2024079793A1/en
Publication of WO2024079793A1 publication Critical patent/WO2024079793A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities

Definitions

  • the present invention relates to a vulnerability detection device, a vulnerability detection method, and a vulnerability detection program.
  • Dynamic testing is a type of software testing. Software testing involves actually providing input values to the target program, running it, and observing its behavior.
  • Fuzzing is a technique for discovering vulnerabilities that may exist in software. Fuzzing discovers vulnerabilities by repeatedly generating and mutating input values, running the target program, observing the program's execution state, and searching for inputs that may cause problems that could lead to vulnerabilities, such as crashes.
  • script engines also known as interpreters. Because such script engines may execute untrusted scripts from outside, it is important to discover vulnerabilities before attackers do.
  • the first method uses a script as a fuzzing seed, mutating it while inputting it into the script engine to search for vulnerabilities.
  • the second method uses the bytecode generated from the script as a seed, and similarly inputs it into the script engine.
  • Non-Patent Document 1 as the first method, a unique intermediate representation is defined to fuzz the PHP script engine.
  • the PHP script is converted into this intermediate representation, mutated in that state, and then converted back into the PHP script to be used as an input value.
  • Non-Patent Document 2 fuzzes JavaScript (registered trademark) engines by statically analyzing JavaScript code to obtain type information, and by performing mutations using abstract syntax trees, it is possible to perform mutations that maintain types and structures.
  • Non-Patent Document 3 proposes a method of mutating the DEX bytecode to fuzz Android's ART VM.
  • Non-Patent Document 4 in order to efficiently fuzz Java VMs (JVMs), when mutating Java bytecode, the type of mutation to be made is selected using Markov Chain Monte Carlo, with the uniqueness of code coverage at runtime as an indicator.
  • Non-Patent Document 5 uses a mutation and selection technique to use fuzzing to discover defects hidden deeper within the JVM, with the aim of maintaining the correct bytecode that the JVM can execute when mutation occurs.
  • Non-Patent Documents 1 and 2 perform fuzzing only by mutating scripts, which means there is a problem that there is a limit to the comprehensiveness of tests when viewed from the bytecode sequence.
  • Non-Patent Documents 3, 4, and 5 use VM information to mutate the bytecode, which means that they cannot be applied to VMs whose internal specifications are unknown.
  • the present invention has been made in consideration of the above, and aims to provide a vulnerability discovery device, a vulnerability discovery method, and a vulnerability discovery program that can realize fuzzing using bytecode as an input value, even for script engines whose VM internal specifications are unknown, without requiring individual manual analysis, design, and implementation.
  • the vulnerability detection device of the present invention is characterized by having a first analysis unit that analyzes the virtual machine of a script engine, a second analysis unit that analyzes the instruction set architecture, which is the instruction system of the virtual machine, to collect virtual machine instructions and determine the instruction contents of the collected virtual machine instructions, and an execution unit that fuzzes the virtual machine using mutated code based on the architecture information obtained by the first analysis unit and the second analysis unit.
  • fuzzing can be performed using bytecode as input, even for script engines whose VM internal specifications are unknown, without the need for individual manual analysis, design, and implementation.
  • FIG. 1 is a diagram illustrating an example of the configuration of a script engine.
  • FIG. 2 is a diagram showing pseudo code of a VM included in the script engine.
  • FIG. 3 is a diagram illustrating an example of a configuration for detecting vulnerabilities according to an embodiment.
  • FIG. 4 is a diagram showing an example of a test script used for detecting a virtual program counter (VPC).
  • FIG. 5 is a diagram showing an example of a test script used for detecting a branch VM instruction.
  • FIG. 6 is a diagram illustrating an example of an execution trace.
  • FIG. 7 illustrates an example of a VM execution trace.
  • FIG. 8 is a diagram illustrating the process of the VM instruction boundary detection unit.
  • FIG. 9 is a diagram for explaining the process of the virtual program counter detection unit.
  • FIG. 1 is a diagram illustrating an example of the configuration of a script engine.
  • FIG. 2 is a diagram showing pseudo code of a VM included in the script engine.
  • FIG. 10 is a diagram illustrating the process of the dispatcher detection unit.
  • FIG. 11 is a diagram illustrating the process of the code cache detection unit.
  • FIG. 12 is a diagram illustrating the process of the VM command determination unit.
  • FIG. 13 is a diagram for explaining the process of the mutation unit.
  • FIG. 14 is a flowchart illustrating a processing procedure of the analysis process according to the embodiment.
  • FIG. 15 is a flowchart illustrating the procedure of the execution trace acquisition process shown in FIG.
  • FIG. 16 is a flowchart illustrating a procedure of the VM instruction boundary detection process illustrated in FIG.
  • FIG. 17 is a flowchart showing the processing procedure of the virtual program counter detection processing shown in FIG.
  • FIG. 18 is a flowchart illustrating the procedure of the dispatcher detection process shown in FIG. FIG.
  • FIG. 19 is a diagram for explaining the conditional branch flag detection process shown in FIG.
  • FIG. 20 is a flowchart illustrating the processing procedure of the code cache detection processing shown in FIG.
  • FIG. 21 is a flowchart illustrating the procedure of the VM execution trace acquisition process illustrated in FIG. 14 .
  • FIG. 22 is a flowchart illustrating the procedure of the VM command collection process illustrated in FIG.
  • FIG. 23 is a flowchart illustrating a processing procedure of the VM command determination processing shown in FIG.
  • FIG. 24 is a flowchart of the branch VM instruction determination process shown in FIG.
  • FIG. 25 is a flowchart illustrating the procedure of the conditional branch VM instruction determination process illustrated in FIG. FIG.
  • FIG. 26 is a flowchart illustrating the procedure of the called VM instruction determination process illustrated in FIG.
  • FIG. 27 is a flowchart illustrating the procedure of the return VM instruction determination process illustrated in FIG.
  • FIG. 28 is a flowchart of the bytecode extraction process shown in FIG.
  • FIG. 29 is a flowchart showing the procedure of the mutation process shown in FIG.
  • FIG. 30 is a flowchart showing the procedure of the execution process shown in FIG.
  • FIG. 31 is a diagram illustrating an example of a computer in which a program is executed to realize vulnerability discovery.
  • the vulnerability detection device of the embodiment is a vulnerability detection device that can perform fuzzing using bytecode as input value, even for script engines whose VM internal specifications are unknown, without requiring individual manual analysis, design, and implementation.
  • the vulnerability discovery device executes test scripts while monitoring the binaries of the script engine, and obtains branch traces and memory access traces as execution traces.
  • the vulnerability discovery device analyzes the VM based on the execution traces, and obtains, as architecture information, VM instruction boundaries, a virtual program counter (VPC), a dispatcher, a conditional branch flag, and a code cache in which executed VM instructions are stored.
  • VPC virtual program counter
  • the vulnerability detection device executes test scripts while monitoring the VPC and dispatcher to obtain VM execution traces. By analyzing the VM execution traces, the vulnerability detection device collects VM instructions, determines the contents of the VM instructions, and obtains information on the instruction set architecture.
  • the vulnerability detection device uses the mutated code to fuzz the VM based on the acquired architecture information.
  • the vulnerability detection device achieves fuzzing by repeatedly extracting, mutating, embedding, and executing bytecode based on the acquired architecture information.
  • the vulnerability detection device executes a seed script while monitoring the code cache to extract bytecode from the code cache, mutates the extracted bytecode, and embeds it back into the code cache to perform fuzzing.
  • Figure 1 is a diagram for explaining an example of the configuration of a script engine.
  • script engine 1 has a bytecode compiler 2 and a VM 3.
  • bytecode compiler 2 has a syntax analysis unit 4 and a bytecode generation unit 5.
  • VM 3 has a code cache unit 6, a fetch unit 7, a decode unit 8, and an execution unit 9. These fetch unit 7, decode unit 8, and execution unit 9 are executed repeatedly and are called an interpreter loop. Then, script engine 1 accepts the input of a script.
  • the syntax analysis unit 4 receives the script as input, and through lexical and syntactic analysis generates an Abstract Syntax Tree (AST), which it outputs to the bytecode generation unit 5.
  • the bytecode generation unit 5 receives the AST as input, converts it into bytecode, and stores it in the code cache unit 6.
  • the fetch unit 7 fetches the VM opcode from the code cache unit 6 and outputs it to the decode unit 8.
  • the VM opcode refers to the opcode portion of the VM instruction.
  • the decode unit 8 receives the VM opcode as input, interprets the VM opcode using a decoder/dispatcher, and dispatches it to the corresponding program.
  • the execution unit 9 executes the program corresponding to the VM instruction. The contents written in the script are executed by executing the VM instructions one after another through a repeated interpreter loop.
  • FIG 2 is a diagram showing pseudocode for a VM in the script engine.
  • the pseudocode first initializes the VPC (line 1).
  • the while loop is the interpreter loop (line 2).
  • the VM opcode pointed to by the VPC is obtained from the code cache (line 3), and is decoded and dispatched using a Switch statement (lines 4, 5, and 7).
  • the program corresponding to the VM opcode that was dispatched is executed (lines 6 and 8).
  • a branch VM command is a VM command that causes a branch to occur within a script
  • a conditional branch flag is an area that holds a flag indicating whether or not a branch will be taken when a conditional branch occurs.
  • Fig. 3 is a diagram illustrating an example of the configuration of the vulnerability discovering device according to the embodiment.
  • the vulnerability discovery device 10 has an input unit 11, a control unit 12, a memory unit 13, and an output unit 14.
  • the vulnerability discovery device 10 accepts inputs of a test script, a script engine binary, and a seed script.
  • the input unit 11 is composed of input devices such as a keyboard and a mouse, and accepts information input from the outside and inputs it to the control unit 12.
  • the input unit 11 also has a communication interface for sending and receiving various information to and from other devices connected via a wired connection or a network, etc., and accepts input of information sent from other devices.
  • the input unit 11 accepts input of test scripts, script engine binaries, and seed scripts, and outputs them to the control unit 12.
  • a test script is a script that is input when dynamically analyzing a script engine to obtain an execution trace and a VM execution trace. Details of test scripts are described later.
  • a script engine binary is an executable file that constitutes a script engine.
  • a script engine binary may be composed of multiple executable files.
  • a seed script is a script that contains bytecode that serves as the initial input value.
  • the control unit 12 has an internal memory for storing programs that define various processing procedures and the necessary data, and executes various processes using these.
  • the control unit 12 is an electronic circuit such as a CPU (Central Processing Unit) or an MPU (Micro Processing Unit).
  • the control unit 12 has a virtual machine analysis unit 121 (first analysis unit), an instruction set architecture analysis unit 122 (second analysis unit), and a vulnerability discovery unit 123.
  • the virtual machine analysis unit 121 analyzes the VM of the script engine.
  • the virtual machine analysis unit 121 obtains multiple execution traces by changing the conditions at run time, analyzes the multiple execution traces using differential execution analysis, and obtains VPCs and conditional branch flags.
  • the virtual machine analysis unit 121 also analyzes the script engine binary to obtain VM instruction boundaries and dispatchers.
  • the virtual machine analysis unit 121 detects a code cache from the VM execution trace.
  • the VM instructions to be executed are stored in the code cache.
  • the virtual machine analysis unit 121 has an execution trace acquisition unit 1211 (first acquisition unit), a VM instruction boundary detection unit 1212 (first detection unit), a virtual program counter detection unit 1213 (second detection unit), a dispatcher detection unit 1214 (third detection unit), a conditional branch flag detection unit 1215 (fourth detection unit), and a code cache detection unit 1216 (fifth detection unit).
  • the execution trace acquisition unit 1211 accepts the test script and the script engine binary as input.
  • the execution trace acquisition unit 1211 acquires an execution trace by executing the test script while monitoring the execution of the script engine binary.
  • An execution trace consists of a branch trace and a memory access trace.
  • a branch trace records the type of branch instruction at the time of execution, the branch source address, and the branch destination address.
  • a memory access trace records the type of memory operation and the memory address of the operation target. It is known that branch traces and memory access traces can be acquired by instruction hooks.
  • the execution trace acquired by the execution trace acquisition unit 1211 is stored in the execution trace DB 131.
  • the VM instruction boundary detection unit 1212 clusters the execution traces to detect the boundaries of each VM instruction.
  • the VM instruction boundary detection unit 1212 clusters the execution traces to detect clusters with a threshold or more of execution counts as VM instructions. In clustering, consecutive code regions that are executed multiple times are detected. For example, executed instructions that are close in distance to each other in the code may be grouped together, common subsequences of executed code blocks may be searched for, or other methods may be used.
  • the vulnerability discovery device 10 detects the start and end points of consecutive instruction sequences that make up the detected VM instruction as boundaries.
  • the VM instruction boundaries detected here are used in VPC detection and dispatcher detection.
  • the virtual program counter detection unit 1213 extracts and analyzes the execution trace for the first test script stored in the execution trace DB 131 to detect the VPC.
  • the virtual program counter detection unit 1213 analyzes multiple execution traces using differential execution analysis focusing on the number of memory reads and the boundaries of each VM instruction detected by the VM instruction boundary detection unit 1212 to detect the VPC.
  • the virtual program counter detection unit 1213 makes use of the fact that a read into the memory that holds the VPC always occurs after the execution of each VM instruction, and detects the VPC by discovering the destination of this read.
  • the virtual program counter detection unit 1213 uses differential execution analysis that focuses on the number of memory reads to detect VPCs.
  • the virtual program counter detection unit 1213 compares execution traces of multiple test scripts acquired using the test scripts, and finds memories whose memory read counts change in proportion to both the increase or decrease in the number of repetitions and the number of repeated statements.
  • the virtual program counter detection unit 1213 then refers to the boundaries of each VM instruction detected by the VM instruction boundary detection unit 1212, and narrows down the memory values that have been read to those that always point to the start point of the VM instruction.
  • the virtual program counter detection unit 1213 detects this memory as a VPC.
  • the dispatcher detection unit 1214 extracts each VM instruction portion from the script engine binary based on the boundaries of the VM instructions detected by the VM instruction boundary detection unit 1212, and detects the parts with high similarity between each VM instruction as dispatchers.
  • the dispatcher is realized by referencing the pointer cache and jumping to the pointer of the next VM instruction handler.
  • the dispatchers are placed in a distributed manner at the rear of each VM instruction handler, and the code therein is generally highly identical.
  • the vulnerability detection device detects the dispatcher using a specified method by searching for code with high similarity that exists at the rear of such VM instruction handlers. To detect the parts with high similarity, for example, a series alignment algorithm may be used, or other methods may be used.
  • the conditional branch flag detection unit 1215 extracts and analyzes the execution trace for the second test script stored in the execution trace DB 131 to discover the conditional branch flag.
  • the conditional branch flag detection unit 1215 analyzes multiple execution traces using differential execution analysis that focuses on the number of times memory is read, and detects the conditional branch flag.
  • the conditional branch flag detection unit 1215 executes conditional branches in various patterns, and detects the memory that stores the conditional branch flag by comparing the pattern of memory changes at that time with the conditional branch pattern in the test script.
  • the code cache detection unit 1216 detects the code cache, which is a cache in which the VM instructions to be executed are stored, from the VM execution trace based on the execution trace, VPC, and VM execution trace.
  • the code cache detection unit 1216 detects the memory area pointed to by the VPC as a code cache from the VM execution trace.
  • the code cache detection unit 1216 detects the code location from which the memory allocation function that allocated this code cache was called from the execution trace.
  • the code cache detection unit 1216 detects all memory areas allocated at this code location from the VM execution trace as code caches.
  • the code cache detection unit 1216 detects code locations that are writing to the code cache from the execution trace.
  • the code cache detection unit 1216 detects writing by these code locations in the VM execution trace as updates to the code cache.
  • the instruction set architecture analysis unit 122 analyzes the instruction set architecture, which is the system of VM instructions.
  • the instruction set architecture analysis unit 122 collects VM instructions. It determines the instruction content of the collected VM instructions.
  • the instruction set architecture analysis unit 122 has a VM execution trace acquisition unit 1221 (second acquisition unit), a VM instruction collection unit 1222 (first collection unit), and a VM instruction determination unit 1223 (first determination unit).
  • VM execution trace acquisition unit 1221 accepts test scripts and script engine binaries as input.
  • VM execution trace acquisition unit 1221 acquires VM execution traces by monitoring VPCs and pointers to VM instruction handlers dispatched by the dispatcher.
  • VM execution trace acquisition unit 1221 acquires VM execution traces, which are execution traces executed on a VM, by executing test scripts while monitoring the execution of script engine binaries.
  • VM execution trace acquisition unit 1221 executes multiple test scripts to acquire VM execution traces.
  • VM execution trace acquisition unit 1221 links pointers to VM instructions with VM instructions, and virtually assigns VM opcodes as identifiers to each.
  • a VM execution trace is an execution trace executed in a VM, in which a VM opcode is virtually assigned as an identifier, and in which a pointer to the executed VM handler and a VPC are recorded.
  • a VM execution trace is a record of a pointer to an executed VM instruction handler and a VPC.
  • a VM execution trace is composed of a VPC and a VM opcode for each executed VM instruction.
  • the recording of a VPC can be achieved by monitoring the memory of the VPC detected by the virtual program counter detection unit 1213.
  • a VM opcode is an identifier virtually assigned to each of a pointer to a VM instruction and a VM instruction that are linked together.
  • the VM execution trace acquired by the VM execution trace acquisition unit 1221 is stored in the VM execution trace DB 133.
  • the VM command collection unit 1222 receives the VPC and dispatcher as input, executes the script while monitoring the VPC and dispatcher, and obtains the VM execution trace.
  • the VM command collection unit 1222 collects VM commands from the VM execution trace.
  • the VM instruction determination unit 1223 determines the instruction content of the VM instruction collected by the VM instruction collection unit 1222. First, the VM instruction determination unit 1223 determines whether it is a branch VM instruction based on the variation in the amount of change in VPC for each VM opcode in the VM execution trace.
  • the VM instruction determination unit 1223 retrieves and analyzes the VM execution traces stored in the VM execution trace DB 133 to determine whether the VM instruction is a branch VM instruction. For each VM opcode assigned as an identifier, the VM instruction determination unit 1223 collects the amount of change in VPC before and after its execution. If the VM opcode is other than a branch VM instruction, the amount of change in VPC is almost constant. On the other hand, if the VM opcode is a branch VM instruction, the VPC varies depending on the branch destination.
  • the VM instruction determination unit 1223 therefore determines whether an instruction is a branch VM instruction based on the variance in the amount of change in the virtual program counter for each VM opcode in the VM execution trace.
  • the VM instruction determination unit 1223 focuses on the fact that the amount of variance in the VPC value differs between branch VM instructions and other VM instructions, determines a threshold value, and determines instructions with greater variance in the VPC value as branch VM instructions.
  • the VM instruction determination unit 1223 evaluates the variance in the amount of change in the VPC for each VM opcode using variance, and determines instructions with variance equal to or greater than a certain threshold as branch VM instructions.
  • the VM instruction determination unit 1223 determines which of the branch VM instructions are conditional branch VM instructions. When a conditional branch occurs, access to a conditional branch flag is always generated to determine the branch destination. Therefore, a conditional branch VM instruction can be determined by verifying whether the conditional branch flag is accessed when each branch VM instruction is executed. In other words, if the conditional branch flag is accessed when a branch VM instruction is executed, it can be determined that it is a conditional branch VM instruction, and if it is not accessed, it is not a conditional branch VM instruction. Therefore, the VM instruction determination unit 1223 determines that, among the branch VM instructions, those that involve access to a conditional branch flag are conditional branch VM instructions based on the VM execution trace and memory access trace.
  • the VM instruction determination unit 1223 also determines call and return VM instructions.
  • a branch caused by a call VM instruction is characterized in that the address immediately following the caller's bytecode is saved, and after the called subroutine is executed, the return VM instruction returns to that saved address.
  • the VM instruction determination unit 1223 determines that the pair of instructions 1 and 2 are call and return VM instructions.
  • the VM instruction determination unit 1223 determines that, among the branch VM instructions, a branch VM instruction that accesses a conditional branch flag is a conditional branch VM instruction. In this case, the VM instruction determination unit 1223 retrieves a list of opcodes of branch VM instructions from the architecture information DB 132, retrieves one VM execution trace from the VM execution trace DB 133, and retrieves the corresponding execution trace from the execution trace DB 131.
  • the VM instruction determination unit 1223 extracts one location where a branch VM instruction is being executed from the VM execution trace, and extracts a memory access trace corresponding to the execution of the extracted branch VM instruction from the execution trace.
  • the VM instruction determination unit 1223 determines whether the retrieved branch VM instruction is accessing a conditional branch flag based on the memory access trace. If the conditional branch flag is being accessed, the VM instruction determination unit 1223 determines that the retrieved branch VM instruction is a conditional branch VM instruction.
  • the VM instruction determination unit 1223 scans the VM execution trace for any branch VM instruction, and if there is a branch instruction that branches immediately after this arbitrary branch VM instruction, it determines that this arbitrary branch VM instruction is a call VM instruction.
  • the VM instruction determination unit 1223 extracts a list of opcodes of branch VM instructions from the architecture information DB 132, and extracts one VM execution trace from the VM execution trace DB 133.
  • the VM instruction determination unit 1223 extracts one location where a branch VM instruction is being executed from the VM execution trace.
  • the VM instruction determination unit 1223 scans the VM execution trace for branch VM instructions that appear after the fetched branch VM instruction. Then, based on the scan results, the VM instruction determination unit 1223 determines whether or not there is a branch VM instruction that branches immediately after the fetched branch VM instruction. If there is a branch VM instruction that branches immediately after the fetched branch VM instruction, the VM instruction determination unit 1223 determines that the fetched branch VM instruction is a call VM instruction.
  • the VM instruction determination unit 1223 extracts a call VM instruction from the VM execution trace, and if there is a branch VM instruction that branches immediately after the extracted call VM instruction, it determines that the branch VM instruction that branches immediately after the extracted call VM instruction is a return VM instruction.
  • the VM instruction determination unit 1223 extracts a list of opcodes of branch VM instructions from the architecture information DB 132, and extracts one VM execution trace from the VM execution trace DB 133.
  • the VM instruction determination unit 1223 extracts one location where a calling VM instruction is being executed from the VM execution trace.
  • the VM instruction determination unit 1223 scans the VM execution trace for branch VM instructions that appear after the retrieved call VM instruction, and determines whether there is a branch VM instruction that branches immediately after the retrieved call VM instruction.
  • the VM instruction determination unit 1223 determines that the branch VM instruction that branches immediately after the fetched call VM instruction is a return VM instruction.
  • the vulnerability discovery unit 123 fuzzes the VM using the mutated code based on the architecture information acquired by the virtual machine analysis unit 121 and the instruction set architecture analysis unit 122.
  • the vulnerability discovery unit 123 executes a seed script while monitoring the code cache to extract bytecode from the code cache.
  • the vulnerability discovery unit 123 then mutates the extracted bytecode and embeds it again in the code cache for execution.
  • the vulnerability discovery unit 123 has a mutation unit 1231 and a fuzzing execution unit 1232.
  • the mutation unit 1231 mutates the bytecode extracted by the fuzzing execution unit 1232 using a predetermined method such as adding, deleting, or changing VM instructions, and outputs the updated bytecode dictionary to the fuzzing execution unit 2132.
  • the fuzzing execution unit 1232 accepts the seed script, VPC, and code cache as input. The fuzzing execution unit 1232 then executes the seed script while monitoring the code cache to extract bytecode from the code cache. The fuzzing execution unit 2132 re-embeds the mutated code at the destination pointed to by the VPC in the code cache and resumes execution. The fuzzing execution unit 1232 outputs the input value (bytecode) that caused a problem such as a crash during execution.
  • the storage unit 13 is realized by a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk, and stores the processing program that operates the vulnerability detection device 10, data used during the execution of the processing program, etc.
  • the storage unit 13 has an execution trace database (DB) 131, a VM execution trace DB 133, and an architecture information DB 132 that stores architecture information acquired by the virtual machine analysis unit 121 and the instruction set architecture analysis unit 122.
  • DB execution trace database
  • the execution trace DB 131 and the VM execution trace DB 133 store the execution traces and VM execution traces acquired by the execution trace acquisition unit 1211 and the VM execution trace acquisition unit 1221, respectively.
  • the execution trace DB 131 and the VM execution trace DB 133 are managed by the vulnerability discovery device 10.
  • the execution trace DB 131 and the VM execution trace DB 133 may be managed by other devices (servers, etc.), in which case the execution trace acquisition units 1211 and 1221 output the acquired execution traces and VM execution traces to management servers, etc., of the execution trace DB 131 and the VM execution trace DB 133 via the communication interface of the output unit 14, and store them in the execution trace DB 131 and the VM execution trace DB 133.
  • the output unit 14 is, for example, an LCD display or a printer, and outputs various information including information related to the vulnerability detection device 10.
  • the output unit 14 may also be an interface that handles the input and output of various data between an external device, and may output various information to an external device.
  • test script configuration Let us now explain the test script.
  • a test script is a script that is input when dynamically analyzing a script engine. This test script focuses on the number of branch instruction executions and memory reads and writes, and is used to capture the difference in the behavior of the script engine that occurs when the test script is executed a different number of times. This test script is prepared in advance of the analysis and is created manually. Creating it requires knowledge of the specifications of the target script language.
  • Figure 4 shows an example of a test script (first test script) used to detect VPCs.
  • the first test script uses a repetitive process (line 2).
  • the first test script changes the execution conditions and generates differences by increasing or decreasing the number of repetitions (line 2) and the number of repeated statements (lines 3 to 5) in the test script.
  • FIG. 5 is a diagram showing an example of a test script (second test script) used to detect branch VM instructions.
  • the second test script uses multiple conditional branches (lines 4 to 8).
  • the branch conditions are controlled so that the multiple conditional branches are either taken or not taken in a specific order pattern (lines 1 and 5).
  • the number of conditional branches and the order pattern of branch success or failure are changed to generate differences.
  • Fig. 6 is a diagram showing an example of an execution trace. As described above, an execution trace is composed of a branch trace and a memory access trace. Fig. 6 shows an excerpt of an execution trace. The structure of an execution trace will be described below with reference to Fig. 6.
  • Trace indicates whether the log line is a branch trace or a memory access trace.
  • a branch trace log line has the format shown, for example, in lines 1 to 10 of Figure 6, and consists of three elements: type, src, and dst.
  • type indicates whether the executed branch instruction was a call instruction, a jmp instruction, or a ret instruction.
  • src indicates the address of the branch source, and dst indicates the address of the branch destination.
  • a log line of a memory access trace has the format shown, for example, in lines 11 to 13 of Figure 6, and consists of three elements: type, target, and value.
  • Type indicates whether the memory access is a read or write.
  • Target indicates the memory address that is the target of the memory access. Value stores the result of the memory access.
  • Fig. 7 is a diagram showing an example of a VM execution trace.
  • a VM execution trace is a record of a VM opcode and a VPC.
  • Fig. 7 shows a part of a VM execution trace. The configuration of a VM execution trace will be described below with reference to Fig. 7.
  • a log line of a VM execution trace is, for example, in the format shown in Figure 7, and consists of two elements: vpc and vmop (vm opcode).
  • vpc indicates the value of the VPC.
  • vmop indicates the value of the VM opcode that is virtually assigned to each pointer that points to the beginning of the VM instruction handler to be executed, obtained from the pointer cache.
  • the VM instruction boundary detection unit 1212 detects the boundaries of each VM instruction. At this time, the VM instruction boundary detection unit 1212 detects VM instructions and their boundaries for threaded code type VMs, which do not have an interpreter loop and therefore make it difficult to grasp the boundaries of VM instructions. Specifically, the VM instruction boundary detection unit 1212 extracts execution traces from the execution trace DB 131. Then, as shown in FIG. 8, the VM instruction boundary detection unit 1212 clusters the execution traces using a predetermined method, and detects clusters with a threshold or more of execution counts as VM instructions (e.g., VM instruction handlers 1 to 3). The VM instruction boundary detection unit 1212 detects the start and end points of the consecutive instruction strings that make up a VM instruction as boundaries.
  • VM instructions e.g., VM instruction handlers 1 to 3
  • the virtual program counter detection unit 1213 detects the VPC and the pointer cache. The detection of the virtual program counter is realized by analyzing the log of the memory access trace of the acquired execution trace. The virtual program counter detection unit 1213 uses differential execution analysis focusing on the number of times memory is read.
  • FIG. 9 is a diagram for explaining the processing of the virtual program counter detection unit 1213.
  • the virtual program counter detection unit 1213 extracts one execution trace by the first test script from the execution trace DB 131.
  • the number of times the VPC is read is proportional to the number of repetitions in the test script and the number of statements in the repetitive process. If the number of repetitions is N and the number of repeated statements is M, then approximately MN VPC reads will occur. For this reason, the virtual program counter detection unit 1213 extracts memory that has increased by 4MN and 9MN in the execution trace for the first test script in which N and M have been increased to 2N and 2M, respectively, and 3N and 3M. Specifically, as shown in FIG. 9, the virtual program counter detection unit 1213 extracts memory areas that have a monotonically increasing read/write for each VM instruction execution ((1) in FIG. 9).
  • the virtual program counter detection unit 1213 detects as a VPC a memory value that has been read and that always points to the start point of a VM instruction. Specifically, the virtual program counter detection unit 1213 compares the VPC's pointing destination with the address of the VM instruction handler, and narrows it down to matching memory areas ((2) in FIG. 9).
  • the dispatcher detection unit 1214 detects a dispatcher by analyzing the binary of the script engine using a predetermined method.
  • FIG. 10 is a diagram for explaining the processing of the dispatcher detection unit 1214.
  • the dispatcher detection unit 1214 detects dispatchers. Based on the boundaries of VM instructions detected by the VM instruction boundary detection unit 1212, the dispatcher detection unit 1214 extracts each VM instruction portion from the script engine binary. Then, based on the assumption that the similarity of dispatcher code is high ((1) in FIG. 10), the dispatcher detection unit 1214 calculates the similarity between the codes of each VM instruction and detects the portion with high similarity between all VM instructions as a dispatcher. The dispatcher detection unit 1214 can detect the code that is commonly executed in the latter half of the VM instructions as a dispatcher ((1) in FIG. 10).
  • the code cache detection unit 1216 detects the memory area pointed to by the VPC as a code cache from the VM execution trace ((1) in FIG. 11).
  • the code cache detection unit 1216 detects the code location that called the memory allocation function that allocated this code cache from the execution trace ((2) in FIG. 11).
  • the code cache detection unit 1216 detects all memory areas allocated at this code location from the VM execution trace as code caches ((3) in FIG. 11).
  • the code cache detection unit 1216 detects the code location that is writing to the code cache from the execution trace ((4) in FIG. 11). The code cache detection unit 1216 detects the writing by this code location in the VM execution trace as an update to the code cache ((5) in FIG. 11).
  • the VM instruction determination unit 1223 first analyzes the acquired VM execution trace log to determine a branch VM instruction.
  • the test script here may be any script that includes a branch VM instruction and includes a branch control syntax.
  • the test script is prepared by collecting information from the Internet or obtaining information from official documents.
  • the VM instruction determination unit 1223 associates a pointer to a VM instruction with a VM instruction for each VM execution trace in the VM execution trace DB 133, and virtually assigns a VM opcode as an identifier to each of them.
  • Figure 12 is a diagram explaining the processing of the VM instruction determination unit 1223.
  • a VM instruction is a branch instruction
  • the advance of the VPC changes depending on the branch destination.
  • the advance of the VPC changes depending on the size of the VM instruction. For this reason, when pairs of VM instruction opcodes and pointers to VM instructions are collected and the advance of the VPC is examined for each opcode, if it is a branch instruction, the advance of the VPC will vary depending on the branch destination.
  • the VM instruction determination unit 1223 uses variance to evaluate the variance of the pointer to this VM instruction.
  • the VM instruction determination unit 1223 calculates the variance of the amount of change in the VPC for each VM opcode, and narrows it down to only VM opcodes whose calculated variance is greater than a threshold. In this way, the VM instruction determination unit 1223 associates the pointer with the VM instruction, and determines that the VM instruction with variance in the advance of the VPC (VM instruction handler 3 in the example of FIG. 12) is a branch VM instruction ((1) in FIG. 12).
  • the VM instruction determination unit 1223 determines whether the VM instruction is a conditional branch VM instruction, a call VM instruction, or a return VM instruction based on the branch VM instruction.
  • the mutation unit 1231 refers to the list of VM instructions ((1) in FIG. 13). This list of VM instructions is collected by the VM instruction collection unit 1222. Then, one bytecode is extracted from the bytecode dictionary ((2) in FIG. 13). The mutation unit 1231 mutates the extracted bytecode in a predetermined manner, such as by adding, deleting, or changing a VM instruction ((3) in FIG. 13).
  • Fig. 14 is a flowchart showing the procedure of the analysis process according to the embodiment.
  • the input unit 11 receives a test script and a script engine binary as input (step S1).
  • the execution trace acquisition unit 1211 performs an execution trace acquisition process in which the test script is executed while monitoring the binary of the script engine to acquire branch traces and memory access traces (step S2).
  • the VM instruction boundary detection unit 1212 detects VM instructions and performs VM instruction boundary detection processing to detect VM instruction boundaries (step S3).
  • the virtual program counter detection unit 1213 extracts and analyzes the execution trace for the first test script stored in the execution trace DB 131, and performs virtual program counter detection processing to discover the VPC (step S4).
  • the dispatcher detection unit 1214 performs dispatcher detection processing to extract each VM command portion from the script engine binary and detect the portion with high similarity between each VM command as a dispatcher (step S5).
  • the conditional branch flag detection unit 1215 performs a conditional branch detection process to extract and analyze the execution trace for the second test script stored in the execution trace DB 131 and discover the conditional branch flag (step S6).
  • the code cache detection unit 1216 performs a code cache detection process based on the execution trace and VPC to detect the area of the code location from which the memory allocation function was called as a code cache, and to detect the area in which writing is being done to the code location area as an update to the code cache (step S7).
  • the VM execution trace acquisition unit 1221 receives the test script and the script engine binary as input, and executes the test script while monitoring the execution of the script engine binary, thereby performing a VM execution trace acquisition process to acquire a VM execution trace (step S8).
  • the VM instruction collection unit 1222 performs a VM instruction collection process to acquire VM instructions from the VM execution trace (step S9).
  • the VM instruction determination unit 1223 performs a VM instruction determination process to determine the instruction content of the collected VM instructions (step S10).
  • the fuzzing execution unit 1232 executes a bytecode extraction process to execute the seed script while monitoring the code cache and extract bytecode from the code cache (step S12).
  • the mutation unit 1231 performs a mutation process to mutate the bytecode extracted by the fuzzing execution unit 1232 using a predetermined method such as adding, deleting, or changing a VM instruction (step S13).
  • the fuzzing execution unit 1232 re-embeds the mutated code at the destination pointed to by the VPC in the code cache, and performs an execution process (step S14).
  • the vulnerability discovery unit 123 determines whether or not a problem such as a crash has occurred during the execution process (step S15). If a problem such as a crash has occurred (step S15: Yes), the vulnerability discovery unit 123 outputs the input value where the problem occurred (step S16). If a problem such as a crash has not occurred (step S15: No), the process returns to the mutation process of step S13 and continues fuzzing the VM.
  • FIG. 15 is a flowchart showing the processing procedure of the execution trace acquisition process shown in Fig. 14.
  • the execution trace acquisition unit 1211 receives a test script and a script engine binary as input (step S21). Then, the execution trace acquisition unit 1211 hooks the received script engine to acquire a branch trace (step S22). The execution trace acquisition unit 1211 also hooks the received script engine to acquire a memory access trace (step S23).
  • the execution trace acquisition unit 1211 inputs the test script received in this state into the script engine and executes it (step S24), and stores the execution trace acquired thereby in the execution trace DB 131 (step S25).
  • the execution trace acquisition unit 1211 determines whether or not all of the input test scripts have been executed (step S26). If all of the input test scripts have been executed (step S26: Yes), the execution trace acquisition unit 1211 ends the process. On the other hand, if all of the input test scripts have not been executed (step S26: No), the execution trace acquisition unit 1211 returns to the execution of the test scripts in step S24 and continues the process.
  • Fig. 16 is a flowchart showing the processing procedure of the VM instruction boundary detection process shown in Fig. 14.
  • the VM instruction boundary detection unit 1212 extracts execution traces from the execution trace DB 131 (step S31).
  • the VM instruction boundary detection unit 1212 clusters the execution traces using a predetermined method (step S32). Any method may be used for the clustering.
  • the VM instruction boundary detection unit 1212 detects clusters whose execution count is equal to or exceeds a threshold as VM instructions (step S33). Then, the VM instruction boundary detection unit 1212 determines the start and end points of a sequence of consecutive instructions that constitute a VM instruction as boundaries (step S34). The VM instruction boundary detection unit 1212 outputs the VM instruction boundary as a return value (step S35), and ends the VM instruction boundary detection process.
  • Fig. 17 is a flowchart showing the processing procedure of the virtual program counter detection process shown in Fig. 14.
  • the virtual program counter detection unit 1213 extracts one execution trace by the first test script from the execution trace DB 131 (step S41). Next, the virtual program counter detection unit 1213 focuses on memory access traces among the execution traces, and counts up the number of reads for each memory read destination (step S42).
  • the virtual program counter detection unit 1213 receives as input the first test script used to obtain the execution trace (step S43), and analyzes the first test script to obtain the number of repetitions and the number of repeated statements (step S44).
  • the virtual program counter detection unit 1213 extracts from the execution trace DB 131 another execution trace by the first test script, which has a different number of repetitions and number of repeated statements (step S45). Then, the virtual program counter detection unit 1213 focuses on the memory access trace and counts the number of reads for each memory read destination (step S46). The virtual program counter detection unit 1213 also receives as input the first test script used to obtain the execution trace (step S47), analyzes the test script, and obtains the number of repetitions and the number of repeated statements (step S48).
  • the virtual program counter detection unit 1213 narrows down the memory read destinations to only those whose read counts change in proportion to the number of repetitions or the increase or decrease in the number of repeated statements (step S49). Furthermore, the virtual program counter detection unit 1213 narrows down the memory read destinations narrowed down in step S49 to those whose read memory values always point to the start point of the VM instruction (step S50).
  • the virtual program counter detection unit 1213 determines whether the memory read destinations have been narrowed down to only one (step S51). If the virtual program counter detection unit 1213 has not narrowed down the memory read destinations to only one (step S51: No), the process returns to step S45, where the virtual program counter detection unit 1213 retrieves the next execution trace and continues processing. On the other hand, if the virtual program counter detection unit 1213 has narrowed down the memory read destinations to only one (step S51: Yes), the virtual program counter detection unit 1213 stores the narrowed down memory read destination as a virtual program counter in the architecture information DB 132 (step S52), and ends processing.
  • Fig. 18 is a flowchart showing the processing procedure of the dispatcher detection process shown in Fig. 14.
  • the dispatcher detection unit 1214 receives the script engine binary as input (step S61).
  • the dispatcher detection unit 1214 receives the VM command boundary from the VM command boundary detection unit 1212 (step S62).
  • the dispatcher detection unit 1214 extracts each VM instruction portion from the script engine binary based on the boundaries of the VM instructions received from the VM instruction boundary detection unit 1212 (step S63).
  • the dispatcher detection unit 1214 calculates the similarity between the codes of each VM instruction using a predetermined method (step S64). Any method for calculating the similarity may be used as long as it is a method that can calculate the similarity between codes.
  • the dispatcher detection unit 1214 extracts the part with high similarity among all VM commands based on the similarity calculated in step S64 (step S65). The dispatcher detection unit 1214 then determines whether it is the end part of the VM command (step S66).
  • step S66: No If it is not the end of the VM command (step S66: No), the dispatcher detection unit 1214 returns to step S65 and continues processing. If it is the end of the VM command (step S66: Yes), the dispatcher detection unit 1214 outputs the extracted part as a dispatcher (step S67) and ends processing.
  • Fig. 19 is a flowchart showing the processing procedure of the conditional branch flag detection process shown in Fig. 14.
  • conditional branch flag detection unit 1215 extracts one execution trace by the second test script from the execution trace DB 131 (step S71). Then, the conditional branch flag detection unit 1215 focuses on the memory access trace and counts the number of reads for each memory read destination (step S72).
  • the conditional branch flag detection unit 1215 also receives as input the second test script used to obtain the execution trace (step S73), analyzes this second test script, and obtains the number of conditional branches and the True/False sequence pattern (step S74). The conditional branch flag detection unit 1215 then narrows down the memory read destinations to only those whose read count changes in proportion to the number of conditional branches (step S75). Furthermore, the conditional branch flag detection unit 1215 narrows down the memory read destinations to only those whose read memory value alternates between two values in accordance with the True/False sequence pattern (step S76).
  • the conditional branch flag detection unit 1215 determines whether the memory read destinations have been narrowed down to only one (step S77). If the conditional branch flag detection unit 1215 has not narrowed down the memory read destinations to only one (step S77: No), it returns to step S71, retrieves the next execution trace, and continues processing. On the other hand, if the conditional branch flag detection unit 1215 has narrowed down the memory read destinations to only one (step S77: Yes), it stores the narrowed down read destination in the architecture information DB 132 as a virtual program counter (step S78), and ends processing.
  • Fig. 20 is a flowchart showing the processing procedure of the code cache detection process shown in Fig. 14.
  • the code cache detection unit 1216 When the code cache detection unit 1216 receives the execution trace, the VPC, and the VM execution trace as input (step S81), it acquires the memory area pointed to by the VPC from the VM execution trace (step S82). The VM execution trace is acquired by the VM execution trace acquisition unit 1221.
  • the code cache detection unit 1216 obtains from the execution trace the code location that called the memory allocation function that allocated the memory area obtained in step S82 (step S83).
  • the code cache detection unit 1216 detects all areas that were allocated at the code location obtained in step S83 from the VM execution trace as code caches (step S84).
  • the code cache detection unit 1216 acquires the code location that is writing to the code cache from the execution trace (step S85). The code cache detection unit 1216 detects all areas in the VM execution trace that are written to at the code location acquired in step S85 as code cache updates (step S86). The code cache detection unit 1216 returns the detected code cache and its updated location (step S87), and ends the code cache detection process.
  • FIG. 21 is a flowchart showing the procedure of the VM execution trace acquisition process shown in Fig. 14.
  • the VM execution trace acquisition unit 1221 receives a test script and a script engine binary as input (step S91). Then, the VM execution trace acquisition unit 1221 hooks the received script engine to record the VPC and VM opcode (step S92).
  • the VM execution trace acquisition unit 1221 inputs the received test script in this state into the script engine for execution (step S93), and stores the VM execution trace acquired thereby in the VM execution trace DB 133 (step S94).
  • the VM execution trace acquisition unit 1221 determines whether or not all of the input test scripts have been executed (step S95). If all of the input test scripts have been executed (step S95: Yes), the VM execution trace acquisition unit 1221 ends the process. If all of the input test scripts have not been executed (step S95: No), the VM execution trace acquisition unit 1221 returns to the execution of the test scripts in step S93 and continues the process.
  • Fig. 22 is a flowchart showing the procedure of the VM command collection process shown in Fig. 14.
  • the VM command collection unit 1222 receives the VPC and dispatcher as input (step S101) and acquires various scripts from the Internet (step S102).
  • the VM command collection unit 1222 executes the scripts while monitoring the VPC and dispatcher, and acquires a VM execution trace (step S103).
  • the VM instruction collection unit 1222 acquires VM instructions from the VM execution trace (step S104) and adds them to a list of VM instructions (step S105). If the VM instruction collection unit 1222 finds a VM instruction that is not in the list (step S106: No), it returns to step S102. If the VM instruction collection unit 1222 finds no VM instructions that are not in the list (step S106: Yes), it returns the list of VM instructions (step S107) and ends the VM instruction collection process.
  • Fig. 23 is a flowchart showing the processing procedure of the VM command determination process shown in Fig. 14.
  • the VM instruction determination unit 1223 performs a branch VM instruction determination process to determine branch VM instructions from among the VM instructions collected by the VM instruction collection unit 1222 (step S111). The VM instruction determination unit 1223 determines that, among the branch VM instructions, a branch VM instruction that accesses a conditional branch flag is a conditional branch VM instruction (step S112).
  • the VM instruction determination unit 1223 scans the VM execution trace for any branch VM instruction, and if there is a branch instruction that branches immediately after the arbitrary branch VM instruction, performs a call VM instruction determination process to determine that the arbitrary branch VM instruction is a call VM instruction (step S113).
  • the VM instruction determination unit 1223 extracts the calling VM instruction from the VM execution trace, and performs a return VM instruction determination process to determine whether the extracted VM instruction is a return VM instruction (step S114).
  • the VM instruction determination unit 1223 determines other VM instructions using a predetermined method based on the difference in the VM execution traces obtained using multiple test scripts in which the VM instruction is called (step S115).
  • the VM command determination unit 1223 adds the determined VM command to a VM command list (step S116).
  • the VM command determination unit 1223 returns the list of VM commands (step S117) and ends the VM command determination process.
  • FIG. 24 is a flowchart of the branch VM instruction determination process shown in FIG.
  • the VM instruction determination unit 1223 extracts one VM execution trace from the VM execution trace DB 133 (step S121).
  • the VM instruction determination unit 1223 links a pointer to the VM instruction with the VM instruction, and assigns a VM opcode to each as an identifier (step S122). Then, the VM instruction determination unit 1223 counts the amount of change in VPC before and after execution for each VM opcode (step S123).
  • the VM instruction determination unit 1223 determines whether all VM execution traces in the VM execution trace DB 133 have been processed (step S124). If all VM execution traces in the VM execution trace DB 133 have not been processed (step S124: No), the VM instruction determination unit 1223 returns to step S121 and retrieves and processes the next VM execution trace.
  • the VM instruction determination unit 1223 calculates the variance of the amount of change in VPC for each VM opcode (step S125). Then, the VM instruction determination unit 1223 receives a threshold value as an input (step S126). The VM instruction determination unit 1223 narrows down to only VM opcodes whose variance is greater than the threshold value (step S127), stores them as branch VM instructions in the architecture information DB 132 (step S128), and ends the process.
  • FIG. 25 is a flowchart illustrating the procedure of the conditional branch VM instruction determination process illustrated in FIG.
  • the VM instruction determination unit 1223 extracts a list of opcodes of branch VM instructions from the architecture information DB 132 (step S131).
  • the VM instruction determination unit 1223 extracts one VM execution trace from the VM execution trace DB 133 (step S132).
  • the VM instruction determination unit 1223 extracts the corresponding execution trace from the execution trace DB 131 (step S133).
  • the VM instruction determination unit 1223 extracts one location where a branch VM instruction is being executed from the VM execution trace (step S134).
  • the VM instruction determination unit 1223 extracts a memory access trace corresponding to the execution of the extracted branch VM instruction from the execution trace (step S135).
  • the VM instruction determination unit 1223 determines whether the branch VM instruction extracted in step S134 accesses a conditional branch flag based on the memory access trace (step S136).
  • step S136 If the conditional branch flag is being accessed (step S136: Yes), the VM instruction determination unit 1223 determines that the branch VM instruction extracted in step S134 is a conditional branch VM instruction (step S137). The VM instruction determination unit 1223 stores the opcode of this conditional branch VM instruction in the architecture information DB 132 (step S138).
  • step S136 If the conditional branch flag has not been accessed (step S136: No), or after step S138 is completed, the VM instruction determination unit 1223 determines whether all locations where the branch VM instruction is being executed have been processed (step S139).
  • step S139 If not all locations where a branch VM instruction is being executed have been processed (step S139: No), the VM instruction determination unit 1223 extracts the next location where a branch VM instruction is being executed (step S140), returns to step S135, and processes the extracted branch VM instruction.
  • step S139 If all locations where the branch VM instruction is executed have been processed (step S139: Yes), the VM instruction determination unit 1223 determines whether all VM execution traces have been processed (step S141).
  • step S141 If not all VM execution traces have been processed (step S141: No), the VM instruction determination unit 1223 retrieves the next VM execution trace (step S142), returns to step S133, and performs processing using the retrieved VM execution trace. If all VM execution traces have been processed (step S141: Yes), the VM instruction determination unit 1223 ends the conditional branch VM instruction determination process.
  • FIG. 26 is a flowchart illustrating the procedure of the called VM instruction determination process illustrated in FIG.
  • the VM instruction determination unit 1223 extracts a list of opcodes of branch VM instructions from the architecture information DB 132 (step S151).
  • the VM instruction determination unit 1223 extracts one VM execution trace from the VM execution trace DB 133 (step S152).
  • the VM instruction determination unit 1223 extracts one location where a branch VM instruction is being executed from the VM execution trace (step S153).
  • the VM instruction determination unit 1223 scans the VM execution trace for branch VM instructions that appear after the branch VM instruction retrieved in step S153 (step S154). Based on the results of the scan in step S154, the VM instruction determination unit 1223 determines whether there is a branch VM instruction that branches immediately after the branch VM instruction retrieved in step S153 (step S155).
  • the VM instruction determination unit 1223 determines that the branch VM instruction retrieved in step S153 is a call VM instruction (step S156).
  • the VM instruction determination unit 1223 stores the opcode of the call VM instruction in the architecture information DB 132 (step S157).
  • step S158 determines whether or not all locations where the branch VM instruction is being executed have been processed.
  • step S158 If not all locations where a branch VM instruction is being executed have been processed (step S158: No), the VM instruction determination unit 1223 extracts the next location where a branch VM instruction is being executed (step S159), returns to step S154, and processes the extracted branch VM instruction.
  • step S158 If all locations where the branch VM instruction is being executed have been processed (step S158: Yes), the VM instruction determination unit 1223 determines whether all VM execution traces have been processed (step S160).
  • step S160: No the VM instruction determination unit 1223 retrieves the next VM execution trace (step S161), returns to step S153, and performs processing using the retrieved VM execution trace. If all VM execution traces have been processed (step S160: Yes), the VM instruction determination unit 1223 ends the called VM instruction determination process.
  • FIG. 27 is a flowchart illustrating the procedure of the return VM instruction determination process illustrated in FIG.
  • the VM instruction determination unit 1223 extracts a list of opcodes of branch VM instructions from the architecture information DB 132 (step S171).
  • the VM instruction determination unit 1223 extracts one VM execution trace from the VM execution trace DB 133 (step S172).
  • the VM instruction determination unit 1223 extracts one location where a called VM instruction is being executed from the VM execution trace (step S173).
  • the VM instruction determination unit 1223 scans the VM execution trace for branch VM instructions that appear after the call VM instruction extracted in step S173 (step S174).
  • the VM instruction determination unit 1223 determines whether there is a branch VM instruction that branches immediately after the call VM instruction retrieved in step S173 (step S175).
  • step S175 If there is a branch VM instruction that branches immediately after the call VM instruction retrieved in step S173 (step S175: Yes), the VM instruction determination unit 1223 determines that the branch VM instruction that branches immediately after the call VM instruction retrieved in step S173 is a return VM instruction (step S176). The VM instruction determination unit 1223 stores the opcode of the return VM instruction in the architecture information DB 132 (step S177).
  • step S178 determines whether or not all locations where branch VM instructions are being executed have been processed.
  • step S178 If not all locations where a branch VM instruction is being executed have been processed (step S178: No), the VM instruction determination unit 1223 extracts the next location where a branch VM instruction is being executed (step S179), returns to step S174, and processes the extracted branch VM instruction.
  • step S178 If all locations where the branch VM instruction is being executed have been processed (step S178: Yes), the VM instruction determination unit 1223 determines whether all VM execution traces have been processed (step S180).
  • step S180: No the VM command determination unit 1223 retrieves the next VM execution trace (step S181), returns to step S173, and performs processing using the retrieved VM execution trace. If all VM execution traces have been processed (step S180: Yes), the VM command determination unit 1223 ends the return VM command determination process.
  • Fig. 28 is a flowchart showing the processing procedure of the bytecode extraction process shown in Fig. 14.
  • the vulnerability discovery unit 123 accepts a seed script as input (step S191).
  • the vulnerability discovery unit 123 accepts a code cache as input (step S192).
  • the fuzzing execution unit 1232 executes the seed script while monitoring the code cache (step S193).
  • the vulnerability discovery unit 123 extracts the bytecode from the code cache (step S194) and stores the bytecode in the dictionary (step S195).
  • Fig. 29 is a flowchart showing the processing procedure of the mutation process shown in Fig. 14.
  • the mutation unit 1231 receives as input the dictionary of bytecodes obtained in the bytecode extraction process (step S201).
  • the mutation unit 1231 receives as input a list of VM instructions (step S202).
  • the mutation unit 1231 retrieves one bytecode from the bytecode dictionary (step S203), and mutates the retrieved bytecode using a predetermined method, such as adding, deleting, or changing a VM instruction (step S204). The mutation unit 1231 then adds the mutated bytecode to the bytecode dictionary (step S205).
  • step S206 If the mutation unit 1231 has not mutated all of the bytecodes to be mutated (step S206: No), the process returns to step S203. If the mutation unit 1231 has mutated all of the bytecodes to be mutated (step S206: Yes), the mutation unit 1231 returns the updated bytecode dictionary (step S207).
  • Fig. 30 is a flowchart showing the processing procedure of the execution process shown in Fig. 14.
  • the fuzzing execution unit 1232 accepts a seed script as input (step S211).
  • the fuzzing execution unit 1232 accepts a VPC and a code cache as input (step S212).
  • the fuzzing execution unit 1232 executes the seed script while monitoring the VPC and the code cache (step S213).
  • the fuzzing execution unit 1232 stops the execution the moment the bytecode is executed (step S214). Then, the fuzzing execution unit 1232 retrieves the bytecode from the bytecode dictionary (step S215).
  • the fuzzing execution unit 1232 places the retrieved bytecode at the destination pointed to by the VPC in the code cache (step S216). The fuzzing execution unit 1232 then resumes execution (step S217).
  • the vulnerability discovery device 10 analyzes the VM of the script engine, collects VM instructions, and determines the contents of the collected VM instructions to obtain information on the instruction set architecture, which is the system of instructions for the VM.
  • the vulnerability discovery device 10 fuzzes the VM using mutated code based on the obtained architecture information. Therefore, the vulnerability discovery device 10 can realize fuzzing using bytecode as an input value even for script engines whose VM internal specifications are unknown.
  • the vulnerability discovery device 10 executes the test script while monitoring the binary of the script engine, and obtains branch traces and memory access traces as execution traces.
  • the vulnerability discovery device 10 analyzes the VM based on the execution trace, and obtains architecture information of the VM instruction boundary, VPC, dispatcher, conditional branch flags, and code cache.
  • the vulnerability discovery device 10 executes the test script while monitoring the VPC and dispatcher, and obtains a VM execution trace. By analyzing the VM execution trace, the vulnerability discovery device 10 collects VM instructions, determines the contents of the VM instructions, and obtains information on the instruction set architecture.
  • the vulnerability detection device 10 can obtain architecture information including information indicating where in the VM the bytecode generated by the script engine is stored, and information on the instruction set architecture of the bytecode that the VM can interpret.
  • the vulnerability discovery device 10 detects the code cache based on the execution trace, the VPC, and the VM execution trace.
  • the vulnerability discovery device 10 executes the seed script while monitoring the code cache to extract bytecode from the code cache, mutates the extracted bytecode, and re-embeds it into the code cache for execution. Therefore, based on the acquired architecture information, the vulnerability discovery device 10 is able to extract the current bytecode from the VM of the script engine, mutate it within the correct range as a bytecode instruction, and re-embed it into the VM.
  • the vulnerability discovery device 10 detects various architectural information by analyzing the execution trace and VM execution trace obtained, even for script engines whose VM internal specifications are unknown, making it possible to realize fuzzing using bytecode as input without requiring manual reverse engineering.
  • the vulnerability discovery device 10 can automatically perform fuzzing using bytecode as input for a variety of script engines as long as a test script is prepared, so fuzzing using bytecode can be performed without the need for individual design or execution.
  • the vulnerability discovery device 10 can analyze a script engine whose VM's internal specifications are unknown, and obtain information about the storage destination of the bytecode and the instruction set architecture, thereby enabling fuzzing of script engines of a wide variety of scripting languages using the bytecode as an input value.
  • the vulnerability discovery device 10 is useful for discovering vulnerabilities in a wide variety of script engines, and is suitable for discovering vulnerabilities hidden in behavior that is difficult to extract when a script is used as the input value, by fuzzing using bytecode as the input value.
  • the vulnerability discovery device 10 to perform fuzzing on various script engines using bytecode as input, it is possible to discover potential vulnerabilities and take measures such as fixing them.
  • Each component of vulnerability discovery device 10 shown in Fig. 3 is a functional concept, and does not necessarily have to be physically configured as shown.
  • the specific form of distribution and integration of the functions of vulnerability discovery device 10 is not limited to that shown in the figure, and all or part of it can be functionally or physically distributed or integrated in any unit depending on various loads, usage conditions, etc.
  • each process performed by the vulnerability discovery device 10 may be realized, in whole or in part, by a CPU and a program analyzed and executed by the CPU. Furthermore, each process performed by the vulnerability discovery device 10 may be realized as hardware using wired logic.
  • [program] 31 is a diagram showing an example of a computer in which a program is executed to realize the vulnerability detecting device 10.
  • the computer 1000 has, for example, a memory 1010 and a CPU 1020.
  • the computer 1000 also has a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These components are connected by a bus 1080.
  • the memory 1010 includes a ROM 1011 and a RAM 1012.
  • the ROM 1011 stores a boot program such as a BIOS (Basic Input Output System).
  • BIOS Basic Input Output System
  • the hard disk drive interface 1030 is connected to a hard disk drive 1090.
  • the disk drive interface 1040 is connected to a disk drive 1100.
  • a removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1100.
  • the serial port interface 1050 is connected to a mouse 1110 and a keyboard 1120, for example.
  • the video adapter 1060 is connected to a display 1130, for example.
  • the hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, the programs that define each process of the vulnerability detection device 10 are implemented as program modules 1093 in which code executable by the computer 1000 is written.
  • the program modules 1093 are stored, for example, in the hard disk drive 1090.
  • a program module 1093 for executing processes similar to the functional configuration of the vulnerability detection device 10 is stored in the hard disk drive 1090.
  • the hard disk drive 1090 may be replaced by an SSD (Solid State Drive).
  • the setting data used in the processing of the above-mentioned embodiment is stored as program data 1094, for example, in memory 1010 or hard disk drive 1090.
  • the CPU 1020 reads the program module 1093 or program data 1094 stored in memory 1010 or hard disk drive 1090 into RAM 1012 as necessary and executes it.
  • the program module 1093 and program data 1094 may not necessarily be stored in the hard disk drive 1090, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1100 or the like.
  • the program module 1093 and program data 1094 may be stored in another computer connected via a network (such as a LAN (Local Area Network), WAN (Wide Area Network)).
  • the program module 1093 and program data 1094 may then be read by the CPU 1020 from the other computer via the network interface 1070.
  • Vulnerability detection device 11 Input unit 12 Control unit 13 Memory unit 14 Output unit 121 Virtual machine analysis unit 122 Instruction set architecture analysis unit 123 Vulnerability detection unit 131 Execution trace database (DB) 132 Architecture Information DB 133 VM execution trace DB 1211 Execution trace acquisition unit 1212 VM instruction boundary detection unit 1213 Virtual program counter detection unit 1214 Dispatcher detection unit 1215 Conditional branch flag detection unit 1216 Code cache detection unit 1221 VM execution trace acquisition unit 1222 VM instruction collection unit 1223 VM instruction determination unit 1231 Mutation unit 1232 Fuzzing execution unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

This vulnerability discovery device (10) includes: a virtual machine analyzing unit (121) that analyzes the VM of a script engine; a command set architecture analyzing unit (122) that analyzes a command set architecture that is a system of VM commands, collects the VM commands, and determines the command content of the collected VM commands; and a vulnerability discovery unit (123) that uses a mutated code to perform fuzzing on the VM on the basis of architecture information acquired by the virtual machine analyzing unit (121) and the command set architecture analyzing unit (122).

Description

脆弱性発見装置、脆弱性発見方法及び脆弱性発見プログラムVulnerability detection device, vulnerability detection method, and vulnerability detection program
 本発明は、脆弱性発見装置、脆弱性発見方法及び脆弱性発見プログラムに関する。 The present invention relates to a vulnerability detection device, a vulnerability detection method, and a vulnerability detection program.
 ソフトウェアに潜在する欠陥を発見するための技術として、動的テストと呼ばれるものがある。動的テストは、ソフトウェアテストが存在する。ソフトウェアテストには、対象のプログラムに実際に入力値を与えて実行し、その振る舞いを見ることでテストを実施する。 There is a technique called dynamic testing that can be used to discover latent defects in software. Dynamic testing is a type of software testing. Software testing involves actually providing input values to the target program, running it, and observing its behavior.
 動的テストの一つに、ファジングがある。ファジングは、ソフトウェアに潜在する脆弱性を発見するための技術である。ファジングは、入力値を繰り返し生成や変異させながら対象のプログラムに与えて実行し、そのプログラムの実行状態を観測し、クラッシュなど、脆弱性に繋がり得る問題を引き起こす入力を探索することで、脆弱性を発見する技術である。 One type of dynamic testing is fuzzing. Fuzzing is a technique for discovering vulnerabilities that may exist in software. Fuzzing discovers vulnerabilities by repeatedly generating and mutating input values, running the target program, observing the program's execution state, and searching for inputs that may cause problems that could lead to vulnerabilities, such as crashes.
 この際、何を入力値とし、どのようにして入力値を生成または変異させていくかが、脆弱性の発見効率に関わる重要な要素となる。例えば、特定の入力値の範囲でなければ実行されない経路があり、そこに脆弱性が存在する場合には、その経路を辿るための入力値を効率よく発見しなければ、その脆弱性の発見に多大な時間を要してしまう。 In this case, what is used as the input value and how the input value is generated or mutated are important factors in determining the efficiency of vulnerability discovery. For example, if there is a path that can only be executed within a specific range of input values and a vulnerability exists there, it will take a long time to discover the vulnerability unless the input values required to follow that path can be found efficiently.
 現在のファジングでは、シードと呼ばれる初期の入力値を与え、それを変異させていくことが一般的である。このため、何をシードとし、どのように変異させていくかが重要となる。 In current fuzzing, it is common to provide an initial input value called a seed, which is then mutated. For this reason, it is important to decide what to use as the seed and how to mutate it.
 近年、脆弱性の発見が重要となってきているソフトウェアの一つに、スクリプトエンジン(インタプリタとも呼ばれる。)がある。こうしたスクリプトエンジンは、外部からの信頼できないスクリプトを実行する場合もあるため、攻撃者に先駆けてあらかじめ脆弱性を発見しておくことは重要である。 In recent years, it has become increasingly important to discover vulnerabilities in software such as script engines (also known as interpreters). Because such script engines may execute untrusted scripts from outside, it is important to discover vulnerabilities before attackers do.
 スクリプトエンジンのように、仮想機械(Virtual Machine:VM)を用いた実行環境のファジングには、おもに二つの方式がある。一つ目の方式は、スクリプトをファジングのシードとし、変異させながらスクリプトエンジンに入力することで、脆弱性を探すものである。二つ目の方式は、スクリプトから生成されるバイトコードをシードとし、同様にスクリプトエンジンに入力するものである。 There are two main methods for fuzzing execution environments that use a Virtual Machine (VM), such as script engines. The first method uses a script as a fuzzing seed, mutating it while inputting it into the script engine to search for vulnerabilities. The second method uses the bytecode generated from the script as a seed, and similarly inputs it into the script engine.
 非特許文献1では、一つ目の方式として、PHPのスクリプトエンジンをファジングするため、独自の中間表現を定義している。まず、非特許文献1に記載の技術では、PHPのスクリプトをこの中間表現に変換し、その状態で変異させた上で、PHPのスクリプトに戻して入力値として用いている。 In Non-Patent Document 1, as the first method, a unique intermediate representation is defined to fuzz the PHP script engine. First, in the technology described in Non-Patent Document 1, the PHP script is converted into this intermediate representation, mutated in that state, and then converted back into the PHP script to be used as an input value.
 また、非特許文献2に記載の技術では、JavaScript(登録商標)エンジンをファジングするため、JavaScriptのコードを静的解析することで型情報を得るとともに、抽象構文木を用いた変異をし、型と構造を維持した変異を可能としている。 In addition, the technology described in Non-Patent Document 2 fuzzes JavaScript (registered trademark) engines by statically analyzing JavaScript code to obtain type information, and by performing mutations using abstract syntax trees, it is possible to perform mutations that maintain types and structures.
 一方、二つ目の方式として、非特許文献3では、Androidの持つART VMをファジングするため、DEXバイトコードを変異させる手法を提案している。 On the other hand, as a second method, Non-Patent Document 3 proposes a method of mutating the DEX bytecode to fuzz Android's ART VM.
 非特許文献4に記載の技術では、JavaのVM(JVM)を効率的にファジングするため、Javaバイトコードを変異させる際に、どのような変異をさせるかを、実行時のコードカバレッジの独自性を指標として、マルコフ連鎖モンテカルロを用いて選択している。 In the technology described in Non-Patent Document 4, in order to efficiently fuzz Java VMs (JVMs), when mutating Java bytecode, the type of mutation to be made is selected using Markov Chain Monte Carlo, with the uniqueness of code coverage at runtime as an indicator.
 非特許文献5に記載の技術では、JVMのより深部に潜む欠陥をファジングで発見するため、変異の際にJVMが実行できる正規のバイトコードを保つことを目的とし、変異と選択の手法を工夫している。 The technology described in Non-Patent Document 5 uses a mutation and selection technique to use fuzzing to discover defects hidden deeper within the JVM, with the aim of maintaining the correct bytecode that the JVM can execute when mutation occurs.
 しかしながら、非特許文献1,2に記載の手法では、スクリプトの変異のみでファジングを実施するため、バイトコード列で見た場合のテストの網羅性には限界があるという課題があった。 However, the techniques described in Non-Patent Documents 1 and 2 perform fuzzing only by mutating scripts, which means there is a problem that there is a limit to the comprehensiveness of tests when viewed from the bytecode sequence.
 さらに、非特許文献3,4,5に記載の手法では、バイトコードを変異させるためにVMの情報を用いるため、内部仕様が未知のVMに対しては適用できない、という課題があった。 Furthermore, the techniques described in Non-Patent Documents 3, 4, and 5 use VM information to mutate the bytecode, which means that they cannot be applied to VMs whose internal specifications are unknown.
 バイトコードを変異させるには、スクリプトエンジンによって生成されたバイトコードがVMのどこに格納されるかを示す情報と、VMが解釈できるバイトコードの命令セットアーキテクチャの情報とを、予め知る必要がある。これらの情報は、現在のバイトコードをスクリプトエンジンのVMから取り出し、バイトコードの命令として正しい範囲で変異を加え、再度VMに埋め込む必要があるため、必要となる。 In order to mutate bytecode, it is necessary to know in advance information indicating where in the VM the bytecode generated by the script engine will be stored, and information on the bytecode instruction set architecture that the VM can interpret. This information is necessary because the current bytecode must be taken from the script engine's VM, mutated within the correct range as a bytecode instruction, and then embedded back into the VM.
 しかしながら、VMの内部仕様は公開されていない場合が多く、VMの内部仕様を容易に知ることはできない。したがって、一般には、VMをリバースエンジニアリングすることで内部仕様を明らかにし、独自に情報を獲得する必要がある。 However, the internal specifications of a VM are often not made public, and cannot be easily learned. Therefore, it is generally necessary to reverse engineer the VM to reveal the internal specifications and obtain the information independently.
 これをスクリプトエンジンに対して手動かつ個別に、解析、設計及び実装をすることは、かかる労力の観点から、現実的でない。  Manually and individually analyzing, designing and implementing this in a script engine is not realistic in terms of the amount of effort involved.
 本発明は、上記に鑑みてなされたものであって、VMの内部仕様が未知のスクリプトエンジンに対しても、手動での個別の解析や設計及び実装を要さず、バイトコードを入力値としたファジングを実現できる脆弱性発見装置、脆弱性発見方法及び脆弱性発見プログラムを提供することを目的とする。 The present invention has been made in consideration of the above, and aims to provide a vulnerability discovery device, a vulnerability discovery method, and a vulnerability discovery program that can realize fuzzing using bytecode as an input value, even for script engines whose VM internal specifications are unknown, without requiring individual manual analysis, design, and implementation.
 上述した課題を解決し、目的を達成するために、本発明の脆弱性発見装置は、スクリプトエンジンの仮想機械を解析する第1の解析部と、前記仮想機械の命令の体系である命令セットアーキテクチャを解析して、仮想機械命令を収集し、収集した仮想機械命令の命令内容を判定する第2の解析部と、前記第1の解析部及び前記第2の解析部によって取得されたアーキテクチャ情報を基に、変異させたコードを用いて仮想機械をファジングする実行部と、を有することを特徴とする。 In order to solve the above problems and achieve the objectives, the vulnerability detection device of the present invention is characterized by having a first analysis unit that analyzes the virtual machine of a script engine, a second analysis unit that analyzes the instruction set architecture, which is the instruction system of the virtual machine, to collect virtual machine instructions and determine the instruction contents of the collected virtual machine instructions, and an execution unit that fuzzes the virtual machine using mutated code based on the architecture information obtained by the first analysis unit and the second analysis unit.
 本発明によれば、VMの内部仕様が未知のスクリプトエンジンに対しても、手動での個別の解析や設計及び実装を要さず、バイトコードを入力値としたファジングを実現できる。 According to the present invention, fuzzing can be performed using bytecode as input, even for script engines whose VM internal specifications are unknown, without the need for individual manual analysis, design, and implementation.
図1は、スクリプトエンジンの構成の一例を説明するための図である。FIG. 1 is a diagram illustrating an example of the configuration of a script engine. 図2は、スクリプトエンジンが有するVMの擬似コードを示す図である。FIG. 2 is a diagram showing pseudo code of a VM included in the script engine. 図3は、実施の形態に係る脆弱性発見の構成の一例を説明する図である。FIG. 3 is a diagram illustrating an example of a configuration for detecting vulnerabilities according to an embodiment. 図4は、仮想プログラムカウンタ(VPC)の検出に用いるテストスクリプトの一例を示す図である。FIG. 4 is a diagram showing an example of a test script used for detecting a virtual program counter (VPC). 図5は、分岐VM命令検出に用いるテストスクリプトの一例を示す図である。FIG. 5 is a diagram showing an example of a test script used for detecting a branch VM instruction. 図6は、実行トレースの一例を示す図である。FIG. 6 is a diagram illustrating an example of an execution trace. 図7は、VM実行トレースの一例を示す図である。FIG. 7 illustrates an example of a VM execution trace. 図8は、VM命令境界検出部の処理を説明する図である。FIG. 8 is a diagram illustrating the process of the VM instruction boundary detection unit. 図9は、仮想プログラムカウンタ検出部の処理を説明する図である。FIG. 9 is a diagram for explaining the process of the virtual program counter detection unit. 図10は、ディスパッチャ検出部の処理を説明する図である。FIG. 10 is a diagram illustrating the process of the dispatcher detection unit. 図11は、コードキャッシュ検出部の処理を説明する図である。FIG. 11 is a diagram illustrating the process of the code cache detection unit. 図12は、VM命令判定部の処理を説明する図である。FIG. 12 is a diagram illustrating the process of the VM command determination unit. 図13は、変異部の処理を説明する図である。FIG. 13 is a diagram for explaining the process of the mutation unit. 図14は、実施の形態に係る解析処理の処理手順を示すフローチャートである。FIG. 14 is a flowchart illustrating a processing procedure of the analysis process according to the embodiment. 図15は、図14に示す実行トレース取得処理の処理手順を示すフローチャートである。FIG. 15 is a flowchart illustrating the procedure of the execution trace acquisition process shown in FIG. 図16は、図14に示すVM命令境界検出処理の処理手順を示すフローチャートである。FIG. 16 is a flowchart illustrating a procedure of the VM instruction boundary detection process illustrated in FIG. 図17は、図14に示す仮想プログラムカウンタ検出処理の処理手順を示すフローチャートである。FIG. 17 is a flowchart showing the processing procedure of the virtual program counter detection processing shown in FIG. 図18は、図14に示すディスパッチャ検出処理の処理手順を示すフローチャートである。FIG. 18 is a flowchart illustrating the procedure of the dispatcher detection process shown in FIG. 図19は、図14に示す条件分岐フラグ検出処理の処理を説明する図である。FIG. 19 is a diagram for explaining the conditional branch flag detection process shown in FIG. 図20は、図14に示すコードキャッシュ検出処理の処理手順の処理手順を示すフローチャートである。FIG. 20 is a flowchart illustrating the processing procedure of the code cache detection processing shown in FIG. 図21は、図14に示すVM実行トレース取得処理の処理手順の処理手順を示すフローチャートである。FIG. 21 is a flowchart illustrating the procedure of the VM execution trace acquisition process illustrated in FIG. 14 . 図22は、図14に示すVM命令収集処理の処理手順を示すフローチャートである。FIG. 22 is a flowchart illustrating the procedure of the VM command collection process illustrated in FIG. 図23は、図14に示すVM命令判定処理の処理手順を示すフローチャートである。FIG. 23 is a flowchart illustrating a processing procedure of the VM command determination processing shown in FIG. 図24は、図23に示す分岐VM命令判定処理の処理手順を示すフローチャートである。FIG. 24 is a flowchart of the branch VM instruction determination process shown in FIG. 図25は、図23に示す条件分岐VM命令判定処理の処理手順を示すフローチャートである。FIG. 25 is a flowchart illustrating the procedure of the conditional branch VM instruction determination process illustrated in FIG. 図26は、図23に示す呼び出しVM命令判定処理の処理手順を示すフローチャートである。FIG. 26 is a flowchart illustrating the procedure of the called VM instruction determination process illustrated in FIG. 図27は、図23に示す戻りVM命令判定処理の処理手順を示すフローチャートである。FIG. 27 is a flowchart illustrating the procedure of the return VM instruction determination process illustrated in FIG. 図28は、図14に示すバイトコード抽出処理の処理手順を示すフローチャートである。FIG. 28 is a flowchart of the bytecode extraction process shown in FIG. 図29は、図14に示す変異処理の処理手順を示すフローチャートである。FIG. 29 is a flowchart showing the procedure of the mutation process shown in FIG. 図30は、図14に示す実行処理の処理手順を示すフローチャートである。FIG. 30 is a flowchart showing the procedure of the execution process shown in FIG. 図31は、プログラムが実行されることにより、脆弱性発見が実現されるコンピュータの一例を示す図である。FIG. 31 is a diagram illustrating an example of a computer in which a program is executed to realize vulnerability discovery.
 以下に、本願に係る脆弱性発見装置、脆弱性発見方法及び脆弱性発見プログラムの実施形態を図面に基づいて詳細に説明する。また、本発明は、以下に説明する実施形態により限定されるものではない。 Below, embodiments of the vulnerability discovery device, vulnerability discovery method, and vulnerability discovery program according to the present application are described in detail with reference to the drawings. Furthermore, the present invention is not limited to the embodiments described below.
[実施の形態]
 実施の形態に係る脆弱性発見装置は、VMの内部仕様が未知のスクリプトエンジンに対しても、手動での個別の解析や設計及び実装を要さず、バイトコードを入力値としたファジングを実現できる脆弱性発見装置である。
[Embodiment]
The vulnerability detection device of the embodiment is a vulnerability detection device that can perform fuzzing using bytecode as input value, even for script engines whose VM internal specifications are unknown, without requiring individual manual analysis, design, and implementation.
 実施の形態に係る脆弱性発見装置は、スクリプトエンジンのバイナリを監視しながらテストスクリプトを実行して、ブランチトレース及びメモリアクセストレースを実行トレースとして取得する。脆弱性発見装置は、実行トレースに基づいてVMを解析し、アーキテクチャ情報として、VM命令境界、仮想プログラムカウンタ(VPC)、ディスパッチャ、条件分岐フラグ、及び、実行されるVM命令が保存されるコードキャッシュを取得する。 The vulnerability discovery device according to the embodiment executes test scripts while monitoring the binaries of the script engine, and obtains branch traces and memory access traces as execution traces. The vulnerability discovery device analyzes the VM based on the execution traces, and obtains, as architecture information, VM instruction boundaries, a virtual program counter (VPC), a dispatcher, a conditional branch flag, and a code cache in which executed VM instructions are stored.
 脆弱性発見装置は、VPC及びディスパッチャを監視しながらテストスクリプトを実行して、VM実行トレースを取得する。脆弱性発見装置は、VM実行トレースを解析することで、VM命令を収集するとともに、VM命令の内容を判定し、命令セットアーキテクチャの情報を取得する。 The vulnerability detection device executes test scripts while monitoring the VPC and dispatcher to obtain VM execution traces. By analyzing the VM execution traces, the vulnerability detection device collects VM instructions, determines the contents of the VM instructions, and obtains information on the instruction set architecture.
 そして、脆弱性発見装置は、取得したアーキテクチャ情報を基に、変異させたコードを用いてVMをファジングする。脆弱性発見装置は、取得したアーキテクチャ情報を基に、バイトコードの抽出、変異、埋め込みをした上で実行することを繰り返し、ファジングを実現する。脆弱性発見装置は、コードキャッシュを監視しながらシードスクリプトを実行することでコードキャッシュからバイトコードを抽出し、抽出したバイトコードに変異を加え、再度コードキャッシュに埋め込んでファジングを実行する。 Then, the vulnerability detection device uses the mutated code to fuzz the VM based on the acquired architecture information. The vulnerability detection device achieves fuzzing by repeatedly extracting, mutating, embedding, and executing bytecode based on the acquired architecture information. The vulnerability detection device executes a seed script while monitoring the code cache to extract bytecode from the code cache, mutates the extracted bytecode, and embeds it back into the code cache to perform fuzzing.
 図1及び図2を参照して、一般的なスクリプトエンジンの構成とそれらの働きについて説明する。図1は、スクリプトエンジンの構成の一例を説明するための図である。図1に示すように、スクリプトエンジン1は、バイトコードコンパイラ2とVM3を有する。また、バイトコードコンパイラ2は、構文解析部4、バイトコード生成部5を有する。また、VM3は、コードキャッシュ部6、フェッチ部7、デコード部8、実行部9を有する。これらのフェッチ部7、デコード部8、実行部9は、繰り返し実行され、インタプリタループと呼ばれる。そして、スクリプトエンジン1は、スクリプトの入力を受け付ける。 The configuration and function of a typical script engine will be described with reference to Figures 1 and 2. Figure 1 is a diagram for explaining an example of the configuration of a script engine. As shown in Figure 1, script engine 1 has a bytecode compiler 2 and a VM 3. Furthermore, bytecode compiler 2 has a syntax analysis unit 4 and a bytecode generation unit 5. Furthermore, VM 3 has a code cache unit 6, a fetch unit 7, a decode unit 8, and an execution unit 9. These fetch unit 7, decode unit 8, and execution unit 9 are executed repeatedly and are called an interpreter loop. Then, script engine 1 accepts the input of a script.
 構文解析部4は、スクリプトを入力として受け取り、字句解析及び構文解析を経て、抽象構文木(Abstract Syntax Tree:AST)を生成し、バイトコード生成部5に出力する。バイトコード生成部5は、ASTを入力として受け取り、バイトコードに変換してコードキャッシュ部6に格納する。 The syntax analysis unit 4 receives the script as input, and through lexical and syntactic analysis generates an Abstract Syntax Tree (AST), which it outputs to the bytecode generation unit 5. The bytecode generation unit 5 receives the AST as input, converts it into bytecode, and stores it in the code cache unit 6.
 フェッチ部7は、コードキャッシュ部6からVMオペコードをフェッチし、デコード部8に出力する。ここで、VMオペコードは、VM命令のオペコード部を指す。デコード部8は、VMオペコードを入力として受け取り、デコーダ・ディスパッチャを用いてVMオペコードを解釈し、対応したプログラムにディスパッチする。実行部9は、VM命令に対応したプログラムを実行する。インタプリタループの繰り返しにより、VM命令を次々に実行していくことで、スクリプトに記述した内容が実行される。 The fetch unit 7 fetches the VM opcode from the code cache unit 6 and outputs it to the decode unit 8. Here, the VM opcode refers to the opcode portion of the VM instruction. The decode unit 8 receives the VM opcode as input, interprets the VM opcode using a decoder/dispatcher, and dispatches it to the corresponding program. The execution unit 9 executes the program corresponding to the VM instruction. The contents written in the script are executed by executing the VM instructions one after another through a repeated interpreter loop.
 図2を参照して、スクリプトエンジンの構成要素の働きについて説明する。図2は、スクリプトエンジンが有するVMの擬似コードを示す図である。図2に示すように、まず、擬似コードは、VPCを初期化している(1行目)。擬似コードでは、while文のループがインタプリタループである(2行目)。擬似コードでは、コードキャッシュからVPCの指すVMオペコードが取得され(3行目)、Switch文を用いてデコード及びディスパッチされる(4、5、7行目)。そして、擬似コードでは、ディスパッチされた先の、VMオペコードに対応したプログラムが実行される(6、8行目)。 The functions of the components of the script engine will be described with reference to Figure 2. Figure 2 is a diagram showing pseudocode for a VM in the script engine. As shown in Figure 2, the pseudocode first initializes the VPC (line 1). In the pseudocode, the while loop is the interpreter loop (line 2). In the pseudocode, the VM opcode pointed to by the VPC is obtained from the code cache (line 3), and is decoded and dispatched using a Switch statement (lines 4, 5, and 7). Then, in the pseudocode, the program corresponding to the VM opcode that was dispatched is executed (lines 6 and 8).
 また、分岐VM命令とはスクリプト内で分岐を発生させるVM命令であり、条件分岐フラグは、条件分岐時に分岐がなされるか否かのフラグを保持する領域である。 In addition, a branch VM command is a VM command that causes a branch to occur within a script, and a conditional branch flag is an area that holds a flag indicating whether or not a branch will be taken when a conditional branch occurs.
[脆弱性発見装置の構成]
 続いて、図3を参照して、実施の形態に係る脆弱性発見装置10の構成について具体的に説明する。図3は、実施の形態に係る脆弱性発見装置の構成の一例を説明する図である。
[Configuration of vulnerability detection device]
Next, the configuration of the vulnerability discovering device 10 according to the embodiment will be specifically described with reference to Fig. 3. Fig. 3 is a diagram illustrating an example of the configuration of the vulnerability discovering device according to the embodiment.
 図3に示すように、脆弱性発見装置10は、入力部11、制御部12、記憶部13及び出力部14を有する。そして、脆弱性発見装置10は、テストスクリプト、スクリプトエンジンバイナリ及びシードスクリプトの入力を受け付ける。 As shown in FIG. 3, the vulnerability discovery device 10 has an input unit 11, a control unit 12, a memory unit 13, and an output unit 14. The vulnerability discovery device 10 accepts inputs of a test script, a script engine binary, and a seed script.
 入力部11は、キーボードやマウス等の入力デバイスで構成され、外部からの情報の入力を受け付け、制御部12に入力する。また、入力部11は、有線接続、或いは、ネットワーク等を介して接続された他の装置との間で、各種情報を送受信する通信インタフェースを有し、他の装置から送信された情報の入力を受け付ける。入力部11は、テストスクリプト、スクリプトエンジンバイナリ及びシードスクリプトの入力を受け付け、制御部12に出力する。 The input unit 11 is composed of input devices such as a keyboard and a mouse, and accepts information input from the outside and inputs it to the control unit 12. The input unit 11 also has a communication interface for sending and receiving various information to and from other devices connected via a wired connection or a network, etc., and accepts input of information sent from other devices. The input unit 11 accepts input of test scripts, script engine binaries, and seed scripts, and outputs them to the control unit 12.
 テストスクリプトは、スクリプトエンジンを動的解析して実行トレース及びVM実行トレースを取得する際に、入力されるスクリプトである。なお、テストスクリプトの詳細は後述する。スクリプトエンジンバイナリは、スクリプトエンジンを構成する実行可能ファイルである。スクリプトエンジンバイナリは、複数の実行可能ファイルによって構成される場合がある。シードスクリプトは、初期の入力値となるバイトコードを含むスクリプトである。 A test script is a script that is input when dynamically analyzing a script engine to obtain an execution trace and a VM execution trace. Details of test scripts are described later. A script engine binary is an executable file that constitutes a script engine. A script engine binary may be composed of multiple executable files. A seed script is a script that contains bytecode that serves as the initial input value.
 制御部12は、各種の処理手順などを規定したプログラム及び所要データを格納するための内部メモリを有し、これらによって種々の処理を実行する。例えば、制御部12は、CPU(Central Processing Unit)やMPU(Micro Processing Unit)などの電子回路である。制御部12は、仮想機械解析部121(第1の解析部)、命令セットアーキテクチャ解析部122(第2の解析部)及び脆弱性発見部123を有する。 The control unit 12 has an internal memory for storing programs that define various processing procedures and the necessary data, and executes various processes using these. For example, the control unit 12 is an electronic circuit such as a CPU (Central Processing Unit) or an MPU (Micro Processing Unit). The control unit 12 has a virtual machine analysis unit 121 (first analysis unit), an instruction set architecture analysis unit 122 (second analysis unit), and a vulnerability discovery unit 123.
 仮想機械解析部121は、スクリプトエンジンのVMを解析する。仮想機械解析部121は、実行時の条件を変えて複数の実行トレースを取得し、差分実行解析を用いて複数の実行トレースを解析し、VPC及び条件分岐フラグを取得する。また、仮想機械解析部121は、スクリプトエンジンバイナリを解析して、VM命令の境界及びディスパッチャを取得する。仮想機械解析部121は、VM実行トレースから、コードキャッシュを検出する。コードキャッシュには、実行されるVM命令が保存される。 The virtual machine analysis unit 121 analyzes the VM of the script engine. The virtual machine analysis unit 121 obtains multiple execution traces by changing the conditions at run time, analyzes the multiple execution traces using differential execution analysis, and obtains VPCs and conditional branch flags. The virtual machine analysis unit 121 also analyzes the script engine binary to obtain VM instruction boundaries and dispatchers. The virtual machine analysis unit 121 detects a code cache from the VM execution trace. The VM instructions to be executed are stored in the code cache.
 仮想機械解析部121は、実行トレース取得部1211(第1の取得部)、VM命令境界検出部1212(第1の検出部)、仮想プログラムカウンタ検出部1213(第2の検出部)、ディスパッチャ検出部1214(第3の検出部)、条件分岐フラグ検出部1215(第4の検出部)及びコードキャッシュ検出部1216(第5の検出部)を有する。 The virtual machine analysis unit 121 has an execution trace acquisition unit 1211 (first acquisition unit), a VM instruction boundary detection unit 1212 (first detection unit), a virtual program counter detection unit 1213 (second detection unit), a dispatcher detection unit 1214 (third detection unit), a conditional branch flag detection unit 1215 (fourth detection unit), and a code cache detection unit 1216 (fifth detection unit).
 実行トレース取得部1211は、テストスクリプト及びスクリプトエンジンバイナリを入力として受け付ける。実行トレース取得部1211は、スクリプトエンジンバイナリの実行を監視しながら、テストスクリプトを実行することで、実行トレースを取得する。 The execution trace acquisition unit 1211 accepts the test script and the script engine binary as input. The execution trace acquisition unit 1211 acquires an execution trace by executing the test script while monitoring the execution of the script engine binary.
 実行トレースは、ブランチトレースとメモリアクセストレースとによって構成される。ブランチトレースは、実行の際の分岐命令の種類と、分岐元アドレスと分岐先アドレスを記録する。メモリアクセストレースは、メモリ操作の種類と、操作対象のメモリアドレスを記録する。ブランチトレース及びメモリアクセストレースは、命令フックによって取得可能であることが知られている。実行トレース取得部1211が取得した実行トレースは、実行トレースDB131に格納される。 An execution trace consists of a branch trace and a memory access trace. A branch trace records the type of branch instruction at the time of execution, the branch source address, and the branch destination address. A memory access trace records the type of memory operation and the memory address of the operation target. It is known that branch traces and memory access traces can be acquired by instruction hooks. The execution trace acquired by the execution trace acquisition unit 1211 is stored in the execution trace DB 131.
 VM命令境界検出部1212は、実行トレースをクラスタリングして、各VM命令の境界を検出する。VM命令境界検出部1212は、実行トレースをクラスタリングして、実行回数が閾値以上のクラスタをVM命令として検出する。クラスタリングでは、複数回実行される連続したコード領域を検出する。これには、例えば、実行された命令間のコード上の距離が近いものをまとめてもよいし、実行されたコードブロックの共通部分列を探してもよいし、他の方法によってもよい。脆弱性発見装置10は、検出したVM命令を構成する連続した命令列の開始点と終了点とを境界として検出する。ここで検出したVM命令の境界は、VPC検出、ディスパッチャ検出において用いられる。 The VM instruction boundary detection unit 1212 clusters the execution traces to detect the boundaries of each VM instruction. The VM instruction boundary detection unit 1212 clusters the execution traces to detect clusters with a threshold or more of execution counts as VM instructions. In clustering, consecutive code regions that are executed multiple times are detected. For example, executed instructions that are close in distance to each other in the code may be grouped together, common subsequences of executed code blocks may be searched for, or other methods may be used. The vulnerability discovery device 10 detects the start and end points of consecutive instruction sequences that make up the detected VM instruction as boundaries. The VM instruction boundaries detected here are used in VPC detection and dispatcher detection.
 仮想プログラムカウンタ検出部1213は、実行トレースDB131に格納された第1のテストスクリプトに対する実行トレースを取り出して解析し、VPCを検出する。仮想プログラムカウンタ検出部1213は、メモリの読み込み回数に着目した差分実行解析とVM命令境界検出部1212によって検出された各VM命令の境界とを用いて複数の実行トレースを解析し、VPCを検出する。仮想プログラムカウンタ検出部1213は、各VM命令の実行後には、必ずVPCを保持するメモリへの読み込みが発生することを利用し、この読み込み先を発見することで、VPCを検出する。 The virtual program counter detection unit 1213 extracts and analyzes the execution trace for the first test script stored in the execution trace DB 131 to detect the VPC. The virtual program counter detection unit 1213 analyzes multiple execution traces using differential execution analysis focusing on the number of memory reads and the boundaries of each VM instruction detected by the VM instruction boundary detection unit 1212 to detect the VPC. The virtual program counter detection unit 1213 makes use of the fact that a read into the memory that holds the VPC always occurs after the execution of each VM instruction, and detects the VPC by discovering the destination of this read.
 このため、仮想プログラムカウンタ検出部1213は、VPCの検出として、メモリの読み込み回数に着目した差分実行解析を用いる。仮想プログラムカウンタ検出部1213は、テストスクリプトを用いて取得された複数のテストスクリプトの実行トレースを比較し、メモリ読み込み回数が、繰り返される回数及び繰り返される文の数との双方の増減に比例して変化するメモリを発見する。そして、仮想プログラムカウンタ検出部1213は、VM命令境界検出部1212によって検出された各VM命令の境界を参照して、読み込んだメモリの値が常にVM命令の開始点を指しているものに絞り込む。仮想プログラムカウンタ検出部1213は、このメモリをVPCとして検出する。 For this reason, the virtual program counter detection unit 1213 uses differential execution analysis that focuses on the number of memory reads to detect VPCs. The virtual program counter detection unit 1213 compares execution traces of multiple test scripts acquired using the test scripts, and finds memories whose memory read counts change in proportion to both the increase or decrease in the number of repetitions and the number of repeated statements. The virtual program counter detection unit 1213 then refers to the boundaries of each VM instruction detected by the VM instruction boundary detection unit 1212, and narrows down the memory values that have been read to those that always point to the start point of the VM instruction. The virtual program counter detection unit 1213 detects this memory as a VPC.
 ディスパッチャ検出部1214は、VM命令境界検出部1212が検出したVM命令の境界を基に、スクリプトエンジンバイナリから各VM命令部分を切り出し、各VM命令間で類似度が高い部分をディスパッチャとして検出する。前提として、ディスパッチャは、ポインタキャッシュの参照と次のVM命令ハンドラのポインタへのジャンプで実現される。ディスパッチャは、各々のVM命令ハンドラの後部に分散的に配置されており、一般にそれらのコードの同一性は高い。こうしたVM命令ハンドラの後部に存在し、同一性の高いコードを探すことで、脆弱性発見装置は、所定の方法でディスパッチャを検出する。類似度の高い部分の検出には、たとえば系列アライメントアルゴリズムを用いてもよく、その他の方法によってもよい。 The dispatcher detection unit 1214 extracts each VM instruction portion from the script engine binary based on the boundaries of the VM instructions detected by the VM instruction boundary detection unit 1212, and detects the parts with high similarity between each VM instruction as dispatchers. As a premise, the dispatcher is realized by referencing the pointer cache and jumping to the pointer of the next VM instruction handler. The dispatchers are placed in a distributed manner at the rear of each VM instruction handler, and the code therein is generally highly identical. The vulnerability detection device detects the dispatcher using a specified method by searching for code with high similarity that exists at the rear of such VM instruction handlers. To detect the parts with high similarity, for example, a series alignment algorithm may be used, or other methods may be used.
 条件分岐フラグ検出部1215は、実行トレースDB131に格納された第2のテストスクリプトに対する実行トレースを取り出して解析し、条件分岐フラグを発見する。条件分岐フラグ検出部1215は、メモリの読み込み回数に着目した差分実行解析を用いて、複数の実行トレースを解析し、条件分岐フラグを検出する。条件分岐フラグ検出部1215は、様々なパターンで条件分岐を実行し、その際のメモリの変化のパターンをテストスクリプト上の条件分岐のパターンと照らし合わせることで、条件分岐フラグを格納するメモリを検出する。 The conditional branch flag detection unit 1215 extracts and analyzes the execution trace for the second test script stored in the execution trace DB 131 to discover the conditional branch flag. The conditional branch flag detection unit 1215 analyzes multiple execution traces using differential execution analysis that focuses on the number of times memory is read, and detects the conditional branch flag. The conditional branch flag detection unit 1215 executes conditional branches in various patterns, and detects the memory that stores the conditional branch flag by comparing the pattern of memory changes at that time with the conditional branch pattern in the test script.
 コードキャッシュ検出部1216は、実行トレース、VPC及びVM実行トレースを基に、VM実行トレースから、実行されるVM命令が保存されるキャッシュであるコードキャッシュを検出する。 The code cache detection unit 1216 detects the code cache, which is a cache in which the VM instructions to be executed are stored, from the VM execution trace based on the execution trace, VPC, and VM execution trace.
 コードキャッシュ検出部1216は、VPCが指すメモリ領域を、コードキャッシュとして、VM実行トレースから検出する。コードキャッシュ検出部1216は、このコードキャッシュを確保したメモリ割り当て関数の呼び出し元のコード箇所を、実行トレースから検出する。コードキャッシュ検出部1216は、VM実行トレースのうち、このコード箇所で確保された全てのメモリ領域をコードキャッシュとして検出する。 The code cache detection unit 1216 detects the memory area pointed to by the VPC as a code cache from the VM execution trace. The code cache detection unit 1216 detects the code location from which the memory allocation function that allocated this code cache was called from the execution trace. The code cache detection unit 1216 detects all memory areas allocated at this code location from the VM execution trace as code caches.
 コードキャッシュ検出部1216は、コードキャッシュに書き込みをしているコード箇所を実行トレースから検出する。コードキャッシュ検出部1216は、VM実行トレースのうち、このコード箇所による書き込みをコードキャッシュの更新として検出する。 The code cache detection unit 1216 detects code locations that are writing to the code cache from the execution trace. The code cache detection unit 1216 detects writing by these code locations in the VM execution trace as updates to the code cache.
 命令セットアーキテクチャ解析部122は、VMの命令の体系である命令セットアーキテクチャを解析する。命令セットアーキテクチャ解析部122は、VM命令を収集する。収集したVM命令の命令内容を判定する。 The instruction set architecture analysis unit 122 analyzes the instruction set architecture, which is the system of VM instructions. The instruction set architecture analysis unit 122 collects VM instructions. It determines the instruction content of the collected VM instructions.
 命令セットアーキテクチャ解析部122は、VM実行トレース取得部1221(第2の取得部)、VM命令収集部1222(第1の収集部)及びVM命令判定部1223(第1の判定部)を有する。 The instruction set architecture analysis unit 122 has a VM execution trace acquisition unit 1221 (second acquisition unit), a VM instruction collection unit 1222 (first collection unit), and a VM instruction determination unit 1223 (first determination unit).
 VM実行トレース取得部1221は、実行トレース取得部1211と同じく、テストスクリプト及びスクリプトエンジンバイナリを入力として受け付ける。VM実行トレース取得部1221は、VPCの監視と、ディスパッチャがディスパッチするVM命令ハンドラのポインタの監視により、VM実行トレースを取得する。VM実行トレース取得部1221は、スクリプトエンジンバイナリの実行を監視しながら、テストスクリプトを実行することで、VM上で実行された実行トレースであるVM実行トレースを取得する。VM実行トレース取得部1221は、分岐VM命令の検出において、多数のテストスクリプトを実行して、VM実行トレースを取得する。VM実行トレース取得部1221は、VM命令へのポインタとVM命令とを紐づけ、各々に識別子としてVMオペコードを仮想的に割り振る。 Like execution trace acquisition unit 1211, VM execution trace acquisition unit 1221 accepts test scripts and script engine binaries as input. VM execution trace acquisition unit 1221 acquires VM execution traces by monitoring VPCs and pointers to VM instruction handlers dispatched by the dispatcher. VM execution trace acquisition unit 1221 acquires VM execution traces, which are execution traces executed on a VM, by executing test scripts while monitoring the execution of script engine binaries. When detecting branch VM instructions, VM execution trace acquisition unit 1221 executes multiple test scripts to acquire VM execution traces. VM execution trace acquisition unit 1221 links pointers to VM instructions with VM instructions, and virtually assigns VM opcodes as identifiers to each.
 VM実行トレースは、VMにおいて実行された実行トレースであって、識別子としてVMオペコードが仮想的に割り振られ、実行されたVMハンドラのポインタとVPCとを記録したものである。VM実行トレースは、実行されたVM命令ハンドラのポインタと、VPCを記録したものである。具体的には、VM実行トレースは、実行されたVM命令ごとのVPCとVMオペコードで構成される。VPCの記録は、仮想プログラムカウンタ検出部1213で検出されたVPCのメモリを監視することで実現できる。VMオペコードは、VM命令へのポインタとVM命令とを紐づけた各々に仮想的に割り振られた識別子である。VM実行トレース取得部1221が取得したVM実行トレースは、VM実行トレースDB133に格納される。 A VM execution trace is an execution trace executed in a VM, in which a VM opcode is virtually assigned as an identifier, and in which a pointer to the executed VM handler and a VPC are recorded. A VM execution trace is a record of a pointer to an executed VM instruction handler and a VPC. Specifically, a VM execution trace is composed of a VPC and a VM opcode for each executed VM instruction. The recording of a VPC can be achieved by monitoring the memory of the VPC detected by the virtual program counter detection unit 1213. A VM opcode is an identifier virtually assigned to each of a pointer to a VM instruction and a VM instruction that are linked together. The VM execution trace acquired by the VM execution trace acquisition unit 1221 is stored in the VM execution trace DB 133.
 VM命令収集部1222は、VPC及びディスパッチャを入力として受け付け、VPC及びディスパッチャを監視しながらスクリプトを実行しVM実行トレースを取得する。VM命令収集部1222は、VM実行トレースからVM命令を収集する。 The VM command collection unit 1222 receives the VPC and dispatcher as input, executes the script while monitoring the VPC and dispatcher, and obtains the VM execution trace. The VM command collection unit 1222 collects VM commands from the VM execution trace.
 VM命令判定部1223は、VM命令収集部1222によって収集されたVM命令の命令内容を判定する。まず、VM命令判定部1223は、VM実行トレースのVMオペコードごとのVPCの変化量のばらつきによって、分岐VM命令を判定する。 The VM instruction determination unit 1223 determines the instruction content of the VM instruction collected by the VM instruction collection unit 1222. First, the VM instruction determination unit 1223 determines whether it is a branch VM instruction based on the variation in the amount of change in VPC for each VM opcode in the VM execution trace.
 VM命令判定部1223は、VM実行トレースDB133に格納されたVM実行トレースを取り出して解析し、分岐VM命令を判定する。VM命令判定部1223は、識別子として割り振られたVMオペコードごとに、その実行の前後でのVPCの変化量を収集する。VMオペコードが分岐VM命令以外のものの場合、VPCの変化量は、ほぼ一定である。一方、VMオペコードが分岐VM命令のものの場合、VPCは分岐先によってばらつきが生じる。 The VM instruction determination unit 1223 retrieves and analyzes the VM execution traces stored in the VM execution trace DB 133 to determine whether the VM instruction is a branch VM instruction. For each VM opcode assigned as an identifier, the VM instruction determination unit 1223 collects the amount of change in VPC before and after its execution. If the VM opcode is other than a branch VM instruction, the amount of change in VPC is almost constant. On the other hand, if the VM opcode is a branch VM instruction, the VPC varies depending on the branch destination.
 そこで、VM命令判定部1223は、VM実行トレースのVMオペコードごとの仮想プログラムカウンタの変化量のばらつきによって、分岐VM命令を判定する。VM命令判定部1223は、分岐VM命令とそれ以外のVM命令とではVPCの値のばらつきの大きさが異なることに着目し、閾値を決めて、よりVPCの値のばらつきの大きいものを分岐VM命令として判定する。具体的には、VM命令判定部1223は、VMオペコードごとVPCの変化量のばらつきを分散で評価し、分散が一定の閾値以上のものを、分岐VM命令として判定する。 The VM instruction determination unit 1223 therefore determines whether an instruction is a branch VM instruction based on the variance in the amount of change in the virtual program counter for each VM opcode in the VM execution trace. The VM instruction determination unit 1223 focuses on the fact that the amount of variance in the VPC value differs between branch VM instructions and other VM instructions, determines a threshold value, and determines instructions with greater variance in the VPC value as branch VM instructions. Specifically, the VM instruction determination unit 1223 evaluates the variance in the amount of change in the VPC for each VM opcode using variance, and determines instructions with variance equal to or greater than a certain threshold as branch VM instructions.
 また、VM命令判定部1223は、精緻な制御フローグラフの構築のために、分岐VM命令のうち、条件分岐VM命令であるものを判定する。条件分岐の際には、分岐先を決定するために、必ず条件分岐フラグへのアクセスが発生する。そのため、各分岐VM命令の実行の際に、条件分岐フラグにアクセスしているかを検証することで、条件分岐VM命令を判定できる。言い換えると、分岐VM命令の実行の際に、条件分岐フラグにアクセスしていれば条件分岐VM命令であり、アクセスしていなければ、条件分岐VM命令ではないと判定できる。そこで、VM命令判定部1223は、VM実行トレースとメモリアクセストレースに基づいて、分岐VM命令のうち、条件分岐フラグへのアクセスを伴うものを、条件分岐VM命令と判定する。 In addition, in order to construct an accurate control flow graph, the VM instruction determination unit 1223 determines which of the branch VM instructions are conditional branch VM instructions. When a conditional branch occurs, access to a conditional branch flag is always generated to determine the branch destination. Therefore, a conditional branch VM instruction can be determined by verifying whether the conditional branch flag is accessed when each branch VM instruction is executed. In other words, if the conditional branch flag is accessed when a branch VM instruction is executed, it can be determined that it is a conditional branch VM instruction, and if it is not accessed, it is not a conditional branch VM instruction. Therefore, the VM instruction determination unit 1223 determines that, among the branch VM instructions, those that involve access to a conditional branch flag are conditional branch VM instructions based on the VM execution trace and memory access trace.
 さらに、VM命令判定部1223は、コール及びリターンのVM命令も判定する。コールVM命令による分岐では、呼び出し元のバイトコード上での直後のアドレスが保存され、呼び出されたサブルーチンの実行後には、リターンVM命令によって、その保存されたアドレスに戻ってくる特徴がある。そこで、VM命令判定部1223は、ある分岐VM命令を命令1とし、以後の他の分岐VM命令を命令2として、命令2によって命令1のバイトコード上での直後のアドレスに戻る場合に、命令1と命令2の組を、コール及びリターンのVM命令と判定する。 Furthermore, the VM instruction determination unit 1223 also determines call and return VM instructions. A branch caused by a call VM instruction is characterized in that the address immediately following the caller's bytecode is saved, and after the called subroutine is executed, the return VM instruction returns to that saved address. Thus, when a certain branch VM instruction is designated as instruction 1, and another subsequent branch VM instruction is designated as instruction 2, and instruction 2 returns to the address immediately following instruction 1's bytecode, the VM instruction determination unit 1223 determines that the pair of instructions 1 and 2 are call and return VM instructions.
 続いて、VM命令判定部1223は、分岐VM命令のうち、条件分岐フラグにアクセスしている分岐VM命令を、条件分岐VM命令と判定する。この場合、VM命令判定部1223は、アーキテクチャ情報DB132から、分岐VM命令のオペコードのリストを取り出し、VM実行トレースDB133から、VM実行トレースを一つ取り出し、実行トレースDB131から対応する実行トレースを取り出す。 Next, the VM instruction determination unit 1223 determines that, among the branch VM instructions, a branch VM instruction that accesses a conditional branch flag is a conditional branch VM instruction. In this case, the VM instruction determination unit 1223 retrieves a list of opcodes of branch VM instructions from the architecture information DB 132, retrieves one VM execution trace from the VM execution trace DB 133, and retrieves the corresponding execution trace from the execution trace DB 131.
 VM命令判定部1223は、VM実行トレースから分岐VM命令を実行している箇所を一つ取り出し、実行トレースから、取り出した分岐VM命令の実行に対応したメモリアクセストレースを取り出す。 The VM instruction determination unit 1223 extracts one location where a branch VM instruction is being executed from the VM execution trace, and extracts a memory access trace corresponding to the execution of the extracted branch VM instruction from the execution trace.
 VM命令判定部1223は、メモリアクセストレースを基に、取り出した分岐VM命令が、条件分岐フラグにアクセスしているか否かを判定する。VM命令判定部1223条件は、条件分岐フラグにアクセスしている場合、取り出した該分岐VM命令を条件分岐VM命令と判定する。 The VM instruction determination unit 1223 determines whether the retrieved branch VM instruction is accessing a conditional branch flag based on the memory access trace. If the conditional branch flag is being accessed, the VM instruction determination unit 1223 determines that the retrieved branch VM instruction is a conditional branch VM instruction.
 そして、VM命令判定部1223は、VM実行トレースのうち、任意の分岐VM命令を走査し、この任意の分岐VM命令の直後に分岐する分岐命令がある場合、この任意の分岐VM命令を呼び出しVM命令と判定する。 Then, the VM instruction determination unit 1223 scans the VM execution trace for any branch VM instruction, and if there is a branch instruction that branches immediately after this arbitrary branch VM instruction, it determines that this arbitrary branch VM instruction is a call VM instruction.
 この場合、VM命令判定部1223は、アーキテクチャ情報DB132から、分岐VM命令のオペコードのリストを取り出し、VM実行トレースDB133から、VM実行トレースを一つ取り出す。VM命令判定部1223は、VM実行トレースから、分岐VM命令を実行している箇所を一つ取り出す。 In this case, the VM instruction determination unit 1223 extracts a list of opcodes of branch VM instructions from the architecture information DB 132, and extracts one VM execution trace from the VM execution trace DB 133. The VM instruction determination unit 1223 extracts one location where a branch VM instruction is being executed from the VM execution trace.
 VM命令判定部1223は、VM実行トレースのうち、取り出した分岐VM命令以降に出現する分岐VM命令を走査する。そして、VM命令判定部1223は、走査結果を基に、取り出した該分岐VM命令の直後に分岐する分岐VM命令があったか否かを判定する。VM命令判定部1223は、取り出した分岐VM命令の直後に分岐する分岐VM命令があった場合、この取り出した該分岐VM命令を呼び出しVM命令と判定する。 The VM instruction determination unit 1223 scans the VM execution trace for branch VM instructions that appear after the fetched branch VM instruction. Then, based on the scan results, the VM instruction determination unit 1223 determines whether or not there is a branch VM instruction that branches immediately after the fetched branch VM instruction. If there is a branch VM instruction that branches immediately after the fetched branch VM instruction, the VM instruction determination unit 1223 determines that the fetched branch VM instruction is a call VM instruction.
 さらに、VM命令判定部1223は、VM実行トレースから呼び出しVM命令を取り出し、取り出した呼び出しVM命令の直後に分岐する分岐VM命令があった場合、この取り出した呼び出しVM命令の直後に分岐する分岐VM命令を戻りVM命令と判定する。 Furthermore, the VM instruction determination unit 1223 extracts a call VM instruction from the VM execution trace, and if there is a branch VM instruction that branches immediately after the extracted call VM instruction, it determines that the branch VM instruction that branches immediately after the extracted call VM instruction is a return VM instruction.
 この場合、VM命令判定部1223は、アーキテクチャ情報DB132から、分岐VM命令のオペコードのリストを取り出し、VM実行トレースDB133から、VM実行トレースを一つ取り出す。VM命令判定部1223は、VM実行トレースから、呼び出しVM命令を実行している箇所を一つ取り出す。 In this case, the VM instruction determination unit 1223 extracts a list of opcodes of branch VM instructions from the architecture information DB 132, and extracts one VM execution trace from the VM execution trace DB 133. The VM instruction determination unit 1223 extracts one location where a calling VM instruction is being executed from the VM execution trace.
 VM命令判定部1223は、VM実行トレースのうち、取り出した該呼び出しVM命令以降に出現する分岐VM命令を走査し、取り出した呼び出しVM命令の直後に分岐する分岐VM命令があったか否かを判定する。 The VM instruction determination unit 1223 scans the VM execution trace for branch VM instructions that appear after the retrieved call VM instruction, and determines whether there is a branch VM instruction that branches immediately after the retrieved call VM instruction.
 VM命令判定部1223は、取り出した呼び出しVM命令の直後に分岐する分岐VM命令があった場合、この取り出した呼び出しVM命令の直後に分岐する分岐VM命令を戻りVM命令と判定する。 If there is a branch VM instruction that branches immediately after the fetched call VM instruction, the VM instruction determination unit 1223 determines that the branch VM instruction that branches immediately after the fetched call VM instruction is a return VM instruction.
 脆弱性発見部123は、仮想機械解析部121及び命令セットアーキテクチャ解析部122によって取得されたアーキテクチャ情報を基に、変異させたコードを用いてVMをファジングする。脆弱性発見部123は、コードキャッシュを監視しながらシードスクリプトを実行してコードキャッシュからバイトコードを抽出する。そして、脆弱性発見部123は、抽出したバイトコードに変異を加え、再度コードキャッシュに埋め込んで実行する。脆弱性発見部123は、変異部1231及びファジング実行部1232を有する。 The vulnerability discovery unit 123 fuzzes the VM using the mutated code based on the architecture information acquired by the virtual machine analysis unit 121 and the instruction set architecture analysis unit 122. The vulnerability discovery unit 123 executes a seed script while monitoring the code cache to extract bytecode from the code cache. The vulnerability discovery unit 123 then mutates the extracted bytecode and embeds it again in the code cache for execution. The vulnerability discovery unit 123 has a mutation unit 1231 and a fuzzing execution unit 1232.
 変異部1231は、VM命令の追加、削除、変更などの所定の方法で、ファジング実行部1232が抽出したバイトコードを変異させ、更新されたバイトコードの辞書をファジング実行部2132に出力する。 The mutation unit 1231 mutates the bytecode extracted by the fuzzing execution unit 1232 using a predetermined method such as adding, deleting, or changing VM instructions, and outputs the updated bytecode dictionary to the fuzzing execution unit 2132.
 ファジング実行部1232は、シードスクリプト、VPC及びコードキャッシュを入力として受け付ける。そして、ファジング実行部1232は、コードキャッシュを監視しながらシードスクリプトを実行してコードキャッシュからバイトコードを抽出する。ファジング実行部2132は、変異されたコードを、コードキャッシュ内のVPCの指す先に再度埋め込んで、実行を再開する。ファジング実行部1232は、実行の際にクラッシュなどの問題が発生した入力値(バイトコード)を出力する。 The fuzzing execution unit 1232 accepts the seed script, VPC, and code cache as input. The fuzzing execution unit 1232 then executes the seed script while monitoring the code cache to extract bytecode from the code cache. The fuzzing execution unit 2132 re-embeds the mutated code at the destination pointed to by the VPC in the code cache and resumes execution. The fuzzing execution unit 1232 outputs the input value (bytecode) that caused a problem such as a crash during execution.
 記憶部13は、RAM(Random Access Memory)、フラッシュメモリ(Flash Memory)等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現され、脆弱性発見装置10を動作させる処理プログラムや、処理プログラムの実行中に使用されるデータなどが記憶される。記憶部13は、実行トレースデータベース(DB)131、VM実行トレースDB133、及び、仮想機械解析部121及び命令セットアーキテクチャ解析部122によって取得されたアーキテクチャ情報を記憶するアーキテクチャ情報DB132を有する。 The storage unit 13 is realized by a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk, and stores the processing program that operates the vulnerability detection device 10, data used during the execution of the processing program, etc. The storage unit 13 has an execution trace database (DB) 131, a VM execution trace DB 133, and an architecture information DB 132 that stores architecture information acquired by the virtual machine analysis unit 121 and the instruction set architecture analysis unit 122.
 実行トレースDB131及びVM実行トレースDB133は、それぞれ実行トレース取得部1211及びVM実行トレース取得部1221によって取得された実行トレース及びVM実行トレースを格納する。実行トレースDB131及びVM実行トレースDB133は、脆弱性発見装置10によって管理される。もちろん、実行トレースDB131及びVM実行トレースDB133は、他の装置(サーバ等)によって管理されていてもよく、この場合には、実行トレース取得部1211及びVM実行トレース取得部1221は、出力部14の通信インタフェースを介して、取得した実行トレース及びVM実行トレースを、実行トレースDB131及びVM実行トレースDB133の管理サーバ等に出力して、実行トレースDB131及びVM実行トレースDB133に記憶させる。 The execution trace DB 131 and the VM execution trace DB 133 store the execution traces and VM execution traces acquired by the execution trace acquisition unit 1211 and the VM execution trace acquisition unit 1221, respectively. The execution trace DB 131 and the VM execution trace DB 133 are managed by the vulnerability discovery device 10. Of course, the execution trace DB 131 and the VM execution trace DB 133 may be managed by other devices (servers, etc.), in which case the execution trace acquisition units 1211 and 1221 output the acquired execution traces and VM execution traces to management servers, etc., of the execution trace DB 131 and the VM execution trace DB 133 via the communication interface of the output unit 14, and store them in the execution trace DB 131 and the VM execution trace DB 133.
 出力部14は、例えば、液晶ディスプレイやプリンタ等であって、脆弱性発見装置10に関する情報を含む各種情報を出力する。また、出力部14は、外部装置との間で、各種データの入出力を司るインタフェースであってもよく、外部装置に各種情報を出力してもよい。 The output unit 14 is, for example, an LCD display or a printer, and outputs various information including information related to the vulnerability detection device 10. The output unit 14 may also be an interface that handles the input and output of various data between an external device, and may output various information to an external device.
[テストスクリプトの構成]
 テストスクリプトについて説明する。テストスクリプトは、スクリプトエンジンを動的解析する際に入力されるスクリプトである。このテストスクリプトは、分岐命令の実行やメモリ読み書きの回数に着目し、異なる回数のテストスクリプトを実行したときに生じるスクリプトエンジンの挙動の差分を捉えるために用いられる。このテストスクリプトは、解析の事前に準備するものであり、手動で作成するものである。この作成には、対象のスクリプト言語の仕様に関する知識が必要となる。
[Test script configuration]
Let us now explain the test script. A test script is a script that is input when dynamically analyzing a script engine. This test script focuses on the number of branch instruction executions and memory reads and writes, and is used to capture the difference in the behavior of the script engine that occurs when the test script is executed a different number of times. This test script is prepared in advance of the analysis and is created manually. Creating it requires knowledge of the specifications of the target script language.
 図4は、VPCの検出に用いるテストスクリプト(第1のテストスクリプト)の一例を示す図である。第1のテストスクリプトでは、繰り返し処理を用いる(2行目)。第1のテストスクリプトでは、テストスクリプト内の繰り返し回数(2行目)や繰り返される文の数(3行目から5行目)を増減させることで、実行時の条件を変更し、差分を発生させる。 Figure 4 shows an example of a test script (first test script) used to detect VPCs. The first test script uses a repetitive process (line 2). The first test script changes the execution conditions and generates differences by increasing or decreasing the number of repetitions (line 2) and the number of repeated statements (lines 3 to 5) in the test script.
 図5は、分岐VM命令検出に用いるテストスクリプト(第2のテストスクリプト)の一例を示す図である。第2のテストスクリプトでは、複数回の条件分岐を用いる(4行目から8行目)。第2のテストスクリプトにおいて、この複数回の条件分岐では、特定の順序のパターンで分岐がなされたり、なされなかったりするように、分岐条件を制御する(1行目、5行目)。第2のテストスクリプトでは、条件分岐の回数や、分岐の成否の順序パターンを変更し、差分を発生させる。 FIG. 5 is a diagram showing an example of a test script (second test script) used to detect branch VM instructions. The second test script uses multiple conditional branches (lines 4 to 8). In the second test script, the branch conditions are controlled so that the multiple conditional branches are either taken or not taken in a specific order pattern (lines 1 and 5). In the second test script, the number of conditional branches and the order pattern of branch success or failure are changed to generate differences.
[実行トレースの構成]
 次に、実行トレースについて説明する。図6は、実行トレースの一例を示す図である。実行トレースは、前述の通り、ブランチトレースとメモリアクセストレースによって構成されている。図6は、実行トレースの一部を切り出したものである。以降、図6を用いて実行トレースの構成を示す。
Execution Trace Configuration
Next, the execution trace will be described. Fig. 6 is a diagram showing an example of an execution trace. As described above, an execution trace is composed of a branch trace and a memory access trace. Fig. 6 shows an excerpt of an execution trace. The structure of an execution trace will be described below with reference to Fig. 6.
 実行トレースは、traceという要素を有する。traceには、そのログ行がブランチトレースか、メモリアクセストレースかが示される。 An execution trace has an element called trace. Trace indicates whether the log line is a branch trace or a memory access trace.
 ブランチトレースのログ行は、例えば、図6の1行目から10行目に記載の書式になっており、type、src、dstの三つの要素からなる。typeは、実行された分岐命令がcall命令によるものか、jmp命令によるものか、ret命令によるものかを示す。また、srcは、分岐元のアドレスを示し、dstは、分岐先のアドレスを示す。 A branch trace log line has the format shown, for example, in lines 1 to 10 of Figure 6, and consists of three elements: type, src, and dst. type indicates whether the executed branch instruction was a call instruction, a jmp instruction, or a ret instruction. src indicates the address of the branch source, and dst indicates the address of the branch destination.
 メモリアクセストレースのログ行は、たとえば、図6の11行目から13行目に記載の書式になっており、type、target、valueの三つの要素からなる。typeは、メモリアクセスが読み込みか書き込みかを示す。targetは、メモリアクセスの対象となるメモリアドレスを示す。また、valueには、メモリアクセスの結果の値が格納される。 A log line of a memory access trace has the format shown, for example, in lines 11 to 13 of Figure 6, and consists of three elements: type, target, and value. Type indicates whether the memory access is a read or write. Target indicates the memory address that is the target of the memory access. Value stores the result of the memory access.
[VM実行トレースの構成]
 次に、VM実行トレースについて説明する。図7は、VM実行トレースの一例を示す図である。VM実行トレースは、前述の通り、VMオペコードとVPCとを記録したものである。図7は、VM実行トレースの一部を切り出したものである。以降、図7を用いてVM実行トレースの構成を示す。
[VM Execution Trace Configuration]
Next, a VM execution trace will be described. Fig. 7 is a diagram showing an example of a VM execution trace. As described above, a VM execution trace is a record of a VM opcode and a VPC. Fig. 7 shows a part of a VM execution trace. The configuration of a VM execution trace will be described below with reference to Fig. 7.
 VM実行トレースのログ行は、たとえば、図7に記載の書式になっており、vpc及びvmop(vm opcode)の二つの要素からなる。vpcは、VPCの値を示す。また、vmopは、ポインタキャッシュから取得された、実行されるVM命令ハンドラの先頭を指すポインタごとに仮想的に割り振られたVMオペコードの値を示す。 A log line of a VM execution trace is, for example, in the format shown in Figure 7, and consists of two elements: vpc and vmop (vm opcode). vpc indicates the value of the VPC. Also, vmop indicates the value of the VM opcode that is virtually assigned to each pointer that points to the beginning of the VM instruction handler to be executed, obtained from the pointer cache.
[VM命令境界検出部の処理]
 次に、VM命令境界検出部1212の処理について説明する。図8は、VM命令境界検出部1212の処理を説明する図である。
[Processing of VM instruction boundary detection unit]
Next, a description will be given of the processing of the VM instruction boundary detection unit 1212. FIG.
 VM命令境界検出部1212は、各VM命令の境界を検出する。この時、VM命令境界検出部1212は、インタプリタループを持たないためにVM命令の境界の把握が難しいスレッデッドコード型VMのために、VM命令とその境界の検出を行う。具体的には、VM命令境界検出部1212は、実行トレースDB131から実行トレースを取り出す。そして、図8に示すように、VM命令境界検出部1212は、実行トレースを、所定の方法でクラスタリングして、実行回数が閾値以上のクラスタをVM命令(例えば、VM命令ハンドラ1~3)として検出する。VM命令境界検出部1212は、VM命令を構成する連続した命令列の開始点と終了点とを境界として検出する。 The VM instruction boundary detection unit 1212 detects the boundaries of each VM instruction. At this time, the VM instruction boundary detection unit 1212 detects VM instructions and their boundaries for threaded code type VMs, which do not have an interpreter loop and therefore make it difficult to grasp the boundaries of VM instructions. Specifically, the VM instruction boundary detection unit 1212 extracts execution traces from the execution trace DB 131. Then, as shown in FIG. 8, the VM instruction boundary detection unit 1212 clusters the execution traces using a predetermined method, and detects clusters with a threshold or more of execution counts as VM instructions (e.g., VM instruction handlers 1 to 3). The VM instruction boundary detection unit 1212 detects the start and end points of the consecutive instruction strings that make up a VM instruction as boundaries.
[仮想プログラムカウンタ検出部の処理]
 次に、仮想プログラムカウンタ検出部1213の処理について説明する。仮想プログラムカウンタ検出部1213は、VPC、ポインタキャッシュの検出を行う。仮想プログラムカウンタの検出は、取得した実行トレースのメモリアクセストレースのログを解析することで実現される。仮想プログラムカウンタ検出部1213は、メモリの読み込み回数に着目した差分実行解析を用いる。図9は、仮想プログラムカウンタ検出部1213の処理を説明する図である。
[Processing of Virtual Program Counter Detection Unit]
Next, the processing of the virtual program counter detection unit 1213 will be described. The virtual program counter detection unit 1213 detects the VPC and the pointer cache. The detection of the virtual program counter is realized by analyzing the log of the memory access trace of the acquired execution trace. The virtual program counter detection unit 1213 uses differential execution analysis focusing on the number of times memory is read. FIG. 9 is a diagram for explaining the processing of the virtual program counter detection unit 1213.
 仮想プログラムカウンタ検出部1213は、実行トレースDB131から第1のテストスクリプトによる実行トレースを一つ取り出す。VPCの読み込みの回数は、テストスクリプト内の繰り返し回数及び、繰り返し処理の中の文の数に比例する。繰り返しの回数をN、繰り返される文の数をMとしたとき、概ねMN程度のVPCの読み込みが発生する。このため、仮想プログラムカウンタ検出部1213は、N及びMをそれぞれ2Nと2M、3Nと3Mと増やした第1のテストスクリプトに対する実行トレースにおいて、4MN、9MNという増え方をしたメモリを抽出する。具体的には、図9に示すように、仮想プログラムカウンタ検出部1213は、1VM命令実行毎にRead/Writeがあり、単調増加するメモリ領域を抽出する(図9の(1))。 The virtual program counter detection unit 1213 extracts one execution trace by the first test script from the execution trace DB 131. The number of times the VPC is read is proportional to the number of repetitions in the test script and the number of statements in the repetitive process. If the number of repetitions is N and the number of repeated statements is M, then approximately MN VPC reads will occur. For this reason, the virtual program counter detection unit 1213 extracts memory that has increased by 4MN and 9MN in the execution trace for the first test script in which N and M have been increased to 2N and 2M, respectively, and 3N and 3M. Specifically, as shown in FIG. 9, the virtual program counter detection unit 1213 extracts memory areas that have a monotonically increasing read/write for each VM instruction execution ((1) in FIG. 9).
 そして、仮想プログラムカウンタ検出部1213は、読み込んだメモリの値が常にVM命令の開始点を指しているものを、VPCとして検出する。具体的には、仮想プログラムカウンタ検出部1213は、VPCの指し先とVM命令ハンドラのアドレスとを照合して、一致するメモリ領域に絞り込む(図9の(2))。 Then, the virtual program counter detection unit 1213 detects as a VPC a memory value that has been read and that always points to the start point of a VM instruction. Specifically, the virtual program counter detection unit 1213 compares the VPC's pointing destination with the address of the VM instruction handler, and narrows it down to matching memory areas ((2) in FIG. 9).
[ディスパッチャ検出部の処理]
 次に、ディスパッチャ検出部1214の処理について説明する。ディスパッチャ検出部1214は、スクリプトエンジンのバイナリを所定の手法で解析することで、ディスパッチャを検出する。図10は、ディスパッチャ検出部1214の処理を説明する図である。
[Dispatcher Detection Processing]
Next, a description will be given of the processing of the dispatcher detection unit 1214. The dispatcher detection unit 1214 detects a dispatcher by analyzing the binary of the script engine using a predetermined method. FIG. 10 is a diagram for explaining the processing of the dispatcher detection unit 1214.
 ディスパッチャ検出部1214は、ディスパッチャの検出を行う。ディスパッチャ検出部1214は、VM命令境界検出部1212が検出したVM命令の境界を基に、スクリプトエンジンバイナリから各VM命令部分を切り出す。そして、ディスパッチャ検出部1214は、ディスパッチャのコードの類似性は高いとした仮定の基(図10の(1))、各VM命令間でコード間の類似度を算出し、全VM命令間で類似度が高い部分を、ディスパッチャとして検出する。ディスパッチャ検出部1214は、VM命令の後半部で共通的に実行されるコードを、ディスパッチャとして検出できる(図10の(1))。 The dispatcher detection unit 1214 detects dispatchers. Based on the boundaries of VM instructions detected by the VM instruction boundary detection unit 1212, the dispatcher detection unit 1214 extracts each VM instruction portion from the script engine binary. Then, based on the assumption that the similarity of dispatcher code is high ((1) in FIG. 10), the dispatcher detection unit 1214 calculates the similarity between the codes of each VM instruction and detects the portion with high similarity between all VM instructions as a dispatcher. The dispatcher detection unit 1214 can detect the code that is commonly executed in the latter half of the VM instructions as a dispatcher ((1) in FIG. 10).
[コードキャッシュ検出部]
 次に、コードキャッシュ検出部1216の処理について説明する。図11は、コードキャッシュ検出部1216の処理を説明する図である。
[Code cache detection part]
Next, a description will be given of the processing of the code cache detection unit 1216. FIG.
 コードキャッシュ検出部1216は、VPCが指すメモリ領域をコードキャッシュとしてVM実行トレースから検出する(図11の(1))。 The code cache detection unit 1216 detects the memory area pointed to by the VPC as a code cache from the VM execution trace ((1) in FIG. 11).
 コードキャッシュ検出部1216は、このコードキャッシュを確保したメモリ割り当て関数の呼び出し元のコード箇所を、実行トレースから検出する(図11の(2))。コードキャッシュ検出部1216は、VM実行トレースのうち、このコード箇所で確保された全てのメモリ領域をコードキャッシュとして検出する(図11の(3))。 The code cache detection unit 1216 detects the code location that called the memory allocation function that allocated this code cache from the execution trace ((2) in FIG. 11). The code cache detection unit 1216 detects all memory areas allocated at this code location from the VM execution trace as code caches ((3) in FIG. 11).
 コードキャッシュ検出部1216は、コードキャッシュに書き込みをしているコード箇所を実行トレースから検出する(図11の(4))。コードキャッシュ検出部1216は、VM実行トレースのうち、このコード箇所による書き込みをコードキャッシュの更新として検出する(図11(5))。 The code cache detection unit 1216 detects the code location that is writing to the code cache from the execution trace ((4) in FIG. 11). The code cache detection unit 1216 detects the writing by this code location in the VM execution trace as an update to the code cache ((5) in FIG. 11).
[VM命令判定部]
 次に、VM命令判定部1223の処理について説明する。VM命令判定部1223は、まず、取得したVM実行トレースのログを解析することで分岐VM命令を判定する。ここでのテストスクリプトは、分岐VM命令が含まれていればよいため、分岐の制御構文を含むスクリプトでありさえすればどのようなものでもよい。例えば、インターネット上から収集したり、公式ドキュメントから取得したりしてテストスクリプトを準備する。
[VM command determination unit]
Next, the process of the VM instruction determination unit 1223 will be described. The VM instruction determination unit 1223 first analyzes the acquired VM execution trace log to determine a branch VM instruction. The test script here may be any script that includes a branch VM instruction and includes a branch control syntax. For example, the test script is prepared by collecting information from the Internet or obtaining information from official documents.
 まず、VM命令判定部1223は、VM実行トレースDB133の各VM実行トレースに対し、VM命令へのポインタとVM命令とを紐づけ、各々に識別子として、VMオペコードを仮想的に割り振る。図12は、VM命令判定部1223の処理を説明する図である。 First, the VM instruction determination unit 1223 associates a pointer to a VM instruction with a VM instruction for each VM execution trace in the VM execution trace DB 133, and virtually assigns a VM opcode as an identifier to each of them. Figure 12 is a diagram explaining the processing of the VM instruction determination unit 1223.
 ここで、あるVM命令が分岐命令のとき、VPCの進みは、分岐先に依存して変化する。一方、分岐命令以外のときは、VPCの進みは、VM命令のサイズに依存して変化する。このため、VM命令のオペコードとVM命令へのポインタとの組を収集し、オペコードごとにVPCの進みを見たとき、分岐命令であれば分岐先によってVPCの進みにばらつきがみられる。 Here, if a VM instruction is a branch instruction, the advance of the VPC changes depending on the branch destination. On the other hand, if it is not a branch instruction, the advance of the VPC changes depending on the size of the VM instruction. For this reason, when pairs of VM instruction opcodes and pointers to VM instructions are collected and the advance of the VPC is examined for each opcode, if it is a branch instruction, the advance of the VPC will vary depending on the branch destination.
 したがって、VM命令判定部1223は、このVM命令へのポインタのばらつきを評価するため、分散を用いる。VM命令判定部1223は、VMオペコード毎にVPCの変化量の分散を算出し、算出した分散が閾値よりも大きいVMオペコードのみに絞り込む。これによって、VM命令判定部1223は、ポインタとVM命令を対応付けつつ、VPCの進みにばらつきのあるVM命令(図12の例では、VM命令ハンドラ3)を、分岐VM命令として判定する(図12の(1))。 Therefore, the VM instruction determination unit 1223 uses variance to evaluate the variance of the pointer to this VM instruction. The VM instruction determination unit 1223 calculates the variance of the amount of change in the VPC for each VM opcode, and narrows it down to only VM opcodes whose calculated variance is greater than a threshold. In this way, the VM instruction determination unit 1223 associates the pointer with the VM instruction, and determines that the VM instruction with variance in the advance of the VPC (VM instruction handler 3 in the example of FIG. 12) is a branch VM instruction ((1) in FIG. 12).
 あるオペコードに対するVPCの進みの集合OをO={o,o,・・・,o}(VPCoの平均は(1)式を参照)とし、tを閾値としたとき、分岐命令か否かは、分散s((2)式を参照)を基に、(3)式のように判定される。これによって、VM命令判定部1223は、分岐VM命令を判定する。 When the set O of VPC advances for a certain opcode is O={ o0 , o1 , ..., oN } (see equation (1) for the average of VPCo) and t is a threshold, whether or not an instruction is a branch instruction is determined based on the variance s (see equation (2)) as shown in equation (3). In this way, the VM instruction determination unit 1223 determines whether an instruction is a branch VM instruction.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 なお、分岐以外のVM命令では、ばらつきがほとんど見られず、分岐VM命令とそれ以外のVM命令との境界は明確であることが多い。このため、閾値として、例えば、得られた分散の値を数直線上にプロットして、できた二つの群を分割可能な値が設定される。VM命令判定部1223は、分岐VM命令を基に、条件分岐VM命令、呼び出しVM命令、戻りVM命令を判定する。 In addition, for VM instructions other than branches, there is almost no variation, and the boundary between branch VM instructions and other VM instructions is often clear. For this reason, the threshold value is set to a value that can divide the two groups obtained by plotting the obtained variance value on a number line, for example. The VM instruction determination unit 1223 determines whether the VM instruction is a conditional branch VM instruction, a call VM instruction, or a return VM instruction based on the branch VM instruction.
[変異部の処理]
 次に、変異部1231の処理について説明する。図13は、変異部1231の処理を説明する図である。
[Treatment of mutated parts]
Next, a description will be given of the processing of the mutation unit 1231. FIG.
 変異部1231は、VM命令のリストを参照する(図13の(1))。このVM命令のリストは、VM命令収集部1222で収集されたものである。そして、バイトコードの辞書からバイトコードが一つ取り出されると(図13の(2))。変異部1231は、VM命令の追加、削除、変更など所定の方法で、取り出したバイトコードを変異させる(図13の(3))。 The mutation unit 1231 refers to the list of VM instructions ((1) in FIG. 13). This list of VM instructions is collected by the VM instruction collection unit 1222. Then, one bytecode is extracted from the bytecode dictionary ((2) in FIG. 13). The mutation unit 1231 mutates the extracted bytecode in a predetermined manner, such as by adding, deleting, or changing a VM instruction ((3) in FIG. 13).
[脆弱性発見装置の処理手順]
 次に、脆弱性発見装置10による解析処理の処理手順について説明する。図14は、実施の形態に係る解析処理の処理手順を示すフローチャートである。
[Processing procedure of vulnerability detection device]
Next, a description will be given of the procedure of the analysis process performed by the vulnerability discovering device 10. Fig. 14 is a flowchart showing the procedure of the analysis process according to the embodiment.
 まず、入力部11は、テストスクリプト及びスクリプトエンジンバイナリを入力として受け取る(ステップS1)。 First, the input unit 11 receives a test script and a script engine binary as input (step S1).
 そして、実行トレース取得部1211は、スクリプトエンジンのバイナリを監視しながらテストスクリプトを実行してブランチトレースとメモリアクセストレースを取得する実行トレース取得処理を行う(ステップS2)。 Then, the execution trace acquisition unit 1211 performs an execution trace acquisition process in which the test script is executed while monitoring the binary of the script engine to acquire branch traces and memory access traces (step S2).
 VM命令境界検出部1212は、VM命令を検出し、VM命令の境界を検出するVM命令境界検出処理を行う(ステップS3)。仮想プログラムカウンタ検出部1213は、実行トレースDB131に格納された第1のテストスクリプトに対する実行トレースを取り出して解析し、VPCを発見する仮想プログラムカウンタ検出処理を行う(ステップS4)。 The VM instruction boundary detection unit 1212 detects VM instructions and performs VM instruction boundary detection processing to detect VM instruction boundaries (step S3). The virtual program counter detection unit 1213 extracts and analyzes the execution trace for the first test script stored in the execution trace DB 131, and performs virtual program counter detection processing to discover the VPC (step S4).
 ディスパッチャ検出部1214は、スクリプトエンジンバイナリから各VM命令部分を切り出し、各VM命令間で類似度が高い部分をディスパッチャとして検出するディスパッチャ検出処理を行う(ステップS5)。 The dispatcher detection unit 1214 performs dispatcher detection processing to extract each VM command portion from the script engine binary and detect the portion with high similarity between each VM command as a dispatcher (step S5).
 条件分岐フラグ検出部1215は、実行トレースDB131に格納された第2のテストスクリプトに対する実行トレースを取り出して解析し、条件分岐フラグを発見する条件分岐検出処理を行う(ステップS6)。 The conditional branch flag detection unit 1215 performs a conditional branch detection process to extract and analyze the execution trace for the second test script stored in the execution trace DB 131 and discover the conditional branch flag (step S6).
 コードキャッシュ検出部1216は、実行トレース及びVPCを基に、メモリ割り当て関数の呼び出し元のコード箇所の領域をコードキャッシュとして検出し、コード箇所の領域に書き込みをしている領域をコードキャッシュの更新として検出するコードキャッシュ検出処理を行う(ステップS7)。 The code cache detection unit 1216 performs a code cache detection process based on the execution trace and VPC to detect the area of the code location from which the memory allocation function was called as a code cache, and to detect the area in which writing is being done to the code location area as an update to the code cache (step S7).
 VM実行トレース取得部1221は、テストスクリプト及びスクリプトエンジンバイナリを入力として受け付け、スクリプトエンジンバイナリの実行を監視しながら、テストスクリプトを実行することで、VM実行トレースを取得するVM実行トレース取得処理を行う(ステップS8)。 The VM execution trace acquisition unit 1221 receives the test script and the script engine binary as input, and executes the test script while monitoring the execution of the script engine binary, thereby performing a VM execution trace acquisition process to acquire a VM execution trace (step S8).
 VM命令収集部1222は、VM実行トレースからVM命令を取得するVM命令収集処理を行う(ステップS9)。VM命令判定部1223は、収集したVM命令の命令内容を判定するVM命令判定処理を行う(ステップS10)。 The VM instruction collection unit 1222 performs a VM instruction collection process to acquire VM instructions from the VM execution trace (step S9). The VM instruction determination unit 1223 performs a VM instruction determination process to determine the instruction content of the collected VM instructions (step S10).
 入力部11がシードスクリプトの入力を受け付けると(ステップS11)、ファジング実行部1232は、コードキャッシュを監視しながらシードスクリプトを実行してコードキャッシュからバイトコードを抽出するバイトコード抽出処理を行う(ステップS12)。 When the input unit 11 accepts the input of a seed script (step S11), the fuzzing execution unit 1232 executes a bytecode extraction process to execute the seed script while monitoring the code cache and extract bytecode from the code cache (step S12).
 変異部1231は、VM命令の追加、削除、変更などの所定の方法で、ファジング実行部1232が抽出したバイトコードを変異させる変異処理を行う(ステップS13)。ファジング実行部1232は、変異されたコードを、コードキャッシュ内のVPCの指す先に再度埋め込んで、実行処理を行う(ステップS14)。 The mutation unit 1231 performs a mutation process to mutate the bytecode extracted by the fuzzing execution unit 1232 using a predetermined method such as adding, deleting, or changing a VM instruction (step S13). The fuzzing execution unit 1232 re-embeds the mutated code at the destination pointed to by the VPC in the code cache, and performs an execution process (step S14).
 脆弱性発見部123は、実行処理において、クラッシュなどの問題が発生したか否かを判定する(ステップS15)。クラッシュなどの問題が発生した場合(ステップS15:Yes)、問題が発生した入力値を出力する(ステップS16)。クラッシュなどの問題が発生していない場合(ステップS15:No)、ステップS13の変異処理に戻り、VMのファジングを継続する。 The vulnerability discovery unit 123 determines whether or not a problem such as a crash has occurred during the execution process (step S15). If a problem such as a crash has occurred (step S15: Yes), the vulnerability discovery unit 123 outputs the input value where the problem occurred (step S16). If a problem such as a crash has not occurred (step S15: No), the process returns to the mutation process of step S13 and continues fuzzing the VM.
[実行トレース取得処理の処理手順]
 次に、図14に示す実行トレース取得処理の流れについて説明する。図15は、図14に示す実行トレース取得処理の処理手順を示すフローチャートである。
[Processing procedure for execution trace acquisition processing]
Next, a flow of the execution trace acquisition process shown in Fig. 14 will be described below. Fig. 15 is a flowchart showing the processing procedure of the execution trace acquisition process shown in Fig. 14.
 まず、実行トレース取得部1211は、テストスクリプト及びスクリプトエンジンバイナリを入力として受け取る(ステップS21)。そして、実行トレース取得部1211は、受け取ったスクリプトエンジンに対して、ブランチトレースを取得するためのフックを施す(ステップS22)。また、実行トレース取得部1211は、受け取ったスクリプトエンジンに対して、メモリアクセストレースを取得するためのフックも施す(ステップS23)。 First, the execution trace acquisition unit 1211 receives a test script and a script engine binary as input (step S21). Then, the execution trace acquisition unit 1211 hooks the received script engine to acquire a branch trace (step S22). The execution trace acquisition unit 1211 also hooks the received script engine to acquire a memory access trace (step S23).
 そして、実行トレース取得部1211は、その状態で受け取ったテストスクリプトをスクリプトエンジンに入力して実行させ(ステップS24)、それによって取得される実行トレースを実行トレースDB131に格納する(ステップS25)。 Then, the execution trace acquisition unit 1211 inputs the test script received in this state into the script engine and executes it (step S24), and stores the execution trace acquired thereby in the execution trace DB 131 (step S25).
 実行トレース取得部1211は、入力されたテストスクリプトを全て実行し終えているか否かを判定する(ステップS26)。実行トレース取得部1211は、入力されたテストスクリプトを全て実行し終えている場合(ステップS26:Yes)、処理を終了する。これに対し、実行トレース取得部1211は、入力されたテストスクリプトを全て実行していない場合(ステップS26:No)、ステップS24のテストスクリプトの実行に戻って処理を続ける。 The execution trace acquisition unit 1211 determines whether or not all of the input test scripts have been executed (step S26). If all of the input test scripts have been executed (step S26: Yes), the execution trace acquisition unit 1211 ends the process. On the other hand, if all of the input test scripts have not been executed (step S26: No), the execution trace acquisition unit 1211 returns to the execution of the test scripts in step S24 and continues the process.
[VM命令境界検出処理の処理手順]
 次に、図14に示すVM命令境界検出処理の流れについて説明する。図16は、図14に示すVM命令境界検出処理の処理手順を示すフローチャートである。
[Procedure of VM instruction boundary detection process]
Next, a description will be given of the flow of the VM instruction boundary detection process shown in Fig. 14. Fig. 16 is a flowchart showing the processing procedure of the VM instruction boundary detection process shown in Fig. 14.
 まず、VM命令境界検出部1212は、実行トレースDB131から実行トレースを取り出す(ステップS31)。VM命令境界検出部1212は、実行トレースを所定の方法でクラスタリングする(ステップS32)。クラスタリングは、いずれの手法を用いてもよい。 First, the VM instruction boundary detection unit 1212 extracts execution traces from the execution trace DB 131 (step S31). The VM instruction boundary detection unit 1212 clusters the execution traces using a predetermined method (step S32). Any method may be used for the clustering.
 VM命令境界検出部1212は、実行回数が閾値以上のクラスタをVM命令として検出する(ステップS33)。そして、VM命令境界検出部1212は、VM命令を構成する連続した命令列の開始点と終了点とを境界とする(ステップS34)。VM命令境界検出部1212は、VM命令の境界を返り値として出力して(ステップS35)、VM命令境界検出処理を終了する。 The VM instruction boundary detection unit 1212 detects clusters whose execution count is equal to or exceeds a threshold as VM instructions (step S33). Then, the VM instruction boundary detection unit 1212 determines the start and end points of a sequence of consecutive instructions that constitute a VM instruction as boundaries (step S34). The VM instruction boundary detection unit 1212 outputs the VM instruction boundary as a return value (step S35), and ends the VM instruction boundary detection process.
[仮想プログラムカウンタ検出処理の処理手順]
 次に、図14に示す仮想プログラムカウンタ検出処理の流れについて説明する。図17は、図14に示す仮想プログラムカウンタ検出処理の処理手順を示すフローチャートである。
[Procedure for Virtual Program Counter Detection Processing]
Next, a description will be given of the flow of the virtual program counter detection process shown in Fig. 14. Fig. 17 is a flowchart showing the processing procedure of the virtual program counter detection process shown in Fig. 14.
 まず、仮想プログラムカウンタ検出部1213は、実行トレースDB131から第1のテストスクリプトによる実行トレースを一つ取り出す(ステップS41)。続いて、仮想プログラムカウンタ検出部1213は、実行トレースのうちのメモリアクセストレースに着目し、メモリ読み込み先ごとに読み込み回数を数え上げる(ステップS42)。 First, the virtual program counter detection unit 1213 extracts one execution trace by the first test script from the execution trace DB 131 (step S41). Next, the virtual program counter detection unit 1213 focuses on memory access traces among the execution traces, and counts up the number of reads for each memory read destination (step S42).
 仮想プログラムカウンタ検出部1213は、実行トレースの取得に用いた第1のテストスクリプトを入力として受け取り(ステップS43)、その第1のテストスクリプトを解析して繰り返しの回数と繰り返される文の数とを取得する(ステップS44)。 The virtual program counter detection unit 1213 receives as input the first test script used to obtain the execution trace (step S43), and analyzes the first test script to obtain the number of repetitions and the number of repeated statements (step S44).
 続いて、仮想プログラムカウンタ検出部1213は、実行トレースDB131から、繰り返し回数や繰り返される文の数の異なる第1のテストスクリプトによる実行トレースを、さらに一つ取り出す(ステップS45)。そして、仮想プログラムカウンタ検出部1213は、メモリアクセストレースに着目し、メモリ読み込み先ごとに読み込み回数を数え上げる(ステップS46)。また、仮想プログラムカウンタ検出部1213は、実行トレースの取得に用いた第1のテストスクリプトを入力として受け取り(ステップS47)、テストスクリプトを解析して、繰り返しの回数と繰り返される文の数とを取得する(ステップS48)。 Then, the virtual program counter detection unit 1213 extracts from the execution trace DB 131 another execution trace by the first test script, which has a different number of repetitions and number of repeated statements (step S45). Then, the virtual program counter detection unit 1213 focuses on the memory access trace and counts the number of reads for each memory read destination (step S46). The virtual program counter detection unit 1213 also receives as input the first test script used to obtain the execution trace (step S47), analyzes the test script, and obtains the number of repetitions and the number of repeated statements (step S48).
 ここで、仮想プログラムカウンタ検出部1213は、繰り返し回数や繰り返される文の増減に比例して読み込み回数が変化するメモリ読み込み先のみに絞り込む(ステップS49)。さらに、仮想プログラムカウンタ検出部1213は、ステップS49において絞り込んだメモリ読み込み先を、読み込んだメモリの値が常にVM命令の開始点を指しているものに絞り込む(ステップS50)。 Here, the virtual program counter detection unit 1213 narrows down the memory read destinations to only those whose read counts change in proportion to the number of repetitions or the increase or decrease in the number of repeated statements (step S49). Furthermore, the virtual program counter detection unit 1213 narrows down the memory read destinations narrowed down in step S49 to those whose read memory values always point to the start point of the VM instruction (step S50).
 そして、仮想プログラムカウンタ検出部1213は、メモリ読み込み先を一つのみに絞り込めたか否かを判定する(ステップS51)。仮想プログラムカウンタ検出部1213は、メモリ読み込み先を一つのみに絞り込めていない場合(ステップS51:No)、ステップS45に戻り、次の実行トレースを一つ取り出して処理を継続する。一方、仮想プログラムカウンタ検出部1213は、メモリ読み込み先を一つのみに絞り込めた場合(ステップS51:Yes)、絞り込まれたメモリ読み込み先を仮想プログラムカウンタとしてアーキテクチャ情報DB132に格納して(ステップS52)、処理を終了する。 Then, the virtual program counter detection unit 1213 determines whether the memory read destinations have been narrowed down to only one (step S51). If the virtual program counter detection unit 1213 has not narrowed down the memory read destinations to only one (step S51: No), the process returns to step S45, where the virtual program counter detection unit 1213 retrieves the next execution trace and continues processing. On the other hand, if the virtual program counter detection unit 1213 has narrowed down the memory read destinations to only one (step S51: Yes), the virtual program counter detection unit 1213 stores the narrowed down memory read destination as a virtual program counter in the architecture information DB 132 (step S52), and ends processing.
[ディスパッチャ検出処理の処理手順]
 次に、図14に示すディスパッチャ検出処理の流れについて説明する。図18は、図14に示すディスパッチャ検出処理の処理手順を示すフローチャートである。
[Processing procedure for dispatcher detection processing]
Next, a description will be given of the flow of the dispatcher detection process shown in Fig. 14. Fig. 18 is a flowchart showing the processing procedure of the dispatcher detection process shown in Fig. 14.
 まず、ディスパッチャ検出部1214は、スクリプトエンジンバイナリを入力として受け取る(ステップS61)。ディスパッチャ検出部1214は、VM命令境界検出部1212から、VM命令の境界を受け取る(ステップS62)。 First, the dispatcher detection unit 1214 receives the script engine binary as input (step S61). The dispatcher detection unit 1214 receives the VM command boundary from the VM command boundary detection unit 1212 (step S62).
 ディスパッチャ検出部1214は、VM命令境界検出部1212から受け取ったVM命令の境界を基に、スクリプトエンジンバイナリから各VM命令部分を切り出す(ステップS63)。ディスパッチャ検出部1214は、各VM命令間でコード間の類似度を所定の方法で算出する(ステップS64)。類似度の算出手法は、コード間の類似度を算出できる手法であれば、どの手法でもよい。 The dispatcher detection unit 1214 extracts each VM instruction portion from the script engine binary based on the boundaries of the VM instructions received from the VM instruction boundary detection unit 1212 (step S63). The dispatcher detection unit 1214 calculates the similarity between the codes of each VM instruction using a predetermined method (step S64). Any method for calculating the similarity may be used as long as it is a method that can calculate the similarity between codes.
 ディスパッチャ検出部1214は、ステップS64において算出した類似度を基に、全VM命令間で類似度が高い部分を取り出す(ステップS65)。そして、ディスパッチャ検出部1214は、VM命令の終端部分であるかを判定する(ステップS66)。 The dispatcher detection unit 1214 extracts the part with high similarity among all VM commands based on the similarity calculated in step S64 (step S65). The dispatcher detection unit 1214 then determines whether it is the end part of the VM command (step S66).
 VM命令の終端部分でない場合(ステップS66:No)、ディスパッチャ検出部1214は、ステップS65に戻り処理を続ける。また、VM命令の終端部分である場合(ステップS66:Yes)、ディスパッチャ検出部1214は、取り出した部分をディスパッチャとして出力して(ステップS67)、処理を終了する。 If it is not the end of the VM command (step S66: No), the dispatcher detection unit 1214 returns to step S65 and continues processing. If it is the end of the VM command (step S66: Yes), the dispatcher detection unit 1214 outputs the extracted part as a dispatcher (step S67) and ends processing.
[条件分岐フラグ検出処理の処理手順]
 次に、図14に示す条件分岐フラグ検出処理の流れについて説明する。図19は、図14に示す条件分岐フラグ検出処理の処理手順を示すフローチャートである。
[Conditional Branch Flag Detection Processing Procedure]
Next, a description will be given of the flow of the conditional branch flag detection process shown in Fig. 14. Fig. 19 is a flowchart showing the processing procedure of the conditional branch flag detection process shown in Fig. 14.
 まず、条件分岐フラグ検出部1215は、実行トレースDB131から第2のテストスクリプトによる実行トレースを一つ取り出す(ステップS71)。そして、条件分岐フラグ検出部1215は、メモリアクセストレースに着目し、メモリ読み込み先ごとに読み込み回数を数え上げる(ステップS72)。 First, the conditional branch flag detection unit 1215 extracts one execution trace by the second test script from the execution trace DB 131 (step S71). Then, the conditional branch flag detection unit 1215 focuses on the memory access trace and counts the number of reads for each memory read destination (step S72).
 また、条件分岐フラグ検出部1215は、実行トレースの取得に用いた第2のテストスクリプトを、入力として受け取り(ステップS73)、この第2のテストスクリプトを解析して、条件分岐の回数とTrue/Falseの順序パターンを取得する(ステップS74)。そして、条件分岐フラグ検出部1215は、条件分岐の回数に比例して読み込み回数が変化するメモリ読み込み先のみに絞り込む(ステップS75)。さらに、条件分岐フラグ検出部1215は、読み込んだメモリの値がTrue/Falseの順序パターンに合わせて二つの値を行き来しているメモリ読み込み先のみに絞り込む(ステップS76)。 The conditional branch flag detection unit 1215 also receives as input the second test script used to obtain the execution trace (step S73), analyzes this second test script, and obtains the number of conditional branches and the True/False sequence pattern (step S74). The conditional branch flag detection unit 1215 then narrows down the memory read destinations to only those whose read count changes in proportion to the number of conditional branches (step S75). Furthermore, the conditional branch flag detection unit 1215 narrows down the memory read destinations to only those whose read memory value alternates between two values in accordance with the True/False sequence pattern (step S76).
 条件分岐フラグ検出部1215は、メモリ読み込み先を一つのみに絞り込めたか否かを判定する(ステップS77)。条件分岐フラグ検出部1215は、メモリ読み込み先を一つのみに絞り込めていない場合(ステップS77:No)、ステップS71に戻り、次の実行トレースを一つ取り出して処理を継続する。一方、条件分岐フラグ検出部1215は、メモリ読み込み先を一つのみに絞り込めた場合(ステップS77:Yes)、絞り込まれた読み込み先を仮想プログラムカウンタとしてアーキテクチャ情報DB132に格納し(ステップS78)、処理を終了する。 The conditional branch flag detection unit 1215 determines whether the memory read destinations have been narrowed down to only one (step S77). If the conditional branch flag detection unit 1215 has not narrowed down the memory read destinations to only one (step S77: No), it returns to step S71, retrieves the next execution trace, and continues processing. On the other hand, if the conditional branch flag detection unit 1215 has narrowed down the memory read destinations to only one (step S77: Yes), it stores the narrowed down read destination in the architecture information DB 132 as a virtual program counter (step S78), and ends processing.
[コードキャッシュ検出処理の処理手順]
 次に、図14に示すコードキャッシュ検出処理の流れについて説明する。図20は、図14に示すコードキャッシュ検出処理の処理手順の処理手順を示すフローチャートである。
[Code cache detection process procedure]
Next, a description will be given of the flow of the code cache detection process shown in Fig. 14. Fig. 20 is a flowchart showing the processing procedure of the code cache detection process shown in Fig. 14.
 コードキャッシュ検出部1216は、実行トレース、VPC及びVM実行トレースを入力として受け付けると(ステップS81)、VPCが指すメモリ領域をVM実行トレースから取得する(ステップS82)。VM実行トレースは、VM実行トレース取得部1221によって取得される。 When the code cache detection unit 1216 receives the execution trace, the VPC, and the VM execution trace as input (step S81), it acquires the memory area pointed to by the VPC from the VM execution trace (step S82). The VM execution trace is acquired by the VM execution trace acquisition unit 1221.
 コードキャッシュ検出部1216は、ステップS82において取得した該メモリ領域を確保したメモリ割り当て関数の呼び出し元のコード箇所を、実行トレースから取得する(ステップS83)。コードキャッシュ検出部1216は、VM実行トレースのうち、ステップS83において取得した該コード箇所で確保された全ての領域をコードキャッシュとして検出する(ステップS84)。 The code cache detection unit 1216 obtains from the execution trace the code location that called the memory allocation function that allocated the memory area obtained in step S82 (step S83). The code cache detection unit 1216 detects all areas that were allocated at the code location obtained in step S83 from the VM execution trace as code caches (step S84).
 コードキャッシュ検出部1216は、コードキャッシュに書き込みをしているコード箇所を、実行トレースから取得する(ステップS85)。コードキャッシュ検出部1216は、VM実行トレースのうち、ステップS85において取得した該コード箇所で書き込みされた全ての領域を、コードキャッシュの更新として検出する(ステップS86)。コードキャッシュ検出部1216は、検出したコードキャッシュ及びその更新箇所を返し(ステップS87)、コードキャッシュ検出処理を終了する。 The code cache detection unit 1216 acquires the code location that is writing to the code cache from the execution trace (step S85). The code cache detection unit 1216 detects all areas in the VM execution trace that are written to at the code location acquired in step S85 as code cache updates (step S86). The code cache detection unit 1216 returns the detected code cache and its updated location (step S87), and ends the code cache detection process.
[VM実行トレース取得処理の処理手順]
 次に、図14に示すVM実行トレース取得処理の流れについて説明する。図21は、図14に示すVM実行トレース取得処理の処理手順を示すフローチャートである。
[Procedure of VM Execution Trace Acquisition Processing]
Next, a flow of the VM execution trace acquisition process shown in Fig. 14 will be described below. Fig. 21 is a flowchart showing the procedure of the VM execution trace acquisition process shown in Fig. 14.
 まず、VM実行トレース取得部1221は、テストスクリプト及びスクリプトエンジンバイナリを入力として受け取る(ステップS91)。そして、VM実行トレース取得部1221は、受け取ったスクリプトエンジンに対して、VPC及びVMオペコードを記録するためのフックを施す(ステップS92)。 First, the VM execution trace acquisition unit 1221 receives a test script and a script engine binary as input (step S91). Then, the VM execution trace acquisition unit 1221 hooks the received script engine to record the VPC and VM opcode (step S92).
 VM実行トレース取得部1221は、その状態で受け取ったテストスクリプトをスクリプトエンジンに入力して実行させ(ステップS93)、それによって取得されるVM実行トレースをVM実行トレースDB133に格納する(ステップS94)。 The VM execution trace acquisition unit 1221 inputs the received test script in this state into the script engine for execution (step S93), and stores the VM execution trace acquired thereby in the VM execution trace DB 133 (step S94).
 VM実行トレース取得部1221は、入力されたテストスクリプトを全て実行したか否かを判定する(ステップS95)。VM実行トレース取得部1221は、入力されたテストスクリプトを全て実行し終えている場合(ステップS95:Yes)、処理を終了する。VM実行トレース取得部1221は、入力されたテストスクリプトを全て実行し終えていない場合(ステップS95:No)、ステップS93のテストスクリプトの実行に戻って処理を続ける。 The VM execution trace acquisition unit 1221 determines whether or not all of the input test scripts have been executed (step S95). If all of the input test scripts have been executed (step S95: Yes), the VM execution trace acquisition unit 1221 ends the process. If all of the input test scripts have not been executed (step S95: No), the VM execution trace acquisition unit 1221 returns to the execution of the test scripts in step S93 and continues the process.
[VM命令収集処理の処理手順]
 次に、図14に示すVM命令収集処理の流れについて説明する。図22は、図14に示すVM命令収集処理の処理手順を示すフローチャートである。
[Procedure of VM command collection process]
Next, a description will be given of the flow of the VM command collection process shown in Fig. 14. Fig. 22 is a flowchart showing the procedure of the VM command collection process shown in Fig. 14.
 VM命令収集部1222は、VPC及びディスパッチャを入力として受け付ける(ステップS101)、インターネット上から多様なスクリプトを取得する(ステップS102)。VM命令収集部1222は、VPC及びディスパッチャを監視しながらスクリプトを実行しVM実行トレースを取得する(ステップS103)。 The VM command collection unit 1222 receives the VPC and dispatcher as input (step S101) and acquires various scripts from the Internet (step S102). The VM command collection unit 1222 executes the scripts while monitoring the VPC and dispatcher, and acquires a VM execution trace (step S103).
 VM命令収集部1222は、VM実行トレースからVM命令を取得し(ステップS104)、VM命令のリストに追加する(ステップS105)。VM命令収集部1222は、リストにないVM命令が見られた場合(ステップS106:No)、ステップS102に戻る。VM命令収集部1222は、リストにないVM命令が見られなくなった場合(ステップS106:Yes)、VM命令のリストを返し(ステップS107)、VM命令収集処理を終了する。 The VM instruction collection unit 1222 acquires VM instructions from the VM execution trace (step S104) and adds them to a list of VM instructions (step S105). If the VM instruction collection unit 1222 finds a VM instruction that is not in the list (step S106: No), it returns to step S102. If the VM instruction collection unit 1222 finds no VM instructions that are not in the list (step S106: Yes), it returns the list of VM instructions (step S107) and ends the VM instruction collection process.
[VM命令判定処理の処理手順]
 次に、図14に示すVM命令判定処理の流れについて説明する。図23は、図14に示すVM命令判定処理の処理手順を示すフローチャートである。
[Processing Procedure of VM Command Determination Processing]
Next, a description will be given of the flow of the VM command determination process shown in Fig. 14. Fig. 23 is a flowchart showing the processing procedure of the VM command determination process shown in Fig. 14.
 VM命令判定部1223は、VM命令収集部1222が収集したVM命令のうち、分岐VM命令を判定する分岐VM命令判定処理を行う(ステップS111)。VM命令判定部1223は、分岐VM命令のうち、条件分岐フラグにアクセスしている分岐VM命令を、条件分岐VM命令と判定する(ステップS112)。 The VM instruction determination unit 1223 performs a branch VM instruction determination process to determine branch VM instructions from among the VM instructions collected by the VM instruction collection unit 1222 (step S111). The VM instruction determination unit 1223 determines that, among the branch VM instructions, a branch VM instruction that accesses a conditional branch flag is a conditional branch VM instruction (step S112).
 VM命令判定部1223は、VM実行トレースのうち、任意の分岐VM命令を走査し、任意の分岐VM命令の直後に分岐する分岐命令がある場合、任意の分岐VM命令を呼び出しVM命令と判定する呼び出しVM命令判定処理を行う(ステップS113)。 The VM instruction determination unit 1223 scans the VM execution trace for any branch VM instruction, and if there is a branch instruction that branches immediately after the arbitrary branch VM instruction, performs a call VM instruction determination process to determine that the arbitrary branch VM instruction is a call VM instruction (step S113).
 VM命令判定部1223は、VM実行トレースから呼び出しVM命令を取り出し、取り出したVM命令が戻りVM命令であるか否かを判定する戻りVM命令判定処理を行う(ステップS114)。 The VM instruction determination unit 1223 extracts the calling VM instruction from the VM execution trace, and performs a return VM instruction determination process to determine whether the extracted VM instruction is a return VM instruction (step S114).
 VM命令判定部1223は、その他のVM命令について、そのVM命令が呼び出される複数のテストスクリプトを用いて取得したVM実行トレースの差分から所定の方法で判定する(ステップS115)。 The VM instruction determination unit 1223 determines other VM instructions using a predetermined method based on the difference in the VM execution traces obtained using multiple test scripts in which the VM instruction is called (step S115).
 続いて、VM命令判定部1223は、判定されたVM命令をVM命令リストに追加する(ステップS116)。VM命令判定部1223は、VM命令のリストを返却して(ステップS117)、VM命令判定処理を終了する。 Then, the VM command determination unit 1223 adds the determined VM command to a VM command list (step S116). The VM command determination unit 1223 returns the list of VM commands (step S117) and ends the VM command determination process.
[分岐VM命令判定処理の処理手順]
 図24は、図23に示す分岐VM命令判定処理の処理手順を示すフローチャートである。
[Processing Procedure for Branch VM Instruction Determination Processing]
FIG. 24 is a flowchart of the branch VM instruction determination process shown in FIG.
 VM命令判定部1223は、VM命令判定部1223は、VM実行トレースDB133から、VM実行トレースを一つ取り出す(ステップS121)。VM命令判定部1223は、VM命令へのポインタとVM命令を紐付け、各々に識別子としてVMオペコードを割り振る(ステップS122)。そして、VM命令判定部1223は、VMオペコードごとに、実行の前後でのVPCの変化量を集計する(ステップS123)。 The VM instruction determination unit 1223 extracts one VM execution trace from the VM execution trace DB 133 (step S121). The VM instruction determination unit 1223 links a pointer to the VM instruction with the VM instruction, and assigns a VM opcode to each as an identifier (step S122). Then, the VM instruction determination unit 1223 counts the amount of change in VPC before and after execution for each VM opcode (step S123).
 VM命令判定部1223は、VM実行トレースDB133の全てのVM実行トレースを処理し終えたか否かを判定する(ステップS124)。VM実行トレースDB133の全てのVM実行トレースを処理し終えていない場合(ステップS124:No)、VM命令判定部1223は、ステップS121に戻り、次のVM実行トレースを一つ取り出して処理する。 The VM instruction determination unit 1223 determines whether all VM execution traces in the VM execution trace DB 133 have been processed (step S124). If all VM execution traces in the VM execution trace DB 133 have not been processed (step S124: No), the VM instruction determination unit 1223 returns to step S121 and retrieves and processes the next VM execution trace.
 VM実行トレースDB133の全てのVM実行トレースを処理し終えている場合(ステップS124:Yes)、VM命令判定部1223は、VMオペコードごとにVPCの変化量の分散を算出する(ステップS125)。そして、VM命令判定部1223は、閾値を入力として受け取る(ステップS126)。VM命令判定部1223は、分散が閾値よりも大きいVMオペコードのみに絞り込み(ステップS127)、それらを分岐VM命令としてアーキテクチャ情報DB132に格納して(ステップS128)、処理を終了する。 If all VM execution traces in the VM execution trace DB 133 have been processed (step S124: Yes), the VM instruction determination unit 1223 calculates the variance of the amount of change in VPC for each VM opcode (step S125). Then, the VM instruction determination unit 1223 receives a threshold value as an input (step S126). The VM instruction determination unit 1223 narrows down to only VM opcodes whose variance is greater than the threshold value (step S127), stores them as branch VM instructions in the architecture information DB 132 (step S128), and ends the process.
[条件分岐VM命令判定処理の処理手順]
 図25は、図23に示す条件分岐VM命令判定処理の処理手順を示すフローチャートである。
[Procedure for Conditional Branch VM Instruction Determination Processing]
FIG. 25 is a flowchart illustrating the procedure of the conditional branch VM instruction determination process illustrated in FIG.
 VM命令判定部1223は、アーキテクチャ情報DB132から、分岐VM命令のオペコードのリストを取り出す(ステップS131)。VM命令判定部1223は、VM実行トレースDB133から、VM実行トレースを一つ取り出す(ステップS132)。VM命令判定部1223は、実行トレースDB131から対応する実行トレースを取り出す(ステップS133)。 The VM instruction determination unit 1223 extracts a list of opcodes of branch VM instructions from the architecture information DB 132 (step S131). The VM instruction determination unit 1223 extracts one VM execution trace from the VM execution trace DB 133 (step S132). The VM instruction determination unit 1223 extracts the corresponding execution trace from the execution trace DB 131 (step S133).
 VM命令判定部1223は、VM実行トレースから分岐VM命令を実行している箇所を一つ取り出す(ステップS134)。VM命令判定部1223は、実行トレースから前述の、取り出した分岐VM命令の実行に対応したメモリアクセストレースを取り出す(ステップS135)。 The VM instruction determination unit 1223 extracts one location where a branch VM instruction is being executed from the VM execution trace (step S134). The VM instruction determination unit 1223 extracts a memory access trace corresponding to the execution of the extracted branch VM instruction from the execution trace (step S135).
 VM命令判定部1223は、メモリアクセストレースを基に、ステップS134で取り出した分岐VM命令が、条件分岐フラグにアクセスしているか否かを判定する(ステップS136)。 The VM instruction determination unit 1223 determines whether the branch VM instruction extracted in step S134 accesses a conditional branch flag based on the memory access trace (step S136).
 条件分岐フラグにアクセスしている場合(ステップS136:Yes)、VM命令判定部1223は、ステップS134で取り出した該分岐VM命令を条件分岐VM命令と判定する(ステップS137)。VM命令判定部1223は、この条件分岐VM命令のオペコードをアーキテクチャ情報DB132に格納する(ステップS138)。 If the conditional branch flag is being accessed (step S136: Yes), the VM instruction determination unit 1223 determines that the branch VM instruction extracted in step S134 is a conditional branch VM instruction (step S137). The VM instruction determination unit 1223 stores the opcode of this conditional branch VM instruction in the architecture information DB 132 (step S138).
 条件分岐フラグにアクセスしていない場合(ステップS136:No)、または、ステップS138終了後、VM命令判定部1223は、分岐VM命令を実行している全ての箇所を処理したか否かを判定する(ステップS139)。 If the conditional branch flag has not been accessed (step S136: No), or after step S138 is completed, the VM instruction determination unit 1223 determines whether all locations where the branch VM instruction is being executed have been processed (step S139).
 分岐VM命令を実行している全ての箇所を処理していない場合(ステップS139:No)、VM命令判定部1223は、分岐VM命令を実行している次の箇所を取り出し(ステップS140)、ステップS135に戻り、取り出した分岐VM命令について処理を行う。 If not all locations where a branch VM instruction is being executed have been processed (step S139: No), the VM instruction determination unit 1223 extracts the next location where a branch VM instruction is being executed (step S140), returns to step S135, and processes the extracted branch VM instruction.
 分岐VM命令を実行している全ての箇所を処理した場合(ステップS139:Yes)、VM命令判定部1223は、全てのVM実行トレースを処理したか否かを判定する(ステップS141)。 If all locations where the branch VM instruction is executed have been processed (step S139: Yes), the VM instruction determination unit 1223 determines whether all VM execution traces have been processed (step S141).
 全てのVM実行トレースを処理していない場合(ステップS141:No)、VM命令判定部1223は、次のVM実行トレースを取り出し(ステップS142)、ステップS133に戻り、取り出したVM実行トレースに用いて処理を行う。全てのVM実行トレースを処理した場合(ステップS141:Yes)、VM命令判定部1223は、条件分岐VM命令判定処理を終了する。 If not all VM execution traces have been processed (step S141: No), the VM instruction determination unit 1223 retrieves the next VM execution trace (step S142), returns to step S133, and performs processing using the retrieved VM execution trace. If all VM execution traces have been processed (step S141: Yes), the VM instruction determination unit 1223 ends the conditional branch VM instruction determination process.
[呼び出しVM命令判定処理の処理手順]
 図26は、図23に示す呼び出しVM命令判定処理の処理手順を示すフローチャートである。
[Processing procedure for calling VM command determination process]
FIG. 26 is a flowchart illustrating the procedure of the called VM instruction determination process illustrated in FIG.
 VM命令判定部1223は、アーキテクチャ情報DB132から、分岐VM命令のオペコードのリストを取り出す(ステップS151)。VM命令判定部1223は、VM実行トレースDB133から、VM実行トレースを一つ取り出す(ステップS152)。VM命令判定部1223は、VM実行トレースから、分岐VM命令を実行している箇所を一つ取り出す(ステップS153)。 The VM instruction determination unit 1223 extracts a list of opcodes of branch VM instructions from the architecture information DB 132 (step S151). The VM instruction determination unit 1223 extracts one VM execution trace from the VM execution trace DB 133 (step S152). The VM instruction determination unit 1223 extracts one location where a branch VM instruction is being executed from the VM execution trace (step S153).
 VM命令判定部1223は、VM実行トレースのうち、ステップS153において取り出した該分岐VM命令以降に出現する分岐VM命令を走査する(ステップS154)。VM命令判定部1223は、ステップS154における走査結果を基に、ステップS153において取り出した該分岐VM命令の直後に分岐する分岐VM命令があったか否かを判定する(ステップS155)。 The VM instruction determination unit 1223 scans the VM execution trace for branch VM instructions that appear after the branch VM instruction retrieved in step S153 (step S154). Based on the results of the scan in step S154, the VM instruction determination unit 1223 determines whether there is a branch VM instruction that branches immediately after the branch VM instruction retrieved in step S153 (step S155).
 VM命令判定部1223は、ステップS153において取り出した該分岐VM命令の直後に分岐する分岐VM命令があった場合(ステップS155:Yes)、ステップS153において取り出した該分岐VM命令を呼び出しVM命令と判定する(ステップS156)。VM命令判定部1223は、呼び出しVM命令のオペコードをアーキテクチャ情報DB132に格納する(ステップS157)。 If there is a branch VM instruction that branches immediately after the branch VM instruction retrieved in step S153 (step S155: Yes), the VM instruction determination unit 1223 determines that the branch VM instruction retrieved in step S153 is a call VM instruction (step S156). The VM instruction determination unit 1223 stores the opcode of the call VM instruction in the architecture information DB 132 (step S157).
 VM命令判定部1223は、ステップS153において取り出した該分岐VM命令の直後に分岐する分岐VM命令がなかった場合(ステップS155:No)、または、ステップS157の処理終了後、分岐VM命令を実行している全ての箇所を処理したか否かを判定する(ステップS158)。 If there is no branch VM instruction that branches immediately after the branch VM instruction extracted in step S153 (step S155: No), or after the processing of step S157 is completed, the VM instruction determination unit 1223 determines whether or not all locations where the branch VM instruction is being executed have been processed (step S158).
 分岐VM命令を実行している全ての箇所を処理していない場合(ステップS158:No)、VM命令判定部1223は、分岐VM命令を実行している次の箇所を取り出し(ステップS159)、ステップS154に戻り、取り出した分岐VM命令について処理を行う。 If not all locations where a branch VM instruction is being executed have been processed (step S158: No), the VM instruction determination unit 1223 extracts the next location where a branch VM instruction is being executed (step S159), returns to step S154, and processes the extracted branch VM instruction.
 分岐VM命令を実行している全ての箇所を処理した場合(ステップS158:Yes)、VM命令判定部1223は、全てのVM実行トレースを処理したか否かを判定する(ステップS160)。 If all locations where the branch VM instruction is being executed have been processed (step S158: Yes), the VM instruction determination unit 1223 determines whether all VM execution traces have been processed (step S160).
 全てのVM実行トレースを処理していない場合(ステップS160:No)、VM命令判定部1223は、次のVM実行トレースを取り出し(ステップS161)、ステップS153に戻り、取り出したVM実行トレースに用いて処理を行う。全てのVM実行トレースを処理した場合(ステップS160:Yes)、VM命令判定部1223は、呼び出しVM命令判定処理を終了する。 If not all VM execution traces have been processed (step S160: No), the VM instruction determination unit 1223 retrieves the next VM execution trace (step S161), returns to step S153, and performs processing using the retrieved VM execution trace. If all VM execution traces have been processed (step S160: Yes), the VM instruction determination unit 1223 ends the called VM instruction determination process.
[戻りVM命令判定処理の処理手順]
 図27は、図23に示す戻りVM命令判定処理の処理手順を示すフローチャートである。
[Processing procedure for return VM command determination process]
FIG. 27 is a flowchart illustrating the procedure of the return VM instruction determination process illustrated in FIG.
 VM命令判定部1223は、アーキテクチャ情報DB132から、分岐VM命令のオペコードのリストを取り出す(ステップS171)。VM命令判定部1223は、VM実行トレースDB133から、VM実行トレースを一つ取り出す(ステップS172)。VM命令判定部1223は、VM実行トレースから、呼び出しVM命令を実行している箇所を一つ取り出す(ステップS173)。 The VM instruction determination unit 1223 extracts a list of opcodes of branch VM instructions from the architecture information DB 132 (step S171). The VM instruction determination unit 1223 extracts one VM execution trace from the VM execution trace DB 133 (step S172). The VM instruction determination unit 1223 extracts one location where a called VM instruction is being executed from the VM execution trace (step S173).
 VM命令判定部1223は、VM実行トレースのうち、ステップS173において取り出した該呼び出しVM命令以降に出現する分岐VM命令を走査する(ステップS174)。 The VM instruction determination unit 1223 scans the VM execution trace for branch VM instructions that appear after the call VM instruction extracted in step S173 (step S174).
 VM命令判定部1223は、ステップS173において取り出した該呼び出しVM命令の直後に分岐する分岐VM命令があったか否かを判定する(ステップS175)。 The VM instruction determination unit 1223 determines whether there is a branch VM instruction that branches immediately after the call VM instruction retrieved in step S173 (step S175).
 VM命令判定部1223は、ステップS173において取り出した該呼び出しVM命令の直後に分岐する分岐VM命令があった場合(ステップS175:Yes)、ステップS173において取り出した該呼び出しVM命令の直後に分岐する分岐VM命令を戻りVM命令と判定する(ステップS176)。VM命令判定部1223は、戻りVM命令のオペコードをアーキテクチャ情報DB132に格納する(ステップS177)。 If there is a branch VM instruction that branches immediately after the call VM instruction retrieved in step S173 (step S175: Yes), the VM instruction determination unit 1223 determines that the branch VM instruction that branches immediately after the call VM instruction retrieved in step S173 is a return VM instruction (step S176). The VM instruction determination unit 1223 stores the opcode of the return VM instruction in the architecture information DB 132 (step S177).
 VM命令判定部1223は、ステップS173において取り出した該呼び出しVM命令の直後に分岐する分岐VM命令がなかった場合(ステップS175:No)、または、ステップS177の処理終了後、分岐VM命令を実行している全ての箇所を処理したか否かを判定する(ステップS178)。 If there is no branch VM instruction that branches immediately after the call VM instruction retrieved in step S173 (step S175: No), or after the processing of step S177 is completed, the VM instruction determination unit 1223 determines whether or not all locations where branch VM instructions are being executed have been processed (step S178).
 分岐VM命令を実行している全ての箇所を処理していない場合(ステップS178:No)、VM命令判定部1223は、分岐VM命令を実行している次の箇所を取り出し(ステップS179)、ステップS174に戻り、取り出した分岐VM命令について処理を行う。 If not all locations where a branch VM instruction is being executed have been processed (step S178: No), the VM instruction determination unit 1223 extracts the next location where a branch VM instruction is being executed (step S179), returns to step S174, and processes the extracted branch VM instruction.
 分岐VM命令を実行している全ての箇所を処理した場合(ステップS178:Yes)、VM命令判定部1223は、全てのVM実行トレースを処理したか否かを判定する(ステップS180)。 If all locations where the branch VM instruction is being executed have been processed (step S178: Yes), the VM instruction determination unit 1223 determines whether all VM execution traces have been processed (step S180).
 全てのVM実行トレースを処理していない場合(ステップS180:No)、VM命令判定部1223は、次のVM実行トレースを取り出し(ステップS181)、ステップS173に戻り、取り出したVM実行トレースに用いて処理を行う。全てのVM実行トレースを処理した場合(ステップS180:Yes)、VM命令判定部1223は、戻りVM命令判定処理を終了する。 If not all VM execution traces have been processed (step S180: No), the VM command determination unit 1223 retrieves the next VM execution trace (step S181), returns to step S173, and performs processing using the retrieved VM execution trace. If all VM execution traces have been processed (step S180: Yes), the VM command determination unit 1223 ends the return VM command determination process.
[バイトコード抽出処理の処理手順]
 次に、図14に示すバイトコード抽出処理の流れについて説明する。図28は、図14に示すバイトコード抽出処理の処理手順を示すフローチャートである。
[Bytecode extraction process procedure]
Next, a flow of the bytecode extraction process shown in Fig. 14 will be described below. Fig. 28 is a flowchart showing the processing procedure of the bytecode extraction process shown in Fig. 14.
 脆弱性発見部123は、シードスクリプトを入力として受け付ける(ステップS191)。脆弱性発見部123は、コードキャッシュを入力として受け付ける(ステップS192)。 The vulnerability discovery unit 123 accepts a seed script as input (step S191). The vulnerability discovery unit 123 accepts a code cache as input (step S192).
 ファジング実行部1232は、コードキャッシュを監視しながらシードスクリプトを実行する(ステップS193)。脆弱性発見部123は、コードキャッシュからバイトコードを抽出し(ステップS194)、バイトコードを辞書に格納する(ステップS195)。 The fuzzing execution unit 1232 executes the seed script while monitoring the code cache (step S193). The vulnerability discovery unit 123 extracts the bytecode from the code cache (step S194) and stores the bytecode in the dictionary (step S195).
[変異処理の処理手順]
 次に、図14に示す変異処理の流れについて説明する。図29は、図14に示す変異処理の処理手順を示すフローチャートである。
[Mutation processing procedure]
Next, a flow of the mutation process shown in Fig. 14 will be described below. Fig. 29 is a flowchart showing the processing procedure of the mutation process shown in Fig. 14.
 変異部1231は、バイトコード抽出処理において取得されたバイトコードの辞書を入力として受け付ける(ステップS201)。変異部1231は、VM命令のリストを入力として受け付ける(ステップS202)。 The mutation unit 1231 receives as input the dictionary of bytecodes obtained in the bytecode extraction process (step S201). The mutation unit 1231 receives as input a list of VM instructions (step S202).
 変異部1231は、バイトコードの辞書からバイトコードを一つ取り出し(ステップS203)、VM命令の追加、削除、変更など所定の方法で、取り出したバイトコードを変異させる(ステップS204)。そして、変異部1231は、変異されたバイトコードをバイトコードの辞書に追加する(ステップS205)。 The mutation unit 1231 retrieves one bytecode from the bytecode dictionary (step S203), and mutates the retrieved bytecode using a predetermined method, such as adding, deleting, or changing a VM instruction (step S204). The mutation unit 1231 then adds the mutated bytecode to the bytecode dictionary (step S205).
 変異部1231は、変異対象のバイトコードを全て変異させていない場合(ステップS206:No)、ステップS203に戻る。変異部1231は、変異対象のバイトコードを全て変異させた場合(ステップS206:Yes)、更新されたバイトコードの辞書を返す(ステップS207)。 If the mutation unit 1231 has not mutated all of the bytecodes to be mutated (step S206: No), the process returns to step S203. If the mutation unit 1231 has mutated all of the bytecodes to be mutated (step S206: Yes), the mutation unit 1231 returns the updated bytecode dictionary (step S207).
[実行処理の処理手順]
 次に、図14に示す実行処理の流れについて説明する。図30は、図14に示す実行処理の処理手順を示すフローチャートである。
[Execution process procedure]
Next, a description will be given of the flow of the execution process shown in Fig. 14. Fig. 30 is a flowchart showing the processing procedure of the execution process shown in Fig. 14.
 ファジング実行部1232は、シードスクリプトを入力として受け付ける(ステップS211)。ファジング実行部1232は、VPC及びコードキャッシュを入力として受け付ける(ステップS212)。 The fuzzing execution unit 1232 accepts a seed script as input (step S211). The fuzzing execution unit 1232 accepts a VPC and a code cache as input (step S212).
 ファジング実行部1232は、VPC及びコードキャッシュを監視しながらシードスクリプトを実行する(ステップS213)。ファジング実行部1232は、バイトコードの実行される瞬間に実行を停止する(ステップS214)。そして、ファジング実行部1232は、バイトコードの辞書からバイトコードを取り出す(ステップS215)。 The fuzzing execution unit 1232 executes the seed script while monitoring the VPC and the code cache (step S213). The fuzzing execution unit 1232 stops the execution the moment the bytecode is executed (step S214). Then, the fuzzing execution unit 1232 retrieves the bytecode from the bytecode dictionary (step S215).
 ファジング実行部1232は、コードキャッシュ内のVPCの指す先に、取り出したバイトコードを配置する(ステップS216)。そして、ファジング実行部1232は、実行を再開する(ステップS217)。 The fuzzing execution unit 1232 places the retrieved bytecode at the destination pointed to by the VPC in the code cache (step S216). The fuzzing execution unit 1232 then resumes execution (step S217).
[実施の形態の効果]
 このように、実施の形態に係る脆弱性発見装置10は、スクリプトエンジンのVMを解析し、VM命令を収集し、収集したVM命令の命令内容を判定して、VMの命令の体系である命令セットアーキテクチャの情報を取得する。脆弱性発見装置10は、取得したアーキテクチャ情報を基に、変異させたコードを用いてVMをファジングする。このため、脆弱性発見装置10は、VMの内部仕様が未知のスクリプトエンジンに対しても、バイトコードを入力値としたファジングを実現できる。
[Effects of the embodiment]
In this way, the vulnerability discovery device 10 according to the embodiment analyzes the VM of the script engine, collects VM instructions, and determines the contents of the collected VM instructions to obtain information on the instruction set architecture, which is the system of instructions for the VM. The vulnerability discovery device 10 fuzzes the VM using mutated code based on the obtained architecture information. Therefore, the vulnerability discovery device 10 can realize fuzzing using bytecode as an input value even for script engines whose VM internal specifications are unknown.
 具体的には、脆弱性発見装置10は、スクリプトエンジンのバイナリを監視しながらテストスクリプトを実行し、ブランチトレースとメモリアクセストレースを実行トレースとして取得する。脆弱性発見装置10は、その実行トレースに基づいてVMを解析し、VM命令境界、VPC、ディスパッチャ、条件分岐フラグ及びコードキャッシュのアーキテクチャ情報を取得する。さらに、脆弱性発見装置10は、VPC及びディスパッチャを監視しながらテストスクリプトを実行して、VM実行トレースを取得する。脆弱性発見装置10は、VM実行トレースを解析することで、VM命令を収集するとともに、VM命令の内容を判定し、命令セットアーキテクチャの情報を取得する。 Specifically, the vulnerability discovery device 10 executes the test script while monitoring the binary of the script engine, and obtains branch traces and memory access traces as execution traces. The vulnerability discovery device 10 analyzes the VM based on the execution trace, and obtains architecture information of the VM instruction boundary, VPC, dispatcher, conditional branch flags, and code cache. Furthermore, the vulnerability discovery device 10 executes the test script while monitoring the VPC and dispatcher, and obtains a VM execution trace. By analyzing the VM execution trace, the vulnerability discovery device 10 collects VM instructions, determines the contents of the VM instructions, and obtains information on the instruction set architecture.
 このように、脆弱性発見装置10は、VMの内部仕様が未知のスクリプトエンジンに対しても、スクリプトエンジンによって生成されたバイトコードがVMのどこに格納されるかを示す情報を含むアーキテクチャ情報、及び、VMが解釈できるバイトコードの命令セットアーキテクチャの情報を取得することができる。 In this way, even for a script engine whose VM's internal specifications are unknown, the vulnerability detection device 10 can obtain architecture information including information indicating where in the VM the bytecode generated by the script engine is stored, and information on the instruction set architecture of the bytecode that the VM can interpret.
 そして、脆弱性発見装置10は、実行トレース、VPC及びVM実行トレースを基に、コードキャッシュを検出する。脆弱性発見装置10は、コードキャッシュを監視しながらシードスクリプトを実行してコードキャッシュからバイトコードを抽出し、抽出したバイトコードに変異を加え、再度コードキャッシュに埋め込んで実行する。したがって、脆弱性発見装置10は、取得したアーキテクチャ情報を基に、現在のバイトコードをスクリプトエンジンのVMから取り出すこと、バイトコードの命令として正しい範囲で変異を加え、再度VMに埋め込むことが可能になる。 Then, the vulnerability discovery device 10 detects the code cache based on the execution trace, the VPC, and the VM execution trace. The vulnerability discovery device 10 executes the seed script while monitoring the code cache to extract bytecode from the code cache, mutates the extracted bytecode, and re-embeds it into the code cache for execution. Therefore, based on the acquired architecture information, the vulnerability discovery device 10 is able to extract the current bytecode from the VM of the script engine, mutate it within the correct range as a bytecode instruction, and re-embed it into the VM.
 このように、脆弱性発見装置10は、VMの内部仕様が未知のスクリプトエンジンに対しても、実行トレース及びVM実行トレースの取得に基づく解析により各種アーキテクチャ情報を検出するため、人手でのリバースエンジニアリングを要することなく、バイトコードを入力値としたファジングを実現できる。 In this way, the vulnerability discovery device 10 detects various architectural information by analyzing the execution trace and VM execution trace obtained, even for script engines whose VM internal specifications are unknown, making it possible to realize fuzzing using bytecode as input without requiring manual reverse engineering.
 また、脆弱性発見装置10は、多様なスクリプトエンジンに対して、テストスクリプトさえ用意すれば自動でバイトコードを入力値としたファジングを実現できるため、個別の設計や実行を要することなく、バイトコードによるファジングを実現できる。 Furthermore, the vulnerability discovery device 10 can automatically perform fuzzing using bytecode as input for a variety of script engines as long as a test script is prepared, so fuzzing using bytecode can be performed without the need for individual design or execution.
 これにより、脆弱性発見装置10は、多種多様なスクリプトエンジンの実装に対しても、バイトコードを入力値としたファジングを実施し、潜在する脆弱性を発見することが可能となる。 This allows the vulnerability detection device 10 to perform fuzzing using bytecode as input values and discover potential vulnerabilities in a wide variety of script engine implementations.
 このように、本実施の形態に係る脆弱性発見装置10によれば、VMの内部仕様が未知のスクリプトエンジンを解析し、バイトコードの格納先と命令セットアーキテクチャに関する情報とを獲得することにより、多種多様なスクリプト言語のスクリプトエンジンに対して、バイトコードを入力値としたファジングを実現できる。 In this way, the vulnerability discovery device 10 according to this embodiment can analyze a script engine whose VM's internal specifications are unknown, and obtain information about the storage destination of the bytecode and the instruction set architecture, thereby enabling fuzzing of script engines of a wide variety of scripting languages using the bytecode as an input value.
 そして、脆弱性発見装置10は、多種多様なスクリプトエンジンにおける脆弱性の発見に有用であり、バイトコードを入力値としたファジングによって、スクリプトを入力値とした場合では引き出しにくい挙動に潜在する脆弱性を発見することに適している。 The vulnerability discovery device 10 is useful for discovering vulnerabilities in a wide variety of script engines, and is suitable for discovering vulnerabilities hidden in behavior that is difficult to extract when a script is used as the input value, by fuzzing using bytecode as the input value.
 このため、脆弱性発見装置10を用いて、様々なスクリプトエンジンにバイトコードを入力値としたファジングを実施することで、潜在する脆弱性を発見して修正するなどの対策に生かすことが可能である。 Therefore, by using the vulnerability discovery device 10 to perform fuzzing on various script engines using bytecode as input, it is possible to discover potential vulnerabilities and take measures such as fixing them.
[実施形態のシステム構成について]
 図3に示す脆弱性発見装置10の各構成要素は機能概念的なものであり、必ずしも物理的に図示のように構成されていることを要しない。すなわち、脆弱性発見装置10の機能の分散及び統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散または統合して構成することができる。
[System configuration of the embodiment]
Each component of vulnerability discovery device 10 shown in Fig. 3 is a functional concept, and does not necessarily have to be physically configured as shown. In other words, the specific form of distribution and integration of the functions of vulnerability discovery device 10 is not limited to that shown in the figure, and all or part of it can be functionally or physically distributed or integrated in any unit depending on various loads, usage conditions, etc.
 また、脆弱性発見装置10においておこなわれる各処理は、全部または任意の一部が、CPU及びCPUにより解析実行されるプログラムにて実現されてもよい。また、脆弱性発見装置10においておこなわれる各処理は、ワイヤードロジックによるハードウェアとして実現されてもよい。 Furthermore, each process performed by the vulnerability discovery device 10 may be realized, in whole or in part, by a CPU and a program analyzed and executed by the CPU. Furthermore, each process performed by the vulnerability discovery device 10 may be realized as hardware using wired logic.
 また、実施の形態において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的に行うこともできる。もしくは、手動的におこなわれるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上述及び図示の処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて適宜変更することができる。 Furthermore, among the processes described in the embodiments, all or part of the processes described as being performed automatically can be performed manually. Alternatively, all or part of the processes described as being performed manually can be performed automatically using known methods. In addition, the information including the processing procedures, control procedures, specific names, various data, and parameters described above and illustrated in the drawings can be changed as appropriate unless otherwise specified.
[プログラム]
 図31は、プログラムが実行されることにより、脆弱性発見装置10が実現されるコンピュータの一例を示す図である。コンピュータ1000は、例えば、メモリ1010、CPU1020を有する。また、コンピュータ1000は、ハードディスクドライブインタフェース1030、ディスクドライブインタフェース1040、シリアルポートインタフェース1050、ビデオアダプタ1060、ネットワークインタフェース1070を有する。これらの各部は、バス1080によって接続される。
[program]
31 is a diagram showing an example of a computer in which a program is executed to realize the vulnerability detecting device 10. The computer 1000 has, for example, a memory 1010 and a CPU 1020. The computer 1000 also has a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These components are connected by a bus 1080.
 メモリ1010は、ROM1011及びRAM1012を含む。ROM1011は、例えば、BIOS(Basic Input Output System)等のブートプログラムを記憶する。ハードディスクドライブインタフェース1030は、ハードディスクドライブ1090に接続される。ディスクドライブインタフェース1040は、ディスクドライブ1100に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ1100に挿入される。シリアルポートインタフェース1050は、例えばマウス1110、キーボード1120に接続される。ビデオアダプタ1060は、例えばディスプレイ1130に接続される。 The memory 1010 includes a ROM 1011 and a RAM 1012. The ROM 1011 stores a boot program such as a BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to a hard disk drive 1090. The disk drive interface 1040 is connected to a disk drive 1100. A removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected to a mouse 1110 and a keyboard 1120, for example. The video adapter 1060 is connected to a display 1130, for example.
 ハードディスクドライブ1090は、例えば、OS1091、アプリケーションプログラム1092、プログラムモジュール1093、プログラムデータ1094を記憶する。すなわち、脆弱性発見装置10の各処理を規定するプログラムは、コンピュータ1000により実行可能なコードが記述されたプログラムモジュール1093として実装される。プログラムモジュール1093は、例えばハードディスクドライブ1090に記憶される。例えば、脆弱性発見装置10における機能構成と同様の処理を実行するためのプログラムモジュール1093が、ハードディスクドライブ1090に記憶される。なお、ハードディスクドライブ1090は、SSD(Solid State Drive)により代替されてもよい。 The hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, the programs that define each process of the vulnerability detection device 10 are implemented as program modules 1093 in which code executable by the computer 1000 is written. The program modules 1093 are stored, for example, in the hard disk drive 1090. For example, a program module 1093 for executing processes similar to the functional configuration of the vulnerability detection device 10 is stored in the hard disk drive 1090. The hard disk drive 1090 may be replaced by an SSD (Solid State Drive).
 また、上述した実施の形態の処理で用いられる設定データは、プログラムデータ1094として、例えばメモリ1010やハードディスクドライブ1090に記憶される。そして、CPU1020が、メモリ1010やハードディスクドライブ1090に記憶されたプログラムモジュール1093やプログラムデータ1094を必要に応じてRAM1012に読み出して実行する。 Furthermore, the setting data used in the processing of the above-mentioned embodiment is stored as program data 1094, for example, in memory 1010 or hard disk drive 1090. Then, the CPU 1020 reads the program module 1093 or program data 1094 stored in memory 1010 or hard disk drive 1090 into RAM 1012 as necessary and executes it.
 なお、プログラムモジュール1093やプログラムデータ1094は、ハードディスクドライブ1090に記憶される場合に限らず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ1100等を介してCPU1020によって読み出されてもよい。あるいは、プログラムモジュール1093及びプログラムデータ1094は、ネットワーク(LAN(Local Area Network)、WAN(Wide Area Network)等)を介して接続された他のコンピュータに記憶されてもよい。そして、プログラムモジュール1093及びプログラムデータ1094は、他のコンピュータから、ネットワークインタフェース1070を介してCPU1020によって読み出されてもよい。 The program module 1093 and program data 1094 may not necessarily be stored in the hard disk drive 1090, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and program data 1094 may be stored in another computer connected via a network (such as a LAN (Local Area Network), WAN (Wide Area Network)). The program module 1093 and program data 1094 may then be read by the CPU 1020 from the other computer via the network interface 1070.
 以上、本発明者によってなされた発明を適用した実施の形態について説明したが、本実施の形態による本発明の開示の一部をなす記述及び図面により本発明は限定されることはない。すなわち、本実施の形態に基づいて当業者等によりなされる他の実施の形態、実施例及び運用技術等はすべて本発明の範疇に含まれる。 The above describes an embodiment of the invention made by the inventor, but the present invention is not limited to the descriptions and drawings that form part of the disclosure of the present invention according to this embodiment. In other words, all other embodiments, examples, and operational techniques made by those skilled in the art based on this embodiment are included in the scope of the present invention.
 10 脆弱性発見装置
 11 入力部
 12 制御部
 13 記憶部
 14 出力部
 121 仮想機械解析部
 122 命令セットアーキテクチャ解析部
 123 脆弱性発見部
 131 実行トレースデータベース(DB)
 132 アーキテクチャ情報DB
 133 VM実行トレースDB
 1211 実行トレース取得部
 1212 VM命令境界検出部
 1213 仮想プログラムカウンタ検出部
 1214 ディスパッチャ検出部
 1215 条件分岐フラグ検出部
 1216 コードキャッシュ検出部
 1221 VM実行トレース取得部
 1222 VM命令収集部
 1223 VM命令判定部
 1231 変異部
 1232 ファジング実行部
REFERENCE SIGNS LIST 10 Vulnerability detection device 11 Input unit 12 Control unit 13 Memory unit 14 Output unit 121 Virtual machine analysis unit 122 Instruction set architecture analysis unit 123 Vulnerability detection unit 131 Execution trace database (DB)
132 Architecture Information DB
133 VM execution trace DB
1211 Execution trace acquisition unit 1212 VM instruction boundary detection unit 1213 Virtual program counter detection unit 1214 Dispatcher detection unit 1215 Conditional branch flag detection unit 1216 Code cache detection unit 1221 VM execution trace acquisition unit 1222 VM instruction collection unit 1223 VM instruction determination unit 1231 Mutation unit 1232 Fuzzing execution unit

Claims (7)

  1.  スクリプトエンジンの仮想機械を解析する第1の解析部と、
     前記仮想機械の命令の体系である命令セットアーキテクチャを解析して、仮想機械命令を収集し、収集した仮想機械命令の命令内容を判定する第2の解析部と、
     前記第1の解析部及び前記第2の解析部によって取得されたアーキテクチャ情報を基に、変異させたコードを用いて仮想機械をファジングする実行部と、
     を有することを特徴とする脆弱性発見装置。
    a first analysis unit that analyzes a virtual machine of a script engine;
    a second analysis unit that analyzes an instruction set architecture, which is an instruction system of the virtual machine, collects virtual machine instructions, and determines instruction contents of the collected virtual machine instructions;
    an execution unit that fuzzes a virtual machine using mutated code based on architecture information acquired by the first analysis unit and the second analysis unit;
    A vulnerability detection device comprising:
  2.  前記第1の解析部は、
     実行時の条件を変えて複数の実行トレースを取得する第1の取得部と、
     前記実行トレースをクラスタリングして、各仮想機械命令の境界を検出する第1の検出部と、
     メモリの読み込み回数に着目した差分実行解析と前記第1の検出部によって検出された各仮想機械命令の境界とを用いて前記複数の実行トレースを解析し、次に実行される前記仮想機械の命令を指し示す変数である仮想プログラムカウンタを検出する第2の検出部と、
     前記第1の検出部によって検出された各仮想機械命令の境界を基に、スクリプトエンジンのバイナリを解析し、ディスパッチャを検出する第3の検出部と、
     メモリの読み込み回数に着目した差分実行解析を用いて前記複数の実行トレースを解析し、実行状態の条件分岐時に分岐がなされるか否かのフラグを保持する領域である条件分岐フラグを検出する第4の検出部と、
     を有することを特徴とする請求項1に記載の脆弱性発見装置。
    The first analysis unit
    a first acquisition unit that acquires a plurality of execution traces by changing execution conditions;
    a first detector for clustering the execution trace to detect boundaries of each virtual machine instruction;
    a second detection unit that analyzes the execution traces using a differential execution analysis focusing on the number of memory reads and the boundaries of each virtual machine instruction detected by the first detection unit, and detects a virtual program counter that is a variable indicating the next instruction of the virtual machine to be executed;
    a third detection unit that analyzes the binary of the script engine based on the boundaries of each virtual machine instruction detected by the first detection unit and detects a dispatcher;
    a fourth detection unit that analyzes the plurality of execution traces using differential execution analysis focusing on the number of memory reads and detects a conditional branch flag that is an area that holds a flag indicating whether or not a branch is to be taken at the time of a conditional branch in an execution state;
    2. The vulnerability discovering device according to claim 1, further comprising:
  3.  前記第2の解析部は、
     前記仮想機械において実行された実行トレースである仮想機械実行トレースを取得する第2の取得部と、
     前記仮想プログラムカウンタ及び前記ディスパッチャを監視しながらテストスクリプトを実行して前記仮想機械実行トレースを取得し、前記仮想機械実行トレースから仮想機械命令を収集する第1の収集部と、
     前記第1の収集部によって収集された仮想機械命令の命令内容を判定する第1の判定部と、
     を有することを特徴とする請求項2に記載の脆弱性発見装置。
    The second analysis unit includes:
    a second acquisition unit that acquires a virtual machine execution trace, which is an execution trace executed in the virtual machine;
    a first collection unit that executes a test script while monitoring the virtual program counter and the dispatcher to obtain the virtual machine execution trace, and collects virtual machine instructions from the virtual machine execution trace;
    a first determination unit that determines instruction content of the virtual machine instruction collected by the first collection unit;
    3. The vulnerability discovering device according to claim 2, further comprising:
  4.  前記第1の解析部は、
     前記実行トレース、前記仮想プログラムカウンタ及び前記仮想機械実行トレースを基に、前記仮想機械実行トレースから、実行される前記仮想機械命令が保存されるキャッシュであるコードキャッシュを検出する第5の検出部
     を有し、
     前記実行部は、前記コードキャッシュを監視しながらシードスクリプトを実行して前記コードキャッシュからバイトコードを抽出し、抽出した前記バイトコードに変異を加え、再度前記コードキャッシュに埋め込んで実行することを特徴とする請求項3に記載の脆弱性発見装置。
    The first analysis unit
    a fifth detection unit that detects a code cache, which is a cache in which the virtual machine instructions to be executed are stored, from the virtual machine execution trace based on the execution trace, the virtual program counter, and the virtual machine execution trace;
    The vulnerability detection device according to claim 3, characterized in that the execution unit executes a seed script while monitoring the code cache, extracts a bytecode from the code cache, mutates the extracted bytecode, and embeds it back into the code cache for execution.
  5.  前記第1の判定部は、
     前記仮想機械実行トレースの仮想機械オペコードごとの仮想プログラムカウンタの変化量のばらつきによって分岐仮想機械命令を判定し、
     前記分岐仮想機械命令のうち、前記条件分岐フラグにアクセスしている分岐仮想機械命令を、条件分岐仮想機械命令と判定し、
     前記仮想機械実行トレースのうち、任意の前記分岐仮想機械命令を走査し、該任意の分岐仮想機械命令の直後に分岐する分岐命令がある場合、該任意の分岐仮想機械命令を呼び出し仮想機械命令と判定し、
     前記仮想機械実行トレースから前記呼び出し仮想機械命令を取り出し、取り出した前記呼び出し仮想機械命令の直後に分岐する前記分岐仮想機械命令があった場合、取り出した前記呼び出し仮想機械命令の直後に分岐する分岐仮想機械命令を戻り仮想機械命令と判定する
     ことを特徴とする請求項4に記載の脆弱性発見装置。
    The first determination unit is
    determining a branch virtual machine instruction based on a variation in a change amount of a virtual program counter for each virtual machine opcode of the virtual machine execution trace;
    determining, among the branch virtual machine instructions, a branch virtual machine instruction that accesses the conditional branch flag as a conditional branch virtual machine instruction;
    scanning any of the branch virtual machine instructions in the virtual machine execution trace, and if there is a branch instruction that branches immediately after the any of the branch virtual machine instructions, determining that the any of the branch virtual machine instructions is a call virtual machine instruction;
    5. The vulnerability detection device according to claim 4, further comprising: extracting the call virtual machine instruction from the virtual machine execution trace; and, if there is a branch virtual machine instruction that branches immediately after the extracted call virtual machine instruction, determining that the branch virtual machine instruction that branches immediately after the extracted call virtual machine instruction is a return virtual machine instruction.
  6.  脆弱性発見装置が実行する脆弱性発見方法であって、
     スクリプトエンジンの仮想機械を解析する第1の解析工程と、
     前記仮想機械の命令の体系である命令セットアーキテクチャを解析して、仮想機械命令を収集し、収集した仮想機械命令の命令内容を判定する第2の解析工程と、
     前記第1の解析工程及び前記第2の解析工程において取得されたアーキテクチャ情報を基に、変異させたコードを用いて仮想機械をファジングする実行工程と、
     を含んだことを特徴とする脆弱性発見方法。
    A vulnerability discovery method executed by a vulnerability discovery device, comprising:
    a first analysis step of analyzing a virtual machine of a script engine;
    a second analysis step of analyzing an instruction set architecture, which is an instruction system of the virtual machine, to collect virtual machine instructions and determine the instruction contents of the collected virtual machine instructions;
    an execution step of fuzzing a virtual machine using mutated code based on architecture information acquired in the first analysis step and the second analysis step;
    A vulnerability discovery method comprising:
  7.  スクリプトエンジンの仮想機械を解析する第1の解析ステップと、
     前記仮想機械の命令の体系である命令セットアーキテクチャを解析して、仮想機械命令を収集し、収集した仮想機械命令の命令内容を判定する第2の解析ステップと、
     前記第1の解析ステップ及び前記第2の解析ステップにおいて取得されたアーキテクチャ情報を基に、変異させたコードを用いて仮想機械をファジングする実行ステップと、
     をコンピュータに実行させるための脆弱性発見プログラム。
    a first analysis step of analyzing a virtual machine of a script engine;
    a second analysis step of analyzing an instruction set architecture, which is an instruction system of the virtual machine, to collect virtual machine instructions and determine instruction contents of the collected virtual machine instructions;
    an execution step of fuzzing a virtual machine using mutated code based on architecture information acquired in the first analysis step and the second analysis step;
    A vulnerability detection program that is executed on a computer.
PCT/JP2022/037924 2022-10-11 2022-10-11 Vulnerability discovery device, vulnerability discovery method, and vulnerability discovery program WO2024079793A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/037924 WO2024079793A1 (en) 2022-10-11 2022-10-11 Vulnerability discovery device, vulnerability discovery method, and vulnerability discovery program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/037924 WO2024079793A1 (en) 2022-10-11 2022-10-11 Vulnerability discovery device, vulnerability discovery method, and vulnerability discovery program

Publications (1)

Publication Number Publication Date
WO2024079793A1 true WO2024079793A1 (en) 2024-04-18

Family

ID=90668967

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/037924 WO2024079793A1 (en) 2022-10-11 2022-10-11 Vulnerability discovery device, vulnerability discovery method, and vulnerability discovery program

Country Status (1)

Country Link
WO (1) WO2024079793A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022180702A1 (en) * 2021-02-24 2022-09-01 日本電信電話株式会社 Analysis function addition device, analysis function addition program, and analysis function addition method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022180702A1 (en) * 2021-02-24 2022-09-01 日本電信電話株式会社 Analysis function addition device, analysis function addition program, and analysis function addition method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KYLE STEPHEN S.KYLE@ED.AC.UK; LEATHER HUGH HLEATHER@INF.ED.AC.UK; FRANKE BJöRN BFRANKE@INF.ED.AC.UK; BUTCHER DAVE DAVE.BUTCHE: "Application of Domain-aware Binary Fuzzing to Aid Android Virtual Machine Testing", ACM SIGPLAN NOTICES, ASSOCIATION FOR COMPUTING MACHINERY, US, vol. 50, no. 7, 14 March 2015 (2015-03-14), US , pages 121 - 132, XP058493219, ISSN: 0362-1340, DOI: 10.1145/2817817.2731198 *
USUI, TOSHINORI ET AL.: "Automatically Appending Execution Stall/Stop Prevention to Vanilla Script Engine", IPSJ COMPUTER SECURITY SYMPOSIUM 2021, 2021, pages 794 - 801 *
YUTING CHEN ; TING SU ; ZHENDONG SU: "Deep differential testing of JVM implementations", SOFTWARE ENGINEERING, IEEE PRESS, 25 May 2019 (2019-05-25) - 31 May 2019 (2019-05-31), pages 1257 - 1268, XP058432850, DOI: 10.1109/ICSE.2019.00127 *

Similar Documents

Publication Publication Date Title
Xu et al. Spain: security patch analysis for binaries towards understanding the pain and pills
TWI553503B (en) Method of generating in-kernel hook point candidates to detect rootkits and system thereof
Carmony et al. Extract Me If You Can: Abusing PDF Parsers in Malware Detectors.
US20200380125A1 (en) Method for Detecting Libraries in Program Binaries
JP7115552B2 (en) Analysis function imparting device, analysis function imparting method and analysis function imparting program
US10459704B2 (en) Code relatives detection
WO2022180702A1 (en) Analysis function addition device, analysis function addition program, and analysis function addition method
CN108268777A (en) A kind of similarity detection method that unknown loophole discovery is carried out using patch information
He et al. Sofi: Reflection-augmented fuzzing for javascript engines
Beaman et al. Fuzzing vulnerability discovery techniques: Survey, challenges and future directions
Wu et al. Evaluating and improving neural program-smoothing-based fuzzing
WO2021167483A1 (en) Method and system for detecting malicious files in a non-isolated environment
CN115146282A (en) AST-based source code anomaly detection method and device
CN111428247B (en) Method for improving computer leak library
CN112948828A (en) Binary program malicious code detection method, terminal device and storage medium
Shi et al. {AIFORE}: Smart Fuzzing Based on Automatic Input Format Reverse Engineering
Ye et al. Rapidfuzz: Accelerating fuzzing via generative adversarial networks
WO2024079793A1 (en) Vulnerability discovery device, vulnerability discovery method, and vulnerability discovery program
WO2023067668A1 (en) Analysis function addition method, analysis function addition device, and analysis function addition program
Qi et al. A Malware Variant Detection Method Based on Byte Randomness Test.
WO2024079803A1 (en) Vulnerability detection device, vulnerability detection method, and vulnerability detection program
WO2024079804A1 (en) Analysis function addition device, analysis function addition method, and analysis function addition program
Imtiaz et al. Predicting vulnerability for requirements
WO2023067667A1 (en) Analysis function imparting method, analysis function imparting device, and analysis function imparting program
WO2024079794A1 (en) Analysis function addition device, analysis function addition method, and analysis function addition program