WO2024214263A1 - 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム - Google Patents

解析機能付与装置、解析機能付与方法及び解析機能付与プログラム Download PDF

Info

Publication number
WO2024214263A1
WO2024214263A1 PCT/JP2023/015094 JP2023015094W WO2024214263A1 WO 2024214263 A1 WO2024214263 A1 WO 2024214263A1 JP 2023015094 W JP2023015094 W JP 2023015094W WO 2024214263 A1 WO2024214263 A1 WO 2024214263A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
symbol table
script
virtual machine
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2023/015094
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
利宣 碓井
裕平 川古谷
誠 岩村
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NTT Inc
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP2025513734A priority Critical patent/JPWO2024214263A1/ja
Priority to PCT/JP2023/015094 priority patent/WO2024214263A1/ja
Publication of WO2024214263A1 publication Critical patent/WO2024214263A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements

Definitions

  • the present invention relates to an analysis function providing device, an analysis function providing method, and an analysis function providing program.
  • Script analysis techniques are used for a variety of purposes, such as compiler optimization for just-in-time (JIT) compilation, software testing and debugging, fuzzing, and malware analysis.
  • instrumentation is a technique for obtaining information about the execution state of a program during execution by adding code with analytical functions to the program to be analyzed and then executing the program.
  • instrumentation can be used to insert logging code after each execution of a program to determine the total number of instructions executed.
  • instrumentation can be used to insert logging code after each execution of a branch to determine the control flow that was executed.
  • This type of instrumentation is an important technology that is widely used for software testing as well as cybersecurity purposes such as malware analysis and vulnerability detection.
  • Instrumentation can be divided into dynamic instrumentation and static instrumentation.
  • Dynamic instrumentation is a technique for dynamically adding code for analysis by using techniques that dynamically change the behavior of a program as it is executed.
  • Static instrumentation is a technique for statically adding code for analysis by using techniques that rewrite the program before it is executed.
  • Instrumentation can be applied to a wide range of objects, including source code, scripts, executable binaries (hereafter referred to as binaries), and bytecode.
  • binaries executable binaries
  • bytecode bytecode
  • Dynamic binary instrumentation is a technique for obtaining information about the execution state by dynamically adding analytical code to the binary program being analyzed at runtime.
  • injection of analytical code is mainly achieved by using hooks.
  • the process for injecting analysis code is as follows: First, the code to be injected is placed in memory. Then, when a specific, pre-determined command or function is executed, a hook is placed so that the code to be injected branches to it. The code is then executed. At the end of the code, execution is returned to the branch source and the original processing is resumed.
  • obfuscation involves applying conversions to programs that make them difficult to interpret, primarily to hinder static analysis.
  • parts of the script are encoded or encrypted, and then dynamically decoded or decrypted at run time before execution. In such cases, the type of script that will be executed is not clear until it is executed. This makes static analysis of the script difficult.
  • Script execution method Scripts are executed by a script engine (also called an interpreter). Scripts are generally converted into bytecodes at run time, and the bytecodes are interpreted and executed by a virtual machine (VM). For this reason, scripts are analyzed before execution, and the bytecodes are analyzed at run time.
  • a script engine also called an interpreter. Scripts are generally converted into bytecodes at run time, and the bytecodes are interpreted and executed by a virtual machine (VM). For this reason, scripts are analyzed before execution, and the bytecodes are analyzed at run time.
  • VM virtual machine
  • Non-Patent Document 1 a method has been proposed to realize dynamic instrumentation for Java bytecode (Non-Patent Document 1). Also, a method has been proposed to realize dynamic instrumentation for ActionScript3 bytecode (Non-Patent Document 2).
  • Non-Patent Documents 1 and 2 use information specific to each VM, and have the problem of requiring individual design and implementation for each of the various scripting languages.
  • Dynamic bytecode instrumentation in scripts generally requires the use of support functions such as a debugger provided by the script engine. This is because the internal specifications of the VM in the script engine that controls the execution of the script are often not made public, making it difficult to monitor the execution state or change the execution required for instrumentation without support functions.
  • the present invention has been made in consideration of the above, and aims to provide an analysis function providing device, an analysis function providing method, and an analysis function providing program that can provide dynamic bytecode instrumentation functionality to script engines that do not have support functions such as a debugger and whose internal specifications are unknown, without the need for manual individual analysis, design, and implementation.
  • the analysis function providing device of the present invention is characterized by having a first analysis unit that analyzes the virtual machine of the script engine based on an execution trace obtained by executing a test script while monitoring the binary of the script engine, and detects a symbol table that holds information about variables based on the analysis result; a second analysis unit that analyzes the instruction set architecture, which is the instruction system of the virtual machine, to collect virtual machine instructions, and analyzes the collected virtual machine instructions based on the analysis result of the virtual machine of the script engine and the analysis result of the instruction set architecture; an extraction unit that extracts instrumentation bytecode and symbol table by executing a transplantation script based on the analysis result of the virtual machine and the analysis result of the instruction set architecture; and a transplantation unit that transplants the instrumentation bytecode and symbol table extracted by the extraction unit into the bytecode and symbol table generated by the script engine in the memory space of the script engine based on the script to be analyzed.
  • FIG. 1 is a diagram illustrating an example of the configuration of a script engine.
  • FIG. 2 is a diagram showing pseudo code of a VM included in the script engine.
  • FIG. 3 is a diagram for explaining an example of instrumentation by bytecode transplantation.
  • FIG. 4 is a diagram illustrating an example of a configuration of an analysis function providing device according to an embodiment.
  • FIG. 5 is a diagram showing an example of a first test script used for detecting a virtual program counter (VPC).
  • FIG. 6 is a diagram showing an example of the second test script.
  • FIG. 7 is a diagram showing an example of the third test script.
  • FIG. 8 is a diagram showing an example of the fourth test script.
  • FIG. 9 is a diagram showing an example of the fifth test script.
  • FIG. 5 is a diagram illustrating an example of the configuration of a script engine.
  • FIG. 2 is a diagram showing pseudo code of a VM included in the script engine.
  • FIG. 3 is a diagram for explaining an
  • FIG. 10 is a diagram showing an example of the fifth test script.
  • FIG. 11 is a diagram showing an example of a porting script.
  • FIG. 12 is a diagram showing the bytecode and symbol table generated by executing the porting script shown in FIG.
  • FIG. 13 is a diagram illustrating an example of a bytecode.
  • FIG. 14 is a diagram illustrating an example of an execution trace.
  • FIG. 15 illustrates an example of a VM execution trace.
  • FIG. 16 is a diagram showing an example of the structure of a symbol table.
  • FIG. 17 is a diagram illustrating the process of the VM instruction boundary detection unit.
  • FIG. 18 is a diagram illustrating the process of the virtual program counter detection unit.
  • FIG. 19 is a diagram illustrating the process of the dispatcher detection unit.
  • FIG. 19 is a diagram illustrating the process of the dispatcher detection unit.
  • FIG. 20 is a diagram illustrating the process of the code cache detection unit.
  • FIG. 21 is a diagram illustrating the process of the branch VM instruction determination unit.
  • FIG. 22 is a diagram illustrating the process of the branch VM instruction analyzing unit.
  • FIG. 23 is a diagram illustrating the process of the branch VM instruction analyzing unit.
  • FIG. 24 is a diagram illustrating the process of the branch VM instruction analyzing unit.
  • FIG. 25 is a diagram illustrating the process of the extraction unit.
  • FIG. 26 is a diagram illustrating the process of the extraction unit.
  • FIG. 27 is a flowchart illustrating a processing procedure of the analysis process according to the embodiment.
  • FIG. 28 is a flowchart of the execution trace acquisition process shown in FIG. FIG.
  • FIG. 29 is a flowchart illustrating a processing procedure of the VM instruction boundary detection processing illustrated in FIG.
  • FIG. 30 is a flowchart illustrating the processing procedure of the virtual program counter detection processing shown in FIG.
  • FIG. 31 is a flowchart showing the procedure of the dispatcher detection process shown in FIG.
  • FIG. 32 is a flowchart of the code cache detection process shown in FIG.
  • FIG. 33 is a flowchart showing the processing procedure of the symbol table detection processing shown in FIG.
  • FIG. 34 is a flowchart illustrating the procedure of the VM execution trace acquisition process illustrated in FIG.
  • FIG. 35 is a flowchart illustrating the procedure of the VM command collection process illustrated in FIG.
  • FIG. 36 is a flowchart illustrating the processing procedure of the branch VM instruction determination processing shown in FIG. FIG.
  • FIG. 37 is a flowchart illustrating the procedure of the branch VM instruction analysis process shown in FIG.
  • FIG. 38 is a flowchart showing the procedure of the immediate addressing method analysis process shown in FIG.
  • FIG. 39 is a flowchart showing the procedure of the direct addressing method analysis process shown in FIG.
  • FIG. 40 is a flowchart showing the procedure of the relative addressing method analysis process shown in FIG.
  • FIG. 41 is a flowchart showing the processing procedure of the symbol table VM command determination processing shown in FIG.
  • FIG. 42 is a flowchart showing the processing procedure of the symbol table VM command analysis processing shown in FIG.
  • FIG. 43 is a flowchart showing the processing procedure of the symbol table VM command analysis processing shown in FIG.
  • FIG. 43 is a flowchart showing the processing procedure of the symbol table VM command analysis processing shown in FIG.
  • FIG. 44 is a flowchart showing the processing procedure of the extraction process shown in FIG.
  • FIG. 45 is a flowchart of the bytecode extraction process shown in FIG.
  • FIG. 46 is a flowchart showing the processing procedure of the symbol table extraction processing shown in FIG.
  • FIG. 47 is a flowchart showing the processing procedure of the transplantation processing shown in FIG.
  • FIG. 48 is a flowchart showing the procedure of the bytecode porting process shown in FIG.
  • FIG. 49 is a flowchart showing the processing procedure of the symbol table transplantation processing shown in FIG.
  • FIG. 50 is a flowchart showing the processing procedure of the execution process shown in FIG.
  • FIG. 51 is a diagram showing an example of a computer that realizes the analysis function providing device by executing a program.
  • An analysis function adding device first analyzes the VM of a script engine.
  • the analysis function adding device executes a test script while monitoring the binary of the script engine, and acquires a branch trace and a memory access trace as an execution trace.
  • the analysis function adding device analyzes a virtual machine (VM) based on the execution trace, and acquires, as architecture information, a VM instruction boundary, a virtual program counter (VPC), and a code cache in which VM instructions to be executed are stored. Then, the analysis function adding device detects a symbol table that holds information about variables based on the analysis result of the VM of the script engine.
  • VM virtual machine
  • VPC virtual program counter
  • the analysis function adding device analyzes the instruction set architecture, which is the system of VM instructions for the script engine.
  • the analysis function adding device executes the test script while monitoring the VPC and dispatcher, and obtains a VM execution trace.
  • the analysis function adding device collects VM instructions by analyzing the VM execution trace.
  • the analysis function adding device determines which branch VM instructions cause branches to occur in the script, and analyzes the addressing method of the branch destination of the branch VM instruction and the operands of the branch VM instruction. Furthermore, it determines which symbol table VM instructions access a symbol table, and analyzes the access destination of the symbol table VM instruction. In this way, the analysis function adding device obtains information on variables managed by the symbol table.
  • the analysis function adding device adds dynamic bytecode instrumentation function to the script engine based on the analysis results of the VM and the analysis results of the instruction set architecture. Based on the acquired architecture information, the analysis function adding device executes a porting script to extract bytecode and symbol table for instrumentation, and ports the extracted bytecode and symbol table to the bytecode and symbol table generated by the script engine in its memory space based on the script to be analyzed, thereby adding dynamic bytecode instrumentation to the script engine.
  • the script to be analyzed is the subject of analysis by the script engine with dynamic bytecode instrumentation function.
  • Figure 1 is a diagram for explaining an example of the configuration of a script engine.
  • script engine 1 has a bytecode compiler 2 and a VM 3.
  • bytecode compiler 2 has a syntax analysis unit 4 and a bytecode generation unit 5.
  • VM 3 has a code cache unit 6, a fetch unit 7, a decode unit 8, and an execution unit 9. These fetch unit 7, decode unit 8, and execution unit 9 are executed repeatedly and are called an interpreter loop. Then, script engine 1 accepts the input of a script.
  • the syntax analysis unit 4 receives the script as input, and through lexical and syntactic analysis generates an Abstract Syntax Tree (AST), which it outputs to the bytecode generation unit 5.
  • the bytecode generation unit 5 receives the AST as input, converts it into bytecode, and stores it in the code cache unit 6.
  • the fetch unit 7 fetches the VM opcode from the code cache unit 6 and outputs it to the decode unit 8.
  • the VM opcode refers to the opcode portion of the VM instruction.
  • the decode unit 8 receives the VM opcode as input, interprets the VM opcode using a decoder/dispatcher, and dispatches it to the corresponding program.
  • the execution unit 9 executes the program corresponding to the VM instruction. The contents written in the script are executed by executing the VM instructions one after another through a repeated interpreter loop.
  • FIG 2 is a diagram showing pseudocode of the VM of the script engine.
  • the pseudocode first initializes the VPC (line 1).
  • the while loop is the interpreter loop (lines 2 to 7).
  • the fetch unit 7 obtains the VM opcode of the VM instruction at the position pointed to by the VPC in the code cache that holds the bytecode (line 3).
  • the decoder uses a Switch statement to interpret the VM instruction (line 4), and the dispatcher calls the instruction handler based on the VM opcode (lines 5 and on).
  • the instruction handler then performs the operation corresponding to the instruction. Input and output are performed using a virtual stack and virtual registers (line 6), and constants and variables are referenced via a symbol table (line 7).
  • the analysis function adding device analyzes a script engine whose internal specifications are unknown, acquires information for dynamic bytecode instrumentation, and realizes adding dynamic bytecode instrumentation to the script engine.
  • the analysis function adding device adds dynamic bytecode instrumentation functionality, for example, instrumentation functionality through bytecode porting, to the script engine.
  • Figure 3 is a diagram explaining an example of instrumentation using bytecode porting.
  • the script engine executes the script to be analyzed, and if there is any processing that meets the hook conditions, it saves and saves the processing that meets the hook conditions ( Figure 3 (1)).
  • the stub includes a process to obtain a specific variable from the symbol table using a symbol table VM instruction, a process to call the ported bytecode using a branch VM instruction after passing the obtained variable, and a process to branch back to the original location after the ported bytecode is executed.
  • the stub calls the ported bytecode, executes it ( Figure 3 (4)), and branches back to the original location after the ported bytecode is executed. Then, after branching back to the original location by executing the stub, the saved processing is restored and processing is resumed ( Figure 3 (5)).
  • the analysis function adding device executes a test script while monitoring the binary of the script engine in a stage prior to the process of adding the analysis function, and obtains architecture information including the VPC, dispatcher, code cache, and symbol table. Furthermore, the analysis function adding device executes a test script while monitoring the VPC and dispatcher, and analyzes the VM execution trace obtained, thereby determining the VM instructions of the script engine and obtaining information on the instruction set architecture. In particular, the analysis function adding device analyzes branch VM instructions and symbol table VM instructions for script engines whose internal specifications are unknown in a stage prior to the process of adding the analysis function.
  • the analysis function adding device can detect various architectural information through analysis based on the execution trace and VM execution trace, even for script engines whose VM internal specifications are unknown, and can add dynamic bytecode instrumentation functionality to the script engine without the need for manual reverse engineering or the use of support functions such as a debugger.
  • Fig. 4 is a diagram illustrating an example of the configuration of the analysis function imparting device 10 according to the embodiment.
  • the analysis function-imparting device 10 has an input unit 11, a control unit 12, a storage unit 13, and an output unit 14.
  • the analysis function-imparting device 10 accepts input of a test script, a script engine binary, and a seed script.
  • the input unit 11 is composed of input devices such as a keyboard and a mouse, and accepts information input from the outside and inputs it to the control unit 12.
  • the input unit 11 also has a communication interface for sending and receiving various information to and from other devices connected via a wired connection or a network, etc., and accepts input of information sent from other devices.
  • the input unit 11 accepts input of test scripts, script engine binaries, porting scripts, scripts to be analyzed, and hook settings, and outputs them to the control unit 12.
  • the test script is a script that is input when dynamically analyzing the script engine to obtain an execution trace and a VM execution trace.
  • the test script includes a first test script for VM analysis, a second test script for symbol table detection, a third test script for branch VM instruction detection, and a fourth test script and a fifth test script for symbol table VM instruction analysis.
  • the script engine binary is an executable file that constitutes the script engine.
  • the script engine binary may be composed of multiple executable files.
  • test script configuration Let us explain about test scripts.
  • a test script is a script that is input when dynamically analyzing a script engine. This test script focuses on the number of branch instruction executions and memory reads and writes, and is used to capture the difference in the behavior of the script engine that occurs when the test script is executed a different number of times. This test script is prepared before the analysis and is created manually. Creating it requires knowledge of the specifications of the target script language.
  • FIG. 5 shows an example of a first test script used to detect VPCs.
  • the first test script uses a repetitive process (line 2).
  • the first test script changes the execution conditions and generates differences by increasing or decreasing the number of repetitions (line 2) and the number of repeated statements (lines 3 to 5) in the test script.
  • the first test script is subject to processing from the execution trace acquisition process to the code cache detection process, which will be described later.
  • FIG. 6 is a diagram showing an example of a second test script.
  • a characteristic value is used to enable matching of values in order to detect the symbol table. For example, in the example of FIG. 6, it is "3735928559".
  • the characteristic value is "3735928559".
  • FIG. 7 is a diagram showing an example of a third test script.
  • the third test script is used to detect a branch VM instruction.
  • the third test script uses multiple conditional branches (lines 4 to 8).
  • the branch conditions are controlled so that the multiple conditional branches are either taken or not taken in a specific order pattern (lines 1 and 5).
  • the number of conditional branches and the order pattern of branch success or failure are changed to generate differences.
  • the third test script is subject to the execution trace acquisition process, VM execution trace acquisition process, branch VM instruction determination process, and branch VM instruction analysis process described below.
  • FIG. 8 is a diagram showing an example of a fourth test script.
  • the fourth test script is a test script that accesses variables (e.g., "a” and "b") in two different scopes (e.g., "func1" and "func2"). From the access trace by the fourth test script, the difference between each part accessed during the execution of each symbol table VM instruction of the first combination is compared with the position of the operand to determine whether the difference is an operand that specifies a symbol table.
  • FIGS. 9 and 10 are diagrams showing an example of a fifth test script.
  • the fifth test script is a test script that accesses two different variables (e.g., "a" and "b") in the same scope. From the access trace by the fifth test script, the difference between each part accessed during the execution of each symbol table VM instruction of the second combination is compared with the position of the operand to determine whether the difference is an operand that specifies a variable. Note that both the fourth and fifth test scripts are subject to the execution trace acquisition process, VM execution trace acquisition process, and symbol table VM instruction analysis process described below.
  • FIG. 11 is a diagram showing an example of a porting script. By executing the porting script, bytecode and a symbol table to be ported during dynamic bytecode instrumentation are extracted.
  • Fig. 12 is a diagram showing the bytecode and the symbol table generated by executing the porting script shown in Fig. 11.
  • the porting script includes a function 202 to be ported and a call section 204 of the function to be ported.
  • the function 202 to be ported includes the process content 203 to be ported.
  • bytecode 205 (see FIG. 12) corresponding to the processing portion of the process content 203 to be ported is extracted and ported to the bytecode generated by the script engine in its own memory space based on the script to be analyzed.
  • the call unit 204 of the function to be ported calls the function to be ported 202, generating bytecode and symbol table 210 (see FIG. 12), making extraction possible.
  • the symbol table 210 is ported to a symbol table that the script engine generates in its own memory space based on the script to be analyzed.
  • FIG. 13 is a diagram showing an example of a bytecode.
  • Bytecode 205 is just an example, and a bytecode exists for each routine.
  • a bytecode has a data structure including an opcode 2051 and an operand 2052.
  • Opcode 2051 is a value that specifies the type of operation, and is actually specified by a corresponding hexadecimal value.
  • Operand 2052 is a value that is the target of the operation, and is generally specified by an ordinal number in a symbol table.
  • the storage unit 13 is realized by a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk, and stores the processing program that operates the analysis function providing device 10, data used during execution of the processing program, etc.
  • the storage unit 13 has an execution trace database (DB) 131, a VM execution trace DB 133, and an architecture information DB 132 that stores architecture information acquired by the virtual machine analysis unit 121 and the instruction set architecture analysis unit 122 (described later).
  • the execution trace DB 131 and the VM execution trace DB 133 store the execution traces and VM execution traces acquired by the execution trace acquisition unit 1211 and the VM execution trace acquisition unit 1221, respectively.
  • the execution trace DB 131 and the VM execution trace DB 133 are managed by the analysis function providing device 10.
  • the execution trace DB 131 and the VM execution trace DB 133 may be managed by another device (such as a server), in which case the execution trace acquisition unit 1211 (described later) and the VM execution trace acquisition unit 1221 (described later) output the acquired execution traces and VM execution traces to a management server or the like for the execution trace DB 131 and the VM execution trace DB 133 via the communication interface of the output unit 14, and store them in the execution trace DB 131 and the VM execution trace DB 133.
  • Fig. 14 is a diagram showing an example of an execution trace. As described above, an execution trace is composed of a branch trace and a memory access trace. Fig. 14 shows an excerpt of an execution trace. The structure of an execution trace will be described below with reference to Fig. 14.
  • Trace indicates whether the log line is a branch trace or a memory access trace.
  • a branch trace log line has the format shown, for example, in lines 1 to 10 of Figure 14, and consists of three elements: type, src, and dst.
  • type indicates whether the executed branch instruction was a call instruction, a jmp instruction, or a ret instruction.
  • src indicates the address of the branch source, and dst indicates the address of the branch destination.
  • Fig. 15 is a diagram showing an example of a VM execution trace.
  • a VM execution trace is a record of a VM opcode and a VPC.
  • Fig. 15 shows a part of a VM execution trace. The configuration of a VM execution trace will be described below with reference to Fig. 15.
  • a log line of a VM execution trace is, for example, in the format shown in Figure 15, and consists of two elements: vpc and vmop (vm opcode).
  • vpc indicates the value of the VPC.
  • vmop indicates the value of the VM opcode that is virtually assigned to each pointer that points to the beginning of the VM instruction handler to be executed, obtained from the pointer cache.
  • the control unit 12 has an internal memory for storing programs that define various processing procedures and the like, and necessary data, and executes various processes using these.
  • the control unit 12 is an electronic circuit such as a CPU (Central Processing Unit) or an MPU (Micro Processing Unit).
  • the control unit 12 has a virtual machine analysis unit 121 (first analysis unit), an instruction set architecture analysis unit 122 (second analysis unit), an extraction unit 1231, a transplantation unit 1232, and an execution unit 1233.
  • the virtual machine analysis unit 121 analyzes the VM of the script engine.
  • the virtual machine analysis unit 121 obtains multiple execution traces by changing the execution conditions, analyzes the multiple execution traces using differential execution analysis, and obtains the VPC.
  • the virtual machine analysis unit 121 also analyzes the script engine binary to obtain the boundaries and dispatchers of VM instructions.
  • the virtual machine analysis unit 121 detects a code cache from the VM execution trace.
  • the VM instructions to be executed are stored in the code cache.
  • the virtual machine analysis unit 121 detects a symbol table.
  • the virtual machine analysis unit 121 has an execution trace acquisition unit 1211 (first acquisition unit), a VM instruction boundary detection unit 1212 (first detection unit), a virtual program counter detection unit 1213 (second detection unit), a dispatcher detection unit 1214 (third detection unit), a code cache detection unit 1215 (fourth detection unit), and a symbol table detection unit 1216 (fifth detection unit).
  • the execution trace acquisition unit 1211 accepts the first to fifth test scripts and the script engine binary as input.
  • the execution trace acquisition unit 1211 acquires an execution trace by executing the first to fifth test scripts while monitoring the execution of the script engine binary.
  • An execution trace consists of a branch trace and a memory access trace.
  • a branch trace records the type of branch instruction at the time of execution, the branch source address, and the branch destination address.
  • a memory access trace records the type of memory operation at the time of execution (read/write), and the memory address and value of the operation target. It is known that branch traces and memory access traces can be obtained by hooking a memory operation instruction, inserting code for log output, and executing it.
  • the execution trace obtained by the execution trace acquisition unit 1211 is stored in the execution trace DB 131.
  • the execution trace acquisition unit 1211 acquires an API (Application Programming Interface) trace and stores it in the execution trace DB 131.
  • the API trace is a record of the system API called during execution and its arguments.
  • the VM instruction boundary detection unit 1212 clusters the execution trace for the first test script to detect the boundaries of each VM instruction.
  • the VM instruction boundary detection unit 1212 clusters the execution trace and detects clusters with a threshold or more of execution counts as VM instructions. In clustering, consecutive code regions that are executed multiple times are detected. For example, executed instructions that are close in distance to each other in the code may be grouped together, common subsequences of executed code blocks may be found, or other methods may be used.
  • the analysis function providing device 10 detects the start and end points of consecutive instruction sequences that make up the detected VM instruction as boundaries.
  • the VM instruction boundaries detected here are used in VPC detection and dispatcher detection.
  • the virtual program counter detection unit 1213 extracts and analyzes the execution trace for the first test script stored in the execution trace DB 131 to detect the VPC.
  • the virtual program counter detection unit 1213 analyzes multiple execution traces using differential execution analysis focusing on the number of memory reads and the boundaries of each VM instruction detected by the VM instruction boundary detection unit 1212 to detect the VPC.
  • the virtual program counter detection unit 1213 uses the fact that a read into the memory that holds the VPC always occurs after the execution of each VM instruction, and detects the VPC by discovering the destination of this read.
  • the virtual program counter detection unit 1213 uses differential execution analysis that focuses on the number of memory reads to detect VPCs.
  • the virtual program counter detection unit 1213 compares execution traces of multiple test scripts acquired using the first test script, and finds memories for which the number of memory reads changes in proportion to both the increase or decrease in the number of repetitions and the number of repeated statements.
  • the virtual program counter detection unit 1213 then refers to the boundaries of each VM instruction detected by the VM instruction boundary detection unit 1212, and narrows down the memory values that have been read to those that always point to the start point of the VM instruction.
  • the virtual program counter detection unit 1213 detects this memory as a VPC.
  • the dispatcher detection unit 1214 extracts each VM instruction portion from the script engine binary based on the boundaries of the VM instructions detected by the VM instruction boundary detection unit 1212, and detects the portions with high similarity between each VM instruction as dispatchers.
  • the dispatcher is realized by referencing the pointer cache and jumping to the pointer of the next VM instruction handler.
  • Dispatchers are placed in a distributed manner at the rear of each VM instruction handler, and the code therein is generally highly identical.
  • the analysis function adding device detects dispatchers using a specified method by searching for code with high similarity that exists at the rear of such VM instruction handlers. To detect the portions with high similarity, for example, a sequence alignment algorithm or other methods may be used.
  • the code cache detection unit 1215 detects a code cache, which is a cache in which virtual machine instructions to be executed are stored, from the VM execution trace based on the execution trace, VPC, and VM execution trace.
  • the code cache detection unit 1215 detects the memory area pointed to by the VPC as a code cache from the VM execution trace.
  • the code cache detection unit 1215 detects the code location from which the memory allocation function that allocated this code cache was called from the execution trace.
  • the code cache detection unit 1215 detects all memory areas allocated at this code location from the VM execution trace as code caches.
  • the code cache detection unit 1215 detects code locations that are writing to the code cache from the execution trace.
  • the code cache detection unit 1215 detects writing by these code locations in the VM execution trace as updates to the code cache.
  • the symbol table detection unit 1216 detects architecture information of the symbol table that holds information about variables based on the analysis results of the VM of the script engine.
  • Figure 16 is a diagram showing an example of the structure of a symbol table.
  • the symbol table has information about variables and constants, and information about functions.
  • the first column shows ordinal numbers
  • the second column shows information that identifies variables, constants, or functions
  • the third row shows the address of the object entity.
  • the first is the location of the symbol table in memory and the code that allocates the memory area for the symbol table.
  • the second is the structure of the symbol table.
  • the third is a list of the opcodes of the symbol table VM instruction.
  • the fourth is the operands (targets of the instruction) of the symbol table VM instruction, in particular which operands are responsible for identifying the scope and variables.
  • the symbol table detection unit 1216 detects the first and second pieces of information related to the symbol table, the location of the symbol table in memory and the structure of the symbol table.
  • the symbol table detection unit 1216 detects the position of the symbol table in the memory area and the structure of the symbol table using the second test script in which the characteristic value is used and the memory access trace by the second test script. From the memory access trace of the second test script, the symbol table detection unit 1216 detects the position of the symbol table in the memory area and the structure that references the characteristic value based on the storage location of the characteristic value in the memory area and the reference source to the characteristic value. From the API trace, the symbol table detection unit 1216 detects the code location that secures the memory area for the symbol table. The symbol table detection unit 1216 outputs the position of the symbol table in the memory area and the code location that secures that area. The symbol table detection unit 1216 detects the first and second pieces of information out of the four pieces of information related to the symbol table.
  • the instruction set architecture analysis unit 122 analyzes the instruction set architecture, which is the system of VM instructions.
  • the instruction set architecture analysis unit 122 collects VM instructions. Based on the results of the script engine's VM analysis and the results of the instruction set architecture analysis, the instruction set architecture analysis unit 122 determines, among the collected VM instructions, a branch VM instruction that generates a branch within a script, and analyzes the addressing method of the branch destination of the branch VM instruction and the operands of the branch VM instruction.
  • the instruction set architecture analysis unit 122 determines, among the collected VM instructions, a symbol table VM instruction that accesses a symbol table, and analyzes the access destination of the symbol table VM instruction.
  • the instruction set architecture analysis unit 122 has a VM execution trace acquisition unit 1221 (second acquisition unit), a VM instruction collection unit 1222 (first collection unit), a branch VM instruction determination unit 1223 (first determination unit), a branch VM instruction analysis unit 1224 (third analysis unit), a symbol table VM instruction determination unit 1225 (second determination unit), and a symbol table VM instruction analysis unit 1226 (fourth analysis unit).
  • the VM execution trace acquisition unit 1221 accepts the test script and the script engine binary as input.
  • the VM execution trace acquisition unit 1221 acquires the VM execution trace by monitoring the VPC and the pointer of the VM instruction handler dispatched by the dispatcher.
  • the VM execution trace acquisition unit 1221 acquires the VM execution trace, which is the execution trace executed on the VM, by executing the third to fifth test scripts while monitoring the execution of the script engine binary.
  • the VM execution trace acquisition unit 1221 links the pointer to the VM instruction with the VM instruction, and virtually assigns a VM opcode as an identifier to each.
  • a VM execution trace is an execution trace executed in a VM, in which a VM opcode is virtually assigned as an identifier, and in which a pointer to the executed VM handler and a VPC are recorded.
  • a VM execution trace is a record of a pointer to an executed VM instruction handler and a VPC.
  • a VM execution trace is composed of a VPC and a VM opcode for each executed VM instruction.
  • the recording of a VPC can be achieved by monitoring the memory of the VPC detected by the virtual program counter detection unit 1213.
  • a VM opcode is an identifier virtually assigned to each of a pointer to a VM instruction and a VM instruction that are linked together.
  • the VM execution trace acquired by the VM execution trace acquisition unit 1221 is stored in the VM execution trace DB 133.
  • the VM instruction collection unit 1222 accepts the VPC and dispatcher as input, executes the test script while monitoring the VPC and dispatcher, and obtains the VM execution trace.
  • the VM instruction collection unit 1222 collects VM instructions from the VM execution trace.
  • branching is done by a branch VM instruction that handles the branching.
  • a branch VM instruction generally has information about the branch destination in an operand.
  • VPC virtual program counter
  • the memory area in which the branch destination address is stored is specified. This memory area may be a variable, a virtual stack, or a virtual register. In direct addressing, the address stored at the destination of the specified address becomes the value of the VPC.
  • the branch destination determines the branch destination.
  • the first is a list of the opcodes of the branch VM instruction;
  • the second is information indicating the operands of the branch VM instruction, specifically which operand holds the branch destination; and the third is the addressing method for the branch destination.
  • the branch VM instruction determination unit 1223 determines which VM instructions are branch VM instructions from among the VM instructions collected by the VM instruction collection unit 1222, and obtains a list of opcodes of the branch VM instructions.
  • the branch VM instruction determination unit 1223 determines which VM instructions are branch VM instructions from among the VM instructions collected by the VM instruction collection unit 1222, based on the variation in the amount of change in VPC for each VM opcode in the VM execution trace for the third test script.
  • the branch VM instruction determination unit 1223 extracts and analyzes the VM execution trace for the third test script stored in the VM execution trace DB 133 to determine the branch VM instruction. For each VM opcode assigned as an identifier, the branch VM instruction determination unit 1223 collects the amount of change in VPC before and after its execution. If the VM opcode is other than a branch VM instruction, the amount of change in VPC is almost constant. On the other hand, if the VM opcode is a branch VM instruction, the VPC varies depending on the branch destination.
  • the branch VM instruction determination unit 1223 therefore determines whether an instruction is a branch VM instruction based on the variance in the amount of change in the virtual program counter for each VM opcode in the VM execution trace for the third test script.
  • the branch VM instruction determination unit 1223 focuses on the fact that the amount of variance in the VPC value differs between branch VM instructions and other VM instructions, determines a threshold value, and determines instructions with greater variance in the VPC value as branch VM instructions.
  • the branch VM instruction determination unit 1223 evaluates the variance in the amount of change in the VPC for each VM opcode using variance, and determines instructions with variance equal to or greater than a certain threshold as branch VM instructions.
  • the branch VM instruction analysis unit 1224 analyzes the addressing method of the branch destination of the branch VM instruction determined by the branch VM instruction determination unit 1223 and the operands of the branch VM instruction based on the analysis results of the VM of the script engine and the analysis results of the instruction set architecture.
  • the branch VM instruction analysis unit 1224 analyzes whether the addressing method of the branch destination of the branch VM instruction is one of the three methods of immediate addressing, direct addressing, or relative addressing, and determines the operand of the branch destination of the branch VM instruction being analyzed.
  • the symbol table VM instruction determination unit 1225 determines that, among the VM instructions collected by the VM instruction collection unit 1222, a VM instruction that accessed the symbol table memory area during VM execution is a symbol table VM instruction.
  • the symbol table VM instruction determination unit 1225 determines that, among the VM instructions that accessed the symbol table memory area, a VM instruction that commands a read is a read symbol table VM instruction.
  • the symbol table VM instruction determination unit 1225 determines that, among the VM instructions that accessed the symbol table memory area, a VM instruction that commands a write is a write symbol table VM instruction.
  • the symbol table VM instruction determination unit 1225 Based on the determination result for each VM instruction, the symbol table VM instruction determination unit 1225 outputs a list indicating the VM instructions determined to be symbol table VM instructions and their types. The symbol table VM instruction determination unit 1225 obtains a list of opcodes of symbol table VM instructions, which is the third piece of information related to the symbol table.
  • the symbol table VM instruction analysis unit 1226 obtains the fourth piece of information related to the symbol table, that is, information about the operand (target of the instruction) of the symbol table VM instruction.
  • the symbol table VM instruction analyzer 1226 retrieves and analyzes the VM execution trace stored in the VM execution trace DB 133. For a combination of two symbol table VM instructions retrieved from the VM execution trace, if the difference between the parts accessed during the execution of each symbol table VM instruction of the combination in the memory access trace exists at the position of the operand, the symbol table VM instruction analyzer 1226 determines that the difference is the operand of the symbol table VM instruction. The operand of the VM instruction can be found from the VPC and opcode of the VM instruction.
  • Operands are generally located next to the opcode in memory, i.e., between the current opcode and the next opcode, such as "[opcode A][operand 1][operand 2][opcode B]", so if the VPC and opcode are known, the position of the operand can be found.
  • the symbol table VM command analysis unit 1226 determines the operand that specifies the symbol table by using the memory access trace by the fourth test script that accesses variables in two different scopes and the VM execution trace.
  • the symbol table VM instruction analyzer 1226 extracts, for a first combination of two symbol table VM instructions extracted from the VM execution trace by the fourth test script, the parts accessed during execution of each symbol table VM instruction of the first combination from the memory access trace by the fourth test script. If the difference between the two extracted parts exists at the position of the operand, the symbol table VM instruction analyzer 1226 determines that the difference is an operand that specifies a symbol table.
  • the symbol table VM command analysis unit 1226 uses the memory access trace and the VM execution trace from the fifth test script that accesses two different variables in the same scope to determine the operand that specifies the variable.
  • the symbol table VM instruction analyzer 1226 For a second combination of two symbol table VM instructions extracted from the VM execution trace by the fifth test script, the symbol table VM instruction analyzer 1226 extracts from the memory access trace by the fifth test script the parts accessed during execution of each symbol table VM instruction of the second combination. If the difference between the two extracted parts exists at the position of the operand, the symbol table VM instruction analyzer 1226 determines that the difference is an operand that specifies a variable.
  • the symbol table VM command analysis unit 1226 outputs information on operands that specify a symbol table and information on operands that specify a variable.
  • the instrumentation unit 123 provides the script engine with dynamic bytecode instrumentation functionality.
  • the instrumentation unit 123 accepts as input a porting script, a script to be analyzed, and hook settings.
  • the instrumentation unit 123 then ports the bytecode and symbol table for instrumentation extracted by executing the porting script to the bytecode and symbol table generated by the script engine in its memory space based on the script to be analyzed, and sets it to branch to a stub every time a process that meets the hook condition is executed, and executes the script to be analyzed.
  • the instrumentation unit 123 has an extraction unit 1231, a porting unit 1232, and an execution unit 1233.
  • the extraction unit 1231 extracts bytecode and symbol tables for instrumentation by executing a porting script based on information about the architecture obtained by analysis by the virtual machine analysis unit 121 and information about the instruction set architecture obtained by analysis by the instruction set architecture analysis unit 122.
  • the extraction unit 1231 executes the porting script while monitoring the writing and execution of the code cache detected by the code cache detection unit 1215, and extracts the data written to the code cache as bytecode for instrumentation.
  • the extraction unit 1231 extracts the entry point of the subroutine in the bytecode.
  • the extraction unit 1231 executes the porting script while monitoring the writing of the symbol table detected by the symbol table detection unit 1216, and extracts the data written to the symbol table as a symbol table.
  • the transplantation unit 1232 transplants the instrumentation bytecode and symbol table extracted by executing the transplantation script into the bytecode and symbol table generated by the script engine in the memory space of the script engine based on the script to be analyzed.
  • the transplantation unit 1232 receives as input the bytecode and symbol table extracted by the execution of the script to be analyzed.
  • the transplantation unit 1232 sets a branch to a stub that executes the extracted bytecode. In this way, the transplantation unit 1232 performs bytecode transplantation.
  • the transplantation unit 1232 When the analysis target script is executed, the transplantation unit 1232 adds a reference to the extracted symbol table to the symbol table of the analysis target script to ensure consistency. In this way, the transplantation unit 1232 performs symbol table transplantation.
  • the porting unit 1232 When executing a VM command, the porting unit 1232 temporarily stops the execution of the script to be analyzed and maps (expands) the extracted bytecode into memory space. If there is a branch using immediate/direct addressing, the porting unit 1232 updates the address of the operand to match the address of the mapping destination. During porting, the address of the bytecode to be ported that is mapped onto memory is generally different from the address of the porting source. Therefore, to ensure consistency between the two, the porting unit 1232 updates the address of the operand to match the mapping destination of the bytecode.
  • the porting unit 1232 adds instrumentation processing to the stub by porting bytecode, and sets it up so that it branches to the stub every time a process that satisfies the hook condition is executed.
  • the porting unit 1232 maps the extracted symbol table into memory space, adds a reference to the extracted symbol table to the symbol table of the script to be analyzed, and then ensures consistency. This is because, during porting, the symbol table for porting mapped onto memory generally differs from the original symbol table. Next, if there is a symbol table VM command, the porting unit 1232 updates the operand to match the consistent symbol table.
  • the execution unit 1233 executes the analysis target script that has been ported by the porting unit 1232 in a script engine. In other words, the execution unit 1233 executes the analysis target script that is set so that execution transitions to a stub every time a process that meets the hook condition described in the hook setting is executed. The execution unit 1233 outputs the execution result to the output unit 14.
  • the output unit 14 is, for example, a liquid crystal display or a printer, and outputs various information including information related to the analysis function imparting device 10.
  • the output unit 14 may also be an interface that handles the input and output of various data between an external device, and may output various information to an external device.
  • the VM instruction boundary detection unit 1212 detects the boundaries of each VM instruction. At this time, the VM instruction boundary detection unit 1212 detects VM instructions and their boundaries for threaded code type VMs, which do not have an interpreter loop and therefore make it difficult to grasp the boundaries of VM instructions. Specifically, the VM instruction boundary detection unit 1212 extracts execution traces from the execution trace DB 131. Then, as shown in FIG. 17, the VM instruction boundary detection unit 1212 clusters the execution traces using a predetermined method, and detects clusters with a threshold or more of execution counts as VM instructions (e.g., VM instruction handlers 1 to 3). The VM instruction boundary detection unit 1212 detects the start and end points of the consecutive instruction strings that make up a VM instruction as boundaries.
  • VM instructions e.g., VM instruction handlers 1 to 3
  • the virtual program counter detection unit 1213 detects the VPC and the pointer cache. The detection of the virtual program counter is realized by analyzing the memory access trace log of the acquired execution trace. The virtual program counter detection unit 1213 uses differential execution analysis focusing on the number of times memory is read.
  • FIG. 18 is a diagram for explaining the processing of the virtual program counter detection unit 1213.
  • the virtual program counter detection unit 1213 extracts one execution trace by the first test script from the execution trace DB 131.
  • the number of times the VPC is read is proportional to the number of repetitions in the test script and the number of statements in the repetitive process. If the number of repetitions is N and the number of repeated statements is M, then approximately MN VPC reads will occur. For this reason, the virtual program counter detection unit 1213 extracts memory that has increased by 4MN and 9MN in the execution trace for the first test script in which N and M have been increased to 2N and 2M, respectively, and 3N and 3M. Specifically, as shown in FIG. 18, the virtual program counter detection unit 1213 extracts memory areas that have a monotonically increasing read/write for each VM instruction execution ((1) in FIG. 18).
  • the virtual program counter detection unit 1213 detects as a VPC a memory value that has been read and that always points to the start point of a VM instruction. Specifically, the virtual program counter detection unit 1213 compares the VPC's pointing destination with the address of the VM instruction handler, and narrows it down to matching memory areas ((2) in FIG. 18).
  • the dispatcher detection unit 1214 detects a dispatcher by analyzing the binary of the script engine using a predetermined method.
  • FIG. 19 is a diagram for explaining the process of the dispatcher detection unit 1214.
  • the dispatcher detection unit 1214 detects dispatchers. Based on the boundaries of VM instructions detected by the VM instruction boundary detection unit 1212, the dispatcher detection unit 1214 extracts each VM instruction portion from the script engine binary. Then, based on the assumption that the similarity of dispatcher code is high ((1) in FIG. 19), the dispatcher detection unit 1214 calculates the similarity between the codes of each VM instruction and detects the portion with high similarity between all VM instructions as a dispatcher. The dispatcher detection unit 1214 can detect the code that is commonly executed in the latter half of the VM instructions as a dispatcher ((1) in FIG. 19).
  • the code cache detection unit 1215 detects the code location that called the memory allocation function that allocated this code cache from the execution trace ((2) in FIG. 20). The code cache detection unit 1215 detects all memory areas allocated at this code location from the VM execution trace as code caches ((3) in FIG. 20).
  • the code cache detection unit 1215 detects the code location that is writing to the code cache from the execution trace ((4) in FIG. 20). The code cache detection unit 1215 detects the writing by this code location from the VM execution trace as an update to the code cache ((5) in FIG. 20).
  • the branch VM instruction determination unit 1223 first determines a branch VM instruction by analyzing the acquired VM execution trace log.
  • the test script (third test script) here is only required to include a branch VM instruction, and may be any script including a branch control syntax, not limited to the example shown in FIG. 7.
  • the test script is prepared by collecting information from the Internet or obtaining information from official documents.
  • the branch VM instruction determination unit 1223 associates a pointer to a VM instruction with a VM instruction for each VM execution trace in the VM execution trace DB 133, and virtually assigns a VM opcode as an identifier to each.
  • Figure 21 is a diagram explaining the processing of the branch VM instruction determination unit 1223.
  • a VM instruction is a branch instruction
  • the advancement of the VPC changes depending on the branch destination.
  • the advancement of the VPC changes depending on the size of the VM instruction. For this reason, when pairs of VM instruction opcodes and pointers to VM instructions are collected and the advancement of the VPC is examined for each opcode, if it is a branch instruction, the advancement of the VPC will vary depending on the branch destination.
  • the branch VM instruction determination unit 1223 therefore uses variance to evaluate the variance of the pointer to this VM instruction.
  • the branch VM instruction determination unit 1223 calculates the variance of the VPC change amount for each VM opcode, and narrows it down to only VM opcodes whose calculated variance is greater than a threshold. In this way, the branch VM instruction determination unit 1223 associates the pointer with the VM instruction, and determines that a VM instruction with variance in the advance of the VPC (VM instruction handler 3 in the example of FIG. 21) is a branch VM instruction ((1) in FIG. 21).
  • the threshold value is set to a value that can divide the two groups that result by plotting the obtained variance value on a number line, for example.
  • Branch VM instruction analysis unit 1224 analyzes whether the addressing method of the branch destination of the branch VM instruction is one of the three methods of immediate addressing, direct addressing, or relative addressing, and determines the operand of the branch destination of the branch VM instruction to be analyzed.
  • the immediate addressing method is a method in which the address of the branch destination is directly specified. With the immediate addressing method, the VPC after the branch becomes this specified value. Figures 22 to 24 explain the processing of the branch VM instruction analysis unit 1224.
  • the branch VM instruction analysis unit 1224 then assigns a taint tag T1 (see FIG. 22) to the code cache of the branch VM instruction being analyzed, and executes the branch VM instruction while propagating the taint tag in accordance with the movement of data. Execution of the branch VM instruction changes the VPC, causing a transition in execution ((1) in FIG. 22).
  • the branch VM instruction analysis unit 1224 determines that the addressing method of the branch destination of this branch VM instruction is the immediate addressing method. In other words, if taint tag T1 is propagated to the VPC by execution of the branch VM instruction, the branch VM instruction analysis unit 1224 determines that the addressing method of the branch destination of the branch VM instruction is the immediate addressing method ((2) in FIG. 22). The branch VM instruction analysis unit 1224 then determines that the original data portion of the data moved to the VPC with taint tag T1 added is the operand of the branch destination of the branch VM instruction being analyzed.
  • the direct addressing method is a method of specifying the memory area in which the branch destination address is stored. With direct addressing, the address stored at the destination of the specified address becomes the value of the VPC.
  • the branch VM instruction analysis unit 1224 therefore assigns a first tag T21 (see FIG. 23) to the code cache of the branch VM instruction being analyzed, and executes the branch VM instruction while propagating the first tag T21 in accordance with the movement of data. If data with the first tag T21 is referenced by a pointer during execution of the branch VM instruction being analyzed, the branch VM instruction analysis unit 1224 assigns a second tag T22 to the referenced data ((1) in FIG. 23).
  • the branch VM instruction analysis unit 1224 determines that the addressing method for the branch destination of this branch VM instruction is the direct addressing method ((2) in FIG. 23). Then, for the first tag T21 that triggered the assignment of the second tag T22 to the data moved to the VPC, the branch VM instruction analysis unit 1224 determines that the original data portion that assigned the tag is the operand of the branch destination.
  • the relative addressing method specifies an offset from the current VPC. Therefore, with the relative addressing method, the value of the VPC after branching is the current VPC value plus this offset.
  • the branch VM instruction analysis unit 1224 then assigns a first tag T31 (see FIG. 24) to the VPC of the branch VM instruction being analyzed, and assigns a second tag T32 (see FIG. 24) to the code cache of the branch VM instruction being analyzed.
  • the branch VM instruction analysis unit 1224 then executes the branch VM instruction being analyzed while propagating the tags (first tag T31 and second tag T32) in accordance with the movement of data.
  • the branch VM instruction analysis unit 1224 determines that the addressing method of the branch destination of the branch VM instruction to be analyzed is the relative addressing method ((1) in FIG. 24). Then, the branch VM instruction analysis unit 1224 determines that the original data portion of the second tag added to the first tag is the offset operand of the branch destination.
  • Extraction section 25 and 26 are diagrams for explaining the processing of the extraction unit.
  • the extraction unit 1231 executes a porting script while monitoring writing and execution of the code cache detected by the code cache detection unit 1215, and extracts the data written in the code cache as bytecode for instrumentation.
  • the extraction unit 1231 calls the function to be ported by the call unit 204 of the function to be ported 207 in the porting script 201, thereby causing the script engine to generate bytecode 225, for example ((1) in FIG. 25).
  • the extraction unit 1231 monitors the writing of the bytecode 225 to the code cache by the script engine ((2) in FIG. 25) and the execution of the VM command ((4) in FIG. 25) of the function call by the script engine ((3) in FIG. 25), and extracts the entry point of the subroutine in the bytecode ((4) in FIG. 25).
  • the extraction unit 1231 executes the porting script 201 while monitoring the writing of the symbol table detected by the symbol table detection unit 1216, and extracts the data written to the symbol table as a symbol table.
  • the script engine generates, for example, the symbol table 210 by executing the function 202 to be ported ((1) in FIG. 26).
  • the extraction unit 1231 monitors the code portion where the script engine secures memory space for the symbol table 210 and the writing by the script engine with the symbol table VM command ((2) in FIG. 26), and extracts the symbol table ((3) in FIG. 26).
  • Fig. 27 is a flowchart showing the procedure of the analysis process according to the embodiment.
  • the input unit 11 receives a test script and a script engine binary as input (step S1).
  • the test script includes the first to fifth test scripts.
  • the execution trace acquisition unit 1211 performs an execution trace acquisition process in which the test script is executed while monitoring the binary of the script engine to acquire branch traces and memory access traces (step S2).
  • the VM instruction boundary detection unit 1212 detects VM instructions and performs VM instruction boundary detection processing to detect VM instruction boundaries (step S3).
  • the virtual program counter detection unit 1213 extracts and analyzes the execution trace for the first test script stored in the execution trace DB 131, and performs virtual program counter detection processing to discover the VPC (step S4).
  • the dispatcher detection unit 1214 performs dispatcher detection processing to extract each VM command portion from the script engine binary and detect the portion with high similarity between each VM command as a dispatcher (step S5).
  • the code cache detection unit 1215 performs a code cache detection process based on the execution trace and VPC to detect the area of the code location from which the memory allocation function was called as a code cache, and to detect the area in which writing is being done to the code location area as an update to the code cache (step S6).
  • the symbol table detection unit 1216 performs a symbol table detection process to detect architecture information of the symbol table using the second test script and the memory access trace by the second test script (step S7).
  • the VM execution trace acquisition unit 1221 receives the test script and the script engine binary as input, and executes the test script while monitoring the execution of the script engine binary, thereby performing a VM execution trace acquisition process to acquire a VM execution trace (step S8).
  • the VM instruction collection unit 1222 performs a VM instruction collection process to collect VM instructions from the VM execution trace (step S9).
  • the branch VM instruction determination unit 1223 performs a branch VM instruction determination process to determine branch VM instructions from among the VM instructions collected by the VM instruction collection unit 1222 (step S10).
  • the branch VM instruction analysis unit 1224 performs a branch VM instruction analysis process to analyze the address specification method of the branch destination of the branch VM instruction being analyzed and the operands of the branch VM instruction (step S11).
  • the symbol table VM instruction determination unit 1225 performs a symbol table VM instruction determination process to determine that a VM instruction that accessed the symbol table memory area during VM execution is a symbol table VM instruction, among the VM instructions collected by the VM instruction collection unit 1222 (step S12).
  • the symbol table VM instruction analysis unit 1226 performs a symbol table VM instruction analysis process to revise the operands of the symbol table VM instruction to be analyzed and determine the operands that specify the symbol table and the operands that specify the variables (step S13).
  • the extraction unit 1231 performs an extraction process to extract bytecode and symbol tables for instrumentation by executing a porting script based on the architecture information and instruction set architecture information acquired by the virtual machine analysis unit 121 and the instruction set architecture analysis unit 122 (step S14).
  • the transplantation unit 1232 performs a transplantation process in which the instrumentation bytecode and symbol table extracted by executing the transplantation script are transplanted into the bytecode and symbol table generated by the script engine in the memory space of the script engine based on the script to be analyzed (step S15).
  • the execution unit 1233 performs an execution process to execute the analysis target script for which the porting process has been performed (step S16).
  • the output unit 14 outputs the information obtained by the instrumentation (step S17).
  • Fig. 28 is a flowchart showing the processing procedure of the execution trace acquisition process shown in Fig. 27.
  • the execution trace acquisition unit 1211 receives a test script and a script engine binary as input (step S21). Then, the execution trace acquisition unit 1211 hooks the received script engine to acquire a branch trace (step S22). The execution trace acquisition unit 1211 also hooks the received script engine to acquire a memory access trace (step S23).
  • the execution trace acquisition unit 1211 inputs the test script received in this state into the script engine for execution (step S24), and stores the execution trace acquired thereby in the execution trace DB 131 (step S25).
  • the execution trace acquisition unit 1211 determines whether or not all of the input test scripts have been executed (step S26). If all of the input test scripts have been executed (step S26: Yes), the execution trace acquisition unit 1211 ends the process. On the other hand, if all of the input test scripts have not been executed (step S26: No), the execution trace acquisition unit 1211 returns to the execution of the first test script in step S24 and continues the process.
  • Fig. 29 is a flowchart showing the processing procedure of the VM instruction boundary detection process shown in Fig. 27.
  • the VM instruction boundary detection unit 1212 extracts execution traces from the execution trace DB 131 (step S31).
  • the VM instruction boundary detection unit 1212 clusters the execution traces using a predetermined method (step S32). Any method may be used for clustering.
  • the VM instruction boundary detection unit 1212 detects clusters whose execution count is equal to or exceeds a threshold as VM instructions (step S33). Then, the VM instruction boundary detection unit 1212 determines the start and end points of the continuous instruction sequence that constitutes the VM instruction as boundaries (step S34). The VM instruction boundary detection unit 1212 outputs the VM instruction boundary as a return value (step S35), and ends the VM instruction boundary detection process.
  • FIG. 30 is a flowchart showing the processing procedure of the virtual program counter detection process shown in Fig. 27.
  • the virtual program counter detection unit 1213 extracts one execution trace by the test script from the execution trace DB 131 (step S41). Next, the virtual program counter detection unit 1213 focuses on memory access traces among the execution traces, and counts up the number of reads for each memory read destination (step S42).
  • the virtual program counter detection unit 1213 receives as input the test script used to obtain the execution trace (step S43), and analyzes the first test script to obtain the number of repetitions and the number of repeated statements (step S44).
  • the virtual program counter detection unit 1213 extracts from the execution trace DB 131 another execution trace by the first test script, which has a different number of repetitions and number of repeated statements (step S45). Then, the virtual program counter detection unit 1213 focuses on the memory access trace and counts the number of reads for each memory read destination (step S46). The virtual program counter detection unit 1213 also receives as input the test script used to obtain the execution trace (step S47), and analyzes the first test script to obtain the number of repetitions and the number of repeated statements (step S48).
  • the virtual program counter detection unit 1213 narrows down the memory read destinations to only those whose read counts change in proportion to the number of repetitions or the increase or decrease in the number of repeated statements (step S49). Furthermore, the virtual program counter detection unit 1213 narrows down the memory read destinations narrowed down in step S49 to those whose read memory values always point to the start point of the VM instruction (step S50).
  • the virtual program counter detection unit 1213 determines whether the memory read destinations have been narrowed down to only one (step S51). If the virtual program counter detection unit 1213 has not narrowed down the memory read destinations to only one (step S51: No), the process returns to step S45, where the virtual program counter detection unit 1213 retrieves the next execution trace and continues processing. On the other hand, if the virtual program counter detection unit 1213 has narrowed down the memory read destinations to only one (step S51: Yes), the virtual program counter detection unit 1213 stores the narrowed down memory read destination in the architecture information DB 132 as a virtual program counter (step S52), and ends processing.
  • Fig. 31 is a flowchart showing the processing procedure of the dispatcher detection process shown in Fig. 27.
  • the dispatcher detection unit 1214 receives the script engine binary as input (step S61).
  • the dispatcher detection unit 1214 receives the boundaries of VM commands from the VM command boundary detection unit 1212 (step S62).
  • the dispatcher detection unit 1214 extracts each VM instruction portion from the script engine binary based on the boundaries of the VM instructions received from the VM instruction boundary detection unit 1212 (step S63).
  • the dispatcher detection unit 1214 calculates the similarity between the codes of each VM instruction using a predetermined method (step S64). Any method for calculating the similarity may be used as long as it is a method that can calculate the similarity between codes.
  • the dispatcher detection unit 1214 extracts the part with high similarity among all VM commands based on the similarity calculated in step S64 (step S65). The dispatcher detection unit 1214 then determines whether it is the end part of the VM command (step S66).
  • step S66: No If it is not the end of the VM command (step S66: No), the dispatcher detection unit 1214 returns to step S65 and continues processing. If it is the end of the VM command (step S66: Yes), the dispatcher detection unit 1214 outputs the extracted part as a dispatcher (step S67) and ends processing.
  • Fig. 32 is a flowchart showing the processing procedure of the code cache detection process shown in Fig. 27.
  • the code cache detection unit 1215 When the code cache detection unit 1215 receives an execution trace and a VM execution trace as input (step S71), it acquires the memory area pointed to by the VPC from the VM execution trace (step S72). The VM execution trace is acquired by the VM execution trace acquisition unit 1221.
  • the code cache detection unit 1215 obtains from the execution trace the code location of the caller of the memory allocation function that allocated the memory area obtained in step S72 (step S73).
  • the code cache detection unit 1215 detects, from the VM execution trace, all areas allocated at the code location obtained in step S73 as code caches (step S74).
  • the code cache detection unit 1215 acquires the code location that is writing to the code cache from the execution trace (step S75). The code cache detection unit 1215 detects all areas in the VM execution trace that are written to at the code location acquired in step S75 as code cache updates (step S76). The code cache detection unit 1215 returns the detected code cache and its updated location (step S77), and ends the code cache detection process.
  • Fig. 33 is a flowchart showing the processing procedure of the symbol table detection process shown in Fig. 27.
  • the symbol table detection unit 1216 receives as input the second test script and a memory access trace by the second test script (step S81). The symbol table detection unit 1216 extracts characteristic values used in the second test script (step S82).
  • the symbol table detection unit 1216 detects the storage location in memory of the characteristic value extracted in step S82 by matching the characteristic value from the memory access trace received in step S81 (step S83).
  • the symbol table detection unit 1216 detects structures that reference the characteristic values detected in step S83 from the memory access trace (step S84). Since it is possible to determine which pointers reference which areas from the memory access trace, the symbol table detection unit 1216 can detect what kind of structure is present by connecting the reference relationships between the values in the memory access trace. The symbol table detection unit 1216 may also detect structures by employing existing methods for analyzing the structure of structures.
  • the symbol table detection unit 1216 detects the memory area where references to multiple values are grouped together as a location in the symbol table memory area (step S85).
  • the symbol table detection unit 1216 extracts function calls for memory allocation from the API trace (step S86). By detecting the code location that made the "function call for memory allocation" as the "code that allocates memory for the symbol table,” and by monitoring this code location and recording the address of the allocated memory, the location of the symbol table in memory can be determined each time a new symbol table is created.
  • the symbol table detection unit 1216 detects from the API trace the code that reserves a memory area containing a symbol table, i.e., the code portion that reserves the memory area for the symbol table (step S87).
  • the symbol table detection unit 1216 makes it possible to identify the position of the symbol table in the memory area and the code portion that reserves that area, and outputs this as a symbol table (step S88).
  • Fig. 34 is a flowchart showing the procedure of the VM execution trace acquisition process shown in Fig. 27.
  • the VM execution trace acquisition unit 1221 receives a test script and a script engine binary as input (step S91). Then, the VM execution trace acquisition unit 1221 applies a hook to the received script engine to record the VPC and VM opcode (step S92).
  • the VM execution trace acquisition unit 1221 inputs the received test script in this state into the script engine for execution (step S93), and stores the VM execution trace acquired thereby in the VM execution trace DB 133 (step S94).
  • the VM execution trace acquisition unit 1221 determines whether or not all of the input test scripts have been executed (step S95). If all of the input second test scripts have been executed (step S95: Yes), the VM execution trace acquisition unit 1221 ends the process. If all of the input second test scripts have not been executed (step S95: No), the VM execution trace acquisition unit 1221 returns to the execution of the test scripts in step S93 and continues the process.
  • Fig. 35 is a flowchart showing the procedure of the VM command collection process shown in Fig. 27.
  • the VM command collection unit 1222 receives the VPC and dispatcher as input (step S101) and acquires various scripts from the Internet (step S102).
  • the VM command collection unit 1222 executes the scripts while monitoring the VPC and dispatcher, and acquires a VM execution trace (step S103).
  • the VM instruction collection unit 1222 acquires VM instructions from the VM execution trace (step S104) and adds them to a list of VM instructions (step S105). If the VM instruction collection unit 1222 finds a VM instruction that is not in the list (step S106: No), it returns to step S102. If the VM instruction collection unit 1222 finds no VM instructions that are not in the list (step S106: Yes), it returns the list of VM instructions (step S107) and ends the VM instruction collection process.
  • Fig. 36 is a flowchart showing the processing procedure of the branch VM instruction determination process shown in Fig. 27.
  • the branch VM instruction determination unit 1223 extracts one VM execution trace for the third test script from the VM execution trace DB 133 (step S111).
  • the branch VM instruction determination unit 1223 links a pointer to the VM instruction with the VM instruction, and assigns a VM opcode to each as an identifier (step S112). Then, the branch VM instruction determination unit 1223 counts the amount of change in VPC before and after execution for each VM opcode (step S113).
  • the branch VM instruction determination unit 1223 determines whether or not the VM execution traces for all the third test scripts in the VM execution trace DB 133 have been processed (step S114). If all the VM execution traces in the VM execution trace DB 133 have not been processed (step S114: No), the branch VM instruction determination unit 1223 returns to step S111 and extracts and processes one VM execution trace for the next third test script.
  • the branch VM instruction determination unit 1223 calculates the variance of the amount of change in VPC for each VM opcode (step S115). Then, the branch VM instruction determination unit 1223 receives a threshold value as an input (step S116). The branch VM instruction determination unit 1223 narrows down the VM opcodes to only those whose variance is greater than the threshold value (step S117), stores them as a list of opcodes of branch VM instructions in the architecture information DB 132 (step S118), and ends the process.
  • Fig. 37 is a flowchart showing the processing procedure of the branch VM instruction analysis process shown in Fig. 27.
  • the branch VM instruction analysis unit 1224 performs an immediate addressing method analysis process to analyze whether the addressing method of the branch destination of the branch VM instruction is the immediate addressing method (step S121).
  • the branch VM instruction analysis unit 1224 performs a direct addressing method analysis process to analyze whether the addressing method of the branch destination of the branch VM instruction is a direct addressing method (step S122).
  • the branch VM instruction analysis unit 1224 performs a relative addressing method analysis process to analyze whether the addressing method of the branch destination of the branch VM instruction is a relative addressing method (step S123).
  • the branch VM instruction analysis unit 1224 outputs the analysis results, ie, the address specification method for the branch destination of the branch VM instruction and the operands of the branch VM instruction (step S124).
  • FIG. 38 is a flowchart showing the processing procedure of the immediate addressing method analysis process shown in Fig. 37.
  • the branch VM instruction analysis unit 1224 receives the VPC and the code cache as input (step S131).
  • the branch VM instruction analysis unit 1224 receives the script acquired in the VM instruction collection process as input (step S132).
  • the branch VM instruction analysis unit 1224 executes the script (step S133).
  • the branch VM instruction analysis unit 1224 stops execution when the bytecode is written to the code cache (step S134).
  • the branch VM instruction analysis unit 1224 assigns a taint tag to the bytecode in the code cache of the branch VM instruction to be analyzed (step S135).
  • branch VM instruction analysis unit 1224 resumes execution of the branch VM instruction being analyzed while propagating the taint tag in accordance with the data movement (step S136).
  • the branch VM instruction analysis unit 1224 determines whether or not there is tagged data that has been moved to the VPC during execution of the branch VM instruction being analyzed (step S137).
  • step S137 If there is tagged data that has been moved to the VPC during execution of the branch VM instruction (step S137: Yes), the branch VM instruction analysis unit 1224 determines that the addressing method for the branch destination of the branch VM instruction being analyzed is the immediate addressing method (step S138).
  • the branch VM instruction analysis unit 1224 determines that the original data portion to which the taint tag has been added and which has been moved to the VPC is the branch destination operand of the branch VM instruction being analyzed (step S139).
  • step S137 determines that the addressing method for the branch destination of the branch VM instruction being analyzed is not the immediate addressing method (step S140).
  • the branch VM instruction analysis unit 1224 outputs the addressing method of the branch VM instruction resulting from the analysis and the operands of the branch VM instruction (step S141).
  • FIG. 39 is a flowchart showing the processing procedure of the direct addressing method analysis process shown in Fig. 37.
  • the branch VM instruction analysis unit 1224 receives the VPC and the code cache as input (step S151).
  • the branch VM instruction analysis unit 1224 receives the script acquired in the VM instruction collection process as input (step S152).
  • the branch VM instruction analysis unit 1224 executes the script (step S153).
  • the branch VM instruction analysis unit 1224 stops execution when the bytecode is written to the code cache (step S154).
  • the branch VM instruction analysis unit 1224 assigns a first tag to the bytecode in the code cache of the branch VM instruction to be analyzed (step S155).
  • the branch VM instruction analysis unit 1224 resumes execution of the branch VM instruction being analyzed while propagating the tag in accordance with the movement of data (step S156).
  • the branch VM instruction analysis unit 1224 determines whether the first tagged data is referenced by a pointer when the branch VM instruction to be analyzed is executed (step S157).
  • step S157 If the data with the first tag is referenced by a pointer (step S157: Yes), the branch VM instruction analysis unit 1224 assigns a second tag to the referenced data (step S158).
  • step S157 If the first tagged data is not pointer referenced (step S157: No), or after processing of step S158, the branch VM instruction analysis unit 1224 determines whether there is second tagged data that has been moved to the VPC during execution of the branch VM instruction being analyzed (step S159).
  • step S159 If there is second tagged data that was moved to the VPC during execution of the branch VM instruction being analyzed (step S159: Yes), the branch VM instruction analysis unit 1224 determines that the addressing method for the branch destination of this branch VM instruction is the direct addressing method (step S160).
  • step S159 If there is no second tagged data moved to the VPC during execution of the branch VM instruction (step S159: No), the branch VM instruction analysis unit 1224 determines that the addressing method for the branch destination of this branch VM instruction is not the direct addressing method (step S162).
  • the branch VM instruction analysis unit 1224 outputs the addressing method of the branch VM instruction resulting from the analysis and the operands of the branch VM instruction (step S163).
  • Fig. 40 is a flowchart showing the processing procedure of the relative addressing method analysis process shown in Fig. 37.
  • the branch VM instruction analysis unit 1224 receives the VPC and the code cache as input (step S171).
  • the branch VM instruction analysis unit 1224 receives the script acquired in the VM instruction collection process as input (step S172).
  • the branch VM instruction analysis unit 1224 executes the script (step S173).
  • the branch VM instruction analysis unit 1224 stops execution when the bytecode is written to the code cache (step S174).
  • the branch VM instruction analysis unit 1224 assigns a first tag to the VPC of the branch VM instruction to be analyzed (step S175).
  • the branch VM instruction analysis unit 1224 assigns a second tag to the bytecode in the code cache of the branch VM instruction being analyzed (step S176).
  • the branch VM instruction analysis unit 1224 determines whether the first tag and the second tag are added and moved to the VPC during execution of the branch VM instruction to be analyzed (step S178).
  • step S178 If the first tag and the second tag are added and moved to the VPC during execution of the branch VM instruction (step S178: Yes), the branch VM instruction analysis unit 1224 determines that the addressing method for the branch destination of the branch VM instruction being analyzed is the relative addressing method (step S179).
  • the branch VM instruction analysis unit 1224 determines that the original data portion of the second tag added to the first tag is the offset operand of the branch destination (step S180).
  • step S178 If the first tag and the second tag are not added and moved to the VPC during execution of the branch VM instruction (step S178: No), the branch VM instruction analysis unit 1224 determines that the addressing method for the branch destination of this branch VM instruction is not a relative addressing method (step S181).
  • the branch VM instruction analysis unit 1224 outputs the addressing method of the branch VM instruction resulting from the analysis and the operands of the branch VM instruction (step S182).
  • Fig. 41 is a flowchart showing the processing procedure of the symbol table VM command determination process shown in Fig. 27.
  • the symbol table VM instruction determination unit 1225 receives as input the VM execution trace and memory access trace from the VM execution trace DB 133 (step S191).
  • the symbol table VM instruction determination unit 1225 receives as input the position in the memory area of the symbol table detected in the symbol table detection process (step S192).
  • the symbol table VM instruction determination unit 1225 extracts one VM instruction from the VM execution trace (step S193).
  • the VM instruction consists of a VPC value and a VM opcode value.
  • the symbol table VM instruction determination unit 1225 checks the memory area accessed during execution of the extracted VM instruction (step S194) and determines whether or not the memory area of the symbol table was accessed (step S195).
  • step S195 If the symbol table memory area is not accessed (step S195: No), the symbol table VM command determination unit 1225 determines that the VM command extracted in step S193 is not a symbol table VM command (step S196).
  • step S195 If the symbol table memory area is accessed (step S195: Yes), the symbol table VM command determination unit 1225 determines whether the VM command extracted in step S193 was a read command (step S197).
  • step S197 If it is a read (step S197: Yes), the symbol table VM command determination unit 1225 determines that the VM command extracted in step S193 is a read symbol table VM command (step S198).
  • step S197 determines that the symbol table VM command determination unit 1225 is a write symbol table VM command (step S199).
  • the symbol table VM instruction determination unit 1225 determines whether all VM instructions in the VM execution trace have been confirmed to be symbol table VM instructions (step S200).
  • step S200 If it has not been confirmed whether all VM instructions are symbol table VM instructions (step S200: No), the symbol table VM instruction determination unit 1225 extracts the next VM instruction from the VM execution trace (step S201) and proceeds to step S194 to continue processing.
  • the symbol table VM instruction determination unit 1225 outputs a list of the opcodes of the VM instructions determined to be symbol table VM instructions (step S202) and ends the symbol table VM instruction determination process.
  • Fig. 42 and Fig. 43 are flowcharts showing the processing procedure of the symbol table VM command analysis process shown in Fig. 27.
  • the symbol table VM command analysis unit 1226 receives as input the memory access trace and VM execution trace from the fourth test script (step S211).
  • the symbol table VM instruction analysis unit 1226 extracts a combination of two symbol table VM instructions from the VM execution trace received in step S211 (step S212).
  • the symbol table VM instruction analysis unit 1226 extracts the combination of symbol table VM instructions by referring to the list of VM instructions determined to be symbol table VM instructions output in the symbol table VM instruction determination process.
  • the symbol table VM instruction analysis unit 1226 extracts the parts accessed during the execution of each symbol table VM instruction from the memory access trace received in step S211 (step S213).
  • the symbol table VM command analyzer 1226 compares the parts extracted in step S213 and extracts the differences (step S214).
  • the symbol table VM command analysis unit 1226 determines whether the difference exists at the operand position (step S215).
  • step S215 the symbol table VM instruction analyzer 1226 determines whether or not all symbol table VM instructions of the VM execution trace input received in step S211 have been processed (step S216).
  • step S216 If the symbol table VM command analysis unit 1226 has not processed all the symbol table VM commands in the VM execution trace (step S216: No), it extracts the next combination of symbol table VM commands from the VM execution trace input received in step S211 (step S217). Then, the symbol table VM command analysis unit 1226 proceeds to the processing of step S213.
  • step S215 If the difference exists at the operand position (step S215: Yes), the symbol table VM command analyzer 1226 determines that the difference exists as an operand that specifies the symbol table (step S218).
  • step S216 If not all symbol table VM commands of the VM execution trace received as input in step S211 have been processed (step S216: Yes), or after processing in step S218, the symbol table VM command analysis unit 1226 receives as input the memory access trace and VM execution trace by the fifth test script (step S219).
  • the symbol table VM instruction analysis unit 1226 extracts a combination of two symbol table VM instructions from the VM execution trace received in step S219 (step S220). The symbol table VM instruction analysis unit 1226 extracts the parts accessed during the execution of each symbol table VM instruction from the memory access trace received in step S219 (step S221).
  • the symbol table VM command analysis unit 1226 compares the parts extracted in step S221 and extracts the differences (step S222).
  • the symbol table VM command analysis unit 1226 determines whether the difference exists at the operand position (step S223).
  • step S224 the symbol table VM instruction analyzer 1226 determines whether or not all symbol table VM instructions of the VM execution trace input received in step S219 have been processed (step S224).
  • step S224 If the symbol table VM command analysis unit 1226 has not processed all the symbol table VM commands in the VM execution trace (step S224: No), it extracts the next combination of symbol table VM commands from the VM execution trace input in step S219 (step S225). Then, the symbol table VM command analysis unit 1226 proceeds to the processing of step S221.
  • step S223 If the difference exists at the operand position (step S223: Yes), the symbol table VM command analyzer 1226 determines that the difference exists as an operand that specifies a variable (step S226).
  • step S224 If all symbol table VM commands have been processed (step S224: Yes), or after processing of step S226, the symbol table VM command analysis unit 1226 outputs information on the operands that specify the symbol table and information on the operands that specify the variables based on the determination results of steps S218 and S226 (step S227).
  • Fig. 44 is a flowchart showing the processing procedure of the extraction process shown in Fig. 27.
  • the extraction unit 1231 receives the porting script as input (step S231).
  • the extraction unit 1231 executes the porting script while monitoring the writing and execution of the code cache, and performs a bytecode extraction process to extract the data written to the code cache as bytecode (step S232).
  • the extraction unit 1231 executes the porting script while monitoring the writing of the symbol table detected by the symbol table detection unit 1216, and performs a symbol table extraction process to extract the data written to the symbol table as a symbol table (step S233).
  • the extraction unit 1231 outputs the extracted bytecode and symbol table (step S234).
  • Fig. 45 is a flowchart showing the processing procedure of the bytecode extraction process shown in Fig. 44.
  • the extraction unit 1231 accepts the porting script (step S241).
  • the extraction unit 1231 executes the porting script (step S243) while monitoring the writing and execution of the code cache (step S242).
  • the extraction unit 1231 determines whether or not writing to the code cache has occurred during execution of the porting script (step S244). If writing to the code cache has occurred (step S244: Yes), the extraction unit 1231 extracts the written data as bytecode (step S245).
  • step S244 If no writing has been done to the code cache (step S244: No), or after processing of step S245, the extraction unit 1231 determines whether or not a subroutine in the written bytecode has been called during execution of the porting script (step S246).
  • step S246 If a subroutine in the written bytecode is called (step S246: Yes), the extraction unit 1231 extracts the entry point of this subroutine (step S247).
  • step S246 determines whether execution of the porting script has ended (step S248).
  • step S248 If the execution of the porting script has not ended (step S248: No), the extraction unit 1231 continues the execution of the porting script (step S250) and performs the determination process of step S244.
  • step S248 If execution of the porting script is completed (step S248: Yes), the extraction unit 1231 outputs the bytecode and entry point extracted in steps S245 and S247 (step S249).
  • Fig. 46 is a flowchart showing the processing procedure of the symbol table extraction process shown in Fig. 44.
  • the extraction unit 1231 receives the porting script (step S261).
  • the extraction unit 1231 executes the porting script (step S263) while monitoring the writing of the symbol table (step S262).
  • the extraction unit 1231 determines whether or not writing has been performed to the symbol table during execution of the porting script (step S264). If writing has been performed to the symbol table (step S264: Yes), the extraction unit 1231 extracts the written data as the symbol table (step S265).
  • step S264 If no writing has been done to the code cache (step S264: No), or after processing of step S265, the extraction unit 1231 determines whether execution of the porting script has finished (step S266).
  • step S266 If the execution of the porting script has not ended (step S266: No), the extraction unit 1231 continues the execution of the porting script (step S267) and performs the determination process of step S264.
  • step S266 If execution of the porting script is completed (step S266: Yes), the extraction unit 1231 outputs the symbol table extracted in step S265 (step S268).
  • Fig. 47 is a flowchart showing the processing procedure of the transplantation process shown in Fig. 27.
  • the porting unit 1232 receives the bytecode and symbol table extracted from the porting script in the extraction process (step S271).
  • the porting unit 1232 receives the script to be analyzed as input (step S272).
  • the porting unit 1232 executes the script to be analyzed (step S274) while monitoring the execution of the VM command based on the dispatcher (step S273).
  • the porting unit 1232 stops execution when the first VM command is executed (step S275).
  • the porting unit 1232 performs a bytecode porting process in which the bytecode for instrumentation is ported to the bytecode generated by the script engine in the memory space of the script engine based on the script to be analyzed, and the process is set to branch to a stub every time a process that satisfies the hook condition is executed (step S276).
  • the porting unit 1232 performs a symbol table porting process (step S277) in which the symbol table for instrumentation is ported to the symbol table generated by the script engine in the memory space of the script engine based on the script to be analyzed.
  • Fig. 48 is a flowchart showing the processing procedure of the bytecode porting process shown in Fig. 47.
  • the porting unit 1232 accepts the bytecode extracted by executing the porting script (step S281).
  • the porting unit 1232 accepts the hook condition (step S282).
  • the porting unit 1232 maps the bytecode to the memory space of the script engine (step S283).
  • the porting unit 1232 then scans the bytecode (step S284) and determines whether or not there is a branch using the immediate/direct addressing method (step S285).
  • step S285 If there is no immediate/direct addressing branch (step S285: No), or after processing of step S286, the porting unit 1232 determines whether or not it has scanned up to the return VM instruction at the end of the bytecode (step S287).
  • step S287: No If the return VM command has not been scanned (step S287: No), the transplantation unit 1232 continues scanning (step S288) and proceeds to the determination process of step S285.
  • the porting unit 1232 reserves an area for the stub in the memory space of the script engine (step S289).
  • the porting unit 1232 adds instrumentation processing by bytecode porting to the stub (step S290).
  • the instrumentation processing by bytecode porting is, for example, the processing shown in FIG. 3.
  • the porting unit 1232 sets the program to branch to the stub every time a process that satisfies the hook condition is executed (step S291), and ends the bytecode porting process.
  • Fig. 49 is a flow chart showing the processing procedure of the symbol table transplantation process shown in Fig. 47.
  • the porting unit 1232 accepts the symbol table extracted by executing the porting script (step S301).
  • the transplantation unit 1232 maps the extracted symbol table to the memory space of the script engine (step S302).
  • the transplantation unit 1232 adds a reference to the extracted symbol table to the symbol table of the script to be analyzed (step S303), and ensures consistency between the symbol table of the script to be analyzed and the extracted symbol table (step S304).
  • the porting unit 1232 scans the bytecode (step S305) and determines whether or not a symbol table VM command is present (step S306).
  • step S306 If there is a symbol table VM command (step S306: Yes), the porting unit 1232 updates the operands to match the consistent symbol table (step S307).
  • step S306 If there is no symbol table VM instruction (step S306: No), or after processing of step S307, the porting unit 1232 determines whether or not it has scanned up to the return VM instruction at the end of the bytecode (step S308).
  • step S308 If the return VM command has not been scanned (step S308: No), the transplantation unit 1232 continues scanning (step S309) and proceeds to the determination process of step S306.
  • step S308: Yes If the return VM command is scanned (step S308: Yes), the transplantation unit 1232 ends the symbol table transplantation process.
  • Fig. 50 is a flowchart showing the processing procedure of the execution process shown in Fig. 27.
  • the execution unit 1233 accepts the hook setting as input (step S311) and sets the execution to transition to the stub each time a process that satisfies the hook condition described in the hook setting is executed (step S312).
  • the execution unit 1233 resumes execution of the script to be analyzed (step S313) and determines whether execution of the script to be analyzed has ended (step S314).
  • step S314 If the execution of the script to be analyzed has not ended (step S314: No), the execution unit 1233 continues the execution (step S315) and proceeds to the determination process of step S314.
  • step S314 If the execution of the script to be analyzed is completed (step S314: Yes), the execution unit 1233 outputs the execution result (step S316) and ends the execution process.
  • the virtual machine analysis unit 121 analyzes the VM of the script engine, acquires information on the architecture of the script engine, and acquires architecture information of the symbol table based on the analysis result.
  • the instruction set architecture analysis unit 122 analyzes the instruction set architecture, which is the system of instructions for the VM, collects VM instructions, and analyzes the VM instructions.
  • the analysis function adding device 10 executes a porting script to extract instrumentation bytecode and symbol table, and ports them to the bytecode and symbol table generated by the script engine in the memory space of the script engine based on the script to be analyzed when the script to be analyzed is executed.
  • the analysis function providing device 10 can provide dynamic bytecode instrumentation functionality to script engines whose internal specifications are unknown, even for script engines whose internal specifications are unknown and that do not have support functions such as a debugger, by detecting various architectural information through analysis based on the acquisition of execution traces and VM execution traces. This makes it possible to provide dynamic bytecode instrumentation functionality to script engines whose internal specifications are unknown, without the need for individual manual analysis, design, and implementation.
  • the analysis function providing device 10 also transfers the instrumentation bytecode and symbol table extracted by executing the transfer script to the bytecode and symbol table generated by the script engine in the memory space of the script engine based on the script to be analyzed. This makes it possible to analyze the script executed on the script engine.
  • the analysis function adding device 10 can automatically add dynamic bytecode instrumentation functionality to a variety of script engines as long as a test script is prepared, so dynamic bytecode instrumentation can be realized without the need for individual design or execution. This makes dynamic bytecode instrumentation possible for scripts created in various script languages, enabling fine-grained analysis to be realized.
  • the analysis function adding device 10 of this embodiment analyzes the script engine and adds a function to realize dynamic bytecode instrumentation later, so that dynamic bytecode instrumentation function can be automatically added to script engines of a wide variety of script languages.
  • this embodiment is useful for analyzing scripts written in a wide variety of scripting languages, and is suitable for implementing dynamic bytecode instrumentation even for scripts for which it is difficult to implement dynamic bytecode instrumentation due to the absence of support functions such as a debugger or unknown internal specifications of the VM.
  • analysis function providing device analysis function providing method, and analysis function providing program according to this embodiment to provide various script engines with dynamic bytecode instrumentation functionality, it is possible to perform detailed analysis of script behavior and utilize the results in understanding the behavior of malicious scripts in cybersecurity and for debugging in software development.
  • Each component of the analysis function-imparting device 10 shown in Fig. 4 is a functional concept, and does not necessarily have to be physically configured as shown in the figure.
  • the specific form of distribution and integration of the functions of the analysis function-imparting device 10 is not limited to that shown in the figure, and all or part of it can be functionally or physically distributed or integrated in any unit depending on various loads, usage conditions, etc.
  • each process performed by the analysis function-imparting device 10 may be realized, in whole or in part, by a CPU and a program that is analyzed and executed by the CPU. Furthermore, each process performed by the analysis function-imparting device 10 may be realized as hardware using wired logic.
  • [program] 51 is a diagram showing an example of a computer in which a program is executed to realize the analysis function imparting device 10.
  • the computer 1000 has, for example, a memory 1010 and a CPU 1020.
  • the computer 1000 also has a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These components are connected by a bus 1080.
  • the memory 1010 includes a ROM 1011 and a RAM 1012.
  • the ROM 1011 stores a boot program such as a BIOS (Basic Input Output System).
  • BIOS Basic Input Output System
  • the hard disk drive interface 1030 is connected to a hard disk drive 1090.
  • the disk drive interface 1040 is connected to a disk drive 1100.
  • a removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1100.
  • the serial port interface 1050 is connected to a mouse 1110 and a keyboard 1120, for example.
  • the video adapter 1060 is connected to a display 1130, for example.
  • the hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, the programs that define each process of the analysis function-imparting device 10 are implemented as program modules 1093 in which code executable by the computer 1000 is written.
  • the program modules 1093 are stored, for example, in the hard disk drive 1090.
  • a program module 1093 for executing processes similar to the functional configuration of the analysis function-imparting device 10 is stored in the hard disk drive 1090.
  • the hard disk drive 1090 may be replaced by an SSD (Solid State Drive).
  • the setting data used in the processing of the above-mentioned embodiment is stored as program data 1094, for example, in memory 1010 or hard disk drive 1090.
  • the CPU 1020 reads the program module 1093 or program data 1094 stored in memory 1010 or hard disk drive 1090 into RAM 1012 as necessary and executes it.
  • the program module 1093 and program data 1094 may not necessarily be stored in the hard disk drive 1090, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1100 or the like.
  • the program module 1093 and program data 1094 may be stored in another computer connected via a network (such as a LAN (Local Area Network), WAN (Wide Area Network)).
  • the program module 1093 and program data 1094 may then be read by the CPU 1020 from the other computer via the network interface 1070.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • Debugging And Monitoring (AREA)
PCT/JP2023/015094 2023-04-13 2023-04-13 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム Ceased WO2024214263A1 (ja)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2025513734A JPWO2024214263A1 (https=) 2023-04-13 2023-04-13
PCT/JP2023/015094 WO2024214263A1 (ja) 2023-04-13 2023-04-13 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2023/015094 WO2024214263A1 (ja) 2023-04-13 2023-04-13 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム

Publications (1)

Publication Number Publication Date
WO2024214263A1 true WO2024214263A1 (ja) 2024-10-17

Family

ID=93058921

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/015094 Ceased WO2024214263A1 (ja) 2023-04-13 2023-04-13 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム

Country Status (2)

Country Link
JP (1) JPWO2024214263A1 (https=)
WO (1) WO2024214263A1 (https=)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7843957B1 (ja) * 2025-09-26 2026-04-10 三菱電機株式会社 制御システム

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009521737A (ja) * 2005-11-10 2009-06-04 株式会社エヌ・ティ・ティ・ドコモ Javascriptプログラムの不安全動作安を検出するため、及び防止するための方法及び装置
US20170102981A1 (en) * 2015-10-13 2017-04-13 International Business Machines Corporation Dynamic instrumentation based on detected errors
WO2021070393A1 (ja) * 2019-10-11 2021-04-15 日本電信電話株式会社 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム
WO2022180702A1 (ja) * 2021-02-24 2022-09-01 日本電信電話株式会社 解析機能付与装置、解析機能付与プログラム及び解析機能付与方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009521737A (ja) * 2005-11-10 2009-06-04 株式会社エヌ・ティ・ティ・ドコモ Javascriptプログラムの不安全動作安を検出するため、及び防止するための方法及び装置
US20170102981A1 (en) * 2015-10-13 2017-04-13 International Business Machines Corporation Dynamic instrumentation based on detected errors
WO2021070393A1 (ja) * 2019-10-11 2021-04-15 日本電信電話株式会社 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム
WO2022180702A1 (ja) * 2021-02-24 2022-09-01 日本電信電話株式会社 解析機能付与装置、解析機能付与プログラム及び解析機能付与方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7843957B1 (ja) * 2025-09-26 2026-04-10 三菱電機株式会社 制御システム

Also Published As

Publication number Publication date
JPWO2024214263A1 (https=) 2024-10-17

Similar Documents

Publication Publication Date Title
Sun et al. {KSG}: Augmenting kernel fuzzing with system call specification generation
JP7517585B2 (ja) 解析機能付与装置、解析機能付与プログラム及び解析機能付与方法
US7409717B1 (en) Metamorphic computer virus detection
Brooks Survey of automated vulnerability detection and exploit generation techniques in cyber reasoning systems
Zhang et al. Intelligen: Automatic driver synthesis for fuzz testing
Magill et al. Automating object transformations for dynamic software updating
US11989291B2 (en) System, method, and apparatus for software verification
US20040088690A1 (en) Method for accelerating a computer application by recompilation and hardware customization
Cheng et al. VERI: A large-scale open-source components vulnerability detection in IoT firmware
WO2023067668A1 (ja) 解析機能付与方法、解析機能付与装置及び解析機能付与プログラム
WO2024214263A1 (ja) 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム
US11868465B2 (en) Binary image stack cookie protection
US11886589B2 (en) Process wrapping method for evading anti-analysis of native codes, recording medium and device for performing the method
JP7838662B2 (ja) 脆弱性発見装置、脆弱性発見方法及び脆弱性発見プログラム
Song et al. An empirical study on the performance of EVMs and wasm vms for smart contract execution
WO2024214260A1 (ja) 解析装置、解析方法及び解析プログラム
WO2023067663A1 (ja) 解析機能付与方法、解析機能付与装置及び解析機能付与プログラム
WO2024214264A1 (ja) 解析装置、解析方法及び解析プログラム
WO2024214262A1 (ja) 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム
Wang et al. Probebuilder: Uncovering opaque kernel data structures for automatic probe construction
Staniloiu et al. Safer Linux Kernel Modules Using the D Programming Language
Liu et al. Automated vulnerability detection in embedded devices
JP7794327B2 (ja) 解析機能付与装置、解析機能付与方法および解析機能付与プログラム
WO2024214261A1 (ja) 解析装置、解析方法及び解析プログラム
WO2024214265A1 (ja) 解析装置、解析方法及び解析プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23933039

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2025513734

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2025513734

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 23933039

Country of ref document: EP

Kind code of ref document: A1