WO2024214262A1 - 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム - Google Patents

解析機能付与装置、解析機能付与方法及び解析機能付与プログラム Download PDF

Info

Publication number
WO2024214262A1
WO2024214262A1 PCT/JP2023/015090 JP2023015090W WO2024214262A1 WO 2024214262 A1 WO2024214262 A1 WO 2024214262A1 JP 2023015090 W JP2023015090 W JP 2023015090W WO 2024214262 A1 WO2024214262 A1 WO 2024214262A1
Authority
WO
WIPO (PCT)
Prior art keywords
virtual machine
analysis
instruction
execution
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2023/015090
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
利宣 碓井
裕平 川古谷
誠 岩村
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NTT Inc
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to PCT/JP2023/015090 priority Critical patent/WO2024214262A1/ja
Priority to JP2025513733A priority patent/JPWO2024214262A1/ja
Publication of WO2024214262A1 publication Critical patent/WO2024214262A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements

Definitions

  • the present invention relates to an analysis function providing device, an analysis function providing method, and an analysis function providing program.
  • Script analysis techniques are used for a variety of purposes, such as compiler optimization for just-in-time (JIT) compilation, software testing and debugging, fuzzing, and malware analysis.
  • Dynamic analysis is a technique that analyzes the behavior of a program by actually executing it and observing its behavior.
  • Static analysis is a technique that analyzes the functions of a program by interpreting its meaning without executing it.
  • instrumentation is a technique for obtaining information about the execution state of a program during execution by adding code with analytical functions to the program to be analyzed and then executing the program.
  • instrumentation can be used to insert logging code after each execution of a program to determine the total number of instructions executed.
  • instrumentation can be used to insert logging code after each execution of a branch to determine the control flow that was executed.
  • This type of instrumentation is an important technology that is widely used for software testing as well as cybersecurity purposes such as malware analysis and vulnerability detection.
  • Instrumentation can be divided into dynamic instrumentation and static instrumentation.
  • Dynamic instrumentation is a technique for dynamically adding code for analysis by using techniques that dynamically change the behavior of a program as it is executed.
  • Static instrumentation is a technique for statically adding code for analysis by using techniques that rewrite the program before it is executed.
  • Instrumentation also covers a wide range of subjects, including source code, scripts, executable binaries (hereafter referred to as binaries), and bytecode.
  • binaries executable binaries
  • Dynamic binary instrumentation is a technique for obtaining information about the execution state by dynamically adding analytical code to the binary program being analyzed at runtime.
  • injection of analytical code is mainly achieved by using hooks.
  • the process for injecting analysis code is as follows: First, the code to be injected is placed in memory. Then, when a specific, pre-determined command or function is executed, a hook is placed so that the code to be injected branches to it. The code is then executed. At the end of the code, execution is returned to the branch source and the original processing is resumed.
  • Script execution method and analysis Scripts are executed by a script engine (also called an interpreter).
  • a script input to a script engine is generally converted into bytecode at execution time, and the bytecode is interpreted and executed by a VM. For this reason, it is common to statically analyze a script before execution, and to dynamically analyze the bytecode at execution time.
  • the bytecode when analyzing obfuscated scripts, the bytecode must be the target. Obfuscation is a technique that hampers analysis and makes the script difficult to interpret. For example, there is a type of obfuscation where the script only has the functionality of a bytecode loader, and dynamically retrieves and executes compiled bytecode at run time.
  • dynamic instrumentation of bytecode i.e. dynamic bytecode instrumentation
  • dynamic bytecode instrumentation is also an important technology for achieving script analysis through instrumentation.
  • Bytecode is an intermediate representation converted from a script, and is composed of a set of instructions that the VM can interpret and execute (VM instructions). Each VM instruction is responsible for a small unit of operation to realize the function of the script, and the operations include, for example, arithmetic operations, logical operations, data transfer, comparison, and branching.
  • a VM instruction consists of an opcode (called a VM opcode) that indicates which operation it is, and an operand (called a VM operand) that is the target of the operation.
  • a VM instruction takes an object to be operated on as input, and outputs the result of the operation as necessary.
  • an addition VM instruction takes an augend and an addend as input, and outputs the sum.
  • Input and output for VM instructions are generally passed via data areas called virtual stacks or virtual registers.
  • references to constants and variables are made via a data structure called a symbol table.
  • VM's Instruction Set Architecture ISA
  • conditional branch which is realized by a set of VM instructions
  • Understanding the conditional expressions related to the branch is important for understanding the behavior, and such analysis is indispensable. Therefore, analysis of the bytecode at the VM instruction level is important.
  • Some script engines provide analysis support functions such as debuggers and bytecode disassemblers, which can be used to perform the analysis.
  • script debuggers generally provide analysis at the expression or statement level of the script, and often do not have the functionality for analysis at the VM instruction level.
  • ISA is essential to constructing basic analysis tools such as bytecode disassemblers and debuggers. Furthermore, the results of ISA analysis are also used when constructing advanced analysis engines such as dynamic bytecode instrumentation, symbolic execution, and taint analysis.
  • the internal specifications including the ISA, are not publicly available.
  • the source code is not available, so the ISA cannot be understood without reverse engineering.
  • Non-Patent Document 1 a method has been proposed to realize dynamic instrumentation for Java bytecode (Non-Patent Document 1). Also, a method has been proposed to realize dynamic instrumentation for ActionScript3 bytecode (Non-Patent Document 2).
  • Non-Patent Documents 1 and 2 only realize dynamic bytecode instrumentation for VMs with known ISAs, and to achieve the same for a variety of scripting languages, there was a problem in that it was necessary to analyze the VM and design and implement an instrumentation engine individually.
  • the present invention has been made in consideration of the above, and aims to provide an analysis function providing device, an analysis function providing method, and an analysis function providing program that can provide dynamic bytecode instrumentation functionality to script engines that do not have support functions such as a debugger and whose internal specifications, including ISA, are unknown, without the need for manual individual analysis, design, and implementation.
  • the analysis function adding device of the present invention is characterized by having a first analysis unit that analyzes the virtual machine of the script engine based on a first execution trace obtained by executing a first test script while monitoring the binary of the script engine, a second analysis unit that analyzes the instruction set architecture, which is the instruction system of the virtual machine, to collect virtual machine instructions, determine the operation of the collected virtual machine instructions, and detect the input and output of the virtual machine instructions using a second test script, and an adding unit that monitors the execution of the virtual machine instructions based on the analysis results of the virtual machine and the analysis results of the instruction set architecture, and adds an analysis function to the binary of the script engine that performs analysis processing according to the input and output values of the virtual machine instructions.
  • the present invention makes it possible to provide dynamic bytecode instrumentation functionality to script engines that do not have support functions such as a debugger and whose internal specifications, including the ISA, are unknown, without the need for manual individual analysis, design, and implementation.
  • FIG. 1 is a diagram for explaining an example of the configuration of a malicious script to be analyzed.
  • FIG. 2 is a diagram illustrating an example of the configuration of a script engine.
  • FIG. 3 is a diagram showing pseudo code of a VM included in the script engine.
  • FIG. 4 illustrates an example dynamic bytecode instrumentation engine.
  • FIG. 5 is a diagram illustrating an example of a configuration of an analysis function providing device according to an embodiment.
  • FIG. 6 is a diagram showing an example of a first test script used for detecting a virtual program counter (VPC).
  • FIG. 7 is a diagram showing an example of the second test script.
  • FIG. 8 is a diagram showing an example of the second test script.
  • FIG. 9 is a diagram showing an example of the second test script.
  • FIG. 1 is a diagram for explaining an example of the configuration of a malicious script to be analyzed.
  • FIG. 2 is a diagram illustrating an example of the configuration of a script engine.
  • FIG. 3 is
  • FIG. 10 is a diagram illustrating an example of an analysis function source code.
  • FIG. 11 is a diagram illustrating an example of an analysis function source code.
  • FIG. 12 is a diagram illustrating an example of an execution trace.
  • FIG. 13 illustrates an example of a VM execution trace.
  • FIG. 14 is a diagram showing an example of a data configuration of the analytical function information database (DB).
  • FIG. 15 is a diagram illustrating the process of the VM instruction boundary detection unit.
  • FIG. 16 is a diagram illustrating the process of the virtual program counter detection unit.
  • FIG. 17 is a diagram illustrating the process of the dispatcher detection unit.
  • FIG. 18 is a diagram illustrating the process of the VM command operation determination unit.
  • FIG. 19 is a flowchart illustrating a processing procedure of the analysis process according to the embodiment.
  • FIG. 11 is a diagram illustrating an example of an analysis function source code.
  • FIG. 12 is a diagram illustrating an example of an execution trace.
  • FIG. 13 illustrate
  • FIG. 20 is a flowchart illustrating the procedure of the first execution trace acquisition process shown in FIG.
  • FIG. 21 is a flowchart illustrating a processing procedure of the VM instruction boundary detection processing illustrated in FIG.
  • FIG. 22 is a flowchart showing the processing procedure of the virtual program counter detection processing shown in FIG.
  • FIG. 23 is a flowchart of the dispatcher detection process shown in FIG.
  • FIG. 24 is a diagram illustrating the VM execution trace acquisition process shown in FIG.
  • FIG. 25 is a flowchart illustrating the procedure of the VM command collection process illustrated in FIG. 19 .
  • FIG. 26 is a flowchart illustrating the procedure of the VM command operation determination process shown in FIG. 19.
  • FIG. 27 is a flowchart illustrating the procedure of the VM command operation determination process illustrated in FIG. FIG.
  • FIG. 28 is a flowchart showing the procedure of the second execution trace acquisition process shown in FIG.
  • FIG. 29 is a flowchart illustrating a processing procedure of the VM command input/output detection processing illustrated in FIG.
  • FIG. 30 is a flowchart illustrating the processing procedure of the analytic function generating process shown in FIG.
  • FIG. 31 is a flowchart showing the processing procedure of the analysis function adding process shown in FIG.
  • FIG. 32 is a flowchart showing the processing steps of the VM command input/output acquisition process.
  • FIG. 33 is a flowchart showing the procedure of the analysis process.
  • FIG. 34 is a diagram illustrating an example of a computer that realizes analysis by executing a program.
  • FIG. 1 is a diagram for explaining an example of the configuration of a malicious script to be analyzed.
  • the malicious script shown in Fig. 1 is an actual malicious script that has been manually deobfuscated, and then partially excerpted and formatted.
  • the malicious script shown in Figure 1 has an analysis interference mechanism that uses conditional branching to obtain the system locale (line 1) and terminate execution if it is not a specific value (lines 2 and 3). For this reason, if the analysis environment does not meet this condition, the malicious behavior (line 4) cannot be observed. This behavior also depends on the locale (lines 4, 6, and 7), and if the path is forcibly changed and the script is executed even in a locale that does not meet the condition, consistency in execution will not be maintained and it may exhibit behavior that should not exist. For example, it may open a file that does not exist or write to a file other than the one intended (lines 7 and onwards).
  • the analysis function providing device provides a dynamic bytecode instrumentation function to a script engine whose internal specifications are unknown, after obtaining information about the script engine's instruction set architecture (ISA) related to dynamic bytecode instrumentation.
  • ISA instruction set architecture
  • Dynamic Bytecode Instrumentation constantly monitors the execution of bytecodes, and inserts analysis processing when a specific event occurs.
  • VM Virtual Machine
  • basic block execution basic block execution
  • function calls At the start and end points of such events, a callback is made to a pre-registered analysis handler (called an analysis function) to observe the state before and after execution.
  • VM Virtual Machine
  • analysis function a pre-registered analysis handler
  • the analysis function is called with information about the execution state as an argument. This makes it possible to observe behavior by logging that information, and to change behavior by rewriting the execution state.
  • each VM command is monitored, and when calling back to the analysis function, the value to be operated on is passed as an argument. This allows the analysis function to restore the formula from the operation on the loc variable, and then rewrite the loc variable to a value that satisfies that formula, achieving a bypass.
  • Dynamic bytecode instrumentation is typically implemented as an additional feature of the script engine's VM.
  • a script engine generally analyzes the input script and generates bytecode through an abstract syntax tree. The generated bytecode is then interpreted and executed by the VM to execute the script. Therefore, understanding the VM architecture is important to understand and control the execution state of the script.
  • VMs There are two main types of VMs for interpretive execution. The first is the decode-dispatch type, and the second is the threaded code type.
  • the decode-dispatch type which is more common, is used as an example. Note that this embodiment can handle both the decode-dispatch type and the threaded code type.
  • Figure 2 is a diagram for explaining an example of the configuration of a script engine.
  • the script engine 1 has a bytecode compiler 2 and a VM 3.
  • the bytecode compiler 2 also has a syntax analysis unit 4 and a bytecode generation unit 5.
  • the VM 3 also has a code cache unit 6, a fetch unit 7, a decode unit 8, and an execution unit 9.
  • the fetch unit 7, decode unit 8, and execution unit 9 are executed repeatedly, and are called an interpreter loop.
  • the script engine 1 then accepts the input of a script.
  • the syntax analysis unit 4 receives the script as input, and through lexical and syntactic analysis generates an Abstract Syntax Tree (AST), which it outputs to the bytecode generation unit 5.
  • the bytecode generation unit 5 receives the AST as input, converts it into bytecode, and stores it in the code cache unit 6.
  • the fetch unit 7 fetches the VM opcode from the code cache unit 6 and outputs it to the decode unit 8.
  • the VM opcode refers to the opcode portion of the VM instruction.
  • the decode unit 8 receives the VM opcode as input, interprets the VM opcode using a decoder/dispatcher, and dispatches it to the corresponding program.
  • the execution unit 9 executes the program corresponding to the VM instruction. The contents written in the script are executed by executing the VM instructions one after another through a repeated interpreter loop.
  • FIG. 3 is a diagram showing pseudocode of the VM of the script engine.
  • the pseudocode first initializes the VPC (line 1).
  • the while loop is the interpreter loop (lines 2 to 7).
  • the fetch unit 7 obtains the VM opcode of the VM instruction at the position pointed to by the VPC in the code cache that holds the bytecode (line 3).
  • the decoder uses a Switch statement to interpret the VM instruction (line 4), and the dispatcher calls the instruction handler based on the VM opcode (lines 5 and onward).
  • the instruction handler performs the operation corresponding to the instruction. Input and output are performed using a virtual stack and virtual registers (line 6), and constants and variables are referenced via a symbol table (line 7).
  • the analysis function adding device first automatically analyzes the VM's architecture and ISA in order to add a dynamic bytecode instrumentation engine.
  • [Configuration of the analysis function providing device] 5 is a diagram illustrating an example of the configuration of the analysis function-imparting device 10 according to the embodiment.
  • the configuration of the analysis function-imparting device 10 according to the embodiment will be specifically described with reference to FIG.
  • the analysis function-adding device 10 executes the first test script while monitoring the binary of the script engine, and obtains a branch trace and a memory access trace as a first execution trace.
  • the analysis function-adding device 10 analyzes the virtual machine (VM) based on the first execution trace, and obtains the VM instruction boundary, the virtual program counter (VPC), and the dispatcher as architectural information.
  • VM virtual machine
  • VPC virtual program counter
  • the analysis function-adding device 10 executes the second test script while monitoring the VPC and the dispatcher to obtain a first VM execution trace.
  • the analysis function-adding device analyzes the VM execution trace to collect VM instructions and to determine the operation of the VM instructions using a predetermined algorithm.
  • the analysis function-adding device 10 uses the second test script and the script engine binary to obtain a second execution trace that targets only the VM instructions to be determined, and obtains ISA information by detecting the input and output of the VM instructions by analyzing the second execution trace.
  • the analysis function adding device 10 adds a dynamic bytecode instrumentation engine (see FIG. 4) to the script engine binary based on the obtained VM architecture and ISA information.
  • the analysis function adding device 10 can add dynamic bytecode instrumentation functionality to script engines that do not have support functions such as a debugger and whose internal specifications, including the ISA, are unknown, without the need for manual individual analysis, design, and implementation.
  • the analysis function-imparting device 10 has an input unit 11, a control unit 12, a storage unit 13, and an output unit 14.
  • the analysis function-imparting device 10 accepts input of a test script, a script engine binary, and a seed script.
  • the input unit 11 is composed of input devices such as a keyboard and a mouse, and accepts information input from the outside and inputs it to the control unit 12.
  • the input unit 11 also has a communication interface for sending and receiving various information to and from other devices connected via a wired connection or a network, etc., and accepts input of information sent from other devices.
  • the input unit 11 accepts input of test scripts, script engine binaries, and analysis function source codes, and outputs them to the control unit 12.
  • the test script is a script that is input when dynamically analyzing the script engine to obtain an execution trace and a VM execution trace, and includes a first test script for VM analysis and a second test script for ISA analysis.
  • the script engine binary is an executable file that constitutes the script engine.
  • the script engine binary may be composed of multiple executable files.
  • the analysis function source code is source code that includes the analysis function.
  • test script configuration Let us explain about test scripts.
  • a test script is a script that is input when dynamically analyzing a script engine. This test script focuses on the number of branch instruction executions and memory reads and writes, and is used to capture the difference in the behavior of the script engine that occurs when the test script is executed a different number of times. This test script is prepared before the analysis and is created manually. Creating it requires knowledge of the specifications of the target script language.
  • FIG. 6 shows an example of a first test script used to detect VPCs.
  • the first test script uses a repetitive process (line 2).
  • the first test script changes the execution conditions and generates differences by increasing or decreasing the number of repetitions (line 2) and the number of repeated statements (lines 3 to 5) in the test script.
  • the second test script is intended to execute the VM instructions to be analyzed for ISA analysis. However, what instructions the VM has is unknown at the stage of creating the second test script. For this reason, a second test script is created for each operation provided by the scripting language. When the second test script is executed, the VM instruction corresponding to this operation is determined. Examples of operations include arithmetic operations, logical operations, comparisons, and manipulation of variables and constants.
  • FIG. 7 shows an example of a second test script.
  • the second test script shown in FIG. 7 was created to test an addition operation.
  • the second test script is created according to the following criteria. First, write the minimum expressions and statements related to the operation to be judged (addition in the example in Figure 7). Second, use a distinctive value that is useful for matching as the value to be operated on. If there are multiple values, use different values with small differences ("0x12345678" and "0x12345679" in the example in Figure 7). Third, do not operate on constants, but use one or more variables.
  • FIGS. 8 and 9 are diagrams showing other examples of the second test script.
  • the second test script in FIG. 8 was created to judge an equality comparison operation.
  • the second test script in FIG. 9 was created to judge a local variable storage operation.
  • the analytic function source code is a source code including an analytic function.
  • Figures 10 and 11 are diagrams showing an example of the analytic function source code.
  • the analysis function source code shown in Figure 10 contains an analysis function named analysis_func_VMOP1_enter.
  • the VM instruction to be monitored is "VMOP1", and this analysis function is injected at its entrance (box W1).
  • This analysis function takes as arguments the VM opcode to be monitored and the input value of the VM instruction to be operated on (boxes W2 and W3), and includes processing to record and rewrite the input values.
  • the analysis function source code shown in Figure 11 contains an analysis function named analysis_func_VMOP1_exit.
  • the VM instruction to be monitored is "VMOP1”, and this analysis function is injected into its exit (box W11).
  • This analysis function takes as arguments the VM opcode to be monitored and the output value of the VM instruction to be manipulated (box W13), and includes processing to record and rewrite the output value.
  • the storage unit 13 is realized by a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk, and stores the processing program that operates the analysis function providing device 10, data used during execution of the processing program, etc.
  • the storage unit 13 has an execution trace DB 131, a VM execution trace DB 133, an architecture information DB 132 that stores architecture information acquired by the virtual machine analysis unit 121 and the instruction set architecture analysis unit 122, and an analysis function information DB 134.
  • the execution trace DB 131 and the VM execution trace DB 133 store the execution traces and VM execution traces acquired by the first execution trace acquisition unit 1211 (described later), the second execution trace acquisition unit 1224, and the VM execution trace acquisition unit 1221 (described later), respectively.
  • the execution trace DB 131 and the VM execution trace DB 133 are managed by the analysis function providing device 10.
  • the execution trace DB 131 and the VM execution trace DB 133 may be managed by another device (server, etc.), in which case the first execution trace acquisition unit 1211, the second execution trace acquisition unit 1224, and the VM execution trace acquisition unit 1221 output the acquired execution traces and VM execution traces to the management server, etc., of the execution trace DB 131 and the VM execution trace DB 133 via the communication interface of the output unit 14, and store them in the execution trace DB 131 and the VM execution trace DB 133.
  • Fig. 12 is a diagram showing an example of an execution trace. As described above, an execution trace is composed of a branch trace and a memory access trace. Fig. 12 shows an excerpt of an execution trace. The configuration of an execution trace will be described below with reference to Fig. 12.
  • Trace indicates whether the log line is a branch trace or a memory access trace.
  • a branch trace log line has the format shown, for example, in lines 1 to 10 of Figure 12, and consists of three elements: type, src, and dst.
  • type indicates whether the executed branch instruction was a call instruction, a jmp instruction, or a ret instruction.
  • src indicates the address of the branch source, and dst indicates the address of the branch destination.
  • a log line of a memory access trace has the format shown, for example, in lines 11 to 13 of Figure 12, and consists of three elements: type, target, and value.
  • Type indicates whether the memory access is a read or write.
  • Target indicates the memory address that is the target of the memory access. Value stores the result of the memory access.
  • Fig. 13 is a diagram showing an example of a VM execution trace.
  • a VM execution trace is a record of a VM opcode and a VPC.
  • Fig. 13 shows a part of a VM execution trace. The configuration of a VM execution trace will be described below with reference to Fig. 13.
  • a log line of a VM execution trace is, for example, in the format shown in Figure 13, and consists of two elements: vpc and vmop (vm opcode).
  • vpc indicates the value of the VPC.
  • vmop indicates the value of the VM opcode that is virtually assigned to each pointer that points to the beginning of the VM instruction handler to be executed, obtained from the pointer cache.
  • the analytic function information DB 134 stores the analytic functions generated by the analytic function generating unit 1231.
  • FIG. 14 is a diagram showing an example of the data configuration of the analytic function information DB 134.
  • the VM instruction item shown in Figure 14 is an area that stores which VM instruction the analysis function is for.
  • Entry/Exit is an area that stores whether the analysis function is for the entry or exit of a VM instruction.
  • Binary is an area that stores the binary of the compiled analysis function.
  • Entry point address is an area that additionally stores the address of the entry point of the analysis function expanded in memory space during analysis using the generated product.
  • the control unit 12 has an internal memory for storing programs that define various processing procedures and the like, and necessary data, and executes various processes using these.
  • the control unit 12 is an electronic circuit such as a CPU (Central Processing Unit) or an MPU (Micro Processing Unit).
  • the control unit 12 has a virtual machine analysis unit 121 (first analysis unit), an instruction set architecture analysis unit 122 (second analysis unit), and a dynamic bytecode instrumentation function addition unit 123 (addition unit).
  • the virtual machine analysis unit 121 analyzes the VM of the script engine.
  • the virtual machine analysis unit 121 obtains a plurality of first execution traces by changing the conditions at the time of execution, analyzes the plurality of first execution traces using differential execution analysis, and obtains the VPC.
  • the virtual machine analysis unit 121 also analyzes the script engine binary to obtain the boundaries and dispatchers of VM instructions.
  • the virtual machine analysis unit 121 has a first execution trace acquisition unit 1211 (first acquisition unit), a VM instruction boundary detection unit 1212 (first detection unit), a virtual program counter detection unit 1213 (second detection unit), and a dispatcher detection unit 1214 (third detection unit).
  • the first execution trace acquisition unit 1211 receives the first test script and the script engine binary as input.
  • the first execution trace acquisition unit 1211 executes the first test script while monitoring the execution of the script engine binary, thereby acquiring the first execution trace.
  • An execution trace is composed of a branch trace and a memory access trace.
  • a branch trace records the type of branch instruction at the time of execution, the branch source address, and the branch destination address.
  • a memory access trace records the type of memory operation at the time of execution (read/write), and the memory address and value of the operation target. It is known that branch traces and memory access traces can be obtained by hooking a memory operation instruction, inserting code for log output, and executing it.
  • the first execution trace obtained by the first execution trace acquisition unit 1211 is stored in the execution trace DB 131.
  • the VM instruction boundary detection unit 1212 clusters the first execution trace to detect the boundaries of each VM instruction.
  • the VM instruction boundary detection unit 1212 clusters the first execution trace to detect clusters with a threshold or more of execution counts as VM instructions. In clustering, consecutive code regions that are executed multiple times are detected. For example, executed instructions that are close in distance to each other in the code may be grouped together, common subsequences of executed code blocks may be searched for, or other methods may be used.
  • the analysis function providing device 10 detects the start and end points of consecutive instruction sequences that make up the detected VM instruction as boundaries.
  • the VM instruction boundaries detected here are used in VPC detection and dispatcher detection.
  • the virtual program counter detection unit 1213 extracts and analyzes the first execution trace for the first test script stored in the execution trace DB 131, and detects the VPC.
  • the virtual program counter detection unit 1213 analyzes the multiple first execution traces using differential execution analysis focusing on the number of memory reads and the boundaries of each VM instruction detected by the VM instruction boundary detection unit 1212, and detects the VPC.
  • the virtual program counter detection unit 1213 uses the fact that a read into the memory that holds the VPC always occurs after the execution of each VM instruction, and detects the VPC by discovering the destination of this read.
  • the virtual program counter detection unit 1213 uses differential execution analysis focusing on the number of memory reads to detect VPCs.
  • the virtual program counter detection unit 1213 compares the first execution traces of multiple test scripts acquired using the first test script, and finds memories for which the number of memory reads changes in proportion to both the increase or decrease in the number of repetitions and the number of repeated statements.
  • the virtual program counter detection unit 1213 then refers to the boundaries of each VM instruction detected by the VM instruction boundary detection unit 1212, and narrows down the memory values that have been read to those that always point to the start point of the VM instruction.
  • the virtual program counter detection unit 1213 detects this memory as a VPC.
  • the dispatcher detection unit 1214 extracts each VM instruction portion from the script engine binary based on the VM instruction boundaries detected by the VM instruction boundary detection unit 1212, and detects the highly similar portions between each VM instruction as dispatchers.
  • the dispatcher is realized by referencing the pointer cache and jumping to the pointer of the next VM instruction handler.
  • Dispatchers are placed in a distributed manner at the rear of each VM instruction handler, and the code therein is generally highly identical.
  • the analysis function adding device detects dispatchers using a specified method by searching for code that exists at the rear of such VM instruction handlers and has high similarity. To detect the highly similar portions, for example, a sequence alignment algorithm may be used, or other methods may be used.
  • the instruction set architecture analysis unit 122 analyzes the instruction set architecture, which is the system of VM instructions.
  • the instruction set architecture analysis unit 122 collects VM instructions and judges the operation of the collected VM instructions.
  • the instruction set architecture analysis unit 122 uses a second test script to obtain a memory access trace that targets only the VM instruction to be judged as a second execution trace, and detects the input and output of the VM instruction by analyzing the second execution trace.
  • the instruction set architecture analysis unit 122 has a VM execution trace acquisition unit 1221 (second acquisition unit), a VM instruction collection unit 1222 (first collection unit), a VM instruction operation determination unit 1223 (first determination unit), a second execution trace acquisition unit 1224 (third acquisition unit), and a VM instruction input/output detection unit 1225 (fourth detection unit).
  • the VM execution trace acquisition unit 1221 receives as input a second test script (described later) that uses values characteristic of the operation target and a script engine binary.
  • the VM execution trace acquisition unit 1221 acquires a VM execution trace by monitoring the VPC and the pointer of the VM instruction handler dispatched by the dispatcher.
  • the VM execution trace acquisition unit 1221 acquires a VM execution trace, which is an execution trace executed on the VM, by executing the second test script while monitoring the execution of the script engine binary.
  • the VM execution trace acquisition unit 1221 executes a large number of test scripts to acquire a VM execution trace.
  • the VM execution trace acquisition unit 1221 links a pointer to a VM instruction with the VM instruction, and virtually assigns a VM opcode as an identifier to each.
  • a VM execution trace is an execution trace executed in a VM, in which a VM opcode is virtually assigned as an identifier, and in which a pointer to the executed VM handler and a VPC are recorded.
  • a VM execution trace is a record of a pointer to an executed VM instruction handler and a VPC.
  • a VM execution trace is composed of a VPC and a VM opcode for each executed VM instruction.
  • the recording of a VPC can be achieved by monitoring the memory of the VPC detected by the virtual program counter detection unit 1213.
  • a VM opcode is an identifier virtually assigned to each of a pointer to a VM instruction and a VM instruction that are linked together.
  • the VM execution trace acquired by the VM execution trace acquisition unit 1221 is stored in the VM execution trace DB 133.
  • the VM instruction collection unit 1222 receives the VPC and dispatcher as input, executes the second test script while monitoring the VPC and dispatcher, and obtains a VM execution trace.
  • the VM instruction collection unit 1222 collects VM instructions from the VM execution trace.
  • the VM instruction operation determination unit 1223 uses a predetermined algorithm to determine the operation of a VM instruction.
  • the VM instruction operation determination unit 1223 analyzes the VM execution trace acquired using a test script (second test script) for a specific operation, and determines the VM opcode that carries out that operation.
  • the VM opcode indicates which operation the VM instruction is.
  • the VM command operation determination unit 1223 creates a list of unknown VM opcodes for each VM execution trace.
  • the VM command operation determination unit 1223 determines a priority based on the number of unknown VM opcodes in the list, and determines which VM opcode is responsible for the operation being determined based on this priority and the number of unknown VM opcodes. For example, the VM command operation determination unit 1223 makes a determination based on the priority being the inverse of the number of unknown VM opcodes in the list.
  • the second execution trace acquisition unit 1224 uses the second test script to acquire a second execution trace that targets only the VM instruction to be judged.
  • the second execution trace acquisition unit 1224 executes the second test script while monitoring the execution of the script engine binary, thereby acquiring a new memory access trace as the second execution trace.
  • the VM instruction I/O detection unit 1225 analyzes the second execution trace to detect the input and output of VM instructions. It obtains information on the instruction set architecture (ISA). Using the memory access trace, which is the second execution trace, the VM instruction I/O detection unit 1225 compares the characteristic values used as the operation target of the second test script, and detects the code that reads and writes the matched memory as the location where the input and output appear.
  • ISA instruction set architecture
  • the dynamic bytecode instrumentation function adding unit 123 monitors the execution of VM instructions based on the analysis results of the VM by the virtual machine analysis unit 121 and the analysis results of the ISA by the instruction set architecture analysis unit 122, and adds an analysis function to the script engine binary that performs analysis processing according to the input and output values of the VM instructions.
  • the dynamic bytecode instrumentation function adding unit 123 outputs a script engine with dynamic bytecode instrumentation function by inserting analysis code that realizes the design of Figure 4 into the script engine binary based on the VM architecture obtained by the virtual machine analysis unit 121 and the ISA information obtained by the instruction set architecture analysis unit 122.
  • the dynamic bytecode instrumentation function adding unit 123 monitors the execution of instructions as events based on the VM architecture and ISA information, and adds dynamic bytecode instrumentation functions to the script engine binary by calling back to the analysis function along with the input and output values.
  • the dynamic bytecode instrumentation function adding unit 123 has an analysis function generation unit 1231 and an analysis function adding unit 1232.
  • the analysis function generation unit 1231 compiles the input analysis function source code and obtains the corresponding binary, and stores the generated analysis function in the analysis function information DB 134.
  • the analysis function adding unit 1232 adds code that realizes analysis processing to the script engine binary.
  • the analysis function adding unit 1232 monitors the execution of VM instructions based on the analysis results for the VM and the analysis results for the instruction set architecture, and adds an analysis function to the script engine binary that performs analysis processing according to the input and output values of the VM instructions.
  • the analysis function assignment unit 1232 has a first code assignment unit 1233 and a second code assignment unit 1234.
  • the first code assignment unit 1233 assigns a code that realizes a VM command input/output acquisition process to the script engine binary.
  • the first code assignment unit 1233 monitors the execution of a command as an event based on the VM architecture and ISA information, and assigns a code that realizes a VM command input/output acquisition process that acquires the input/output values to the script engine binary.
  • the second code assignment unit 1234 assigns code to the script engine binary that executes a corresponding analysis function for each operation of the VM command to realize the analysis process.
  • the output unit 14 is, for example, a liquid crystal display or a printer, and outputs various information including information related to the analysis function imparting device 10.
  • the output unit 14 may also be an interface that handles the input and output of various data between an external device, and may output various information to an external device.
  • the VM instruction boundary detection unit 1212 detects the boundaries of each VM instruction. At this time, the VM instruction boundary detection unit 1212 detects VM instructions and their boundaries for threaded code type VMs that do not have an interpreter loop and therefore have difficulty grasping the boundaries of VM instructions. Specifically, the VM instruction boundary detection unit 1212 extracts a first execution trace from the execution trace DB 131. Then, as shown in FIG. 15, the VM instruction boundary detection unit 1212 clusters the first execution trace using a predetermined method, and detects clusters whose execution count is equal to or greater than a threshold as VM instructions (e.g., VM instruction handlers 1 to 3). The VM instruction boundary detection unit 1212 detects the start and end points of the consecutive instruction sequence that constitutes the VM instruction as boundaries.
  • a threshold as VM instructions
  • the virtual program counter detection unit 1213 detects the VPC and the pointer cache.
  • the detection of the virtual program counter is realized by analyzing the log of the memory access trace of the acquired first execution trace.
  • the virtual program counter detection unit 1213 uses differential execution analysis focusing on the number of times memory is read.
  • FIG. 16 is a diagram for explaining the processing of the virtual program counter detection unit 1213.
  • the virtual program counter detection unit 1213 extracts one execution trace by the first test script from the execution trace DB 131.
  • the number of times the VPC is read is proportional to the number of repetitions in the test script and the number of statements in the repetitive process. If the number of repetitions is N and the number of repeated statements is M, then approximately MN VPC reads will occur. For this reason, the virtual program counter detection unit 1213 extracts memory that has increased by 4MN and 9MN in the execution trace for the first test script in which N and M have been increased to 2N and 2M, respectively, and 3N and 3M. Specifically, as shown in FIG. 16, the virtual program counter detection unit 1213 extracts memory areas that have a monotonically increasing read/write for each VM instruction execution ((1) in FIG. 16).
  • the virtual program counter detection unit 1213 detects as a VPC a memory value that has been read and that always points to the start point of a VM instruction. Specifically, the virtual program counter detection unit 1213 compares the VPC's pointing destination with the address of the VM instruction handler, and narrows it down to matching memory areas ((2) in FIG. 16).
  • the dispatcher detection unit 1214 detects a dispatcher by analyzing the binary of the script engine using a predetermined method.
  • FIG. 17 is a diagram for explaining the process of the dispatcher detection unit 1214.
  • the dispatcher detection unit 1214 detects dispatchers. Based on the boundaries of VM instructions detected by the VM instruction boundary detection unit 1212, the dispatcher detection unit 1214 extracts each VM instruction portion from the script engine binary. Then, based on the assumption that the similarity of dispatcher code is high (FIG. 17 (1)), the dispatcher detection unit 1214 calculates the similarity between the codes of each VM instruction and detects the portion with high similarity between all VM instructions as the dispatcher. The dispatcher detection unit 1214 can detect the code that is commonly executed in the latter half of the VM instructions as the dispatcher (FIG. 17 (1)).
  • the VM command operation determination unit 1223 analyzes the VM execution trace obtained by the second test script for a specific operation and determines the VM opcode that carries out that operation.
  • the second test script only performs a single operation to be judged, so it is expected that only the VM instruction responsible for that operation will be executed. In this case, it can be easily judged because only that one VM opcode appears in the VM execution trace.
  • multiple VM instructions may be executed even for a test script with a single operation.
  • VM instructions are also executed that handle reading constants and variables, storing them in variables, etc. This is because there is a dependency between the VM instructions that make up the addition formula, making it more difficult to judge.
  • the VM command operation determination unit 1223 therefore uses an algorithm that prioritizes determining which is easier to determine in order to make the determination. Even in cases where there is a dependency as described above, it is possible to make a determination if the VM command on which it depends is identified first, so the VM command operation determination unit 1223 gradually advances the determination while resolving the dependency.
  • the algorithm used by the VM instruction operation determination unit 1223 will be outlined with reference to FIG. 18.
  • the VM instruction operation determination unit 1223 creates a list of unknown VM opcodes for each VM execution trace (see FIG. 18).
  • the VM instruction operation determination unit 1223 sets the reciprocal of the number of unknown VM opcodes in the list as the priority, and makes a determination by repeating the following procedure.
  • the VM command operation determination unit 1223 performs a first procedure to determine that this VM opcode is responsible for the operation to be determined.
  • the VM command operation determination unit 1223 performs a second procedure in which it detects a unique VM opcode from the difference with other lists and determines that the unknown VM opcode is responsible for the operation to be determined.
  • the VM command operation determination unit 1223 performs a third procedure, which reflects the determination results from the first and second procedures to all lists and updates the priorities.
  • the VM instruction operation determination unit 1223 determines that this unknown opcode is the VM instruction responsible for operation 1 ((1) in FIG. 18). Then, the VM instruction operation determination unit 1223 reflects the determination result in all lists and updates the priority ((2) in FIG. 18). As a result, all of the opcodes "VMOP1_Unknown" in the list are updated to the opcode "VMOP1_known”.
  • the VM instruction operation determination unit 1223 determines that this opcode is the VM instruction responsible for operation 2 ((3) in FIG. 18). The determination result is then reflected in all lists, and the priorities are updated ((4) in FIG. 18). As a result, all of the opcodes "VMOP2_Unknown" in the lists are updated to the opcode "VMOP2_known”.
  • the VM command operation determination unit 1223 detects unknown VM opcodes specific to each list from the differences with other lists.
  • the VM command operation determination unit 1223 detects the VM opcode "VMOP3_Unknown” that is not present in operations 4 and 5, and determines that this VM opcode "VMOP3_Unknown” is the VM command responsible for operation 3 ((5) in FIG. 18).
  • the VM instruction operation determination unit 1223 detects the VM opcode "VMOP6_Unknown” that is not present in operations 3 and 5, and determines that this VM opcode "VMOP6_Unknown” is the VM instruction responsible for operation 4 ((5) in FIG. 18).
  • the VM instruction operation determination unit 1223 detects the VM opcode "VMOP7_Unknown” that is not present in operations 3 and 4, and determines that this VM opcode "VMOP7_Unknown” is the VM instruction responsible for operation 5 ((5) in FIG. 18).
  • the VM instruction input/output detection unit 1225 analyzes the memory access trace (second execution trace) acquired by the second execution trace acquisition unit 1224 using the second test script, and detects input/output of a VM instruction.
  • VM instruction operands are generally passed on a virtual stack or virtual registers, and variables and constants are managed in a symbol table.
  • the subscript of a record in the symbol table is passed to the VM instruction operand, and the actual value does not appear.
  • the VM instruction I/O detection unit 1225 identifies code that accesses the actual value by comparing values using memory access traces.
  • the VM instruction input/output detection unit 1225 determines during memory access tracing that the period from the dispatch of a VM instruction to the next dispatch is memory access due to that VM instruction.
  • the VM command I/O detection unit 1225 compares the read/write value with the value used for the operation target in the second test script. This comparison is performed by using a value that is characteristic of the operation target in the second test script.
  • the VM instruction I/O detection unit 1225 detects the matched code that reads and writes memory as a location where VM instruction I/O appears.
  • the VM instruction I/O detection unit 1225 obtains VM instruction I/O by monitoring the reads and writes at this location.
  • Fig. 19 is a flowchart showing the procedure of the analysis process according to the embodiment.
  • the input unit 11 receives a test script and a script engine binary as input (step S1).
  • the test script includes a first test script and a second test script.
  • the first execution trace acquisition unit 1211 performs a first execution trace acquisition process in which the first test script is executed while monitoring the binary of the script engine to acquire a branch trace and a memory access trace (step S2).
  • the VM instruction boundary detection unit 1212 detects VM instructions and performs VM instruction boundary detection processing to detect VM instruction boundaries (step S3).
  • the virtual program counter detection unit 1213 extracts and analyzes the execution trace for the first test script stored in the execution trace DB 131, and performs virtual program counter detection processing to discover the VPC (step S4).
  • the dispatcher detection unit 1214 performs dispatcher detection processing to extract each VM command portion from the script engine binary and detect the portion with high similarity between each VM command as a dispatcher (step S5).
  • the VM execution trace acquisition unit 1221 receives the test script and the script engine binary as input, and executes the test script while monitoring the execution of the script engine binary, thereby performing a VM execution trace acquisition process to acquire a VM execution trace (step S6).
  • the VM instruction collection unit 1222 performs a VM instruction collection process to collect VM instructions from the VM execution trace (step S7).
  • the VM command operation determination unit 1223 performs a VM command operation determination process that determines the operation of the VM command using a predetermined algorithm (step S8).
  • the second execution trace acquisition unit 1224 performs a second execution trace acquisition process using the second test script to acquire a second execution trace that targets only the VM instruction to be judged (step S9).
  • the VM instruction input/output detection unit 1225 analyzes the second execution trace and performs a VM instruction input/output detection process to detect the input/output of VM instructions (step S10).
  • the analysis function generation unit 1231 compiles the input analysis function source code and performs analysis function generation processing to generate an analysis function (step S11).
  • the analysis function adding unit 1232 performs an analysis function adding process to add an analysis function to the script engine binary that monitors the execution of VM instructions and performs analysis processing according to the input/output values of the VM instructions based on the VM architecture and ISA information (step S12).
  • the analysis function adding unit 1232 adds a code that realizes the VM instruction input/output acquisition process to the script engine binary, and also adds a code that realizes the analysis processing.
  • the output unit 14 outputs the script engine with dynamic bytecode instrumentation function (step S13).
  • Fig. 20 is a flowchart showing the processing procedure of the first execution trace acquisition process shown in Fig. 19.
  • the first execution trace acquisition unit 1211 receives the first test script and the script engine binary as input (step S21). Then, the first execution trace acquisition unit 1211 hooks the received script engine to acquire a branch trace (step S22). The first execution trace acquisition unit 1211 also hooks the received script engine to acquire a memory access trace (step S23).
  • the first execution trace acquisition unit 1211 inputs the first test script received in this state into the script engine and executes it (step S24), and stores the first execution trace acquired thereby in the execution trace DB 131 (step S25).
  • the first execution trace acquisition unit 1211 determines whether or not all of the input first test scripts have been executed (step S26). If all of the input first test scripts have been executed (step S26: Yes), the first execution trace acquisition unit 1211 ends the process. On the other hand, if all of the input first test scripts have not been executed (step S26: No), the first execution trace acquisition unit 1211 returns to the execution of the first test script in step S24 and continues the process.
  • Fig. 21 is a flowchart showing the processing procedure of the VM instruction boundary detection process shown in Fig. 19.
  • the VM instruction boundary detection unit 1212 extracts a first execution trace from the execution trace DB 131 (step S31).
  • the VM instruction boundary detection unit 1212 clusters the first execution trace using a predetermined method (step S32). Any method may be used for the clustering.
  • the VM instruction boundary detection unit 1212 detects clusters whose execution count is equal to or exceeds a threshold as VM instructions (step S33). Then, the VM instruction boundary detection unit 1212 determines the start and end points of the continuous instruction sequence that constitutes the VM instruction as boundaries (step S34). The VM instruction boundary detection unit 1212 outputs the VM instruction boundary as a return value (step S35), and ends the VM instruction boundary detection process.
  • Fig. 22 is a flowchart showing the processing procedure of the virtual program counter detection process shown in Fig. 19.
  • the virtual program counter detection unit 1213 extracts one first execution trace by the first test script from the execution trace DB 131 (step S41). Next, the virtual program counter detection unit 1213 focuses on memory access traces in the first execution trace, and counts up the number of reads for each memory read destination (step S42).
  • the virtual program counter detection unit 1213 receives as input the first test script used to obtain the first execution trace (step S43), and analyzes the first test script to obtain the number of repetitions and the number of repeated statements (step S44).
  • the virtual program counter detection unit 1213 extracts another first execution trace by the first test script, which has a different number of repetitions and number of repeated statements, from the execution trace DB 131 (step S45). Then, the virtual program counter detection unit 1213 focuses on the memory access trace and counts the number of reads for each memory read destination (step S46). The virtual program counter detection unit 1213 also receives as input the first test script used to obtain the first execution trace (step S47), and analyzes the first test script to obtain the number of repetitions and the number of repeated statements (step S48).
  • the virtual program counter detection unit 1213 narrows down the memory read destinations to only those whose read counts change in proportion to the number of repetitions or the increase or decrease in the number of repeated statements (step S49). Furthermore, the virtual program counter detection unit 1213 narrows down the memory read destinations narrowed down in step S49 to those whose read memory values always point to the start point of the VM instruction (step S50).
  • the virtual program counter detection unit 1213 determines whether the memory read destinations have been narrowed down to only one (step S51). If the virtual program counter detection unit 1213 has not narrowed down the memory read destinations to only one (step S51: No), the process returns to step S45, where the virtual program counter detection unit 1213 extracts the next first execution trace and continues processing. On the other hand, if the virtual program counter detection unit 1213 has narrowed down the memory read destinations to only one (step S51: Yes), the virtual program counter detection unit 1213 stores the narrowed down memory read destination in the architecture information DB 132 as a virtual program counter (step S52), and ends processing.
  • Fig. 23 is a flowchart showing the processing procedure of the dispatcher detection process shown in Fig. 19.
  • the dispatcher detection unit 1214 receives the script engine binary as input (step S61).
  • the dispatcher detection unit 1214 receives the boundaries of VM commands from the VM command boundary detection unit 1212 (step S62).
  • the dispatcher detection unit 1214 extracts each VM instruction portion from the script engine binary based on the boundaries of the VM instructions received from the VM instruction boundary detection unit 1212 (step S63).
  • the dispatcher detection unit 1214 calculates the similarity between the codes of each VM instruction using a predetermined method (step S64). Any method for calculating the similarity may be used as long as it is a method that can calculate the similarity between codes.
  • the dispatcher detection unit 1214 extracts the part with high similarity among all VM commands based on the similarity calculated in step S64 (step S65). The dispatcher detection unit 1214 then determines whether it is the end part of the VM command (step S66).
  • step S66: No If it is not the end of the VM command (step S66: No), the dispatcher detection unit 1214 returns to step S65 and continues processing. If it is the end of the VM command (step S66: Yes), the dispatcher detection unit 1214 outputs the extracted part as a dispatcher (step S67) and ends processing.
  • Fig. 24 is a flowchart showing the procedure of the VM execution trace acquisition process shown in Fig. 19.
  • the VM execution trace acquisition unit 1221 receives the second test script and the script engine binary as input (step S71). Then, the VM execution trace acquisition unit 1221 applies a hook to the received script engine to record the VPC and VM opcode (step S72).
  • the VM execution trace acquisition unit 1221 inputs the second test script received in this state into the script engine and executes it (step S73), and stores the VM execution trace acquired thereby in the VM execution trace DB 133 (step S74).
  • the VM execution trace acquisition unit 1221 determines whether or not all of the input second test scripts have been executed (step S75). If all of the input second test scripts have been executed (step S75: Yes), the VM execution trace acquisition unit 1221 ends the process. If all of the input second test scripts have not been executed (step S75: No), the VM execution trace acquisition unit 1221 returns to the execution of the second test script in step S73 and continues the process.
  • Fig. 25 is a flowchart showing the processing procedure of the VM command collection process shown in Fig. 19.
  • the VM command collection unit 1222 receives the VPC and dispatcher as input (step S81) and acquires various scripts from the Internet (step S82). The VM command collection unit 1222 executes the scripts while monitoring the VPC and dispatcher, and acquires a VM execution trace (step S83).
  • the VM instruction collection unit 1222 acquires VM instructions from the VM execution trace (step S84) and adds them to a list of VM instructions (step S85). If the VM instruction collection unit 1222 finds a VM instruction that is not in the list (step S86: No), it returns to step S82. If the VM instruction collection unit 1222 finds no VM instructions that are not in the list (step S86: Yes), it returns the list of VM instructions (step S87) and ends the VM instruction collection process.
  • FIG. 26 and Fig. 27 are flowcharts showing the processing procedure of the VM command operation determination process shown in Fig. 19.
  • the VM command operation determination unit 1223 receives VM execution traces as input (step S91) and creates a list of VM opcodes for each VM execution trace, along with known and unknown information (step S92).
  • the VM command operation determination unit 1223 calculates the inverse of the number of unknown VM opcodes in the list as the priority (step S93).
  • the VM command operation determination unit 1223 extracts the list with the highest priority (step S94). The VM command operation determination unit 1223 determines whether the number of unknown VM opcodes is 1 (step S95).
  • step S95 If the number of unknown VM opcodes is 1 (step S95: Yes), the VM command operation determination unit 1223 determines that the unknown VM opcode is responsible for the operation to be determined (step S96).
  • step S95 If the number of unknown VM opcodes is not 1 (step S95: No), the VM command operation determination unit 1223 determines whether the number of unknown VM opcodes is greater than 1 (step S97).
  • step S97 If the number of unknown VM opcodes is less than 1 (step S97: No), the VM instruction operation determination unit 1223 determines that the operation to be determined does not exist in the VM instruction (step S98).
  • step S97 If the number of unknown VM opcodes is greater than 1 (step S97: Yes), the VM command operation determination unit 1223 calculates the difference with other lists (step S99).
  • the VM command operation determination unit 1223 detects an unknown VM opcode specific to the list and determines that the detected VM opcode is responsible for the operation to be determined (step S100).
  • the VM command operation determination unit 1223 reflects the newly determined VM opcodes in all lists (step S101).
  • the VM command operation determination unit 1223 recalculates the priorities and updates the lists (step S102).
  • the VM command operation determination unit 1223 determines whether all lists have been processed (step S103).
  • step S104 retrieves the next list (step S104), returns to step S95, and processes the next list.
  • step S103 If all the lists have been processed (step S103: Yes), the VM command operation determination unit 1223 outputs the VM opcode and its operation as a result of the determination (step S105).
  • Fig. 28 is a flowchart showing the processing procedure of the second execution trace acquisition process shown in Fig. 19.
  • the second execution trace acquisition unit 1224 receives the second test script and the script engine binary as input (step S111). Next, the second execution trace acquisition unit 1224 receives the VM instruction for which the execution trace is to be acquired as input (step S112).
  • the second execution trace acquisition unit 1224 hooks the dispatcher of the script engine to observe the VM opcode (step S113).
  • the second execution trace acquisition unit 1224 hooks the script engine to acquire a memory access trace (step S114).
  • the second execution trace acquisition unit 1224 extracts one second test script (step S115) and inputs the second test script into the script engine for execution (step S116).
  • the second execution trace acquisition unit 1224 acquires a memory access trace from the observation of the VM opcode, limited to when the target VM instruction is being executed (step S117), and stores the acquired memory access trace as a second execution trace in the execution trace DB 131 (step S118).
  • the second execution trace acquisition unit 1224 determines whether or not all of the input second test scripts have been executed (step S119).
  • step S119: No the second execution trace acquisition unit 1224 receives the next second test script (step S120) and returns to step S115 to continue processing. If all of the input second test scripts have been executed (step S119: Yes), the second execution trace acquisition unit 1224 ends the second execution trace acquisition process.
  • VM Command Input/Output Detection Processing Next, a description will be given of the flow of the VM command input/output detection process shown in Fig. 19.
  • Fig. 29 is a flowchart showing the processing procedure of the VM command input/output detection process shown in Fig. 19.
  • the VM command input/output detection unit 1225 receives the second test script and the second execution trace as input (step S131).
  • the VM command I/O detection unit 1225 extracts all characteristic values used in the second test script (step S132). In the example of Figure 7, these are "0x12345678" and "0x12345679".
  • the VM instruction I/O detection unit 1225 extracts one value (step S133).
  • the VM instruction I/O detection unit 1225 searches for a memory read value in the second execution trace that can be compared with the extracted value (step S134).
  • the VM instruction I/O detection unit 1225 extracts the address of the code for reading memory found from the execution trace (step S135), and extracts this code as code that accesses an input value (step S136).
  • the VM command input/output detection unit 1225 determines whether all values have been processed (step S137).
  • step S137 If not all values have been processed (step S137: No), the VM command I/O detection unit 1225 extracts the next value (step S138), returns to step S134, and continues processing.
  • step S137 If all values have been processed (step S137: Yes), the VM command I/O detection unit 1225 calculates the value of the operation result of the second test script (step S139). The VM command I/O detection unit 1225 searches for a memory write value in the second execution trace that can be compared with the calculated value (step S140).
  • the VM instruction I/O detection unit 1225 extracts the address of the found memory write code from the second execution trace (step S141), and extracts this code as code that accesses an output value (step S142).
  • the VM command input/output detection unit 1225 outputs the input/output code extracted in steps S136 and S142 (step S143).
  • FIG. 30 is a flowchart showing the processing procedure of the analytic function generation process shown in Fig. 19.
  • the analysis function generation unit 1231 receives as input the analysis function source code created for each operation of the VM command determined by the VM command operation determination unit 1223 (step S151). Note that the analysis function source code input to the analysis function providing device 10 is created for each operation of the VM command in accordance with that operation.
  • the analysis function generation unit 1231 compiles the input analysis function source code to obtain the corresponding binary (step S152).
  • the analysis function generation unit 1231 stores information about the analysis function in the analysis function information DB 134 for each operation of the VM command (step S153).
  • FIG. 31 is a flowchart showing the processing procedure of the analysis function adding process shown in Fig. 19.
  • the analysis function adding unit 1232 accepts the script engine binary as input (step S161), and the first code adding unit 1233 adds a code that realizes a VM command input/output acquisition process to this script engine binary (step S162).
  • the second code adding unit 1234 adds a code that realizes an analysis process to this script engine binary (step S163).
  • Fig. 32 is a flowchart showing the processing procedure of the VM command input/output acquisition process.
  • the VM command I/O acquisition process accepts as input a code that accesses an I/O value (step S171).
  • the VM command I/O acquisition process then monitors memory reads by the code that accesses the input value (step S172).
  • the VM command I/O acquisition process monitors memory writes by the code that accesses the output value (step S173).
  • the detection results of the VM command I/O detection unit 1225 are used in the processes of steps S171 to S173.
  • step S174 it is determined whether a memory read has been observed (step S174). If a memory read has been observed (step S174: Yes), the read value is output as an input value (step S175).
  • step S174 if no memory read is observed (step S174: No), it is determined whether or not a memory write is observed (step S176).
  • step S176 If memory writing is observed (step S176: Yes), the VM command input/output acquisition process outputs the written value as the output value (step S177).
  • the VM command I/O acquisition process determines whether or not the execution of the bytecode has finished (step S178). If the execution of the bytecode has not finished (step S178: No), the VM command I/O acquisition process continues monitoring (step S179) and returns to the determination process of step S174.
  • step S178 If the bytecode execution is completed (step S178: Yes), the VM command input/output acquisition process is terminated.
  • Fig. 33 is a flowchart showing the processing procedure of the analysis process.
  • the bytecode to be analyzed is accepted as input (step S181). Then, in the analysis process, the analysis function binary is expanded in memory space by referring to the analysis function information DB 134 (step S182). In the analysis process, the address of the expanded entry point is stored in the analysis function information DB 134 (step S183).
  • the analysis process monitors the dispatcher and monitors the VM opcode being executed (step S184).
  • the analysis process uses the decoder dispatcher detected by the dispatcher detection unit 1214. Then, the analysis process starts executing the bytecode (step S185).
  • step S186 it is determined whether a VM command in the analysis function information DB 134 has been executed. If a VM command in the analysis function information DB 134 has been executed (step S186: Yes), the input of the VM command is obtained from the VM command input/output acquisition process and set as an argument of the analysis function (step S187).
  • the analysis function information DB is searched for the VM opcode, and execution is transferred to the entry point of the input analysis function (step S188).
  • the analysis function is executed to its end (step S189).
  • execution of the VM command is continued (step S190).
  • the output of the VM command is obtained from the VM command input/output acquisition process, and is set as an argument of the analysis function (step S191).
  • the analysis function information DB134 is searched for the VM opcode, and execution is transferred to the entry point of the analysis function of the output (step S192).
  • the analysis function is executed until its end (step S193).
  • step S186 If the VM instruction in the analysis function information DB134 has not been executed (step S186: No), or after processing in step S193, execution of the bytecode continues (step S194).
  • step S195 it is determined whether or not the execution of the bytecode has finished (step S195). If the execution of the bytecode has not finished (step S195: No), the process returns to step S186. If the execution of the bytecode has finished (step S195: Yes), the analysis process ends.
  • the analysis function-adding device 10 executes the first test script while monitoring the binary of the script engine, and acquires a memory access trace as a first execution trace.
  • the analysis function-adding device 10 analyzes the VM of the script engine based on the first execution trace.
  • the analysis function-adding device 10 acquires architecture information of the VPC and the dispatcher.
  • the analysis function adding device 10 analyzes the instruction set architecture, which is the system of virtual machine instructions, collects virtual machine instructions, determines the operation of the collected virtual machine instructions, and detects the input and output of VM instructions using a second test script, thereby obtaining ISA information on the operation of VM instructions and the input and output of VM instructions.
  • the analysis function-adding device 10 executes the second test script while monitoring the VPC and the dispatcher to obtain a VM execution trace.
  • the analysis function-adding device 10 analyzes this VM execution trace to collect VM instructions and creates a list of unknown virtual machine opcodes among the virtual machine opcodes, and determines the operation of the VM instructions using this list and a predetermined algorithm.
  • the analysis function-adding device 10 uses the second test script to obtain a memory access trace that targets only the VM instruction to be determined as a second execution trace, and analyzes the second execution trace to detect the input and output of the VM instructions.
  • the analysis function providing device 10 can detect various architectural information by analyzing the execution trace and VM execution trace obtained, even for script engines whose VM internal specifications are unknown, and can determine the ISA without the need for manual reverse engineering.
  • the analysis function adding device 10 monitors the execution of VM instructions as events based on the obtained VM architecture and ISA information, and realizes dynamic bytecode instrumentation by calling back to the analysis function along with the input and output values.
  • the analysis function providing device 10 can provide dynamic bytecode instrumentation functionality to script engines whose VM internal specifications are unknown, by obtaining information about the VM's architecture and ISA through analysis based on the acquisition of execution traces and VM execution traces, without requiring manual reverse engineering.
  • analysis function providing device 10 can automatically determine VM architecture and ISA information for a variety of script engines as long as a test script is prepared, so dynamic bytecode instrumentation functionality can be realized without the need for individual design or execution.
  • the analysis function adding device 10 makes it possible to perform dynamic bytecode instrumentation at the VM instruction level for scripts written in various scripting languages, enabling more detailed understanding of behavior.
  • the analysis function providing device 10 can analyze the script engine and determine the ISA, thereby enabling analysis at the VM command level for script engines in a wide variety of script languages.
  • the analysis function providing device 10 is useful for implementing dynamic bytecode instrumentation in a wide variety of script engines, and is suitable for implementing bytecode instrumentation even for scripts that are difficult to analyze at the VM instruction level due to the absence of an analysis support function or an unknown VM ISA.
  • analysis function providing device 10 analysis function providing method, and analysis function providing program according to this embodiment to provide dynamic bytecode instrumentation functionality to various script engines, it is possible to realize a tool that analyzes bytecode at the VM instruction level.
  • analysis function providing device 10 analysis function providing method, and analysis function providing program according to this embodiment, it is possible to provide dynamic bytecode instrumentation functionality to script engines that do not have support functions such as a debugger and whose internal specifications are unknown, without the need for manual individual analysis, design, and implementation.
  • the analysis function providing device 10 can create tools that analyze scripts using dynamic bytecode instrumentation, which can be expected to be used, for example, to support debugging through analysis of scripts at the bytecode level, and to analyze the detailed behavior of malware.
  • Each component of the analysis function-imparting device 10 shown in Fig. 5 is a functional concept, and does not necessarily have to be physically configured as shown in the figure.
  • the specific form of distribution and integration of the functions of the analysis function-imparting device 10 is not limited to that shown in the figure, and all or part of it can be functionally or physically distributed or integrated in any unit depending on various loads, usage conditions, etc.
  • each process performed by the analysis function-imparting device 10 may be realized, in whole or in part, by a CPU and a program that is analyzed and executed by the CPU. Furthermore, each process performed by the analysis function-imparting device 10 may be realized as hardware using wired logic.
  • [program] 34 is a diagram showing an example of a computer in which a program is executed to realize the analysis function imparting device 10.
  • the computer 1000 has, for example, a memory 1010 and a CPU 1020.
  • the computer 1000 also has a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These components are connected by a bus 1080.
  • the memory 1010 includes a ROM 1011 and a RAM 1012.
  • the ROM 1011 stores a boot program such as a BIOS (Basic Input Output System).
  • BIOS Basic Input Output System
  • the hard disk drive interface 1030 is connected to a hard disk drive 1090.
  • the disk drive interface 1040 is connected to a disk drive 1100.
  • a removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1100.
  • the serial port interface 1050 is connected to a mouse 1110 and a keyboard 1120, for example.
  • the video adapter 1060 is connected to a display 1130, for example.
  • the hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, the programs that define each process of the analysis function-imparting device 10 are implemented as program modules 1093 in which code executable by the computer 1000 is written.
  • the program modules 1093 are stored, for example, in the hard disk drive 1090.
  • a program module 1093 for executing processes similar to the functional configuration of the analysis function-imparting device 10 is stored in the hard disk drive 1090.
  • the hard disk drive 1090 may be replaced by an SSD (Solid State Drive).
  • the setting data used in the processing of the above-mentioned embodiment is stored as program data 1094, for example, in memory 1010 or hard disk drive 1090.
  • the CPU 1020 reads the program module 1093 or program data 1094 stored in memory 1010 or hard disk drive 1090 into RAM 1012 as necessary and executes it.
  • the program module 1093 and program data 1094 may not necessarily be stored in the hard disk drive 1090, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1100 or the like.
  • the program module 1093 and program data 1094 may be stored in another computer connected via a network (such as a LAN (Local Area Network), WAN (Wide Area Network)).
  • the program module 1093 and program data 1094 may then be read by the CPU 1020 from the other computer via the network interface 1070.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • Debugging And Monitoring (AREA)
PCT/JP2023/015090 2023-04-13 2023-04-13 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム Ceased WO2024214262A1 (ja)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2023/015090 WO2024214262A1 (ja) 2023-04-13 2023-04-13 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム
JP2025513733A JPWO2024214262A1 (https=) 2023-04-13 2023-04-13

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2023/015090 WO2024214262A1 (ja) 2023-04-13 2023-04-13 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム

Publications (1)

Publication Number Publication Date
WO2024214262A1 true WO2024214262A1 (ja) 2024-10-17

Family

ID=93059157

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/015090 Ceased WO2024214262A1 (ja) 2023-04-13 2023-04-13 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム

Country Status (2)

Country Link
JP (1) JPWO2024214262A1 (https=)
WO (1) WO2024214262A1 (https=)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022079840A1 (ja) * 2020-10-14 2022-04-21 日本電信電話株式会社 解析機能付与装置、解析機能付与方法および解析機能付与プログラム
WO2022180702A1 (ja) * 2021-02-24 2022-09-01 日本電信電話株式会社 解析機能付与装置、解析機能付与プログラム及び解析機能付与方法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022079840A1 (ja) * 2020-10-14 2022-04-21 日本電信電話株式会社 解析機能付与装置、解析機能付与方法および解析機能付与プログラム
WO2022180702A1 (ja) * 2021-02-24 2022-09-01 日本電信電話株式会社 解析機能付与装置、解析機能付与プログラム及び解析機能付与方法

Also Published As

Publication number Publication date
JPWO2024214262A1 (https=) 2024-10-17

Similar Documents

Publication Publication Date Title
Wen et al. Automatically inspecting thousands of static bug warnings with large language model: How far are we?
Tsantalis et al. Accurate and efficient refactoring detection in commit history
Chen et al. Coverage-directed differential testing of JVM implementations
US7975256B2 (en) Optimizing application performance through data mining
JP7517585B2 (ja) 解析機能付与装置、解析機能付与プログラム及び解析機能付与方法
Alrabaee et al. BinGold: Towards robust binary analysis by extracting the semantics of binary code as semantic flow graphs (SFGs)
US20090328002A1 (en) Analysis and Detection of Responsiveness Bugs
JP2006185211A (ja) プログラム解析装置、テスト実行装置、その解析方法及びプログラム
Escobar-Velásquez et al. Enabling mutant generation for open-and closed-source Android apps
Liu et al. PromeFuzz: A Knowledge-Driven Approach to Fuzzing Harness Generation with Large Language Models
Saumya et al. Xstressor: Automatic generation of large-scale worst-case test inputs by inferring path conditions
Paydar et al. An experimental study on flakiness and fragility of randoop regression test suites
Cesare et al. Wire--a formal intermediate language for binary analysis
KR20150111610A (ko) 스택 기반 소프트웨어 유사도 평가 방법 및 장치
JP7838662B2 (ja) 脆弱性発見装置、脆弱性発見方法及び脆弱性発見プログラム
WO2024214263A1 (ja) 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム
WO2024214262A1 (ja) 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム
JP7568128B2 (ja) 解析機能付与方法、解析機能付与装置及び解析機能付与プログラム
Alrabaee et al. Compiler provenance attribution
WO2024214261A1 (ja) 解析装置、解析方法及び解析プログラム
WO2024214260A1 (ja) 解析装置、解析方法及び解析プログラム
Nakagawa et al. How compact will my system be? A fully-automated way to calculate Loc reduced by clone refactoring
WO2024214264A1 (ja) 解析装置、解析方法及び解析プログラム
WO2024214265A1 (ja) 解析装置、解析方法及び解析プログラム
JP7800716B2 (ja) 解析機能付与装置、解析機能付与方法および解析機能付与プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23933038

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2025513733

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2025513733

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 23933038

Country of ref document: EP

Kind code of ref document: A1