WO2024079800A1 - 解析機能付与装置、解析機能付与方法および解析機能付与プログラム - Google Patents

解析機能付与装置、解析機能付与方法および解析機能付与プログラム Download PDF

Info

Publication number
WO2024079800A1
WO2024079800A1 PCT/JP2022/037936 JP2022037936W WO2024079800A1 WO 2024079800 A1 WO2024079800 A1 WO 2024079800A1 JP 2022037936 W JP2022037936 W JP 2022037936W WO 2024079800 A1 WO2024079800 A1 WO 2024079800A1
Authority
WO
WIPO (PCT)
Prior art keywords
symbol table
information
instruction
unit
architecture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2022/037936
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
利宣 碓井
裕平 川古谷
誠 岩村
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NTT Inc
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP2024550954A priority Critical patent/JP7794327B2/ja
Priority to PCT/JP2022/037936 priority patent/WO2024079800A1/ja
Publication of WO2024079800A1 publication Critical patent/WO2024079800A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Prevention of errors by analysis, debugging or testing of software
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines

Definitions

  • the present invention relates to an analysis function providing device, an analysis function providing method, and an analysis function providing program.
  • One of the important techniques for analyzing software is tracking the values of variables held by a program. This involves obtaining information such as what variables are being used by the program at a given time while the program is running, and what values they hold.
  • debuggers have the ability to track information about variables and values during execution, and are widely used when developing and debugging programs. What variables exist and what values they hold are closely related to the behavior of a program, so this is an important technique in software testing and reverse engineering (see non-patent document 1).
  • scripts information about variables and the values they hold are managed by a symbol table in the script engine (also called an interpreter), and it is common to access this symbol table and obtain information through analysis support functions such as a debugger provided by the script engine.
  • the present invention has been made in consideration of the above, and aims to make it possible to provide a function for acquiring variable information, even for script engines that do not have support functions such as a debugger and have VMs whose internal specifications are unknown, without the need for manual individual analysis, design, and implementation.
  • the analysis function-imparting device is characterized by having a first acquisition unit that analyzes the virtual machine of a script engine and acquires information about the architecture of the script engine, a detection unit that detects a symbol table that holds information about variables based on the acquired information about the architecture, an analysis unit that analyzes the structure of the detected symbol table, a second acquisition unit that acquires information about the instruction set architecture, which is the system of instructions for the virtual machine, based on the acquired information about the architecture, and a determination unit that uses the analyzed symbol table and the information about the instruction set architecture to determine the instruction of the virtual machine that corresponds to the variable held by the symbol table.
  • FIG. 1 is a schematic diagram illustrating a schematic configuration of an analysis function imparting device according to the present embodiment.
  • FIG. 2 is a diagram showing an example of a test script (first test script) used for detecting a virtual program counter.
  • FIG. 3 is a diagram showing an example of a test script (second test script) used for detecting a conditional branch flag.
  • FIG. 4 is a diagram illustrating an example of an execution trace.
  • FIG. 5 illustrates an example of a VM execution trace.
  • FIG. 6 is a flowchart showing the procedure of the analysis function providing process.
  • FIG. 7 is a flowchart showing the procedure of the execution trace acquisition process.
  • FIG. 8 is a flowchart showing the procedure of the virtual program counter detection process.
  • FIG. 1 is a schematic diagram illustrating a schematic configuration of an analysis function imparting device according to the present embodiment.
  • FIG. 2 is a diagram showing an example of a test script (first test script) used for detecting a virtual program counter.
  • FIG. 9 is a flowchart illustrating a procedure for the VM instruction boundary detection process.
  • FIG. 10 is a flowchart showing the procedure of the dispatcher detection process.
  • FIG. 11 is a flowchart showing the procedure of the conditional branch flag detection process.
  • FIG. 12 is a flowchart showing the procedure of the code cache detection process.
  • FIG. 13 is a flowchart showing the procedure of the variable detection process.
  • FIG. 14 is a flowchart showing the processing procedure of the symbol table detection process.
  • FIG. 15 is a flowchart showing the processing procedure of the symbol table analysis process.
  • FIG. 16 is a flowchart illustrating the procedure of the VM execution trace acquisition process.
  • FIG. 17 is a flowchart illustrating a procedure for the VM command collection process.
  • FIG. 18 is a flowchart illustrating a procedure for the VM command determination process.
  • FIG. 19 is a flowchart showing the procedure of the hook insertion process.
  • FIG. 20 is a diagram
  • the analysis function adding device of this embodiment is applied to a script engine, executes a test script while monitoring the binary of the script engine, and acquires a branch trace and a memory access trace as an execution trace.
  • the analysis function adding device then analyzes the VM based on the execution trace, and acquires the VPC (Virtual Program Counter), the dispatcher, the conditional branch flag, and the code cache, which are architectural information related to the architecture of the script engine.
  • VPC Virtual Program Counter
  • the analysis function providing device executes the test script while monitoring the VPC and the dispatcher to obtain a VM execution trace. By analyzing this VM execution trace, it collects VM instructions, determines the VM instructions, and obtains information on the instruction set architecture.
  • the analysis function adding device detects and analyzes the symbol table based on the obtained architecture information, and adds the function of acquiring variable information of the script engine. In this way, even for script engines whose VM internal specifications are unknown, the analysis function adding device detects various architectural information by analyzing the execution trace and the acquired VM execution trace, and realizes the addition of the function of acquiring variables without the need for manual reverse engineering.
  • the analysis function adding device can automatically add a variable acquisition function to various script engines by preparing a test script, and does not require individual design or implementation. Therefore, it becomes possible to acquire variables even for scripts written in various script languages, enabling more advanced script analysis.
  • FIG. 1 is a schematic diagram illustrating a schematic configuration of an analysis function imparting device of the present embodiment.
  • an analysis function imparting device 10 of the present embodiment is realized by a general-purpose computer such as a personal computer, and includes an input unit 11, a control unit 12, a storage unit 13, and an output unit 14.
  • the input unit 11 is realized using input devices such as a keyboard and a mouse, and accepts information input from an operator or from outside, and inputs it to the control unit 12. For example, the input unit 11 accepts input of a test script or a script engine binary. The input unit 11 also accepts input of information transmitted from an external device via a telecommunications line.
  • test script is a script that is input when dynamically analyzing a script engine to obtain an execution trace and a VM execution trace.
  • This test script focuses on the number of branch instruction executions and memory reads and writes, and is used to capture differences in the behavior of the script engine that arise when the test script is executed a different number of times.
  • This test script is prepared in advance of the analysis and is created manually. This creation requires knowledge of the specifications of the target script language.
  • the test script used for VPC detection (first test script) is different from the test script used for conditional branch flag detection (second test script).
  • FIG. 2 is a diagram showing an example of a test script (first test script) used for VPC detection.
  • the first test script uses a repetitive process (line 2).
  • the first test script changes the execution conditions and generates differences by increasing or decreasing the number of repetitions (line 2) and the number of repeated statements (lines 3 to 5) in the test script.
  • FIG. 3 is a diagram showing an example of a test script (second test script) used to detect conditional branch flags.
  • the second test script uses multiple conditional branches (lines 4 to 8).
  • the branch conditions are controlled so that the multiple conditional branches are either taken or not taken in a specific order pattern (lines 1 and 5).
  • the number of conditional branches and the order pattern of branch success or failure are changed to generate differences.
  • a script engine binary is an executable file that makes up a script engine.
  • a script engine binary may consist of multiple executable files.
  • the output unit 14 is realized by a display device such as a liquid crystal display, a printing device such as a printer, etc. For example, the output unit 14 displays the results of the analysis function imparting process described below. The output unit 14 may also output various information to an external device.
  • the storage unit 13 is realized by a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk.
  • the storage unit 13 stores in advance the processing program that operates the analysis function imparting device 10, data used during the execution of the processing program, etc., or stores it temporarily each time processing is performed.
  • the memory unit 13 stores an execution trace database (DB) 131, a VM execution trace DB 133, and an architecture information DB 132.
  • DB execution trace database
  • the execution trace DB 131 and the VM execution trace DB 133 store the execution traces and VM execution traces acquired by the execution trace acquisition unit 1211 and the VM execution trace acquisition unit 1221, respectively.
  • the execution trace DB 131 and the VM execution trace DB 133 are managed by the analysis function providing device 10.
  • the execution trace DB 131 and the VM execution trace DB 133 may be managed by another device (such as a server), in which case the execution trace acquisition unit 1211 and the VM execution trace acquisition unit 1221 output the acquired execution traces and VM execution traces to the management server of the execution trace DB 131 and the VM execution trace DB 133, etc., via the communication interface of the output unit 14, and store them in the execution trace DB 131 and the VM execution trace DB 133.
  • another device such as a server
  • the execution trace acquisition unit 1211 and the VM execution trace acquisition unit 1221 output the acquired execution traces and VM execution traces to the management server of the execution trace DB 131 and the VM execution trace DB 133, etc., via the communication interface of the output unit 14, and store them in the execution trace DB 131 and the VM execution trace DB 133.
  • the control unit 12 is realized using a CPU (Central Processing Unit) or an MPU (Micro Processing Unit), and executes a processing program stored in memory. As a result, the control unit 12 functions as a virtual machine analysis unit 121 (first acquisition unit), an instruction set architecture analysis unit 122 (second acquisition unit), and an analysis function assignment unit 123 (assignment unit), as illustrated in FIG. 1.
  • a virtual machine analysis unit 121 first acquisition unit
  • an instruction set architecture analysis unit 122 second acquisition unit
  • an analysis function assignment unit 123 as illustrated in FIG. 1.
  • the virtual machine analysis unit (first acquisition unit) 121 analyzes the VM of the script engine and acquires information about the architecture of the script engine. Specifically, the virtual machine analysis unit (first acquisition unit) 121 executes a test script while monitoring the binary of the script engine, and acquires a branch trace and a memory access trace as an execution trace. The virtual machine analysis unit 121 also analyzes the VM based on the execution trace and acquires information about the architecture of the script engine.
  • the architecture information includes any one of a virtual program counter, a dispatcher, a conditional branch flag, or a code cache.
  • the virtual machine analysis unit 121 has an execution trace acquisition unit 1211, a virtual program counter detection unit 1212, a VM instruction boundary detection unit 1213, a dispatcher detection unit 1214, a conditional branch flag detection unit 1215, a code cache detection unit 1216, a variable detection unit 1217, a symbol table detection unit 1218, and a symbol table analysis unit 1219.
  • the execution trace acquisition unit 1211 accepts the test script and the script engine binary as input.
  • the execution trace acquisition unit 1211 acquires an execution trace by executing the test script while monitoring the execution of the script engine binary.
  • An execution trace consists of a branch trace and a memory access trace.
  • a branch trace records the type of branch instruction at the time of execution, the branch source address, and the branch destination address.
  • a memory access trace records the type of memory operation and the memory address of the operation target. It is known that branch traces and memory access traces can be acquired by instruction hooks.
  • the execution trace acquired by the execution trace acquisition unit 1211 is stored in the execution trace DB 131.
  • Figure 4 shows an example of an execution trace.
  • An execution trace has an element called trace. Trace indicates whether the log line is a branch trace or a memory access trace.
  • a branch trace log line has the format shown, for example, in lines 1 to 10 of Figure 4, and consists of three elements: type, src, and dst.
  • type indicates whether the executed branch instruction was a call instruction, a jmp instruction, or a ret instruction.
  • src indicates the address of the branch source, and dst indicates the address of the branch destination.
  • a log line of a memory access trace has the format shown, for example, in lines 11 to 13 of Figure 4, and consists of three elements: type, target, and value.
  • Type indicates whether the memory access is a read or write.
  • Target indicates the memory address that is the target of the memory access. Value stores the result of the memory access.
  • the virtual program counter detection unit 1212 extracts and analyzes the execution trace for the first test script stored in the execution trace DB 131 to detect the VPC.
  • the virtual program counter detection unit 1212 analyzes multiple execution traces using differential execution analysis focusing on the number of memory reads and the boundaries of each VM instruction detected by the VM instruction boundary detection unit 1213 to detect the VPC.
  • the virtual program counter detection unit 1212 makes use of the fact that a read into the memory that holds the VPC always occurs after the execution of each VM instruction, and detects the VPC by discovering the destination of this read.
  • the virtual program counter detection unit 1212 uses differential execution analysis that focuses on the number of memory reads to detect VPCs.
  • the virtual program counter detection unit 1212 compares execution traces of multiple test scripts acquired using the test scripts, and finds memories whose memory read counts change in proportion to both the increase or decrease in the number of repetitions and the number of repeated statements.
  • the virtual program counter detection unit 1212 then refers to the boundaries of each VM instruction detected by the VM instruction boundary detection unit 1213, and narrows down the memory values that have been read to those that always point to the start point of the VM instruction.
  • the virtual program counter detection unit 1212 detects this memory as a VPC.
  • the VM instruction boundary detection unit 1213 clusters the execution trace to detect the boundaries of each VM instruction.
  • the VM instruction boundary detection unit 1213 clusters the execution trace to detect clusters with a threshold or more of execution count as VM instructions. In clustering, consecutive code regions that are executed multiple times are detected. For example, executed instructions that are close to each other in code may be grouped together, common subsequences of executed code blocks may be searched for, or other methods may be used.
  • the analysis function adding device 10 detects the start and end points of consecutive instruction sequences that make up the detected VM instruction as boundaries.
  • the VM instruction boundaries detected here are used in VPC detection and dispatcher detection.
  • the dispatcher detection unit 1214 extracts each VM instruction portion from the script engine binary based on the boundaries of the VM instructions detected by the VM instruction boundary detection unit 1213, and detects the portion with high similarity between each VM instruction as a dispatcher. To detect the portion with high similarity, for example, a sequence alignment algorithm may be used, or other methods may be used.
  • the conditional branch flag detection unit 1215 extracts and analyzes the execution trace for the second test script stored in the execution trace DB 131, and discovers the conditional branch flag.
  • the VM instruction that causes a branch in the script is the branch VM instruction
  • the conditional branch flag is an area that holds a flag indicating whether or not a branch is taken at the time of a conditional branch.
  • the conditional branch flag detection unit 1215 analyzes multiple execution traces using differential execution analysis that focuses on the number of times memory is read, and detects the conditional branch flag.
  • the conditional branch flag detection unit 1215 executes conditional branches in various patterns, and detects the memory that stores the conditional branch flag by comparing the pattern of memory changes at that time with the pattern of conditional branching in the test script.
  • the code cache detection unit 1216 accepts the execution trace and VM execution trace as input, and obtains the memory area pointed to by the VPC from the VM execution trace. The code cache detection unit 1216 also obtains from the execution trace the code location that called the memory allocation function that allocated the memory area. The code cache detection unit 1216 also detects all areas allocated at that code location as code caches. The code cache detection unit 1216 then obtains from the execution trace the code locations that are writing to the code cache. The code cache detection unit 1216 also detects all areas written to at that code location as updates to the code cache, and returns the code cache and the update locations.
  • the variable detection unit 1217 accepts a test script as input, and extracts an execution trace corresponding to the test script from the execution trace DB 131. The variable detection unit 1217 also extracts pairs of variable names and values written in the test script. The variable detection unit 1217 then searches the memory access trace writing for a value whose consistency with the variable name is higher than a predetermined threshold. The variable detection unit 1217 also searches the memory access trace writing for a value whose consistency with the value is higher than a predetermined threshold. The variable detection unit 1217 then returns the variable name and value corresponding to the variable as the storage destination.
  • the symbol table detection unit (detection unit) 1218 detects a symbol table based on the obtained information about the architecture. Specifically, the symbol table detection unit 1218 accepts the script engine binary as input, and accepts the storage destination of the variables and the corresponding values. The symbol table detection unit 1218 also extracts the execution trace from the execution trace DB 131, and collects the memory reference relationship around the storage destination of the values by static analysis of the script engine binary. The symbol table detection unit 1218 also collects memory references to variable names and values at runtime from the memory access trace, and detects arrays that have references to variable names and values as symbol tables. The symbol table detection unit 1218 then identifies the code location that has reserved the memory area for the symbol table from the API trace, makes the position of the symbol table identifiable at runtime, and returns the symbol table.
  • the symbol table analysis unit (analysis unit) 1219 analyzes the structure of the detected symbol table. Specifically, the symbol table analysis unit 1219 accepts the detected symbol table, test script, and script engine binary as input. Furthermore, the symbol table analysis unit 1219 extracts the execution trace from the execution trace DB 131, performs static analysis of the script engine binary, and collects variable dependencies. The symbol table analysis unit 1219 infers the type of the structure that constitutes the symbol table, for example, by a predetermined method such as that disclosed in Non-Patent Document 1. Then, based on the memory access trace, the symbol table analysis unit 1219 identifies variables that hold type information from the co-occurrence of values held by the structures that constitute the symbol table and the types of variables secured by the test script, and returns the structure of the symbol table.
  • the instruction set architecture analysis unit (second acquisition unit) 122 acquires information on the instruction set architecture, which is the instruction system of the VM, based on the acquired information on the architecture. Specifically, the instruction set architecture analysis unit 122 monitors the VPC and the dispatcher and analyzes the VM execution trace executed in the VM to collect VM instructions, determine the VM instructions, and acquire information on the instruction set architecture.
  • the instruction set architecture analysis unit 122 has a VM execution trace acquisition unit 1221, a VM instruction collection unit 1222, and a VM instruction determination unit 1223 (determination unit).
  • the VM execution trace acquisition unit 1221 accepts a test script and a script engine binary as input.
  • the VM execution trace acquisition unit 1221 executes a test script while monitoring the execution of the script engine binary, thereby acquiring a VM execution trace, which is an execution trace executed on a VM.
  • the VM execution trace consists of the VPC and VM opcode for each executed VM instruction.
  • the VPC can be recorded by monitoring the memory of the VPC detected by the virtual program counter detection unit 1212.
  • the VM opcode here is an identifier virtually assigned to each of the pointer to the VM instruction and the VM instruction that are linked to each other.
  • the VM execution trace acquired by the VM execution trace acquisition unit 1221 is stored in the VM execution trace DB 133.
  • FIG. 5 is a diagram showing an example of a VM execution trace.
  • FIG. 5 shows an excerpt of a portion of a VM execution trace.
  • a log line of a VM execution trace is, for example, in the format shown in FIG. 5, and consists of two elements, vpc and pointer.
  • vpc indicates the value of the VPC.
  • pointer indicates the value of the pointer obtained from the pointer cache that points to the beginning of the VM instruction handler to be executed.
  • the VM instruction collection unit 1222 accepts the VPC and the dispatcher as input.
  • the VM instruction collection unit 1222 also acquires various scripts from the Internet.
  • the VM instruction collection unit 1222 then executes the scripts while monitoring the VPC and the dispatcher to acquire a VM execution trace.
  • the VM instruction collection unit 1222 also acquires VM instructions from the VM execution trace and adds them to a list of VM instructions. When the VM instruction collection unit 1222 finds no VM instructions that are not on the list, it returns the list of VM instructions.
  • the VM instruction determination unit (determination unit) 1223 uses the symbol table and information on the instruction set architecture to determine the VM instruction that corresponds to the variable held in the symbol table. Specifically, the VM instruction determination unit 1223 receives as input a list of VM instructions, VM instruction boundaries, and the symbol table. The VM instruction determination unit 1223 also extracts the execution trace and VM execution trace from the execution trace DB 131.
  • the VM instruction determination unit 1223 associates the executed VM instruction with the relevant portion of the execution trace from the list of VM instructions, VM instruction boundaries, execution trace, and VM execution trace.
  • the VM instruction determination unit 1223 also searches for a VM instruction that reads the memory area of a value held in the symbol table from reading the memory access trace, and determines that it is a VM instruction that reads the value of a variable held in the symbol table.
  • the VM instruction determination unit 1223 also searches for a VM instruction that writes to the memory area of a value held in the symbol table from reading the memory access trace, and determines that it is a VM instruction that updates the value of a variable held in the symbol table.
  • the analysis function adding unit (addition unit) 123 adds a function to the script engine to output information about the script engine variables, using the VM instructions determined to be a symbol table.
  • the hook insertion unit 1231 of the analysis function adding unit 123 accepts the symbol table, VM instructions to read and write the symbol table, and the script engine binary as input, and adds a function to output the symbol table information to the script engine binary using a hook.
  • the hook insertion unit 1231 also adds a function to output updated information to the script engine binary using a hook each time a VM instruction that updates the symbol table is executed.
  • the hook insertion unit 1231 outputs the script engine binary that has been given the function of obtaining variable information.
  • FIG 6 is a flowchart showing the procedure of the analysis function imparting process.
  • the input unit 11 receives a test script and a script engine binary as input (step S1).
  • the execution trace acquisition unit 1211 performs an execution trace acquisition process in which the test script is executed while monitoring the binary of the script engine to acquire branch traces and memory access traces (step S2).
  • the virtual program counter detection unit 1212 performs a virtual program counter detection process to extract and analyze the execution trace for the first test script stored in the execution trace DB 131 and discover the VPC (step S3).
  • the VM instruction boundary detection unit 1213 then performs a VM instruction boundary detection process to detect the VM instructions and discover the boundaries of the VM instructions (step S4).
  • the dispatcher detection unit 1214 performs a dispatcher detection process to extract each VM instruction portion from the script engine binary and detect the portion with high similarity between each VM instruction as a dispatcher (step S5).
  • the conditional branch flag detection unit 1215 performs a conditional branch flag detection process to extract and analyze the execution trace for the second test script stored in the execution trace DB 131 and discover the conditional branch flag (step S6).
  • the code cache detection unit 1216 accepts the execution trace and VM execution trace as input, and performs a code cache detection process to detect updated parts of the code cache (step S7).
  • the variable detection unit 1217 accepts the test script as input, and performs a variable detection process to detect pairs of variable names and values (step S8).
  • the symbol table detection unit 1218 accepts the script engine binary as input, and performs a symbol table detection process to detect arrays having references to variable names and values as symbol tables (step S9).
  • the symbol table analysis unit 1219 accepts the symbol table, test script, and script engine binary as input, and performs a symbol table analysis process to analyze the structure of the symbol table (step S10).
  • the VM execution trace acquisition unit 1221 receives a test script and a script engine binary as input, and executes the test script while monitoring the execution of the script engine binary to perform a VM execution trace acquisition process to acquire a VM execution trace (step S11).
  • the VM instruction collection unit 1222 receives and monitors the VPC and dispatcher, and performs a VM instruction collection process to collect a list of VM instructions (step S12).
  • the VM instruction determination unit 1223 receives a list of VM instructions, VM instruction boundaries, and a symbol table as input, and performs a VM instruction determination process to determine the VM instruction that corresponds to the symbol table (step S13).
  • the hook insertion unit 1231 receives the symbol table as input and performs a hook insertion process that adds a function to acquire variable information to the script engine binary (step S14). The hook insertion unit 1231 also outputs the script engine to which the variable information acquisition function has been added to the output unit 14 (step S15). This completes the series of analysis function addition processes.
  • FIG. 7 is a flowchart showing the procedure of the execution trace acquisition process shown in FIG.
  • the execution trace acquisition unit 1211 receives a test script and a script engine binary as input (step S21). Then, the execution trace acquisition unit 1211 hooks the received script engine to acquire a branch trace (step S22). The execution trace acquisition unit 1211 also hooks the received script engine to acquire a memory access trace (step S23).
  • the execution trace acquisition unit 1211 inputs the test script received in this state into the script engine and executes it (step S24), and stores the execution trace acquired thereby in the execution trace DB 131 (step S25).
  • the execution trace acquisition unit 1211 determines whether or not all of the input test scripts have been executed (step S26). If all of the input test scripts have been executed (step S26: Yes), the execution trace acquisition unit 1211 ends the process. On the other hand, if all of the input test scripts have not been executed (step S26: No), the execution trace acquisition unit 1211 returns to the execution of the test scripts in step S24 and continues the process.
  • FIG. 8 is a flowchart showing the procedure of the virtual program counter detection process shown in FIG.
  • the virtual program counter detection unit 1212 extracts one execution trace by the first test script from the execution trace DB 131 (step S31). Next, the virtual program counter detection unit 1212 focuses on memory access traces among the execution traces, and counts up the number of reads for each memory read destination (step S32).
  • the virtual program counter detection unit 1212 receives as input the first test script used to obtain the execution trace (step S33), and analyzes the first test script to obtain the number of repetitions and the number of repeated statements (step S34).
  • the virtual program counter detection unit 1212 extracts from the execution trace DB 131 another execution trace by the first test script, which has a different number of repetitions and number of repeated statements (step S35). Then, the virtual program counter detection unit 1212 focuses on the memory access trace, and counts up the number of reads for each memory read destination (step S36). The virtual program counter detection unit 1212 also receives as input the first test script used to obtain the execution trace (step S37), and analyzes the test script to obtain the number of repetitions and the number of repeated statements (step S38).
  • the virtual program counter detection unit 1212 narrows down the memory read destinations to only those whose read counts change in proportion to the number of repetitions or the increase or decrease in the number of repeated statements (step S39). Furthermore, the virtual program counter detection unit 1212 narrows down the memory read destinations narrowed down in step S39 to those whose read memory values always point to the start point of the VM instruction (step S40).
  • the virtual program counter detection unit 1212 determines whether the memory read destinations have been narrowed down to only one (step S41). If the virtual program counter detection unit 1212 has not narrowed down the memory read destinations to only one (step S41: No), the process returns to step S35, where the virtual program counter detection unit 1212 retrieves the next execution trace and continues processing. On the other hand, if the virtual program counter detection unit 1212 has narrowed down the memory read destinations to only one (step S41: Yes), the virtual program counter detection unit 1212 stores the narrowed down memory read destination in the architecture information DB 132 as a virtual program counter (step S42), and ends processing.
  • FIG. 9 is a flowchart showing the procedure of the VM instruction boundary detection process shown in FIG.
  • the VM instruction boundary detection unit 1213 extracts execution traces from the execution trace DB 131 (step S51).
  • the VM instruction boundary detection unit 1213 clusters the execution traces using a predetermined method (step S52). Any method may be used for clustering.
  • the VM instruction boundary detection unit 1213 detects clusters whose execution count is equal to or exceeds a threshold as VM instructions (step S53). The VM instruction boundary detection unit 1213 then determines the start and end points of a sequence of consecutive instructions that constitute a VM instruction as boundaries (step S54). The VM instruction boundary detection unit 1213 outputs the VM instruction boundary as a return value (step S55), and ends the VM instruction boundary detection process.
  • FIG. 10 is a flowchart showing the procedure of the dispatcher detection process shown in FIG.
  • the dispatcher detection unit 1214 receives the script engine binary as an input (step S61).
  • the dispatcher detection unit 1214 receives the boundaries of the VM commands from the VM command boundary detection unit 1213 (step S62).
  • the dispatcher detection unit 1214 extracts each VM command portion from the script engine binary based on the boundaries of the VM commands received from the VM command boundary detection unit 1213 (step S63).
  • the dispatcher detection unit 1214 calculates the similarity between the codes of each VM command using a predetermined method (step S64). Any method for calculating the similarity may be used as long as it is capable of calculating the similarity between codes.
  • the dispatcher detection unit 1214 extracts the part with high similarity among all VM commands based on the similarity calculated in step S64 (step S65). The dispatcher detection unit 1214 then determines whether it is the end part of the VM command (step S66).
  • step S66: No If it is not the end of the VM command (step S66: No), the dispatcher detection unit 1214 returns to step S65 and continues processing. If it is the end of the VM command (step S66: Yes), the dispatcher detection unit 1214 outputs the extracted part as a dispatcher (step S67) and ends processing.
  • FIG. 11 is a flowchart showing the procedure of the conditional branch flag detection process shown in FIG.
  • conditional branch flag detection unit 1215 extracts one execution trace by the second test script from the execution trace DB 131 (step S71). Then, the conditional branch flag detection unit 1215 focuses on the memory access trace and counts the number of reads for each memory read destination (step S72).
  • the conditional branch flag detection unit 1215 also receives as input the second test script used to obtain the execution trace (step S73), analyzes this second test script, and obtains the number of conditional branches and the True/False sequence pattern (step S74). The conditional branch flag detection unit 1215 then narrows down the memory read destinations to only those whose read count changes in proportion to the number of conditional branches (step S75). Furthermore, the conditional branch flag detection unit 1215 narrows down the memory read destinations to only those whose read memory value alternates between two values in accordance with the True/False sequence pattern (step S76).
  • the conditional branch flag detection unit 1215 determines whether the memory read destinations have been narrowed down to only one (step S77). If the conditional branch flag detection unit 1215 has not narrowed down the memory read destinations to only one (step S77: No), it returns to step S71, retrieves the next execution trace, and continues processing. On the other hand, if the conditional branch flag detection unit 1215 has narrowed down the memory read destinations to only one (step S77: Yes), it stores the narrowed down read destination in the architecture information DB 132 as a virtual program counter (step S78), and ends processing.
  • FIG. 12 is a flowchart showing the procedure of the conditional branch flag detection process shown in FIG.
  • the code cache detection unit 1216 receives the execution trace and VM execution trace as input (step S81), and obtains the memory area indicated by the VPC from the VM execution trace (step S82). The code cache detection unit 1216 also obtains from the execution trace the code location that called the memory allocation function that allocated the memory area (step S83). The code cache detection unit 1216 also detects all areas allocated at the code location as code caches (step S84). The code cache detection unit 1216 then obtains from the execution trace the code locations that are writing to the code cache (step S85). The code cache detection unit 1216 also detects all areas written at the code location as updates to the code cache (step S86), returns the code cache and the updated locations (step S87), and ends the process.
  • FIG. 13 is a flowchart showing the procedure of the conditional branch flag detection process shown in FIG.
  • the variable detection unit 1217 accepts a test script as input (step S91) and extracts an execution trace corresponding to the test script from the execution trace DB 131 (step S92). The variable detection unit 1217 also extracts a pair of variable name and value written in the test script (step S93). The variable detection unit 1217 then searches the memory access trace writing for a value whose consistency with the variable name is higher than a predetermined threshold (step S94). The variable detection unit 1217 also searches the memory access trace writing for a value whose consistency with the value is higher than a predetermined threshold (step S95). The variable detection unit 1217 then returns the variable name and value corresponding to the variable as a storage destination (step S96), and ends the process.
  • FIG. 14 is a flowchart showing the procedure of the conditional branch flag detection process shown in FIG.
  • the symbol table detection unit (extraction unit) 1218 detects the symbol table based on the obtained information about the architecture. Specifically, the symbol table detection unit 1218 accepts the script engine binary as input (step S101), and accepts the storage destination of the variable and the corresponding value (step S102). The symbol table detection unit 1218 also extracts the execution trace from the execution trace DB 131 (step S103), and collects the memory reference relationship related to the storage destination of the value by static analysis of the script engine binary (step S104). The symbol table detection unit 1218 also collects the memory reference to the variable name and value at the time of execution from the memory access trace (step S105), and detects the array having the reference to the variable name and value as the symbol table (step S106). The symbol table detection unit 1218 then identifies the code part that secures the memory area of the symbol table from the API trace, makes the position of the symbol table identifiable at the time of execution (step S107), returns the symbol table (step S108), and ends the process.
  • FIG. 15 is a flowchart showing the procedure of the conditional branch flag detection process shown in FIG.
  • the symbol table analysis unit (analysis unit) 1219 also analyzes the structure of the detected symbol table. Specifically, the symbol table analysis unit 1219 accepts the detected symbol table, test script, and script engine binary as input (steps S111 to S113). The symbol table analysis unit 1219 also extracts the execution trace from the execution trace DB 131, performs static analysis of the script engine binary, and collects variable dependencies (steps S114 to S115). The symbol table analysis unit 1219 infers the type of the structure that constitutes the symbol table using a predetermined method, such as that disclosed in Non-Patent Document 1 (step S116).
  • the symbol table analysis unit 1219 identifies variables that hold type information from the co-occurrence of values held by the structures that constitute the symbol table and the types of variables secured by the test script, returns the structure of the symbol table (steps S117 to S118), and ends the process.
  • FIG. 16 is a flowchart showing the procedure of the VM execution trace acquisition process shown in FIG.
  • the VM execution trace acquisition unit 1221 receives a test script and a script engine binary as input (step S121). Then, the VM execution trace acquisition unit 1221 hooks the received script engine to record the VPC and VM opcode (step S122).
  • the VM execution trace acquisition unit 1221 inputs the received test script in this state into the script engine for execution (step S123), and stores the VM execution trace acquired thereby in the VM execution trace DB 133 (step S124).
  • the VM execution trace acquisition unit 1221 determines whether or not all of the input test scripts have been executed (step S125). If the VM execution trace acquisition unit 1221 has not finished executing all of the input test scripts (step S125: No), it returns to the execution of the test scripts in step S123 and continues processing. On the other hand, if the VM execution trace acquisition unit 1221 has finished executing all of the input test scripts (step S125: Yes), it ends processing.
  • FIG. 17 is a flowchart showing the procedure of the conditional branch flag detection process shown in FIG.
  • the VM command collection unit 1222 receives the VPC and the dispatcher as input (step S131).
  • the VM command collection unit 1222 also acquires various scripts from the Internet (step S132).
  • the VM command collection unit 1222 then executes the scripts while monitoring the VPC and the dispatcher to acquire a VM execution trace (step S133).
  • the VM command collection unit 1222 also acquires VM commands from the VM execution trace and adds them to a list of VM commands (steps S134 to S135).
  • the VM instruction collection unit 1222 checks whether there are any VM instructions not on the list (step S136). If there are any VM instructions not on the list (step S136: No), the VM instruction collection unit 1222 returns the process to step S132. On the other hand, if there are no more VM instructions not on the list (step S136: Yes), the VM instruction collection unit 1222 returns a list of VM instructions (step S137) and ends the process.
  • FIG. 18 is a flowchart showing the procedure of the conditional branch flag detection process shown in FIG.
  • the VM instruction determination unit (determination unit) 1223 uses the symbol table and information on the instruction set architecture to determine the VM instruction that corresponds to the variable held in the symbol table. Specifically, the VM instruction determination unit 1223 receives as input a list of VM instructions, VM instruction boundaries, and the symbol table (steps S141 to S142). The VM instruction determination unit 1223 also extracts the execution trace and VM execution trace from the execution trace DB 131 (step S143).
  • the VM instruction determination unit 1223 associates the executed VM instruction with the relevant portion of the execution trace from the list of VM instructions, VM instruction boundaries, execution trace, and VM execution trace (step S144).
  • the VM instruction determination unit 1223 also searches for a VM instruction that reads the memory area of a value held by the symbol table from the reading of the memory access trace, and determines that it is a VM instruction that reads the value of a variable held by the symbol table (steps S145 to S146).
  • the VM instruction determination unit 1223 also searches for a VM instruction that writes to the memory area of a value held by the symbol table from the reading of the memory access trace, and determines that it is a VM instruction that updates the value of a variable held by the symbol table (steps S147 to S148), and ends the process.
  • FIG. 19 is a flowchart showing the procedure of the hook insertion process shown in FIG.
  • the hook insertion unit 1231 accepts as input the symbol table, VM commands to read and write the symbol table, and the script engine binary (steps S151 to S153), and adds a function to output symbol table information to the script engine binary using a hook (step S154).
  • the hook insertion unit 1231 also adds a function to output updated information to the script engine binary each time a VM command that updates the symbol table is executed (step S155).
  • the hook insertion unit 1231 then outputs the script engine binary to which the function to obtain variable information has been added (step S156), and the process ends.
  • the virtual machine analysis unit (first acquisition unit) 121 analyzes the VM of the script engine and acquires information about the architecture of the script engine.
  • the symbol table detection unit (detection unit) 1218 detects a symbol table that holds information about variables based on the acquired information about the architecture.
  • the symbol table analysis unit (analysis unit) 1219 analyzes the structure of the symbol table.
  • the instruction set architecture analysis unit (second acquisition unit) 122 acquires information about the instruction set architecture, which is the system of instructions for the virtual machine, based on the acquired information about the architecture.
  • the VM instruction determination unit (determination unit) 1223 uses the symbol table and the information about the instruction set architecture to determine the virtual machine instruction that corresponds to the variable held by the symbol table.
  • the architecture information includes any one of a virtual program counter, a dispatcher, a conditional branch flag, or a code cache.
  • the instruction set architecture analysis unit 122 obtains the instruction set architecture information by monitoring the virtual program counter and the dispatcher and analyzing the virtual machine execution trace executed in the virtual machine.
  • the analysis function providing device 10 of this embodiment is able to detect various architectural information by analysis based on the acquisition of execution traces and VM execution traces, even for script engines whose VM internal specifications are unknown. Therefore, it is possible to acquire information on variables of the script engine.
  • the analysis function adding unit 123 also uses the analyzed symbol table and the determined virtual machine instructions to add to the script engine a function for outputting variable information of the script engine. This allows the analysis function adding device 10 to add a variable information acquisition function without requiring manual reverse engineering.
  • the analysis function providing device 10 can automatically provide a variable information acquisition function to various script engines as long as a test script is prepared, so the variable information acquisition function can be provided without the need for individual design or execution. Therefore, the variable information acquisition function is possible even for scripts created in various script languages, making it possible to achieve more advanced script analysis.
  • the analysis function adding device 10 can add a function to analyze the script engine and measure code coverage, so that the function to obtain variable information can be automatically added to script engines of a wide variety of script languages.
  • the analysis function-imparting device 10 of this embodiment is useful for analyzing variable information of scripts written in a wide variety of scripting languages, and is suitable for implementing analysis even for scripts for which it is difficult to obtain variable information due to the absence of support functions such as a debugger or unknown internal specifications of the VM. Therefore, by imparting the function of obtaining variable information to various script engines, it becomes possible to analyze script variables and interpret the functions of the script more deeply.
  • a program in which the process executed by the analysis function-imparting device 10 according to the above embodiment is written in a language executable by a computer can also be created.
  • the analysis function-imparting device 10 can be implemented by installing an analysis function-imparting program that executes the above analysis function-imparting process as package software or online software on a desired computer.
  • the information processing device can function as the analysis function-imparting device 10 by executing the above analysis function-imparting program on an information processing device.
  • the information processing device referred to here includes desktop or notebook personal computers.
  • the information processing device also includes mobile communication terminals such as smartphones, mobile phones, and PHS (Personal Handyphone System), as well as slate terminals such as PDAs (Personal Digital Assistants).
  • mobile communication terminals such as smartphones, mobile phones, and PHS (Personal Handyphone System), as well as slate terminals such as PDAs (Personal Digital Assistants).
  • slate terminals such as PDAs (Personal Digital Assistants).
  • the functions of the analysis function-imparting device 10 may be implemented on a cloud server.
  • FIG. 20 is a diagram showing an example of a computer that executes an analysis function adding program.
  • the computer 1000 has, for example, a memory 1010, a CPU 1020, a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These components are connected by a bus 1080.
  • the memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012.
  • the ROM 1011 stores a boot program such as a BIOS (Basic Input Output System).
  • BIOS Basic Input Output System
  • the hard disk drive interface 1030 is connected to a hard disk drive 1031.
  • the disk drive interface 1040 is connected to a disk drive 1041.
  • a removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1041.
  • the serial port interface 1050 is connected to a mouse 1051 and a keyboard 1052, for example.
  • the video adapter 1060 is connected to a display 1061, for example.
  • the hard disk drive 1031 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. Each piece of information described in the above embodiment is stored, for example, in the hard disk drive 1031 or memory 1010.
  • the analysis function-imparting program is stored in the hard disk drive 1031, for example, as a program module 1093 in which instructions to be executed by the computer 1000 are written. Specifically, the program module 1093 in which each process executed by the analysis function-imparting device 10 described in the above embodiment is written is stored in the hard disk drive 1031.
  • data used for information processing by the analysis function-imparting program is stored as program data 1094, for example, in the hard disk drive 1031.
  • the CPU 1020 reads the program module 1093 and program data 1094 stored in the hard disk drive 1031 into the RAM 1012 as necessary, and executes each of the above-mentioned procedures.
  • the program module 1093 and program data 1094 related to the analysis function-imparting program are not limited to being stored in the hard disk drive 1031, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1041 or the like.
  • the program module 1093 and program data 1094 related to the analysis function-imparting program may be stored in another computer connected via a network, such as a LAN or WAN (Wide Area Network), and read by the CPU 1020 via the network interface 1070.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)
  • Stored Programmes (AREA)
PCT/JP2022/037936 2022-10-11 2022-10-11 解析機能付与装置、解析機能付与方法および解析機能付与プログラム Ceased WO2024079800A1 (ja)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2024550954A JP7794327B2 (ja) 2022-10-11 2022-10-11 解析機能付与装置、解析機能付与方法および解析機能付与プログラム
PCT/JP2022/037936 WO2024079800A1 (ja) 2022-10-11 2022-10-11 解析機能付与装置、解析機能付与方法および解析機能付与プログラム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/037936 WO2024079800A1 (ja) 2022-10-11 2022-10-11 解析機能付与装置、解析機能付与方法および解析機能付与プログラム

Publications (1)

Publication Number Publication Date
WO2024079800A1 true WO2024079800A1 (ja) 2024-04-18

Family

ID=90668978

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/037936 Ceased WO2024079800A1 (ja) 2022-10-11 2022-10-11 解析機能付与装置、解析機能付与方法および解析機能付与プログラム

Country Status (2)

Country Link
JP (1) JP7794327B2 (https=)
WO (1) WO2024079800A1 (https=)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022180702A1 (ja) * 2021-02-24 2022-09-01 日本電信電話株式会社 解析機能付与装置、解析機能付与プログラム及び解析機能付与方法

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022180702A1 (ja) * 2021-02-24 2022-09-01 日本電信電話株式会社 解析機能付与装置、解析機能付与プログラム及び解析機能付与方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Lecture Notes/ Software Science 37 Foundation of Software Engineering XVIII", vol. XVII, 30 November 2011, MODERN SCIENCE SOCIETY, JP, ISBN: 978-4-7649-0419-4, article HOSHI, KOICHIRO; YAMAMOTO, TETSUO; SUGIYAMA, YASUHIRO: "Java Virtual Machine to display variable historics at runtime errors", pages: 31 - 40, XP009556090 *

Also Published As

Publication number Publication date
JP7794327B2 (ja) 2026-01-06
JPWO2024079800A1 (https=) 2024-04-18

Similar Documents

Publication Publication Date Title
JP7517585B2 (ja) 解析機能付与装置、解析機能付与プログラム及び解析機能付与方法
US9026998B2 (en) Selecting relevant tests to quickly assess code stability
CN103577328B (zh) 一种应用的性能分析方法及装置
CN107239392B (zh) 一种测试方法、装置、终端及存储介质
US20150089297A1 (en) Using Crowd Experiences for Software Problem Determination and Resolution
US9152731B2 (en) Detecting a broken point in a web application automatic test case
WO2020075335A1 (ja) 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム
Jiang et al. Revealing performance issues in server-side webassembly runtimes via differential testing
US10628140B2 (en) Program code generation apparatus
EP3570173B1 (en) Equivalence verification apparatus and equivalence verification program
WO2023067668A1 (ja) 解析機能付与方法、解析機能付与装置及び解析機能付与プログラム
CN114386045B (zh) 一种Web应用程序漏洞检测方法、装置及存储介质
JP7794327B2 (ja) 解析機能付与装置、解析機能付与方法および解析機能付与プログラム
US20220164446A1 (en) Process wrapping method for evading anti-analysis of native codes, recording medium and device for performing the method
JP7800716B2 (ja) 解析機能付与装置、解析機能付与方法および解析機能付与プログラム
JP7838662B2 (ja) 脆弱性発見装置、脆弱性発見方法及び脆弱性発見プログラム
WO2024214263A1 (ja) 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム
US5956511A (en) Program development support apparatus, program development support method, and storage medium therefor
JP5578625B2 (ja) プログラム分析装置、プログラム分析方法、及びプログラム
WO2023067665A1 (ja) 解析機能付与方法、解析機能付与装置及び解析機能付与プログラム
WO2023067663A1 (ja) 解析機能付与方法、解析機能付与装置及び解析機能付与プログラム
JP7800718B2 (ja) 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム
WO2024214260A1 (ja) 解析装置、解析方法及び解析プログラム
JP7800717B2 (ja) 脆弱性発見装置、脆弱性発見方法及び脆弱性発見プログラム
KR20210155214A (ko) 하드웨어와 소프트웨어 기반 트레이싱을 이용한 악성코드 탐지 장치 및 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22962020

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2024550954

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22962020

Country of ref document: EP

Kind code of ref document: A1