WO2024079794A1 - 解析機能付与装置、解析機能付与方法および解析機能付与プログラム - Google Patents
解析機能付与装置、解析機能付与方法および解析機能付与プログラム Download PDFInfo
- Publication number
- WO2024079794A1 WO2024079794A1 PCT/JP2022/037925 JP2022037925W WO2024079794A1 WO 2024079794 A1 WO2024079794 A1 WO 2024079794A1 JP 2022037925 W JP2022037925 W JP 2022037925W WO 2024079794 A1 WO2024079794 A1 WO 2024079794A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- unit
- execution
- information
- bytecode
- architecture
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/57—Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
Definitions
- the present invention relates to an analysis function providing device, an analysis function providing method, and an analysis function providing program.
- Instrumentation is one of the techniques for analyzing software. This involves adding code with analytical functions to the program being analyzed, and obtaining information about the execution state of the target program at run time. For example, by using instrumentation to insert recording code for each instruction, it is possible to count the number of instructions executed, and by inserting similar code for each branch, it is possible to recognize the control flow that was executed.
- This type of instrumentation is an important technique that is widely used for software testing as well as cybersecurity purposes such as analyzing malware and discovering vulnerabilities.
- Instrumentation can be added statically before a program is executed, or dynamically at runtime.
- the targets of instrumentation are diverse, including source code, scripts, executable binaries (hereafter referred to as binaries), and bytecode.
- binaries executable binaries
- bytecode bytecode
- instrumentation that is dynamically added to binaries is called dynamic binary instrumentation.
- scripts being tested, but there are also increasing opportunities for malicious scripts to be used in attacks, making instrumentation of scripts important.
- the former static addition method has the problem that it can be difficult to comprehensively find the locations where analysis scripts should be added. There is also the problem that static addition may not be possible for scripts that are dynamically evaluated at runtime. Furthermore, if the script is obfuscated, comprehensive addition is also difficult, and there is also the problem that it is not effective for analyzing malicious scripts. Therefore, the latter method is generally used to dynamically add scripts to bytecode at runtime. This makes it possible to achieve instrumentation even in the above-mentioned cases.
- Non-Patent Document 1 proposes a method for implementing dynamic instrumentation for Java (registered trademark) bytecode.
- Non-Patent Document 2 proposes a method for implementing dynamic instrumentation for ActionScript3 bytecode.
- dynamic bytecode instrumentation in scripts generally requires the use of support functions such as a debugger provided by the script engine. This is because the internal specifications of the virtual machine (VM) in the script engine that controls the execution of the script are often not made public, making it difficult to monitor the execution state and transition between execution states required for instrumentation without support functions.
- VM virtual machine
- the present invention was made in consideration of the above, and aims to make it possible to provide dynamic bytecode instrumentation functionality to script engines that do not have support functions such as a debugger and whose internal specifications are unknown, without the need for manual individual analysis, design, and implementation.
- the analysis function providing device is characterized by having a first acquisition unit that analyzes the virtual machine of the script engine and acquires information about the architecture of the script engine, a second acquisition unit that acquires information about the instruction set architecture, which is the instruction system of the virtual machine, based on the information about the architecture, an extraction unit that extracts bytecode for instrumentation based on the acquired information about the architecture and the information about the instruction set architecture, and an insertion unit that inserts the extracted bytecode for instrumentation into the bytecode to be analyzed.
- FIG. 1 is a schematic diagram illustrating a schematic configuration of an analysis function imparting device according to the present embodiment.
- FIG. 2 is a diagram showing an example of a test script used for detecting a virtual program counter.
- FIG. 3 is a diagram illustrating an example of an execution trace.
- FIG. 4 is a diagram illustrating an example of a VM execution trace.
- FIG. 5 is a flowchart showing the procedure of the analysis function providing process.
- FIG. 6 is a flowchart showing the procedure of the execution trace acquisition process.
- FIG. 7 is a flowchart showing the procedure of the virtual program counter detection process.
- FIG. 8 is a flowchart illustrating a procedure for the VM instruction boundary detection process.
- FIG. 9 is a flowchart showing the procedure of the dispatcher detection process.
- FIG. 1 is a schematic diagram illustrating a schematic configuration of an analysis function imparting device according to the present embodiment.
- FIG. 2 is a diagram showing an example of a test script used for detecting
- FIG. 10 is a flowchart showing the procedure of the code cache detection process.
- FIG. 11 is a flowchart illustrating a procedure for the VM execution trace acquisition process.
- FIG. 12 is a flowchart illustrating a procedure for the VM command collection process.
- FIG. 13 is a flowchart illustrating a procedure for the VM command determination process.
- FIG. 14 is a flowchart showing the procedure of the instrumentation bytecode extraction process.
- FIG. 15 is a flowchart showing the procedure of the insertion process.
- FIG. 16 is a flowchart showing the procedure of the execution process.
- FIG. 17 is a diagram illustrating an example of a computer that executes an analysis function providing program.
- the analysis function adding device of this embodiment is applied to a script engine, executes a test script while monitoring the binary of the script engine, and acquires a branch trace and a memory access trace as an execution trace.
- the analysis function adding device then analyzes the VM based on the execution trace, and acquires a VPC (Virtual Program Counter), a dispatcher, a conditional branch flag, and a code cache, which are architectural information related to the architecture of the script engine.
- VPC Virtual Program Counter
- the analysis function providing device executes the test script while monitoring the VPC and the dispatcher to obtain a VM execution trace. By analyzing this VM execution trace, it collects VM instructions, determines the VM instructions, and obtains information on the instruction set architecture.
- the analysis function adding device extracts bytecode for instrumentation based on the obtained architecture information, embeds it into the bytecode to be analyzed, and executes it to realize dynamic bytecode instrumentation. In this way, even for script engines where the internal specifications of the VM are unknown, the analysis function adding device detects various architectural information through analysis based on the acquisition of execution traces and VM execution traces, and realizes the addition of dynamic bytecode instrumentation functionality without the need for manual reverse engineering.
- the analysis function adding device can automatically add dynamic bytecode instrumentation functionality to a variety of script engines by preparing test scripts, eliminating the need for individual design and implementation. Therefore, it is possible to perform dynamic bytecode instrumentation and perform analysis on a wide variety of script engine implementations.
- FIG. 1 is a schematic diagram illustrating a schematic configuration of an analysis function imparting device of the present embodiment.
- an analysis function imparting device 10 of the present embodiment is realized by a general-purpose computer such as a personal computer, and includes an input unit 11, a control unit 12, a storage unit 13, and an output unit 14.
- the input unit 11 is realized using input devices such as a keyboard and a mouse, and accepts information input from an operator or from outside, and inputs it to the control unit 12. For example, the input unit 11 accepts input of a test script or a virtual machine binary. The input unit 11 also accepts input of information transmitted from an external device via a telecommunications line.
- test script is a script that is input when dynamically analyzing a script engine to obtain an execution trace and a VM execution trace. This test script focuses on the number of branch instruction executions and memory reads and writes, and is used to capture differences in the behavior of the script engine that arise when the test script is executed a different number of times. This test script is prepared in advance of analysis and is created manually. Creating it requires knowledge of the specifications of the target scripting language.
- Figure 2 shows an example of a test script used for VPC detection.
- This test script uses a repetitive process (line 2).
- the test script changes the execution conditions by increasing or decreasing the number of repetitions (line 2) and the number of repeated statements (lines 3 to 5) in the test script, thereby generating differences.
- a script engine binary is an executable file that makes up a script engine.
- a script engine binary may consist of multiple executable files.
- the output unit 14 is realized by a display device such as a liquid crystal display, a printing device such as a printer, etc. For example, the output unit 14 displays the results of the analysis function imparting process described below. The output unit 14 may also output various information to an external device.
- the storage unit 13 is realized by a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk.
- the storage unit 13 stores in advance the processing program that operates the analysis function imparting device 10, data used during the execution of the processing program, etc., or stores it temporarily each time processing is performed.
- the memory unit 13 stores an execution trace database (DB) 131, a VM execution trace DB 133, and an architecture information DB 132.
- DB execution trace database
- the execution trace DB 131 and the VM execution trace DB 133 store the execution traces and VM execution traces acquired by the execution trace acquisition unit 1211 and the VM execution trace acquisition unit 1221, respectively.
- the execution trace DB 131 and the VM execution trace DB 133 are managed by the analysis function providing device 10.
- the execution trace DB 131 and the VM execution trace DB 133 may be managed by another device (such as a server), in which case the execution trace acquisition unit 1211 and the VM execution trace acquisition unit 1221 output the acquired execution traces and VM execution traces to the management server of the execution trace DB 131 and the VM execution trace DB 133, etc., via the communication interface of the output unit 14, and store them in the execution trace DB 131 and the VM execution trace DB 133.
- another device such as a server
- the execution trace acquisition unit 1211 and the VM execution trace acquisition unit 1221 output the acquired execution traces and VM execution traces to the management server of the execution trace DB 131 and the VM execution trace DB 133, etc., via the communication interface of the output unit 14, and store them in the execution trace DB 131 and the VM execution trace DB 133.
- the control unit 12 is realized using a CPU (Central Processing Unit) or an MPU (Micro Processing Unit), and executes processing programs stored in memory. As a result, the control unit 12 functions as a virtual machine analysis unit 121 (first acquisition unit), an instruction set architecture analysis unit 122 (second acquisition unit), and an instrumentation unit 123, as illustrated in FIG. 1.
- a CPU Central Processing Unit
- MPU Micro Processing Unit
- the control unit 12 functions as a virtual machine analysis unit 121 (first acquisition unit), an instruction set architecture analysis unit 122 (second acquisition unit), and an instrumentation unit 123, as illustrated in FIG. 1.
- the virtual machine analysis unit 121 (first acquisition unit) analyzes the VM of the script engine and acquires information about the architecture of the script engine. Specifically, the virtual machine analysis unit 121 executes a test script while monitoring the binary of the script engine, and acquires branch traces and memory access traces as execution traces. The analysis function providing device then analyzes the virtual machine VM based on the execution traces and acquires architecture information.
- the architecture information includes any one of a virtual program counter, a dispatcher, a conditional branch flag, or a code cache.
- the virtual machine analysis unit 121 has an execution trace acquisition unit 1211, a virtual program counter detection unit 1212, a VM instruction boundary detection unit 1213, a dispatcher detection unit 1214, and a code cache detection unit 1215.
- the execution trace acquisition unit 1211 accepts the test script and the script engine binary as input.
- the execution trace acquisition unit 1211 acquires an execution trace by executing the test script while monitoring the execution of the script engine binary.
- An execution trace consists of a branch trace and a memory access trace.
- a branch trace records the type of branch instruction at the time of execution, the branch source address, and the branch destination address.
- a memory access trace records the type of memory operation and the memory address of the operation target. It is known that branch traces and memory access traces can be acquired by instruction hooks.
- the execution trace acquired by the execution trace acquisition unit 1211 is stored in the execution trace DB 131.
- Figure 3 shows an example of an execution trace.
- An execution trace has an element called trace. Trace indicates whether the log line is a branch trace or a memory access trace.
- a branch trace log line has the format shown, for example, in lines 1 to 10 of Figure 3, and consists of three elements: type, src, and dst.
- type indicates whether the executed branch instruction was a call instruction, a jmp instruction, or a ret instruction.
- src indicates the address of the branch source, and dst indicates the address of the branch destination.
- a log line of a memory access trace has the format shown, for example, in lines 11 to 13 of Figure 3, and consists of three elements: type, target, and value.
- Type indicates whether the memory access is a read or write.
- Target indicates the memory address that is the target of the memory access. Value stores the result of the memory access.
- the virtual program counter detection unit 1212 extracts and analyzes the execution trace for the first test script stored in the execution trace DB 131 to detect the VPC.
- the virtual program counter detection unit 1212 analyzes multiple execution traces using differential execution analysis focusing on the number of memory reads and the boundaries of each VM instruction detected by the VM instruction boundary detection unit 1213 to detect the VPC.
- the virtual program counter detection unit 1212 makes use of the fact that a read into the memory that holds the VPC always occurs after the execution of each VM instruction, and detects the VPC by discovering the destination of this read.
- the virtual program counter detection unit 1212 uses differential execution analysis that focuses on the number of memory reads to detect VPCs.
- the virtual program counter detection unit 1212 compares execution traces of multiple test scripts acquired using the test scripts, and finds memories whose memory read counts change in proportion to both the increase or decrease in the number of repetitions and the number of repeated statements.
- the virtual program counter detection unit 1212 then refers to the boundaries of each VM instruction detected by the VM instruction boundary detection unit 1213, and narrows down the memory values that have been read to those that always point to the start point of the VM instruction.
- the virtual program counter detection unit 1212 detects this memory as a VPC.
- the VM instruction boundary detection unit 1213 clusters the execution trace to detect the boundaries of each VM instruction.
- the VM instruction boundary detection unit 1213 clusters the execution trace to detect clusters with a threshold or more of execution count as VM instructions. In clustering, consecutive code regions that are executed multiple times are detected. For example, executed instructions that are close to each other in code may be grouped together, common subsequences of executed code blocks may be searched for, or other methods may be used.
- the analysis function providing device 10 detects the start and end points of consecutive instruction sequences that make up the detected VM instruction as boundaries.
- the VM instruction boundaries detected here are used in VPC detection and dispatcher detection.
- the dispatcher detection unit 1214 extracts each VM instruction portion from the script engine binary based on the boundaries of the VM instructions detected by the VM instruction boundary detection unit 1213, and detects the portion with high similarity between each VM instruction as a dispatcher. To detect the portion with high similarity, for example, a sequence alignment algorithm may be used, or other methods may be used.
- the code cache detection unit 1215 receives the execution trace and VM execution trace as input, and obtains the memory area pointed to by the VPC from the VM execution trace. The code cache detection unit 1215 also obtains from the execution trace the code location that called the memory allocation function that allocated the memory area. The code cache detection unit 1215 also detects all areas allocated at that code location as code caches. The code cache detection unit 1215 then obtains from the execution trace the code locations that are writing to the code cache. The code cache detection unit 1215 also detects all areas written to at that code location as updates to the code cache, and returns the code cache and the update locations.
- the instruction set architecture analysis unit (second acquisition unit) 122 acquires information on the instruction set architecture, which is the instruction system of the VM, based on the acquired information on the architecture. Specifically, the instruction set architecture analysis unit 122 monitors the VPC and the dispatcher and analyzes the virtual machine execution trace executed in the VM to collect VM instructions, determine the VM instructions, and acquire information on the instruction set architecture.
- the instruction set architecture analysis unit 122 has a VM execution trace acquisition unit 1221, a VM instruction collection unit 1222, and a VM instruction determination unit 1223.
- the VM execution trace acquisition unit 1221 accepts a test script and a script engine binary as input.
- the VM execution trace acquisition unit 1221 executes a test script while monitoring the execution of the script engine binary, thereby acquiring a VM execution trace, which is an execution trace executed on a VM.
- the VM execution trace consists of the VPC and VM opcode for each executed VM instruction.
- the VPC can be recorded by monitoring the memory of the VPC detected by the virtual program counter detection unit 1212.
- the VM opcode here is an identifier virtually assigned to each of the pointer to the VM instruction and the VM instruction that are linked to each other.
- the VM execution trace acquired by the VM execution trace acquisition unit 1221 is stored in the VM execution trace DB 133.
- FIG. 4 is a diagram showing an example of a VM execution trace.
- FIG. 4 shows an excerpt of a portion of a VM execution trace.
- a log line of a VM execution trace is, for example, in the format shown in FIG. 4, and consists of two elements, vpc and pointer.
- vpc indicates the value of the VPC.
- pointer indicates the value of the pointer obtained from the pointer cache that points to the beginning of the VM instruction handler to be executed.
- the VM instruction collection unit 1222 accepts the VPC and the dispatcher as input.
- the VM instruction collection unit 1222 also acquires various scripts from the Internet.
- the VM instruction collection unit 1222 then executes the scripts while monitoring the VPC and the dispatcher to acquire a VM execution trace.
- the VM instruction collection unit 1222 also acquires VM instructions from the VM execution trace and adds them to a list of VM instructions. When the VM instruction collection unit 1222 finds no VM instructions that are not on the list, it returns the list of VM instructions.
- the VM instruction determination unit 1223 uses the symbol table and information on the instruction set architecture to determine the VM instruction that corresponds to the variable held in the symbol table. Specifically, the VM instruction determination unit 1223 receives as input a list of VM instructions, VM instruction boundaries, and the symbol table. The VM instruction determination unit 1223 also extracts the execution trace and VM execution trace from the execution trace DB 131.
- the VM instruction determination unit 1223 associates the executed VM instruction with the relevant portion of the execution trace from the list of VM instructions, VM instruction boundaries, execution trace, and VM execution trace.
- the VM instruction determination unit 1223 also searches for a VM instruction that reads the memory area of a value held in the symbol table from reading the memory access trace, and determines that it is a VM instruction that reads the value of a variable held in the symbol table.
- the VM instruction determination unit 1223 also searches for a VM instruction that writes to the memory area of a value held in the symbol table from reading the memory access trace, and determines that it is a VM instruction that updates the value of a variable held in the symbol table.
- the instrumentation unit 123 receives an instrumentation script and a script to be analyzed as input, and inserts the instrumentation bytecode extracted from the instrumentation script into the bytecode to be analyzed, and executes it.
- the instrumentation unit 123 has an instrumentation bytecode extraction unit 1231, an insertion unit 1232, and an execution unit 1233.
- the instrumentation bytecode extraction unit (extraction unit) 1231 extracts bytecode for instrumentation based on the acquired information about the architecture and the information about the instruction set architecture. Specifically, the instrumentation bytecode extraction unit 1231 accepts an instrumentation script as input, executes it, and extracts the bytecode written to the code cache.
- the insertion unit 1232 inserts the extracted instrumentation bytecode into the bytecode to be analyzed. Specifically, the insertion unit 1232 accepts the script to be analyzed and the extracted bytecode as input, temporarily stops the execution of the script to be analyzed, and expands the extracted bytecode into memory space. The insertion unit 1232 then outputs the memory address where the bytecode has been expanded.
- the execution unit 1233 executes the script into which the instrumentation bytecode has been inserted. In other words, the execution unit 1233 executes the script to be analyzed into which the instrumentation bytecode has been inserted, and outputs the input value in which the problem occurred as the analysis result to the output unit 14.
- the execution unit 1233 receives as input the memory address where the insertion unit 1232 expanded the bytecode.
- the execution unit 1233 also receives as input the VPC and the dispatcher and monitors them.
- the execution unit 1233 resumes the execution of the analysis target script that was temporarily stopped by the insertion unit 1232, and if the VM instruction obtained from the dispatcher is a VM instruction to be instrumented, it saves the current VPC and rewrites the VPC to the memory address where the bytecode was expanded, and executes the inserted bytecode. Furthermore, once the execution unit 1233 has executed the bytecode up to the end, it restores the VPC to the saved value. Once the execution of the analysis target script has ended, the execution unit 1233 outputs the analysis result.
- Fig. 5 is a flowchart showing the procedure of the analysis function imparting process.
- the input unit 11 receives a test script and a script engine binary as input (step S1).
- the execution trace acquisition unit 1211 performs an execution trace acquisition process in which the test script is executed while monitoring the binary of the script engine to acquire branch traces and memory access traces (step S2).
- the virtual program counter detection unit 1212 performs a virtual program counter detection process to extract and analyze the execution trace for the first test script stored in the execution trace DB 131 and discover the VPC (step S3).
- the VM instruction boundary detection unit 1213 performs a VM instruction boundary detection process to detect VM instructions and detect VM instruction boundaries (step S4).
- the dispatcher detection unit 1214 performs a dispatcher detection process to extract each VM instruction portion from the script engine binary and detect a portion with high similarity between each VM instruction as a dispatcher (step S5).
- the code cache detection unit 1215 receives the execution trace and the VM execution trace as input and performs a code cache detection process to detect the updated portion of the code cache (step S6).
- the VM execution trace acquisition unit 1221 receives a test script and a script engine binary as input, and executes the test script while monitoring the execution of the script engine binary to perform a VM execution trace acquisition process to acquire a VM execution trace (step S7).
- the VM instruction collection unit 1222 receives and monitors the VPC and dispatcher, and performs a VM instruction collection process to collect a list of VM instructions (step S8).
- the VM instruction determination unit 1223 receives a list of VM instructions, VM instruction boundaries, and a symbol table as input, and performs a VM instruction determination process to determine the VM instruction that corresponds to the symbol table (step S9).
- the instrumentation bytecode extraction unit 1231 receives the instrumentation script (step S10). Then, the instrumentation bytecode extraction unit 1231 performs an instrumentation bytecode extraction process to extract bytecode for instrumentation based on the acquired information about the architecture and the information about the instruction set architecture (step S11).
- the insertion unit 1232 also accepts a script to be analyzed (step S12).
- the insertion unit 1232 then accepts the analysis script as input and performs an insertion process to insert an instrumentation bytecode (step S13).
- the execution unit 1233 also performs an execution process to execute the analysis script into which the instrumentation bytecode has been inserted (step S14).
- the execution unit 1233 then outputs the input value in which the problem occurred to the output unit 14 (step S15). This completes the series of analysis function addition processes.
- FIG. 6 is a flowchart showing the procedure of the execution trace acquisition process shown in FIG.
- the execution trace acquisition unit 1211 receives a test script and a script engine binary as input (step S21). Then, the execution trace acquisition unit 1211 hooks the received script engine to acquire a branch trace (step S22). The execution trace acquisition unit 1211 also hooks the received script engine to acquire a memory access trace (step S23).
- the execution trace acquisition unit 1211 inputs the test script received in this state into the script engine and executes it (step S24), and stores the execution trace acquired thereby in the execution trace DB 131 (step S25).
- the execution trace acquisition unit 1211 determines whether or not all of the input test scripts have been executed (step S26). If all of the input test scripts have been executed (step S26: Yes), the execution trace acquisition unit 1211 ends the process. On the other hand, if all of the input test scripts have not been executed (step S26: No), the execution trace acquisition unit 1211 returns to the execution of the test scripts in step S24 and continues the process.
- FIG. 7 is a flowchart showing the procedure of the virtual program counter detection process shown in FIG.
- the virtual program counter detection unit 1212 extracts one execution trace by the first test script from the execution trace DB 131 (step S31). Next, the virtual program counter detection unit 1212 focuses on memory access traces among the execution traces, and counts up the number of reads for each memory read destination (step S32).
- the virtual program counter detection unit 1212 receives as input the first test script used to obtain the execution trace (step S33), and analyzes the first test script to obtain the number of repetitions and the number of repeated statements (step S34).
- the virtual program counter detection unit 1212 extracts from the execution trace DB 131 another execution trace by the first test script, which has a different number of repetitions and number of repeated statements (step S35). Then, the virtual program counter detection unit 1212 focuses on the memory access trace, and counts up the number of reads for each memory read destination (step S36). The virtual program counter detection unit 1212 also receives as input the first test script used to obtain the execution trace (step S37), and analyzes the test script to obtain the number of repetitions and the number of repeated statements (step S38).
- the virtual program counter detection unit 1212 narrows down the memory read destinations to only those whose read counts change in proportion to the number of repetitions or the increase or decrease in the number of repeated statements (step S39). Furthermore, the virtual program counter detection unit 1212 narrows down the memory read destinations narrowed down in step S39 to those whose read memory values always point to the start point of the VM instruction (step S40).
- the virtual program counter detection unit 1212 determines whether the memory read destinations have been narrowed down to only one (step S41). If the virtual program counter detection unit 1212 has not narrowed down the memory read destinations to only one (step S41: No), the process returns to step S35, where the virtual program counter detection unit 1212 retrieves the next execution trace and continues processing. On the other hand, if the virtual program counter detection unit 1212 has narrowed down the memory read destinations to only one (step S41: Yes), the virtual program counter detection unit 1212 stores the narrowed down memory read destination in the architecture information DB 132 as a virtual program counter (step S42), and ends processing.
- FIG. 8 is a flowchart showing the procedure of the VM instruction boundary detection process shown in FIG.
- the VM instruction boundary detection unit 1213 extracts execution traces from the execution trace DB 131 (step S51).
- the VM instruction boundary detection unit 1213 clusters the execution traces using a predetermined method (step S52). Any method may be used for clustering.
- the VM instruction boundary detection unit 1213 detects clusters whose execution count is equal to or exceeds a threshold as VM instructions (step S53). Then, the VM instruction boundary detection unit 1213 determines the start and end points of a sequence of consecutive instructions that constitute a VM instruction as boundaries (step S54). The VM instruction boundary detection unit 1213 outputs the VM instruction boundary as a return value (step S55), and ends the VM instruction boundary detection process.
- FIG. 9 is a flowchart showing the procedure of the dispatcher detection process shown in FIG.
- the dispatcher detection unit 1214 receives the script engine binary as an input (step S61).
- the dispatcher detection unit 1214 receives the boundaries of the VM commands from the VM command boundary detection unit 1213 (step S62).
- the dispatcher detection unit 1214 extracts each VM command portion from the script engine binary based on the boundaries of the VM commands received from the VM command boundary detection unit 1213 (step S63).
- the dispatcher detection unit 1214 calculates the similarity between the codes of each VM command using a predetermined method (step S64). Any method for calculating the similarity may be used as long as it is capable of calculating the similarity between codes.
- the dispatcher detection unit 1214 extracts the part with high similarity among all VM commands based on the similarity calculated in step S64 (step S65). The dispatcher detection unit 1214 then determines whether it is the end part of the VM command (step S66).
- step S66: No If it is not the end of the VM command (step S66: No), the dispatcher detection unit 1214 returns to step S65 and continues processing. If it is the end of the VM command (step S66: Yes), the dispatcher detection unit 1214 outputs the extracted part as a dispatcher (step S67) and ends processing.
- FIG. 10 is a flowchart showing the procedure of the conditional branch flag detection process shown in FIG.
- the code cache detection unit 1215 receives the execution trace and VM execution trace as input (step S71), and obtains the memory area indicated by the VPC from the VM execution trace (step S72). The code cache detection unit 1215 also obtains from the execution trace the code location that called the memory allocation function that allocated the memory area (step S73). The code cache detection unit 1215 also detects all areas allocated at the code location as code caches (step S74). The code cache detection unit 1215 then obtains the code location that is writing to the code cache from the execution trace (step S75). The code cache detection unit 1215 also detects all areas written at the code location as updates to the code cache (step S76), returns the code cache and the updated location (step S77), and ends the process.
- FIG. 11 is a flowchart showing the procedure of the VM execution trace acquisition process shown in FIG.
- the VM execution trace acquisition unit 1221 receives a test script and a script engine binary as input (step S81). Then, the VM execution trace acquisition unit 1221 hooks the received script engine to record the VPC and VM opcode (step S82).
- the VM execution trace acquisition unit 1221 inputs the received test script in this state into the script engine and executes it (step S83), and stores the VM execution trace acquired thereby in the VM execution trace DB 133 (step S84).
- the VM execution trace acquisition unit 1221 determines whether or not all of the input test scripts have been executed (step S85). If all of the input test scripts have been executed (step S85: Yes), the VM execution trace acquisition unit 1221 ends the process. If all of the input test scripts have not been executed (step S85: No), the VM execution trace acquisition unit 1221 returns to the execution of the test scripts in step S83 and continues the process.
- FIG. 12 is a flowchart showing the procedure of the conditional branch flag detection process shown in FIG.
- the VM command collection unit 1222 receives the VPC and the dispatcher as input (step S91).
- the VM command collection unit 1222 also acquires various scripts from the Internet (step S92).
- the VM command collection unit 1222 then executes the scripts while monitoring the VPC and the dispatcher to acquire a VM execution trace (step S93).
- the VM command collection unit 1222 also acquires VM commands from the VM execution trace and adds them to a list of VM commands (steps S94-S95).
- the VM instruction collection unit 1222 checks whether there are any VM instructions not on the list (step S96). If there are any VM instructions not on the list (step S96: No), the VM instruction collection unit 1222 returns the process to step S92. On the other hand, if there are no more VM instructions not on the list (step S96: Yes), the VM instruction collection unit 1222 returns a list of VM instructions (step S97) and ends the process.
- FIG. 13 is a flowchart showing the procedure of the VM command determination process shown in FIG.
- the VM command determination unit (determination unit) 1223 receives as input a list of VM commands, VM command boundaries, and a symbol table (steps S101 to S102).
- the symbol table manages information about variables held by the script and the values held therein, and is obtained by a predetermined method.
- the VM command determination unit 1223 also extracts the execution trace and VM execution trace from the execution trace DB 131 (step S103).
- the VM instruction determination unit 1223 associates the executed VM instruction with the relevant portion of the execution trace from the list of VM instructions, VM instruction boundaries, execution trace, and VM execution trace (step S104).
- the VM instruction determination unit 1223 also searches for a VM instruction that reads the memory area of a value held in the symbol table from the reading of the memory access trace, and determines that it is a VM instruction that reads the value of a variable held in the symbol table (steps S105 to S106).
- the VM instruction determination unit 1223 also searches for a VM instruction that writes to the memory area of a value held in the symbol table from the reading of the memory access trace, and determines that it is a VM instruction that updates the value of a variable held in the symbol table (steps S107 to S108), and ends the process.
- FIG. 14 is a flowchart showing the procedure of the instrumentation bytecode extraction process shown in FIG.
- the instrumentation bytecode extraction unit 1231 accepts an instrumentation script as input (step S111), monitors writing to the code cache (step S112), and executes the instrumentation script (step S113). If there is writing to the code cache (step S114, Yes), the instrumentation bytecode extraction unit 1231 extracts the written bytecode (step S115).
- step S114 if there is no writing to the code cache (step S114, No), the instrumentation bytecode extraction unit 1231 continues executing the instrumentation script until completion (step S116, No ⁇ S117 ⁇ S114). Also, when the instrumentation script execution is completed (step S116, Yes), the instrumentation bytecode extraction unit 1231 outputs the extracted bytecode (step S118) and ends the process.
- FIG. 15 is a flowchart showing the procedure of the insertion process shown in FIG.
- the insertion unit 1232 accepts as input the script to be analyzed and the bytecode extracted by the instrumentation bytecode extraction unit 1231 (steps S121 to S122), and executes the script to be analyzed (step S123).
- the insertion unit 1232 temporarily stops the execution of the script to be analyzed (step S124), expands the extracted bytecode into memory space (step S125), outputs the memory address where the bytecode was expanded (step S126), and ends the process.
- FIG. 16 is a flowchart showing the procedure of the execution process shown in FIG.
- the execution unit 1233 receives as input the memory address into which the bytecode has been expanded, output by the insertion unit 1232 (step S131).
- the execution unit 1233 also receives as input the VPC and the dispatcher (step S132), and monitors the VPC and the dispatcher (steps S133 to S134).
- the execution unit 1233 also resumes the execution of the analysis target script that was temporarily stopped in processing S124 of the insertion unit 1232 (step S135).
- the execution unit 1233 acquires a VM instruction from the dispatcher (step S136). If the acquired VM instruction is a VM instruction to be instrumented (step S137, Yes), the execution unit 1233 saves the current VPC (step S138). In addition, the execution unit 1233 rewrites the VPC to the memory address where the bytecode is expanded (step S139), and executes the inserted bytecode (step S140).
- the execution unit 1233 executes the inserted bytecode until the end (step S141, No -> S140), and if it has executed until the end (step S141, Yes), it restores the VPC to the value saved in step S138 (step S142) and continues execution of the script to be analyzed (step S143).
- step S137 if the acquired VM command is not a VM command to be instrumented (step S137, No), the execution unit 1233 continues execution of the script to be analyzed (step S143).
- step S144 If the execution of the script to be analyzed has not finished (step S144, No), the execution unit 1233 returns to step S136, and if the execution has finished (step S144, Yes), it outputs the analysis result (step S145) and ends the process.
- the virtual machine analysis unit 121 analyzes the VM of the script engine and acquires information about the architecture of the script engine.
- the instruction set architecture analysis unit 122 acquires information about the instruction set architecture, which is the instruction system of the virtual machine, based on the information about the architecture.
- the instrumentation bytecode extraction unit 1231 extracts bytecode for instrumentation based on the acquired information about the architecture and the information about the instruction set architecture.
- the insertion unit 1232 inserts the extracted bytecode for instrumentation into the bytecode to be analyzed.
- the information about the architecture includes any one of a virtual program counter, a dispatcher, a conditional branch flag, or a code cache.
- the second acquisition unit acquires the information about the instruction set architecture by monitoring the virtual program counter and the dispatcher and analyzing a virtual machine execution trace executed in the virtual machine.
- the execution unit 1233 also executes the script into which the instrumentation bytecode has been inserted. This makes it possible to analyze the script executed on the script engine.
- the analysis function adding device 10 can automatically add dynamic bytecode instrumentation functionality to a variety of script engines by simply preparing a test script, so dynamic bytecode instrumentation can be realized without the need for individual design or execution. Therefore, it is possible to realize and analyze dynamic bytecode instrumentation even for the implementation of a wide variety of script engines.
- the analysis function-imparting device 10 of this embodiment enables dynamic bytecode instrumentation in a wide variety of script engines, making it possible to analyze the execution state of scripts. Therefore, by imparting dynamic bytecode instrumentation functionality to various script engines, it becomes possible to analyze scripts executed on the script engines.
- a program in which the process executed by the analysis function-imparting device 10 according to the above embodiment is written in a language executable by a computer can also be created.
- the analysis function-imparting device 10 can be implemented by installing an analysis function-imparting program that executes the above analysis function-imparting process as package software or online software on a desired computer.
- the information processing device can function as the analysis function-imparting device 10 by executing the above analysis function-imparting program on an information processing device.
- the information processing device referred to here includes desktop or notebook personal computers.
- the information processing device also includes mobile communication terminals such as smartphones, mobile phones, and PHS (Personal Handyphone System), as well as slate terminals such as PDAs (Personal Digital Assistants).
- mobile communication terminals such as smartphones, mobile phones, and PHS (Personal Handyphone System), as well as slate terminals such as PDAs (Personal Digital Assistants).
- slate terminals such as PDAs (Personal Digital Assistants).
- the functions of the analysis function-imparting device 10 may be implemented on a cloud server.
- FIG. 17 is a diagram showing an example of a computer that executes an analysis function adding program.
- the computer 1000 has, for example, a memory 1010, a CPU 1020, a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These components are connected by a bus 1080.
- the memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012.
- the ROM 1011 stores a boot program such as a BIOS (Basic Input Output System).
- BIOS Basic Input Output System
- the hard disk drive interface 1030 is connected to a hard disk drive 1031.
- the disk drive interface 1040 is connected to a disk drive 1041.
- a removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1041.
- the serial port interface 1050 is connected to a mouse 1051 and a keyboard 1052, for example.
- the video adapter 1060 is connected to a display 1061, for example.
- the hard disk drive 1031 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. Each piece of information described in the above embodiment is stored, for example, in the hard disk drive 1031 or memory 1010.
- the analysis function-imparting program is stored in the hard disk drive 1031, for example, as a program module 1093 in which instructions to be executed by the computer 1000 are written. Specifically, the program module 1093 in which each process executed by the analysis function-imparting device 10 described in the above embodiment is written is stored in the hard disk drive 1031.
- data used for information processing by the analysis function-imparting program is stored as program data 1094, for example, in the hard disk drive 1031.
- the CPU 1020 reads the program module 1093 and program data 1094 stored in the hard disk drive 1031 into the RAM 1012 as necessary, and executes each of the above-mentioned procedures.
- the program module 1093 and program data 1094 related to the analysis function-imparting program are not limited to being stored in the hard disk drive 1031, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1041 or the like.
- the program module 1093 and program data 1094 related to the analysis function-imparting program may be stored in another computer connected via a network, such as a LAN or WAN (Wide Area Network), and read by the CPU 1020 via the network interface 1070.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Stored Programmes (AREA)
- Debugging And Monitoring (AREA)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2024550948A JP7800716B2 (ja) | 2022-10-11 | 2022-10-11 | 解析機能付与装置、解析機能付与方法および解析機能付与プログラム |
| PCT/JP2022/037925 WO2024079794A1 (ja) | 2022-10-11 | 2022-10-11 | 解析機能付与装置、解析機能付与方法および解析機能付与プログラム |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2022/037925 WO2024079794A1 (ja) | 2022-10-11 | 2022-10-11 | 解析機能付与装置、解析機能付与方法および解析機能付与プログラム |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024079794A1 true WO2024079794A1 (ja) | 2024-04-18 |
Family
ID=90668968
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2022/037925 Ceased WO2024079794A1 (ja) | 2022-10-11 | 2022-10-11 | 解析機能付与装置、解析機能付与方法および解析機能付与プログラム |
Country Status (2)
| Country | Link |
|---|---|
| JP (1) | JP7800716B2 (https=) |
| WO (1) | WO2024079794A1 (https=) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100153939A1 (en) * | 2008-12-12 | 2010-06-17 | Microsoft Corporation | Remapping debuggable code |
| WO2015122872A1 (en) * | 2014-02-11 | 2015-08-20 | Hewlett Packard Development Company, L.P. | Client application profiling |
| WO2022180702A1 (ja) * | 2021-02-24 | 2022-09-01 | 日本電信電話株式会社 | 解析機能付与装置、解析機能付与プログラム及び解析機能付与方法 |
-
2022
- 2022-10-11 WO PCT/JP2022/037925 patent/WO2024079794A1/ja not_active Ceased
- 2022-10-11 JP JP2024550948A patent/JP7800716B2/ja active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100153939A1 (en) * | 2008-12-12 | 2010-06-17 | Microsoft Corporation | Remapping debuggable code |
| WO2015122872A1 (en) * | 2014-02-11 | 2015-08-20 | Hewlett Packard Development Company, L.P. | Client application profiling |
| WO2022180702A1 (ja) * | 2021-02-24 | 2022-09-01 | 日本電信電話株式会社 | 解析機能付与装置、解析機能付与プログラム及び解析機能付与方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| JPWO2024079794A1 (https=) | 2024-04-18 |
| JP7800716B2 (ja) | 2026-01-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7517585B2 (ja) | 解析機能付与装置、解析機能付与プログラム及び解析機能付与方法 | |
| EP3330879B1 (en) | Vulnerability discovering device, vulnerability discovering method, and vulnerability discovering program | |
| CN109101815B (zh) | 一种恶意软件检测方法及相关设备 | |
| JP7115552B2 (ja) | 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム | |
| EP2881877A1 (en) | Program execution device and program analysis device | |
| CN111291377B (zh) | 一种应用漏洞的检测方法及系统 | |
| JP7568131B2 (ja) | 解析機能付与方法、解析機能付与装置及び解析機能付与プログラム | |
| CN114741700B (zh) | 基于符号化污点分析的公共组件库漏洞可利用性分析方法及装置 | |
| CN117828600A (zh) | Android个人信息违规收集行为动态检测方法 | |
| CN116340081B (zh) | 一种基于硬件虚拟化的riscv内存访问违例检测方法及装置 | |
| CN114386045B (zh) | 一种Web应用程序漏洞检测方法、装置及存储介质 | |
| US20220164446A1 (en) | Process wrapping method for evading anti-analysis of native codes, recording medium and device for performing the method | |
| JP7800716B2 (ja) | 解析機能付与装置、解析機能付与方法および解析機能付与プログラム | |
| JP7838662B2 (ja) | 脆弱性発見装置、脆弱性発見方法及び脆弱性発見プログラム | |
| WO2024214263A1 (ja) | 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム | |
| JP7568129B2 (ja) | 解析機能付与方法、解析機能付与装置及び解析機能付与プログラム | |
| JP7794327B2 (ja) | 解析機能付与装置、解析機能付与方法および解析機能付与プログラム | |
| JP7568128B2 (ja) | 解析機能付与方法、解析機能付与装置及び解析機能付与プログラム | |
| KR102421394B1 (ko) | 하드웨어와 소프트웨어 기반 트레이싱을 이용한 악성코드 탐지 장치 및 방법 | |
| KR102416292B1 (ko) | 안드로이드 앱 동적 분석 방법, 이를 수행하기 위한 기록 매체 및 장치 | |
| CN118916070A (zh) | 一种软件依赖关系检测方法及相关设备 | |
| Liu et al. | Detecting exploit primitives automatically for heap vulnerabilities on binary programs | |
| WO2024214260A1 (ja) | 解析装置、解析方法及び解析プログラム | |
| WO2024214261A1 (ja) | 解析装置、解析方法及び解析プログラム | |
| JP7800718B2 (ja) | 解析機能付与装置、解析機能付与方法及び解析機能付与プログラム |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22962014 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024550948 Country of ref document: JP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 22962014 Country of ref document: EP Kind code of ref document: A1 |