WO2024069772A1 - 解析装置、解析方法および解析プログラム - Google Patents
解析装置、解析方法および解析プログラム Download PDFInfo
- Publication number
- WO2024069772A1 WO2024069772A1 PCT/JP2022/036020 JP2022036020W WO2024069772A1 WO 2024069772 A1 WO2024069772 A1 WO 2024069772A1 JP 2022036020 W JP2022036020 W JP 2022036020W WO 2024069772 A1 WO2024069772 A1 WO 2024069772A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- intermediate code
- program
- analysis
- argument
- code variable
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
Definitions
- the present invention relates to an analysis device, an analysis method, and an analysis program.
- a method is known in which, for example, an emulator or virtual machine is used to run the program to be analyzed, sequentially execute the instructions of the program to be analyzed, or hook the execution of specific parts of the program, and perform behavior monitoring or behavior analysis, which outputs the state of registers, memory, etc., when predetermined conditions are met.
- QEMU is known as an example of an emulator used in such behavior monitoring methods (for example, Non-Patent Document 1).
- QEMU is an emulator that emulates CPUs and devices of various architectures, and can emulate programs compiled for various architectures regardless of the architecture of the host, which is the execution environment in which QEMU runs.
- QEMU reads the instructions (guest code) of the program to be analyzed and converts them into architecture-independent code called intermediate code (for example, "Tiny Code Generator Intermediate Representation, i.e. TCG IR"). QEMU then converts the converted intermediate code into an instruction sequence (host code) that can be executed on the host, and executes it on the host. This host code is executed on the physical CPU.
- guest code for example, "Tiny Code Generator Intermediate Representation, i.e. TCG IR”
- TCG I/F Tiny Code Generator Interface
- the advantage of performing analysis using intermediate code is that behavior can be monitored without depending on the architecture of the program being analyzed.
- QEMU can emulate approximately 30 architectures.
- TCG I/F to create an analysis module that operates on one intermediate code layer, it becomes possible to analyze the behavior of programs being analyzed for various architectures without the need to create an analysis module for each of the approximately 30 architectures mentioned above, and modules that can analyze multiple architectures efficiently can be created.
- the analysis module in the intermediate code layer can intervene in the execution of the program being analyzed without depending on the architecture, but when dumping (obtaining values) the state of registers, memory, etc., it may be necessary to be aware of registers and pointers specific to each architecture.
- the analysis module in the intermediate code layer can intervene in the execution of each instruction without depending on the architecture of the program being analyzed, but in order to obtain information such as system call arguments (for example, register values and memory values pointed to by registers), it is necessary to be aware of the architecture of the program being analyzed, identify the location where the virtual CPU information of that architecture is stored, and extract register and memory information, etc.
- the analysis device of the present invention is characterized by having a creation unit that creates a correspondence table based on arguments, which are values output by executing a known operation included in a program that is the same as or similar to the program to be analyzed, an identifier of the known operation, and an intermediate code variable ID that identifies an intermediate code variable that holds the same value as the argument, a reading unit that obtains the intermediate code variable ID that corresponds to a predetermined operation executed by the program to be analyzed and reads out the argument identified by the intermediate code variable ID based on the correspondence table, and an extraction unit that uses the argument read by the reading unit to extract either one or both of information from a register or memory of a virtual CPU on which the program to be analyzed runs.
- the present invention has the effect of making it possible to identify intermediate code variables that correspond to the registers of a virtual CPU in the architecture of a program to be analyzed, and facilitating the extraction of register and memory information, etc.
- FIG. 1 is a diagram illustrating an example of a sample program according to the first embodiment.
- FIG. 2 is a table illustrating an example of information stored in the intermediate code variable storage unit according to the first embodiment.
- FIG. 3 is a diagram illustrating an example of a device configuration of the analysis system according to the first embodiment.
- FIG. 4 is a diagram illustrating an example of a program to be analyzed according to the first embodiment.
- FIG. 5 is a diagram illustrating an example of a flowchart of the analysis method according to the first embodiment.
- FIG. 6 is a diagram illustrating an example of a flowchart of an analysis method according to the second embodiment.
- FIG. 7 is a diagram illustrating an example of a flowchart of an analysis method according to the third embodiment.
- FIG. 8 is a diagram illustrating an example of a computer in which the analysis device and the emulator according to the embodiment are realized.
- the emulator 200 executes a known operation (e.g., a system call, a library call, etc.) included in a sample program of the same architecture as the analysis target program that is prepared in advance.
- the analysis device 100 associates and stores information for identifying the known operation obtained by executing the known operation of the above-mentioned sample program (e.g., a system call number, a library call address, etc.), arguments obtained by executing the sample program, and intermediate code variable IDs for identifying intermediate code variables that hold the same values as the arguments, to create a correspondence table.
- the analysis device 100 captures the occurrence of a specific operation of the program to be analyzed by the emulator 200 (e.g., a system call, a library call, etc.), it monitors the behavior of the program to be analyzed in more detail by reading each register value, memory value, etc. that is passed as an argument from the corresponding intermediate code variable based on the created correspondence table.
- the emulator 200 e.g., a system call, a library call, etc.
- each embodiment will be described on the assumption that a general emulator such as QEMU or Bochs is used.
- this embodiment is not limited to implementations such as QEMU or Bochs, and can be applied to virtual machines that use intermediate code in general.
- VA-Map Variable Argument Mapping
- the procedure performed by the analysis system 1 is not limited to the above-mentioned order, and other orders or procedures may be used.
- the analysis device 100 may perform a procedure such as terminating the process once after creating the VA-Map and analyzing the program to be analyzed in a different time series.
- a correspondence table created in the past may be used.
- the emulator 200 compiles (e.g., cross-compiles) sample code that includes a system call (known operation) whose system call number that identifies the system call is known, and generates a sample program (e.g., a binary program) with the same architecture as the program to be analyzed.
- a sample program e.g., a binary program
- the emulator 200 generates a sample program 2201 based on a sample code 2201a as shown in FIG. 1.
- the sample code 2201a in FIG. 1 includes a system call 1 (sys_call1) and a system call 2 (sys_call2).
- the system call 1 (sys_call1) passes three arguments ("0x12345678", “0xabcdefg", "0xdeadbeef") and calls the system call with system call number 1.
- the system call 2 passes two arguments ("0x8badf00d", "0x87654321" and calls the system call with system call number 2.
- multiple sample programs one or more) that call specific system calls as described above are prepared.
- the emulator 200 executes the above-mentioned sample program 2201. Specifically, the emulator 200 executes a known system call included in the sample code, and outputs an identifiable value as an argument to the known system call.
- the analysis device 100 captures the execution of a system call by the emulator 200.
- the method of capturing the execution of a system call in this embodiment is not particularly limited.
- the analysis device 100 may capture the execution of a system call based on a virtual address, or may capture it when an arbitrary instruction is executed or a value is referenced.
- the acquisition unit 131 of the analysis device 100 acquires the value of the intermediate code variable.
- the intermediate code variables have an intermediate code variable ID as an identifier that can uniquely distinguish each of them. For example, when system call 1 (sys_call1) included in sample code 2201a in FIG. 1 is executed, the intermediate code variable storage unit 121 of the analysis device 100 stores the "intermediate code variable ID", "intermediate code variable name”, and "value" as shown in FIG. 2.
- the creation unit 132 of the analysis device 100 uses the system call number of the system call described in the sample program 2201 and the arguments obtained by executing the sample program 2201 to identify an intermediate code variable ID that holds the same value as the argument value. Then, the creation unit 132 of the analysis device 100 uses the identified intermediate code variable ID to create a correspondence between the system call number and the intermediate code variable IDs that correspond to the n arguments associated with it.
- system call 1 system call 1
- the creation unit 132 then performs allocation by solving the collected set of constraints using a Satisfiability Modulo Theories (SMT) solver, and creates a VA-Map.
- SMT Satisfiability Modulo Theories
- the allocation method using the SMT solver is similar in the first, second, and third embodiments described below, and therefore further explanation will be omitted.
- the emulator 200 executes the program to be analyzed. Then, when the emulator 200 captures the invocation of a system call issued by the program to be analyzed, the reading unit 133 of the analysis device 100 obtains from the VA-Map "the intermediate code variable ID in which the system call number is stored, the intermediate code variable ID in which arg0 is stored, ..., the intermediate code variable ID in which argN is stored," and reads the target argument from the intermediate code variable identified by the intermediate code variable ID.
- the extraction unit 134 of the analysis device 100 uses the arguments read by the reading unit 133 to extract information such as registers or memory of the virtual CPU on which the program to be analyzed runs.
- the analysis system 1 in the first embodiment is configured to include an analysis device 100 and an emulator 200.
- the functional units of each device will be described in detail in the following items.
- the analysis device 100 has a communication unit 110, a storage unit 120, and a control unit 130.
- the analysis device 100 may include an input unit (e.g., a keyboard, a mouse, etc.) that accepts various operations, and a display unit (e.g., a display, etc.) that displays various information.
- the analysis device 100 may be a desktop personal computer, a notebook PC, a virtual PC, a smartphone, a tablet, a PDA (Personal Digital Assistant), etc., and the form of the information processing device is not limited. Next, detailed functions of each unit will be described below.
- the communication unit 110 exchanges various types of information with the emulator 200. Note that, in this embodiment, the description will be given on the assumption that the analysis device 100 exchanges information with the emulator 200 via the communication unit 110, but the information may be exchanged without going through the communication unit 110. Furthermore, the communication unit 110 may control communication via an electric communication line such as a LAN (Local Area Network) or the Internet as necessary, and may transmit and receive information bidirectionally with an external information processing device or the like.
- LAN Local Area Network
- the storage unit 120 stores data and programs necessary for various processes by the control unit 130.
- the storage unit 120 has an intermediate code variable storage unit 121 and a correspondence information storage unit 122.
- the storage unit 120 is realized by a semiconductor memory element such as a random access memory (RAM) or a flash memory, or a storage device such as a hard disk or an optical disk.
- the intermediate code variable storage unit 121 stores information about intermediate code variables. For example, as shown in Fig. 2, the intermediate code variable storage unit 121 stores information such as "intermediate code variable ID”, "intermediate code variable name”, and "value”. Note that the intermediate code variable storage unit 121 is not limited to the above-mentioned information, and may store information about other intermediate code variables.
- the correspondence information storage unit 122 stores a correspondence table created by the creation unit 132 of the analysis device 100. Specifically, the correspondence information storage unit 122 stores a VA-Map (embodiments 1 and 2) and a VR-Map (embodiment 3). Note that the correspondence information storage unit 122 is not limited to the above-mentioned correspondence tables, VA-Map and VR-Map, and may store other correspondence tables. Furthermore, the correspondence information storage unit 122 may store the correspondence table in a format that is not limited to a tabular format.
- the control unit 130 has an acquisition unit 131, a creation unit 132, a reading unit 133, and an extraction unit 134.
- the control unit 130 has an internal memory for temporarily storing programs that define various processing procedures and processing data, and may be realized by electronic circuits such as a CPU (Central Processing Unit) and an MPU (Micro Processing Unit), or integrated circuits such as an ASIC (Application Specific Integrated Circuit) and an FPGA (Field Programmable Gate Array).
- the above-mentioned CPU and internal memory are not limited to a physical CPU and physical memory, and may be a virtual CPU and virtual memory.
- the acquisition unit 131 executes a system call as a known operation, and acquires a system call argument that is an output argument and a system call number as an identifier that identifies the system call. Specifically, the acquisition unit 131 executes a system call whose system call number is known, and acquires the known system call number and the system call argument identified by the known system call number.
- the acquisition unit 131 of the analysis device 100 acquires the values of intermediate code variables when an existing operation (such as a system call or a library call) is performed.
- an existing operation such as a system call or a library call
- the information acquired by the acquisition unit 131 is not limited to the above-mentioned system call arguments, and information regarding other arguments may be acquired.
- the creation unit 132 creates a correspondence table based on the value of an argument output by executing a known operation included in a program that is the same as or similar to the analysis target program, an identifier of the known operation, and an intermediate code variable ID that identifies an intermediate code variable that holds the same value as the argument. Specifically, the creation unit 132 creates a VA-Map as a correspondence table based on a system call argument, a system call number, and an intermediate code variable ID that identifies an intermediate code variable that holds the same value as the system call argument. Note that the creation unit 132 may create a correspondence table in a format other than the above-mentioned VA-Map.
- the creation unit 132 identifies the target intermediate code variable ID (i.e., the intermediate code variable ID that holds the same value as the acquired system call argument) based on the acquired system call number and system call argument.
- the reading unit 133 acquires an intermediate code variable ID corresponding to a predetermined operation (a system call in the first embodiment) executed by the analysis target program, and reads an argument identified by the intermediate code variable ID based on the correspondence table (VA-Map).
- the reading unit 133 may read other information as necessary.
- the reading unit 133 may determine the size of the argument to be obtained and how many bytes to trace forward depending on the type of the variable. For example, when the type of the first argument is a string type (the argument type is char*), the reading unit 133 may interpret the value of the intermediate code variable corresponding to the first argument as a pointer, and output the byte sequence from the memory area pointed to by that value until a null byte appears as the argument output. Furthermore, the reading unit 133 may output the information obtained above to a file, standard output, or error output as necessary.
- the fetching unit 134 fetches either one or both of the register and memory information of the virtual CPU in which the analysis target program runs, using the arguments read by the reading unit 133. Note that the fetching unit 134 may fetch information other than the above-mentioned register or memory information.
- the emulator 200 is configured with a virtual CPU, a virtual memory, etc., and in this embodiment, has a function of executing a sample program, a program to be analyzed, etc.
- the emulator 200 is realized by a physical PC, and the sample program, the program to be analyzed, etc. run on this emulator.
- the emulator 200 has a communication unit 210, a storage unit 220, and an execution unit 230.
- the communication unit 210 exchanges various types of information with the analysis device 100. Note that, in the present embodiment, the emulator 200 will be described on the assumption that it exchanges information with the analysis device 100 via the communication unit 210, but the emulator 200 may exchange information without going through the communication unit 210.
- the storage unit 220 stores data and programs necessary for various processes by the execution unit 230. Specifically, the storage unit 220 stores a sample program 2201 and an analysis target program 2202. Note that the storage unit 220 may store programs other than the above-mentioned programs as necessary.
- sample Program 2201 The sample program 2201 stored in the storage unit 220 is a program with the same architecture as the analysis target program 2202, and is generated based on the sample code 2201a as shown in FIG.
- the analysis target program 2202 stored in the storage unit 220 is a program that is the target of analysis by the analysis system 1.
- the analysis target program 2202 is written as guest code 2202a as shown in Fig. 4.
- Fig. 4 shows how the execution unit 230 of the emulator 200 converts the guest code 2202a into intermediate code.
- the "mov eax, 0x3f" contained in the guest code 2202a shown in FIG. 4 is converted into two intermediate codes, "mov_i64 tmp0, $0x3f" and "ext32u_i64 rax, tmp0".
- the "eax” in the guest code 2202a shown in FIG. 4 represents the eax register in the virtual CPU.
- the "rax” and “tmp0" appearing in the intermediate code represent the names of intermediate code variables.
- the system call number (03xf in this case) is stored in an intermediate code variable named "rax", which is different from the rax register of the virtual CPU.
- the execution unit 230 executes the programs stored in the storage unit 220.
- the execution unit 230 includes a virtual CPU 231 and a virtual memory 232.
- the virtual CPU 231 has the function of a central processing unit (CPU) in a virtualized PC that runs an OS (Operating System), application software, etc. As shown in Fig. 3, the virtual CPU 231 has a plurality of registers, including a register 2310, a register 2311, a register 2312, and a register 231n. Note that, although four registers are shown in Fig. 3, the number of registers that the virtual CPU 231 has is not limited to four, and the virtual CPU 231 may have any other number of registers.
- the virtual memory 232 is a memory area to which virtual addresses are assigned by the OS, and an operating program accesses the virtual memory via the OS.
- the emulator 200 cross-compiles the sample code to generate a sample program with the same or similar architecture as the program to be analyzed (step S100).
- the emulator 200 executes the generated sample program (step S101).
- the acquisition unit 131 of the analysis device 100 detects the execution of a known system call (step S102).
- the acquisition unit 131 of the analysis device 100 acquires the system call number and the system call argument (step S103).
- the creation unit 132 of the analysis device 100 identifies the intermediate code variable ID that is the target (i.e., holds the same value as the acquired system call argument) based on the system call number and the system call argument (step S104). Next, the creation unit 132 of the analysis device 100 creates a constraint using the identified intermediate code variable ID (step S105). If the analysis device 100 determines that the system call execution remains, it returns to the previous step and continues processing (Yes in step S106).
- the analysis device 100 When the analysis device 100 has executed all system calls, it determines that there are no system calls remaining to be executed (No in step S106). In that case, the creation unit 132 of the analysis device 100 performs allocation using an SMT solver based on the created constraints and creates a VA-Map (step S107).
- the emulator 200 executes the program to be analyzed (step S108). Then, the reading unit 133 of the analysis device 100 detects the execution of a system call by the program to be analyzed (step S109). Next, the reading unit 133 of the analysis device 100 reads arguments from the intermediate code variable that holds the system call number of the executed system call (step S110). The extraction unit 134 of the analysis device 100 uses the read arguments to extract register or memory information, etc. (step S111), and the process ends.
- Second embodiment Creating a correspondence table based on library call arguments From here, as embodiment 2, the creation of a correspondence table based on library call arguments and the analysis of a program to be analyzed that are realized by the analysis system 1 will be described. Note that the correspondence table in embodiment 2 will also be called "VA-Map (Variable Argument Mapping)" as in embodiment 1. Note that only the differences in the procedure for creating the VA-Map will be described.
- VA-Map Variable Argument Mapping
- the sample code 2201a used in the second embodiment is similar to the sample code 2201a in the first embodiment, but is a program that implements a library call rather than a system call.
- the emulator 200 executes the sample program 2201 that implements the above-mentioned library call.
- the acquisition unit 131 of the analysis device 100 acquires the address of the library call, which is an identifier that identifies the library call, and the library call arguments.
- the acquisition unit 131 of the analysis device 100 when the value of an argument held by an intermediate code variable points to a referenceable memory area, the acquisition unit 131 of the analysis device 100 also acquires the memory value at the offset ⁇ m bytes of the corresponding memory area together with the offset value. Then, the creation unit 132 of the analysis device 100 creates a constraint including the intermediate code variable ID and the offset based on the value of each argument of the library call.
- the reading unit 133 of the analysis device 100 captures the execution of a library call of the analysis target program by the emulator 200, it reads the VA-Map. Then, when the value of the VM-MAP includes an offset, the reading unit 133 of the analysis device 100 calculates an address by adding the offset to the memory area pointed to by the argument value held by the intermediate code variable identified by the intermediate code variable ID, and outputs the value at that address as the value of that argument.
- the analysis system 1 in the second embodiment is configured to include an analysis device 100 and an emulator 200. Note that the device configuration in the second embodiment is similar to that in the first embodiment, and in this section, only the additional functions of the acquisition unit 131 and creation unit 132 will be described as differences, and detailed descriptions of the rest will be omitted.
- the acquisition unit 131 of the second embodiment executes a library call as a known operation, and acquires a library call argument which is an output argument, a library call address which is an identifier for identifying the library call, and, if a predetermined condition is satisfied, a value in memory and its offset.
- the information acquired by the acquisition unit 131 of the second embodiment is not limited to the above-mentioned library call arguments, and may acquire information related to other arguments.
- the acquisition unit 131 of the second embodiment also acquires the memory value at the offset ⁇ m bytes of the corresponding memory area.
- the creation unit 132 of the second embodiment creates a correspondence table (VA-Map in the second embodiment) based on the library call arguments, the address of the library call, the intermediate code variable ID that identifies the intermediate code variable that holds the same value as the library call argument, and the offset. Note that the creation unit 132 of the second embodiment may create a correspondence table in a format other than the above-mentioned VA-Map.
- the emulator 200 cross-compiles the sample code to generate a sample program with the same or similar architecture as the analysis target program (step S200).
- the emulator 200 executes the generated sample program (step S201).
- the acquisition unit 131 of the analysis device 100 detects the execution of a known library call (step S202).
- the acquisition unit 131 of the analysis device 100 acquires the library call arguments and the address of the library call (step S203).
- the creation unit 132 of the analysis device 100 identifies the intermediate code variable ID that is the target (i.e., holds the same value as the acquired library call argument) based on the address of the library call and the library call argument (step S204). If there is no intermediate code variable that matches the argument value, the analysis device 100 treats the value of the intermediate code variable as a pointer and searches for the relevant value at a position of ⁇ m bytes in memory. Next, the creation unit 132 of the analysis device 100 creates a constraint using the identified intermediate code variable ID and the acquired offset (step S205). If the analysis device 100 determines that the execution of the library call remains, it returns to the previous step and continues processing (Yes in step S206).
- the analysis device 100 When the analysis device 100 has executed all library calls, it determines that there are no more library calls remaining to be executed (No in step S206). In that case, the creation unit 132 of the analysis device 100 performs allocation using an SMT solver based on the created constraints, and creates a VA-Map (step S207).
- the emulator 200 executes the program to be analyzed (step S208). Then, the reading unit 133 of the analysis device 100 detects the execution of a library call by the program to be analyzed (step S209). Next, the reading unit 133 of the analysis device 100 calculates a memory address from the memory area and offset indicated by the value of the intermediate code variable, and reads the argument at the corresponding address (step S210). The extraction unit 134 of the analysis device 100 uses the read argument to extract register or memory information, etc. (step S211), and the process ends.
- the sample code 2201a used in the third embodiment is similar to the sample code 2201a in the first embodiment, but is not a system call, but a program including a function that outputs information (hereinafter referred to as "register value") held in a register of an emulator on which the program to be analyzed runs.
- the acquisition unit 131 of the analysis device 100 acquires the register values output when the emulator 200 executes the sample program 2201 compiled from the sample code 2201a used in the third embodiment.
- the creation unit 132 of the analysis device 100 then extracts values that have the register value from the values held by the intermediate code variables, and creates a constraint using the ID of the intermediate code variable.
- the creation unit 132 of the analysis device 100 identifies the intermediate code variable ID that has the same value as the register value output from the sample program 2201, and creates a VR-Map. After that, the reading unit 133 of the analysis device 100 uses the VR-Map to obtain the value of the specific register from the corresponding intermediate code variable.
- the analysis system 1 in the third embodiment is configured to include an analysis device 100 and an emulator 200. Note that the device configuration in the third embodiment is basically the same as that in the first embodiment, and in this section, only the additional functions of the acquisition unit 131 and creation unit 132 will be described as differences, and detailed descriptions of the rest will be omitted.
- the acquiring unit 131 of the third embodiment executes a program that is the same as or similar to the analysis target program, which outputs register values held in the registers of the virtual CPU, as a known operation, and acquires the register values.
- the information acquired by the acquiring unit 131 of the third embodiment is not limited to the above-mentioned register values, and the acquiring unit 131 may acquire information related to other arguments.
- the creation unit 132 of the third embodiment creates a correspondence table (VR-Map in the third embodiment) based on a register value and an intermediate code variable ID that identifies an intermediate code variable that holds the same value as the register value. Note that the creation unit 132 of the third embodiment may create a correspondence table in a format other than the above-mentioned VR-Map.
- the creation unit 132 of the third embodiment identifies an intermediate code variable ID that has the same value as the acquired register value. Furthermore, the creation unit 132 of the analysis device 100 of the third embodiment extracts the register value from the argument held by the intermediate code variable, and creates a constraint.
- the emulator 200 cross-compiles the sample code to generate a sample program with the same or similar architecture as the analysis target program (step S300).
- the emulator 200 executes the generated sample program (step S301).
- the acquisition unit 131 of the analysis device 100 detects the execution of the sample program (step S302).
- the acquisition unit 131 of the analysis device 100 acquires all register values of the virtual CPU obtained by the execution of the sample program (step S303).
- the creation unit 132 of the analysis device 100 identifies the intermediate code variable ID that has the same value as the acquired register value (step S304). Next, the creation unit 132 of the analysis device 100 searches for the register value from the values held by the intermediate code variables and creates a constraint (step S305). The creation unit 132 of the analysis device 100 performs allocation using an SMT solver based on the created constraint and creates a VR-Map (step S306).
- the emulator 200 executes the program to be analyzed (step S307). Then, the reading unit 133 of the analysis device 100 detects the execution of a system call or a library call by the program to be analyzed (step S308). The reading unit 133 of the analysis device 100 reads the target register value from the corresponding intermediate code variable (step S309). The extraction unit 134 of the analysis device 100 extracts register or memory information, etc. (step S310), and the process ends.
- an analysis module in the intermediate code layer can intervene in the execution of a program to be analyzed regardless of architecture, but when dumping the state of registers, memory, etc., it may be necessary to be aware of registers and pointers specific to each architecture.
- an x86 a general term for the instruction set architecture of Intel 8086 and microprocessors having backward compatibility therewith
- program to be analyzed analyzes the behavior of issuing a system call using "sysenter".
- the system call number indicating the type of system call is stored in the "rax" register.
- the arguments passed to this system call are stored in the order corresponding to "rdi”, “rsi”, “rdx”, “rcx”, “r8”, and "r9".
- the analysis module in the intermediate code layer captures the execution of sysenter, it needs to know the value of the rax register in order to know the type of system call.
- the number of arguments passed is also determined according to the type of system call, and these values are stored in "rdi”, “rsi”, “rdx”, “rcx”, "r8", and “r9". Therefore, in order to obtain more detailed information about the system call, it is necessary to know the values stored in these registers and the memory values to which these values point.
- the emulator converts the instructions of the program being analyzed into intermediate code
- the values of each register are stored in variables (intermediate code variables) that are defined during execution of the intermediate code and are used. Therefore, the rax register, rdi register, etc. of the virtual CPU cannot be directly accessed from the intermediate code layer.
- the analysis module in the intermediate code layer can intervene in the execution of each instruction without depending on the architecture of the program being analyzed, but in order to obtain information such as system call arguments (for example, register values or memory values pointed to by registers), it is necessary to be aware of the architecture of the program being analyzed, identify where the information of the virtual CPU of that architecture is stored, and extract the register values.
- system call arguments for example, register values or memory values pointed to by registers
- the analysis module in the intermediate code layer can access intermediate code variables, but since it does not know the correspondence between the virtual CPU registers of each architecture and the intermediate code variables, it cannot know which intermediate code variable holds the desired register value.
- the analysis device 100 and emulator 200 are characterized in that they create a correspondence table based on an argument, which is a value output by executing a known operation included in a program that is the same as or similar to the program to be analyzed, an identifier for the known operation, and an intermediate code variable ID that identifies an intermediate code variable that holds the same value as the argument, obtain an intermediate code variable ID that corresponds to a predetermined operation executed by the program to be analyzed, read out the argument identified by the intermediate code variable ID based on the correspondence table, and use the argument read by the reading unit 133 to extract either one or both of the register and memory information of the virtual CPU on which the program to be analyzed runs. Therefore, this embodiment provides the following effects.
- the analysis device 100 has the effect of making it possible to identify intermediate code variables that hold register values of a virtual CPU in the architecture of the program to be analyzed, and facilitating the extraction of register and memory information, etc.
- the analysis device 100 provides the effect of being able to create an analysis module that is independent of the architecture of the program to be analyzed, and to obtain more detailed information such as register and memory information.
- each component of each device shown in the figure is a functional concept, and does not necessarily have to be physically configured as shown in the figure.
- the specific form of distribution and integration of each device is not limited to that shown in the figure, and all or a part of it can be functionally or physically distributed and integrated in any unit according to various loads, usage conditions, etc.
- all or any part of each processing function performed by each device can be realized by a CPU and a program analyzed and executed by the CPU, or can be realized as hardware using wired logic.
- the various devices constituting the analysis device 100 and the emulator 200 can be implemented by installing an analysis program that performs the above-mentioned analysis as package software or online software on a desired computer.
- the above-mentioned analysis program can be executed by an information processing device to function as various devices constituting the analysis device 100 and the emulator 200.
- the information processing device here includes desktop or notebook personal computers.
- the information processing device also includes mobile communication terminals such as smartphones, mobile phones, and PHS (Personal Handyphone System), as well as slate terminals such as PDA (Personal Digital Assistant), etc.
- FIG. 8 is a diagram showing an example of a computer in which the various devices constituting the analysis device 100 and the emulator 200 are realized.
- the computer 1000 has, for example, a memory 1010 and a CPU 1020.
- the computer 1000 also has a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. Each of these components is connected by a bus 1080.
- the memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012.
- the ROM 1011 stores a boot program such as a BIOS (Basic Input Output System).
- BIOS Basic Input Output System
- the hard disk drive interface 1030 is connected to a hard disk drive 1090.
- the disk drive interface 1040 is connected to a disk drive 1100.
- a removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1100.
- the serial port interface 1050 is connected to a mouse 1110 and a keyboard 1120, for example.
- the video adapter 1060 is connected to a display 1130, for example.
- Hard disk drive 1090 stores, for example, OS 1091, application program 1092, program module 1093, and program data 1094. That is, the programs that define the processes of the various devices that constitute analysis device 100 and emulator 200 are implemented as program modules 1093 in which computer-executable code is written. Program modules 1093 are stored, for example, in hard disk drive 1090. For example, program modules 1093 for executing processes similar to the functional configurations of the various devices that constitute analysis device 100 and emulator 200 are stored in hard disk drive 1090. Note that hard disk drive 1090 may be replaced by an SSD (Solid State Drive).
- SSD Solid State Drive
- the setting data used in the processing of the above-mentioned embodiment is stored as program data 1094, for example, in memory 1010 or hard disk drive 1090.
- the CPU 1020 reads out the program module 1093 or program data 1094 stored in memory 1010 or hard disk drive 1090 into RAM 1012 as necessary, and executes the processing of the above-mentioned embodiment.
- the program module 1093 and program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and program data 1094 may be stored in another computer connected via a network (such as a LAN or a WAN (Wide Area Network)). The program module 1093 and program data 1094 may then be read by the CPU 1020 from the other computer via the network interface 1070.
- a network such as a LAN or a WAN (Wide Area Network)
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2022/036020 WO2024069772A1 (ja) | 2022-09-27 | 2022-09-27 | 解析装置、解析方法および解析プログラム |
JP2024548892A JPWO2024069772A1 (enrdf_load_stackoverflow) | 2022-09-27 | 2022-09-27 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2022/036020 WO2024069772A1 (ja) | 2022-09-27 | 2022-09-27 | 解析装置、解析方法および解析プログラム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024069772A1 true WO2024069772A1 (ja) | 2024-04-04 |
Family
ID=90476658
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2022/036020 WO2024069772A1 (ja) | 2022-09-27 | 2022-09-27 | 解析装置、解析方法および解析プログラム |
Country Status (2)
Country | Link |
---|---|
JP (1) | JPWO2024069772A1 (enrdf_load_stackoverflow) |
WO (1) | WO2024069772A1 (enrdf_load_stackoverflow) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017517821A (ja) * | 2014-06-13 | 2017-06-29 | ザ・チャールズ・スターク・ドレイパー・ラボラトリー・インコーポレイテッド | ソフトウェアアーチファクトのデータベースのためのシステム及び方法 |
US20210312048A1 (en) * | 2019-08-28 | 2021-10-07 | Palo Alto Networks, Inc. | Analyzing multiple cpu architecture malware samples |
-
2022
- 2022-09-27 JP JP2024548892A patent/JPWO2024069772A1/ja active Pending
- 2022-09-27 WO PCT/JP2022/036020 patent/WO2024069772A1/ja active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017517821A (ja) * | 2014-06-13 | 2017-06-29 | ザ・チャールズ・スターク・ドレイパー・ラボラトリー・インコーポレイテッド | ソフトウェアアーチファクトのデータベースのためのシステム及び方法 |
US20210312048A1 (en) * | 2019-08-28 | 2021-10-07 | Palo Alto Networks, Inc. | Analyzing multiple cpu architecture malware samples |
Non-Patent Citations (1)
Title |
---|
DOVGALYUK PAVEL PAVEL.DOVGALUK@GMAIL.COM; FURSOVA NATALIA NATALIA.FURSOVA@ISPRAS.RU; VASILIEV IVAN IVAN.VASILIEV@ISPRAS.RU; MAKARO: "QEMU-based framework for non-intrusive virtual machine instrumentation and introspection", PROCEEDINGS OF THE 18TH ACM/IFIP/USENIX MIDDLEWARE CONFERENCE, ACMPUB27, NEW YORK, NY, USA, 21 August 2017 (2017-08-21) - 24 October 2017 (2017-10-24), New York, NY, USA, pages 944 - 948, XP058698267, ISBN: 978-1-4503-5525-4, DOI: 10.1145/3106237.3122817 * |
Also Published As
Publication number | Publication date |
---|---|
JPWO2024069772A1 (enrdf_load_stackoverflow) | 2024-04-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110096338B (zh) | 智能合约执行方法、装置、设备及介质 | |
Zheng et al. | Efficient greybox fuzzing of applications in Linux-based IoT devices via enhanced user-mode emulation | |
Gui et al. | Firmcorn: Vulnerability-oriented fuzzing of iot firmware via optimized virtual execution | |
Fu et al. | Space traveling across vm: Automatically bridging the semantic gap in virtual machine introspection via online kernel data redirection | |
CN111324396A (zh) | 一种区块链智能合约执行方法、装置及设备 | |
US10275595B2 (en) | System and method for characterizing malware | |
EP2985716B1 (en) | Information processing device and identifying method | |
CN115062309A (zh) | 一种新型电力系统下基于设备固件仿真的漏洞挖掘方法及存储介质 | |
CN114398172A (zh) | 资源配置方法、装置、电子设备及计算机可读存储介质 | |
US10129275B2 (en) | Information processing system and information processing method | |
CN114356779A (zh) | 编程语言调试方法、装置及终端设备 | |
GB2616340A (en) | Methods, systems, and computer readable media for customizing data plane pipeline processing using Berkeley packet filter (BPF) hook entry points | |
US8407787B1 (en) | Computer apparatus and method for non-intrusive inspection of program behavior | |
JP5952218B2 (ja) | 情報処理装置および情報処理方法 | |
CN111176663A (zh) | 应用程序的数据处理方法、装置、设备及存储介质 | |
CN104007956B (zh) | 一种操作系统进程识别跟踪及信息获取的方法和装置 | |
CN113220586A (zh) | 一种自动化的接口压力测试执行方法、装置和系统 | |
WO2024069772A1 (ja) | 解析装置、解析方法および解析プログラム | |
CN104572482B (zh) | 一种过程变量的存储方法及装置 | |
CN110597707A (zh) | 一种内存越界故障检测方法及终端设备 | |
US11886589B2 (en) | Process wrapping method for evading anti-analysis of native codes, recording medium and device for performing the method | |
US11599342B2 (en) | Pathname independent probing of binaries | |
CN116126690A (zh) | 一种用于轻量级嵌入式系统的调试方法及系统 | |
US9632757B2 (en) | Custom class library generation method and apparatus | |
Josse | Malware dynamic recompilation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22960836 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2024548892 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |