WO2023228288A1

WO2023228288A1 - Detection device, detection method, and detection program

Info

Publication number: WO2023228288A1
Application number: PCT/JP2022/021305
Authority: WO
Inventors: 稜久保田; 利宣碓井; 裕平川古谷; 誠岩村
Original assignee: 日本電信電話株式会社
Priority date: 2022-05-24
Filing date: 2022-05-24
Publication date: 2023-11-30

Abstract

In the present invention, a detection device (10) dynamically analyzes a malware executable file and creates a malware execution log. Next, the detection device (10) creates a query graph using the malware executable file, in which a series of malware execution processes is indicated by nodes and edges. The detection device (10) also acquires a monitoring log that indicates a series of execution processes at the terminal being monitored and creates a provenance graph using the acquired monitoring log, in which the series of execution processes of the terminal being monitored is indicated by nodes and edges. Then, the detection device (10) matches the provenance graph with one or more malware query graphs so as to detect malware.

Description

Detection device, detection method, and detection program

The present invention relates to a malware detection device, a detection method, and a detection program.

Traditionally, Endpoint Detection & Response (EDR) has been widely implemented to detect malware attacks on PC terminals on corporate networks. With EDR, an agent installed on a device constantly monitors the device using Indicators of Compromise (IOC), which are detection rules for traces left when malware is executed.

In general, it is desirable for EDR to not falsely detect benign programs and overlook malware. If there are many false positives, the cost of human response will increase, and if something is missed, it will lead to more damage. EIGER (see Non-Patent Document 1) is a technology that automatically generates an IOC from a malware execution log. By injecting a large number of IOCs generated by EIGER into existing EDR products, a wide range of malware can be detected.

Additionally, POIROT (see Non-Patent Document 2) is a method that uses graphs to detect malware on a terminal with high accuracy. POIROT uses graph pattern matching to detect the same (or variant) malware from past logs by manually creating a graph called a query graph that expresses the relationships between traces left behind by malware.

While IOC can only express conditions for a single trace or conditions that combine multiple traces with AND or OR, the above query graph can also express dependencies between traces. Therefore, matching using a query graph allows malware to be detected with high accuracy.

Although POIROT is capable of highly accurate malware detection, it requires experts to manually create query graphs. Additionally, new malware appears every day, and creating query graphs for a wide range of malware is extremely costly. Therefore, there was a problem in that it was difficult to detect the latest malware with high accuracy and over a wide range. Therefore, an object of the present invention is to detect the latest malware with high accuracy and over a wide range.

In order to solve the above-mentioned problems, the present invention uses a malware analysis unit that creates an execution log indicating a series of execution processes of the malware by analyzing an executable file of the malware, and an execution log of the malware, A first graph for creating a first graph in which the subject and target of operations in a series of execution processes of the malware are nodes, and the operations performed by the subject of the operation on the target of the operation are represented by edges connecting the nodes. Using the graph creation unit of a second graph creation unit that creates a second graph in which an operation to be performed on the target is represented by edges connecting the nodes; The method further includes: a matching unit that performs matching; and a detection result output unit that detects malware based on a matching rate between the second graph and one or more of the first graphs and outputs the result of the detection. Features.

According to the present invention, the latest malware can be detected with high precision and over a wide range.

FIG. 1 is a diagram illustrating an overview of a detection device. FIG. 2 is a diagram showing an example of the configuration of the detection device. FIG. 3 is a flowchart illustrating an example of processing executed by the malware analysis section of FIG. 2. FIG. 4 is a diagram showing an example of an execution log. FIG. 5 is a flowchart illustrating an example of processing executed by the first graph creation section in FIG. FIG. 6 is a diagram showing an example of a query graph created by the first graph creation section in FIG. FIG. 7 is a flowchart illustrating an example of a procedure in which the detection device creates a query graph. FIG. 8 is a flowchart illustrating an example of a procedure in which the detection device detects malware. FIG. 9 is a diagram showing an example of the configuration of a computer that executes the detection program.

Hereinafter, modes for carrying out the present invention (embodiments) will be described with reference to the drawings. The present invention is not limited to this embodiment.

[overview]
First, the outline of the detection device 10 of this embodiment will be explained using FIG. The detection device 10 automatically creates a query graph from the executable file of malware, and uses the created query graph to detect malware on a terminal to be monitored.

The detection device 10 includes a creation unit 131 that creates a query graph from a malware executable file, and a detection unit 134 that performs malware detection using the query graph.

The creation unit 131 includes a malware analysis unit 132 and a first graph creation unit 133. The malware analysis unit 132 dynamically analyzes the execution file of the malware and creates a log (execution log) indicating the behavior of the malware. The first graph creation unit 133 creates a query graph using the malware execution log.

The detection unit 134 includes a second graph creation unit 135 and a matching unit 136. The second graph creation unit 135 acquires the monitoring log of the terminal to be monitored, and creates a graph called a provenance graph. The matching unit 136 detects malware on the monitored terminal by matching the above-mentioned provenance graph with the query graph. When the detection unit 134 detects malware from the monitored terminal, it outputs an alert or the like.

In this way, the detection device 10 automatically creates a malware query graph and detects malware, so it can detect the latest malware with high accuracy and over a wide range.

[Configuration example]
A configuration example of the detection device 10 will be described using FIG. 2. The detection device 10 includes, for example, an input/output section 11, a storage section 12, and a control section 13.

The input/output unit 11 is an interface that controls input and output of various data. For example, the input/output unit 11 receives inputs such as executable files of malware and monitoring logs of terminals to be monitored. In addition, the input/output unit 11 outputs the results of malware detection by the control unit 13 and the like.

The storage unit 12 is realized by a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk.

The storage unit 12 stores, for example, processing programs for realizing the functions of the detection device 10, execution files of malware, monitoring logs of terminals to be monitored, and the like.

The control unit 13 is realized using, for example, a CPU (Central Processing Unit). The control unit 13 functions as the creation unit 131 and the detection unit 134 by executing the processing program stored in the storage unit 12.

[Creation Department]
The creation unit 131 includes a malware analysis unit 132 and a first graph creation unit 133.

[Malware Analysis Department]
Upon receiving input of an executable file of malware, the malware analysis unit 132 executes the executable file and monitors its behavior, thereby creating an execution log indicating a series of processes executed by the malware. Details of the malware analysis unit 132 will be explained using FIG. 3.

For example, as shown in FIG. 3, when the malware analysis unit 132 acquires a malware sample (malware execution file) from the storage unit 12 (S1), the malware analysis unit 132 executes the malware sample for a predetermined period of time in an isolated environment (S2).

The malware analysis unit 132 uses an API (Application Programming Interface) tracer to monitor API calls (for example, Win32 API of Windows (registered trademark) OS (Operating System), syscall of Linux (registered trademark), etc.), and Each time the API is called, information regarding the call is output to the execution log file (S3). For example, the malware analysis unit 132 monitors API calls by using an API tracer that can monitor calls from malware to an OS API along with information on arguments and return values. Then, the malware analysis unit 132 outputs information regarding the call obtained through monitoring to an execution log file.

The APIs to be monitored here include, for example, reading/writing files, creating/terminating processes, operations related to injecting code into other processes, sending/receiving to sockets, and in the case of Windows OS, registry settings. Only those related to reading and writing.

After S3, the malware analysis unit 132 returns the execution log file (S4) and returns to the process of S1.

Here, an example of an execution log file created by the malware analysis unit 132 will be described using FIG. 4. As shown in FIG. 4, the execution log file includes one or more execution logs.

The execution log shows the PID (process ID, identification information of the process that is the subject of the operation) of the caller of the execution process, the type of operation of the PID, the target of the operation, etc.

The types of operations include, for example, reading/writing files, creating/terminating processes, operations related to injecting code into other processes (for example, CreateRemoteThread, etc.), sending/receiving to sockets, etc. In the case of Windows OS, the types of operations further include reading and writing the registry.

The target of the operation is, for example, the file path if the target is a file, the key/value path if it is a registry, the process ID (or parent process ID if creating a process), or the parent process ID if the target is a socket. The IP address of the communication destination and the API related to code injection include the target process ID.

However, if the target of the operation is a resource that is operated using a handle, such as when the target of the operation is a registry key, there is a possibility that the target of the operation cannot be specified using only the information of the corresponding call. If the target of the operation is a registry key, you can open a key (subkey) in a deeper path by calling RegOpenKey with the handle obtained by RegOpenKey as an argument. Therefore, if the malware analysis unit 132 cannot identify the target of the operation based only on the information on the corresponding call, the malware analysis unit 132 recursively traces past calls to obtain information and writes it into the execution log file.

[First graph creation section]
Returning to the explanation of FIG. 3. The first graph creation unit 133 creates a query graph (first graph) using the malware execution log created by the malware analysis unit 132.

For example, in a series of execution processes of malware shown in an execution log, the first graph creation unit 133 sets the subject and target of an operation to be a node, and calculates the operation that the subject of the operation performs on the target of the operation between nodes. Create a query graph represented by edges connecting . Details of the first graph creation section 133 will be explained using FIG. 5.

For example, the first graph creation unit 133 obtains the execution log of the malware created by the malware analysis unit 132, as shown in FIG. 5 (S11). Then, the first graph creation unit 133 prepares an empty graph G (S12), executes the processes of S13 to S16 shown below for each execution log acquired in S11, and then returns the graph G ( S17). After that, the process returns to S1.

For example, if the node corresponding to the subject of the operation and the target of the operation indicated in the execution log of the malware acquired in S11 does not exist in the graph G, the first graph creation unit 133 generates the subject of the operation and the target of the operation. A node corresponding to is created (S13).

Next, the first graph creation unit 133 determines whether the operation of interest in the execution log is read access or reception (S14), and determines whether the operation of interest is read access or reception. If it is determined that this is the case (Yes in S14), the process advances to S15. Then, the first graph creation unit 133 adds an edge to the graph G from the node that is the target of the operation to the node that is the subject of the operation if the edge labeled with the type of operation does not yet exist. (S15).

On the other hand, if the first graph creation unit 133 determines that the operation of interest in the execution log is neither read access nor reception (No in S14), the process proceeds to S16. Then, the first graph creation unit 133 adds an edge to the graph G from the node that is the subject of the operation to the node that is the target of the operation if there is not yet an edge labeled with the type of operation. (S16).

By doing so, the first graph creation unit 133 can create a query graph corresponding to the execution log of the malware.

Note that in creating the above query graph, the first graph creation unit 133 expresses processes, files, and sockets as nodes, similar to the provenance graph of POIROT (see Non-Patent Document 2), and expresses each specific Represent behavior as edges between nodes. Note that in the case of Windows OS, the first graph creation unit 133 also represents the registry as a node.

Additionally, the above processes are distinguished by their PIDs, and the PID of a terminated process is reused by a new process. Therefore, the first graph creation unit 133 assigns another node to the above-mentioned new process.

Note that the first graph creation unit 133 distinguishes the above-mentioned registries and files by path, and distinguishes sockets by IP address.

Furthermore, when the type of operation is read access or reception, the first graph creation unit 133 sets the direction of the edge from the node that is the target of the operation to the node that is the subject of the operation. Otherwise, use the opposite direction.

For example, the first graph creation unit 133 creates the query graph shown in FIG. 6 based on the execution log shown in FIG. 4. Note that in the query graph shown in FIG. 6, labels indicating the type of operation are not written on the edges.

For example, as shown in FIG. 6, the first graph creation unit 133 identifies the subject of the operation (for example, pid:14) in the execution log shown in FIG. 4, and the target of the operation by the subject of the operation (for example, path= Create a node indicating C:\Temp\logger.exe, path=HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Run, image path=C:\Temp\logger.exe, pid=20).

Additionally, a node is created that indicates the subject of the operation (for example, pid:20) in the execution log shown in FIG. 4 and the target of the operation by the subject of the operation (for example, path=C:\Temp\logs). Then, the first graph creation unit 133 connects the node that is the subject of the operation and the node that is the target of the operation by the subject of the operation, using an edge.

Note that in the execution log shown in Figure 4, the target of the operation with pid:14 "image path=C:\Temp\logger.exe, pid=20" is the creation of a process with pid=20, so the first graph creation section 133 connects the node with pid:14 and the node with pid:20 by an edge.

Furthermore, the final query graph created by the first graph creation unit 133 does not include the PID in the node (process node) that is the subject of the operation. This allows the matching unit 136 to match an arbitrary process node when matching the provenance graph and the query graph.

[Detection part]
Return to Figure 2. The detection unit 134 detects malware from a terminal to be monitored. For example, the detection unit 134 obtains a monitoring log indicating a series of execution processes executed on a terminal to be monitored, and creates a provenance graph based on the monitoring log.

Next, the detection unit 134 detects malware in the terminal to be monitored by matching the above provenance graph with the malware query graph created by the creation unit 131. When the detection unit 134 detects malware from the monitored terminal, it outputs an alert or the like.

The detection unit 134 includes a second graph creation unit 135, a matching unit 136, and a detection result output unit 137.

[Second graph creation section]
Based on the monitoring log of the monitored terminal, the second graph creation unit 135 sets the subject and target of an operation in a series of execution processes of the monitored terminal as nodes, and the subject of the operation performs the operation on the target of the operation. A provenance graph (second graph) is created in which operations are represented by edges connecting the nodes.

The method for creating the provenance graph is the same as the method for creating the query graph described above, so the explanation will be omitted. Note that it is preferable that the monitoring log used to create the provenance graph records all the execution processes of the terminal to be monitored.

[Matching section]
The matching unit 136 performs matching between the provenance graph created by the second graph creation unit 135 and some query graphs created by the first graph creation unit 133.

Any matching method can be used here, such as strict subgraph matching or vague matching proposed in POIROT (see Non-Patent Document 2).

[Detection result output section]
The detection result output unit 137 detects malware in the monitored terminal based on the matching result by the matching unit 136, and outputs the detection result. For example, the detection result output unit 137 detects malware for which the match rate between the provenance graph and the query graph is greater than or equal to a predetermined value as malware existing in the monitored terminal, and outputs the detection result.

[Example of processing procedure]
Next, an example of a processing procedure executed by the detection device 10 will be described using FIGS. 7 and 8. First, an example of a processing procedure in which the detection device 10 creates a malware query graph will be described with reference to FIG.

When the malware analysis unit 132 of the detection device 10 acquires the malware executable file (S21 in FIG. 7), it analyzes the acquired malware executable file and creates a malware execution log (S22). Then, the first graph creation unit 133 creates a query graph based on the execution log created in S22 (S23).

Next, upon acquiring the monitoring log of the terminal to be monitored (S31 in FIG. 8), the second graph creation unit 135 creates a provenance graph corresponding to the acquired monitoring log (S32). After that, the matching unit 136 performs matching between the provenance graph created in S32 and the query graph created in S23 of FIG. 7 (S33). Then, the detection result output unit 137 detects malware based on the matching result in S33 (S34), and outputs the result of malware detection (S35).

In this way, the detection device 10 automatically creates a query graph from the malware executable file, and uses the created query graph to detect malware on the monitored terminal. As a result, the detection device 10 can detect the latest malware with high accuracy and over a wide range.

[System configuration, etc.]
Further, each component of each part shown in the drawings is functionally conceptual, and does not necessarily need to be physically configured as shown in the drawings. In other words, the specific form of distributing and integrating each device is not limited to what is shown in the diagram, and all or part of the devices can be functionally or physically distributed or integrated in arbitrary units depending on various loads, usage conditions, etc. Can be integrated and configured. Furthermore, all or any part of each processing function performed by each device may be realized by a CPU and a program executed by the CPU, or may be realized as hardware using wired logic.

Further, among the processes described in the embodiments described above, all or part of the processes described as being performed automatically can be performed manually, or the processes described as being performed manually can be performed manually. All or part of this can also be performed automatically using known methods. In addition, information including processing procedures, control procedures, specific names, and various data and parameters shown in the above documents and drawings may be changed arbitrarily, unless otherwise specified.

[program]
The detection device 10 described above can be implemented by installing a program (detection program) in a desired computer as package software or online software. For example, by causing the information processing device to execute the above program, the information processing device can be made to function as the detection device 10. The information processing device referred to here includes mobile communication terminals such as smartphones, mobile phones, and PHSs (Personal Handyphone Systems), as well as terminals such as PDAs (Personal Digital Assistants).

FIG. 9 is a diagram showing an example of a computer that executes the detection program. Computer 1000 includes, for example, a memory 1010 and a CPU 1020. The computer 1000 also includes a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These parts are connected by a bus 1080.

The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM (Random Access Memory) 1012. The ROM 1011 stores, for example, a boot program such as BIOS (Basic Input Output System). Hard disk drive interface 1030 is connected to hard disk drive 1090. Disk drive interface 1040 is connected to disk drive 1100. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into disk drive 1100. Serial port interface 1050 is connected to, for example, mouse 1110 and keyboard 1120. Video adapter 1060 is connected to display 1130, for example.

The hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, a program that defines each process executed by the detection device 10 described above is implemented as a program module 1093 in which code executable by a computer is written. Program module 1093 is stored in hard disk drive 1090, for example. For example, a program module 1093 for executing processing similar to the functional configuration of the detection device 10 is stored in the hard disk drive 1090. Note that the hard disk drive 1090 may be replaced by an SSD (Solid State Drive).

Further, the data used in the processing of the embodiment described above is stored as program data 1094 in, for example, the memory 1010 or the hard disk drive 1090. Then, the CPU 1020 reads out the program module 1093 and program data 1094 stored in the memory 1010 and the hard disk drive 1090 to the RAM 1012 as necessary and executes them.

Note that the program module 1093 and the program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and the program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). The program module 1093 and program data 1094 may then be read by the CPU 1020 from another computer via the network interface 1070.

10 Detection device 11 Input/output section 12 Storage section 13 Control section 131 Creation section 132 Malware analysis section 133 First graph creation section 134 Detection section 135 Second graph creation section 136 Matching section 137 Detection result output section

Claims

a malware analysis unit that creates an execution log indicating a series of execution processes of the malware by analyzing an executable file of the malware;
Using the execution log of the malware, the subject and target of operations in a series of execution processes of the malware are represented by nodes, and the operations performed by the subject of the operation on the target of the operation are represented by edges connecting the nodes. a first graph creation unit that creates a first graph;
Using a monitoring log showing a series of execution processes executed on a monitored terminal, the subject and target of an operation in the series of execution processes of the terminal are nodes, and the subject of the operation executes on the target of the operation. a second graph creation unit that creates a second graph in which operations are represented by edges connecting the nodes;
a matching unit that matches the second graph and one or more of the first graphs;
A detection device comprising: a detection result output unit that detects malware based on a match rate between the second graph and one or more of the first graphs, and outputs a result of the detection.
The detection result output section includes:
The detection device according to claim 1, wherein the detection device detects malware for which the match rate is equal to or higher than a predetermined value as malware existing in the monitored terminal, and outputs a result of the detection.
The matching section is
2. The method according to claim 1, wherein strict subgraph matching is performed to determine whether or not the first graph exists as a subgraph in the second graph, or ambiguous matching in POIROT is performed. Detection device.
The edges in the first graph and the second graph are
The detection device according to claim 1, further comprising a label indicating a type of operation performed by the subject of the operation on the target of the operation.
If the type of the operation is read access or reception of the target of the operation, the edge is an edge from the node of the target of the operation to the node of the subject of the operation, and the type of operation performed by the subject of the operation is neither read access nor reception of the target of the operation, the edge is an edge from the node of the subject of the operation to the node of the target of the operation,
The matching section further includes:
The detection device according to claim 1, wherein the second graph and one or more of the first graphs are matched in consideration of the direction of the edge.
A detection method performed by a detection device, comprising:
creating an execution log indicating a series of execution processes of the malware by analyzing an executable file of the malware;
Using the execution log of the malware, the subject and target of operations in a series of execution processes of the malware are represented by nodes, and the operations performed by the subject of the operation on the target of the operation are represented by edges connecting the nodes. creating a first graph;
Using a monitoring log showing a series of execution processes executed on a monitored terminal, the subject and target of an operation in the series of execution processes of the terminal are nodes, and the subject of the operation executes on the target of the operation. creating a second graph in which operations are represented by edges connecting the nodes;
a step of matching the second graph with one or more of the first graphs;
A detection method comprising: outputting a malware detection result based on a match rate between the second graph and one or more of the first graphs.
creating an execution log indicating a series of execution processes of the malware by analyzing an executable file of the malware;
Using the execution log of the malware, the subject and target of operations in a series of execution processes of the malware are represented by nodes, and the operations performed by the subject of the operation on the target of the operation are represented by edges connecting the nodes. creating a first graph;
Using a monitoring log showing a series of execution processes executed on a monitored terminal, the subject and target of an operation in the series of execution processes of the terminal are nodes, and the subject of the operation executes on the target of the operation. creating a second graph in which operations are represented by edges connecting the nodes;
a step of matching the second graph with one or more of the first graphs;
A detection program for causing a computer to execute a step of outputting a malware detection result based on a match rate between the second graph and one or more of the first graphs.