CN116260627A

CN116260627A - APT detecting system based on data tracing graph label

Info

Publication number: CN116260627A
Application number: CN202310003535.3A
Authority: CN
Inventors: 邹福泰; 黄明义; 胡钺琳; 郑荔文; 吴越
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2023-01-03
Filing date: 2023-01-03
Publication date: 2023-06-13

Abstract

The invention discloses an APT detection system based on a data tracing graph label, which relates to the field of computer network security and comprises a system log collection module, a data tracing graph standardization module, a data tracing graph compression module and an APT attack detection module; taking the collected original system logs or the system logs collected in real time by an open source tool SPADE deployed in the victim machine as the system log collecting module for input; the data tracing graph standardization module processes the system log into a data tracing graph format meeting the requirement; the data tracing graph compression module compresses edges and nodes of the data tracing graph; and finally, detecting APT attack by using the traceability graph label. The invention designs the category and the specific meaning of the label of the traceability map, writes the propagation rule and the APT attack matching rule of the label, and supports the user to define the label and add the corresponding inference rule and attack discrimination rule.

Description

APT detecting system based on data tracing graph label

Technical Field

The invention relates to the field of computer network security, in particular to an APT detection system based on a data tracing graph label.

Background

Advanced persistent threats (Advanced Persistent Threat, APT), also known as targeted threat attacks, refer to the act of targeted implementation of an attack organization in order to obtain the core secrets of some important entity (large enterprise, government agency, etc.).

The APT attack modes are various and can be roughly divided into the following five steps:

(1) Initial penetration: an attacker implants a prepared attack program (such as Trojan horse, remote control software and the like) into the victim machine through CVE, physical media, deception and other means. (2) untrusted execution: after the target system is successfully invaded, an attacker remotely controls the implanted malicious program to execute, so that the target system and a prepared control server establish command and control (C & C) connection, the target system and the prepared control server start communication, and the control right of the attacked system is acquired. (3) lateral movement: in most cases, the host that initially established control does not have the target object of the attacker, who often needs to move laterally between different networks or hosts of the target system in order to search for core data of the target system. (4) suspicious behavior: after the core file of the target system is found, an attacker formally develops the attack behavior, and main means include but are not limited to authority improvement, password stealing, back door installation and the like. (5) data leakage: after the preset attack purpose is achieved, an attacker transfers the acquired important data through the C & C connection established in the step (2), and finally, attack traces such as a system log and the like are cleaned, so that the whole attack process is completed.

The data tracing Graph (Provenance Graph) is a metadata form and can record activities of various entities of the system in the data generation process. In 2013, W3C defines a set of relational models and constraints, which provide a structural and semantic basis for data tracing graphs. The data tracing graph consists of a plurality of nodes and directed edges, wherein the nodes represent system entities appearing in system activities and mainly comprise processes, files, sockets and the like; directed edges represent interactions between nodes, with the main types being read, write, execute, create sub-processes, etc. Therefore, the data traceability graph has very rich causal relations on a larger space-time scale, and is very suitable for detecting APT attacks.

Accordingly, those skilled in the art are working to develop an APT attack detection system.

Disclosure of Invention

In view of the above-mentioned drawbacks of the prior art, the technical problem to be solved by the present invention is how to implement APT attack detection.

In order to achieve the above purpose, the invention provides an APT detection system based on a data tracing graph label, which relates to the field of computer network security and comprises a system log collection module, a data tracing graph standardization module, a data tracing graph compression module and an APT attack detection module; taking the collected original system logs or the system logs collected in real time by an open source tool SPADE deployed in the victim machine as the system log collecting module for input; the data tracing graph standardization module processes the system log into a data tracing graph format meeting the requirement; the data tracing graph compression module compresses edges and nodes of the data tracing graph; and finally, detecting APT attack by using the traceability graph label.

Further, the system log collecting module uniformly processes the system logs from different sources into a data tracing graph in an edge_list format.

Further, the data tracing graph normalization module uses a node version updating algorithm to convert the collected original tracing graph into a loop-free graph with better layout.

Further, the acyclic graph satisfies a DAG characteristic.

Furthermore, the data tracing graph compression module designs a compression algorithm of the data tracing graph, and on the premise of not changing the semantics of the data tracing graph, two algorithms of redundant semantic deletion and invalid entity removal are used for respectively compressing edges and nodes of the data tracing graph.

Further, the APT attack detection module marks the processed data tracing graph by using the predefined tracing graph label, diffuses the label by using a label deducing rule, and finally judges whether APT attack exists by using an APT attack matching rule.

Further, definition of the label of the traceability map is performed based on an ATT & CK model, wherein the definition comprises types and specific meanings of the label.

Further, the APT attack detection module designs a set of tag matching rules, traverses the data traceability graph after tag inference, uses the matching rules to match dangerous tags in the data traceability graph, sends a warning to a user, and reports the detected attack type and the attack occurrence position.

Further, the APT attack detection module allows the user to define the traceability graph tag, write the traceability graph tag inference rule and the matching rule by himself, so as to complete detection of the novel APT attack.

Further, the workflow of the APT detection system includes the following steps:

step 1, an attacker spreads APT attack on a target system, and the system log collecting module collects a system log containing attack behaviors and processes the system log into an edge_list format;

step 2, the tracing graph standardization module firstly creates an original version for each node, copies the current latest version ut-1 of u when the node u obtains an edge l pointing to the original version, creates a new version ut of the node, and leads the newly entered edge to point to the new version, and finally, the processed tracing graph becomes a directed acyclic graph meeting the DAG characteristics;

and 3, firstly analyzing each edge in the graph by the tracing graph compression module, separately recording the starting node and the end node of each edge, and recording that a set formed by all the end nodes is Sem (x) when each node is used as the starting node. Whenever an edge from a certain node u to another node v appears, comparing v with a Sem (u) element, and if v is in Sem (u), deleting the event directly; if v is not in Sem (u), adding v into Sem (u), and emptying Sem (v), thereby completing compression of the tracing edges;

step 4, when the tracing graph compression module inserts an edge between the node u and the node v, judging the latest version of the current v, and if the number of child nodes is zero, taking the entity corresponding to the new version of the v which is originally generated as an invalid entity and deleting the entity, thereby completing the compression of the nodes in the tracing graph;

step 5, the APT attack detection module marks each node in the traceability graph by using the designed traceability graph label, and the label initialization process is completed;

step 6, the APT attack detection module uses a label deducing rule, and starts from the initialized label, the label is diffused in the traceability graph, so that the coverage rate of the label is increased;

and 7, the APT attack detection module finds that the label condition of the node x in the graph accords with the designed APT judgment rule, and if the APT attack is judged to occur, the corresponding attack type and the position of the node x where the attack occurs are reported to the user.

According to the APT detection system and method based on the data tracing graph label, system logs of various sources are collected and processed into a unified format, two methods of redundant semantic deletion and invalid entity removal are combined, meanwhile, edges and nodes of the tracing graph are compressed, and the size of the tracing graph is effectively compressed under limited load. The system of the invention uses the data tracing graph label mode to detect the APT attack. The label class and specific meaning of the traceability map are designed, the propagation rule and APT attack matching rule of the label are compiled, and the user is supported to define the label by himself and to add the corresponding inference rule and attack discrimination rule.

The conception, specific structure, and technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, features, and effects of the present invention.

Drawings

FIG. 1 is a schematic diagram of an APT detection system based on a data trace-source graph label according to a preferred embodiment of the present invention;

fig. 2 is a schematic diagram of an APT attack detection process of an APT detection system based on a data trace-source graph label according to a preferred embodiment of the present invention.

Detailed Description

The following description of the preferred embodiments of the present invention refers to the accompanying drawings, which make the technical contents thereof more clear and easy to understand. The present invention may be embodied in many different forms of embodiments and the scope of the present invention is not limited to only the embodiments described herein.

In the drawings, like structural elements are referred to by like reference numerals and components having similar structure or function are referred to by like reference numerals. The dimensions and thickness of each component shown in the drawings are arbitrarily shown, and the present invention is not limited to the dimensions and thickness of each component. The thickness of the components is exaggerated in some places in the drawings for clarity of illustration.

The invention provides an APT detection system based on a data tracing graph label, which consists of a data tracing graph collecting module, a data tracing graph normalizing module, a data tracing graph and an APT attack detection module based on the label. The system takes the collected original system logs or the system logs collected in real time by an open source tool SPADE deployed in the victim machine as input, processes the system logs into a data traceability map format meeting the requirements, initializes and deduces labels of the traceability map, and finally uses the traceability map labels to detect APT attacks. The system is composed of the following modules as shown in fig. 1, including:

1) The system log collection module: the open source tool SPASDE is deployed in the victim machine, and the system log of the victim machine is collected, or log data manually input by a user is directly used and converted into an edge_list format meeting the requirements.

2) And the data tracing graph normalization module is used for: and converting the collected original traceability graph into a loop-free graph with better layout by using a node version updating algorithm, so that the loop-free graph meets the DAG characteristic.

3) And the data tracing diagram compression module is used for: and inputting the data tracing graph processed by the tracing graph normalization module, and respectively compressing edges and nodes of the data tracing graph by using two algorithms, namely redundant semantic deletion and invalid entity removal.

4) APT attack detection module: and marking the processed data traceability graph by using a predefined self-adaptive label, diffusing the label by using a label deducing rule, and finally judging whether an APT attack exists by using an APT attack matching rule.

Fig. 2 shows an APT attack detection process of an APT detection system based on a data trace-source graph tag.

The method comprises the steps of collecting system log data in real time or log data manually input by a user as input into a system, firstly entering a system log standardization module, firstly uniformly processing the input system log into an edge_list format, then creating an original version for each node in a graph, copying the node every time the node obtains an edge pointing to the node, namely creating a new version of the node, pointing to the new version by the newly-entered edge, and finally converting an original tracing graph into a data tracing graph meeting the directed acyclic characteristic. The processed data tracing graph enters a data tracing graph compression module to be processed, the data tracing graph compression module firstly compresses edges, namely a redundant semantic deletion algorithm is adopted, under the condition that the state of an initial node is unchanged, only one edge with the earliest timestamp can be reserved in all edges from the initial node to any one same final node, and the edges added subsequently are deleted as redundant events; and then compressing the nodes, namely using an invalid entity clearing algorithm, and if one node and the other node have the same ancestor node and have no child nodes, treating the nodes as invalid nodes and deleting the nodes. The data tracing graph is compressed, the scale of the data tracing graph is greatly reduced, and the processed data tracing graph is input into an APT detection module for attack detection. The APT attack detection module firstly initializes the labels of the nodes in the data tracing graph, the defined labels are marked on a part of the nodes of the data tracing graph, the initialized labels are diffused among adjacent nodes in the graph according to the label deducing rule, the whole graph is traversed finally, dangerous labels are searched, the corresponding APT attacks are identified, and a warning is sent to a user to report the type of the APT attacks and the occurrence position information of the APT attacks.

The foregoing describes in detail preferred embodiments of the present invention. It should be understood that numerous modifications and variations can be made in accordance with the concepts of the invention without requiring creative effort by one of ordinary skill in the art. Therefore, all technical solutions which can be obtained by logic analysis, reasoning or limited experiments based on the prior art by the person skilled in the art according to the inventive concept shall be within the scope of protection defined by the claims.

Claims

1. An APT detection system based on a data tracing graph label relates to the field of computer network security and comprises a system log collection module, a data tracing graph standardization module, a data tracing graph compression module and an APT attack detection module; taking the collected original system logs or the system logs collected in real time by an open source tool SPADE deployed in the victim machine as the system log collecting module for input; the data tracing graph standardization module processes the system log into a data tracing graph format meeting the requirement; the data tracing graph compression module compresses edges and nodes of the data tracing graph; and finally, detecting APT attack by using the traceability graph label.

2. The APT detection system of claim 1, wherein the system log collection module processes system logs of different sources into a data trace-source graph in edge_list format.

3. The APT detection system based on data trace-source graph labels of claim 2, wherein the data trace-source graph normalization module converts the collected original trace-source graph into a loop-free graph with a better layout by using a node version update algorithm.

4. An APT detection system based on data trace-source graph labels according to claim 3, wherein the loop-free graph satisfies DAG characteristics.

5. The APT detection system based on data trace-source graph label according to claim 4, wherein the data trace-source graph compression module designs a compression algorithm of the data trace-source graph, and uses two algorithms of redundant semantic pruning and invalid entity removal to compress edges and nodes of the data trace-source graph on the premise of not changing semantics of the data trace-source graph.

6. The APT detection system of claim 5, wherein the APT attack detection module marks the processed data trace-source graph using the trace-source graph tag defined in advance, diffuses the tag using a tag inference rule, and finally judges whether there is an APT attack using an APT attack matching rule.

7. The APT detection system based on data trace-source graph labels of claim 6, wherein definition of the trace-source graph labels is performed based on ATT & CK model, including type and specific meaning of labels.

8. The APT detection system based on data trace-source graph labels of claim 7, wherein the APT attack detection module designs a set of label matching rules, traverses the data trace-source graph after label inference, matches dangerous labels therein using the matching rules, and gives a warning to a user reporting the type of attack detected and the occurrence position of the attack.

9. The APT detection system of claim 8, wherein the APT attack detection module allows a user to define the trace-source graph label, write the trace-source graph label inference rule and the matching rule by himself, so as to complete the detection of the novel APT attack.

10. The APT detection system based on the data trace-source graph label of claim 9, wherein the workflow of the APT detection system comprises the steps of:

step (a)2. The tracing graph normalization module firstly creates an original version for each node, and when node u obtains an edge l pointing to the original version, the current latest version u of the node u is obtained at the moment of t _t-1 Making a copy, creating a new version u of the node _t And the newly entered edge points to the new version, so that the processed traceable graph is finally changed into a directed acyclic graph meeting the DAG characteristics;