CN114900364A

CN114900364A - High-level continuous threat detection method based on tracing graph and heterogeneous graph neural network

Info

Publication number: CN114900364A
Application number: CN202210546970.6A
Authority: CN
Inventors: 黄永忠; 欧阳规格; 高一鹏
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2022-05-18
Filing date: 2022-05-18
Publication date: 2022-08-12
Anticipated expiration: 2042-05-18
Also published as: CN114900364B

Abstract

The invention relates to the technical field of network security, in particular to a high-level continuous threat detection method based on a tracing graph and a heterogeneous graph neural network. Firstly, a good representation is learned for the tracing graph by using a heterogeneous graph representation learning technology, and preparation is made for a subsequent classification task. And then, performing layered pooling on the output heterogeneous graph vectors, gradually aggregating the original representation information of the heterogeneous graph, and judging whether the input tracing graph contains an attack behavior or not by using the information. And finally, checking the classification result represented by the heterogeneous graph through the real label of the tracing graph. The excessive dependence of the APT detection process on expert field knowledge is effectively reduced, the different network attack detection fields are conveniently expanded, meanwhile, a tracing graph structure of a cross-operating system is used for modeling the host activity, the host activity can be applied in a complex enterprise environment, and the workload of designing different tracing graphs for different operating systems is reduced.

Description

High-level continuous threat detection method based on tracing graph and heterogeneous graph neural network

Technical Field

The invention relates to the technical field of network security, in particular to a high-level continuous threat detection method based on a tracing graph and a heterogeneous graph neural network.

Background

With the continuous progress of informatization, the combination of network space and various aspects of industry, national defense and social life is increasingly deepened. Advanced Persistent Threat (APT) organizations attack enterprises or organizations in different areas for economic benefit, theft of confidential information, or political purposes. How to accurately detect the APT attack and quickly respond becomes a hot research problem in the field of network security.

The existing detection method based on the tracing graph mainly focuses on the aspects of label propagation algorithm, graph matching and the like, the technologies depend on algorithms, rules and the like designed by expert knowledge too much, a large amount of domain knowledge is needed, and the customized algorithms are difficult to adapt to various network environments and operating system environments to a certain extent. However, with the development of deep learning techniques, the importance of reducing human intervention in the APT detection process is increasing.

Disclosure of Invention

The invention aims to provide a high-level continuous threat detection method based on a tracing graph and a heterogeneous graph neural network, and aims to solve the problem that the existing detection method based on the tracing graph is difficult to adapt to various network environments and operating system environments.

In order to achieve the above purpose, the present invention provides a high-level persistent threat detection method based on a tracing graph and a heterogeneous graph neural network, which comprises the following steps:

defining a tracing frame spanning an operating system by taking the basic definition of a system level tracing diagram as a guiding principle;

under the tracing diagram framework, converting the logs generated by the host system into a tracing diagram capable of modeling the running state of the system;

learning one representation for the tracing graph by using a heterogeneous representation learning technology to obtain a heterogeneous graph;

performing hierarchical pooling through the vector of the new heterogeneous graph, aggregating original representation information of the new heterogeneous graph, and judging whether the tracing graph contains an attack behavior by using the original representation information;

and checking the classification result represented by the heterogeneous graph through the real label of the tracing graph.

The tracing graph is a directed heterogeneous attribute graph, and the provided attributes are symbolized and vectorized.

Wherein the step of gradually aggregating the original representation information of the new heterogeneous graph by performing hierarchical pooling on the vector of the new heterogeneous graph comprises:

mapping different types of nodes in the tracing graph to respective specific vector spaces;

calculating the head attention of the nodes and the importance of all domain nodes of the nodes to the nodes by utilizing the edges between the nodes;

calculating the importance degree of each node to the target node;

all message headers are combined to obtain a message vector;

aggregating the message to the target node, performing linear mapping on the target node, connecting the target node through a nonlinear activation function and a residual error, and mapping the target node to a specific space where the target node is located;

obtaining the information of the nodes in the abnormal graph by repeating the steps;

compressing the information in the heterogeneous graph through hierarchical pooling, and obtaining a vector representing the information of the heterogeneous graph after grouping.

Compared with the prior art, the advanced persistent threat detection method based on the tracing graph and the heterogeneous graph neural network has the beneficial effects that: the advanced continuous threat detection method based on the tracing graph and the heterogeneous graph neural network effectively reduces the excessive dependence of the APT detection process on expert field knowledge, and is convenient to expand to different network attack detection fields; modeling the host activity by using a tracing graph structure of a cross-operating system, so that the host activity can be applied in a complex enterprise environment, and the workload of designing different tracing graphs for different operating systems is reduced; by using the layered pooling model, the detection accuracy of the model is improved, and the problem of classification accuracy reduction caused by flattening of graphic data is avoided.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic step diagram of an advanced persistent threat detection method based on a tracing graph and a heterogeneous graph neural network according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of the present invention

Referring to fig. 1 and fig. 2, fig. 1 is a schematic diagram illustrating steps of a high-level persistent threat detection method based on a tracing graph and a heterogeneous graph neural network according to an embodiment of the present invention. Specifically, as shown in fig. 1, the advanced persistent threat detection method based on a traceback graph and a heterogeneous graph neural network may include the following steps:

s1, defining a tracing frame crossing an operating system by taking the basic definition of the system level tracing diagram as a guiding principle;

s2, under the tracing graph framework, converting the logs generated by the host system into a tracing graph capable of accurately modeling the system running state;

s3, learning one representation for the traceable graph by using a heterogeneous representation learning technology to obtain a heterogeneous graph;

s4, performing hierarchical pooling through the vector of the new heterogeneous graph, aggregating the original representation information of the new heterogeneous graph, and judging whether the tracing graph contains an attack behavior by using the original representation information;

and S5, checking the classification result represented by the heterogeneous graph through the real label of the tracing graph.

Specifically, a traceback graph architecture applicable to various operating systems is defined and nodes therein are divided into three types: a host object capable of initiating system activities, a carrier guest object of system activities, and a descriptive node object. The main object may be subdivided by its type attributes into process objects, thread objects, execution unit objects, and the like. The guest objects may be classified as file objects, memory objects, network stream objects, and the like. The descriptive node objects include dependency unit objects, executive user objects, as shown in table 1. The richness of the edge types of the tracing graph directly influences the semantic expression capability of the tracing graph, and the edges of the tracing graph comprise 56 types including event closing, object creating event, login event, mount event, reading event and the like. For the nodes and edges of the tracing graph, the expression capability of the nodes and edges can be enhanced by adding various attributes to the nodes and edges, so that the tracing graph can accurately depict the operation state and details of the system. The well-defined tracing frame can provide guidance for subsequently converting information such as logs generated by a host computer into a tracing graph.

Through a well-defined system level tracing diagram framework, the operation state and the details of the system can be described by means of rich semantic knowledge and strong abstract expression capability of the tracing diagram. Subsequent heterogeneous graph representation learning is prepared by symbolizing and vectorizing the information provided by the traceback graph.

For different types of nodes s and t in the tracing graph, the nodes s and t are mapped to respective specific vector space to obtain K ⁱ (s) and Q ⁱ (t), because the designed traceback graph is a heterogeneous graph, different types of nodes are mapped into different types of vector spaces, relative to mapping all nodes into one space, mainly to better capture the properties of the heterogeneous graph. And calculating the attention of the h heads of the nodes s and t by using the edge e between the nodes in the tracing graph and a formula 1, and calculating the importance of all the field nodes t of the s node to the node s by using a formula 2.

Herein, the<τ(v _t ),φ(e),τ(v _s )>A meta-relationship between the nodes is represented,

is an edge-based matrix whose effect is to make the model capture different semantic relationships between the same node pairs, h represents the total number of heads of attention, and i represents the ith head of attention.

Due to the fact that different relations have different degrees of importance on the target node, the prior tensor mu is added to be used for zooming the attention size. The importance degree of each node to the target node can be calculated through formula 1 and formula 2.

After mapping different types of source nodes to different spaces, for nodes s and t and edge e between them, their multi-headed messages are computed, by the matrix in equation 3

And finally, combining all message headers by using a formula 4 to obtain a message vector, and finishing message aggregation.

Is a linear mapping that maps the source node s into the ith header message vector.

Message _HGT (s, e, t) incorporates all the headers in equation 3And | means MSG-head generated in the formula 3 ⁱ (s, e, t) are spliced together.

After the message is aggregated to the target node by the formula 5, A-Linear is carried out on the target node _τ(t) After linear mapping, the non-linear activation function and residual connection of formula 6 are used to map the non-linear activation function back to the specific space where the target node t is located.

Refers to a weighted average between the attention weight and the message header.

At this time, each node in the graph aggregates the context representation of the domain node information, and the nodes expand the message propagation distance by repeating the steps, so that the information of most nodes in the graph can be obtained. And finally, compressing all information in the graph through hierarchical pooling to obtain a vector capable of representing the information of the whole graph after grouping, and judging whether the graph comprises an APT attack behavior or not by using the vector.

The advanced continuous threat detection method based on the tracing graph and the heterogeneous graph neural network effectively reduces the excessive dependence of the APT detection process on expert field knowledge, and is convenient to expand to different network attack detection fields. Meanwhile, the host activity is modeled by using the tracing graph structure of the cross-operating system, so that the host activity can be applied in a complex enterprise environment, and the workload of designing different tracing graphs for different operating systems is reduced. In addition, the layered pooling model is used, so that the detection accuracy of the model is improved, and the problem of reduced classification accuracy caused by flattening of the graphic data is solved.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A high-level persistent threat detection method based on a tracing graph and a heterogeneous graph neural network is characterized by comprising the following steps:

under the tracing graph framework, converting logs generated by a host system into a tracing graph capable of modeling the running state of the system;

learning a representation for the tracing graph by using a heterogeneous graph representation learning technology to obtain a new heterogeneous graph representation;

2. The advanced persistent threat detection method based on a traceback graph and a heterogeneous graph neural network according to claim 1,

3. The advanced persistent threat detection method based on a traceback graph and a heterogeneous graph neural network according to claim 2, wherein the step of gradually aggregating the original representation information of the new heterogeneous graph through hierarchical pooling of vectors of the new heterogeneous graph comprises:

calculating the head attention of the nodes and the importance of all neighborhood nodes of the nodes to the nodes by utilizing the edge vectors among the nodes;

calculating the importance degree of each node to the target node;

all message headers are combined to obtain a message vector;

aggregating the message to the target node, performing linear mapping on the target node, connecting the target node through a nonlinear activation function and a residual error, and mapping the target node back to a specific space where the target node is located;

because the above steps need to be repeated, all the nodes need to be mapped back to the specific space of the node type;

compressing the information in the abnormal composition picture through layering pooling to finally obtain a vector representing the information of the abnormal composition picture.