CN116846612A

CN116846612A - Attack chain completion method and device, electronic equipment and storage medium

Info

Publication number: CN116846612A
Application number: CN202310737889.0A
Authority: CN
Inventors: 高浩浩; 蔡挺; 马奇辰; 焦伟; 张凯强; 王景丽; 赵佳祥; 赵曦滨; 孙逸伦; 万海
Original assignee: China Bond Jinke Information Technology Co ltd; Tsinghua University
Current assignee: China Bond Jinke Information Technology Co ltd; Tsinghua University
Priority date: 2023-06-20
Filing date: 2023-06-20
Publication date: 2023-10-03

Abstract

The application relates to an attack chain completion method, an attack chain completion device, electronic equipment and a storage medium, which are applied to the technical field of network security, wherein the attack chain completion method comprises the following steps: acquiring a safety knowledge graph and an abnormal edge; determining a plurality of dependent paths in the safety knowledge graph, and clustering the plurality of dependent paths to obtain a plurality of class clusters and corresponding host behavior categories; selecting a target dependency path from the dependency paths to which each abnormal edge belongs, and merging target dependency paths with the same entity and host behavior categories in the target dependency paths corresponding to all the abnormal edges to obtain a plurality of local attack chains; establishing connection between local attack chains in each class cluster according to the connection probability between the entities in the local attack chains in each class cluster and the entities in other local attack chains in the class cluster; and establishing connection between local attack chains among the class clusters according to the connection probability between the entity in each class cluster and the entity in other class clusters. The application can improve the integrity of the attack chain.

Description

Attack chain completion method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of network security technologies, and in particular, to an attack chain completion method, an attack chain completion device, an electronic device, and a storage medium.

Background

The network security technology aims at ensuring the normal operation of a computer system and a network, and concerns national security and national folk life. Advanced persistent threats have begun to produce security incidents worldwide as a new means of attack by comprehensively utilizing a variety of advanced attack techniques to control target assets for long periods and continuously.

In the related technology, network security defense can be performed through attack tracing, and advanced persistent threat can be effectively treated. The attack tracing records system events by collecting information such as audit logs of the host system, and can detect and restore the process of advanced persistent threat intrusion into the host system.

However, due to the reasons of data incompleteness, attack tracing strategies and the like, the problem of attack chain deletion exists in the primary reduction result, the attack flow cannot be completely reproduced, the safety operator review is not facilitated, and the subsequent attack disposal flow is affected.

Disclosure of Invention

In order to solve the technical problems, the application provides an attack chain completion method, an attack chain completion device, electronic equipment and a storage medium.

According to a first aspect of the present application, there is provided an attack chain completion method, comprising:

Acquiring a pre-constructed safety knowledge graph based on a host system audit log and abnormal edges in the safety knowledge graph, wherein nodes and edges in the safety knowledge graph are respectively interaction relations between entities in the host system audit log;

determining a plurality of dependent paths in the safety knowledge graph according to the safety knowledge graph, and clustering the plurality of dependent paths to obtain a plurality of class clusters and host behavior categories corresponding to each class cluster;

selecting a target dependency path from the dependency paths to which each abnormal edge belongs, and merging target dependency paths with the same entity and host behavior categories in the target dependency paths corresponding to all the abnormal edges to obtain a plurality of local attack chains;

aiming at each local attack chain in each class cluster, carrying out link prediction on the entity in the local attack chain and the entity in other local attack chains before the timestamp corresponding to the local attack chain in the class cluster to obtain the connection probability between the entity in the local attack chain and the entity in other local attack chains;

selecting a first target entity with the largest connection probability and located in the local attack chain and a second target entity located in the other local attack chains, and establishing connection between the second target entity and the first target entity;

Acquiring the earliest time stamp in time stamps corresponding to local attack links in each class cluster, and taking the earliest time stamp as the time stamp corresponding to the class cluster;

carrying out link prediction on the entities in each class cluster and the entities in other class clusters before the timestamp corresponding to the class cluster to obtain the connection probability between the entities in each class cluster and the entities in the other class clusters;

and selecting a third target entity with the maximum connection probability and located in the class cluster and a fourth target entity located in the other class clusters, and establishing connection between the fourth target entity and the third target entity.

Optionally, the determining a plurality of dependent paths in the security knowledge-graph according to the security knowledge-graph includes:

selecting an entrance vertex from the security knowledge graph, wherein the exit degree of the entrance vertex is greater than 0, or when the entrance vertex is a file entity or a communication entity, the sum of the entrance degree and the exit degree of the entrance vertex is smaller than or equal to a first threshold value, or the sum of the entrance degree and the exit degree of the entrance vertex is smaller than or equal to a second threshold value; the first threshold is less than the second threshold;

traversing from the entrance vertexes according to a graph traversing algorithm based on depth-first search, and determining a plurality of candidate paths in the safety knowledge graph, wherein the time stamp of the vertexes in the candidate paths increases along with the direction of the edge;

And determining a plurality of dependent paths according to the plurality of candidate paths.

Optionally, the determining a plurality of dependent paths according to the plurality of candidate paths includes:

taking each candidate path as a dependent path; or alternatively, the process may be performed,

and if any one of the candidate paths is a sub-path of other candidate paths and the number of the other candidate paths is smaller than the preset number, taking the candidate paths except the candidate paths as dependent paths.

Optionally, the clustering the multiple dependent paths to obtain multiple class clusters and host behavior categories corresponding to each class cluster includes:

mapping the entity and the interaction relation in each dependent path into an embedded vector through an embedded model TransR, and obtaining the embedded vector of each dependent path according to the embedded vector of the entity and the interaction relation;

clustering the embedded vectors of the multiple dependent paths through a clustering algorithm to obtain multiple class clusters and host behavior categories corresponding to each class cluster.

Optionally, performing link prediction on the entity in the local attack chain and the entity in the other local attack chain to obtain a connection probability between the entity in the local attack chain and the entity in the other local attack chain, including:

Inputting the entity in the local attack chain and the entity in other local attack chains into a pre-established link prediction model to obtain the connection probability between the entity in the local attack chain and the entity in other local attack chains;

carrying out link prediction on the entity in each class cluster and the entity in the other class clusters to obtain the connection probability between the entity in each class cluster and the entity in the other class clusters, wherein the link prediction comprises the following steps:

and inputting the entity in each class cluster and the entity in the other class clusters into the link prediction model to obtain the connection probability between the entity in each class cluster and the entity in the other class clusters.

Optionally, the selecting a target dependency path from the dependency paths to which each abnormal edge belongs includes:

according to the following formula:

calculating the ith dependent path P to which the abnormal edge belongs _i Score (P) _i ) Selecting a dependent path with the highest value score as a target dependent path;

where ε represents the dependency path P _i The number of abnormal edges contained in the dependency path, e represents the number of edges contained in the dependency path, v represents the dependency path P _i The number of entities involved in τ represents the dependency path P _i Is a rarity ranking of the host behavior categories.

Optionally, before the obtaining the pre-built security knowledge graph based on the audit log of the host system, the method further includes:

obtaining an audit log of a host system, extracting an entity from the design log of the host system, and calling relation between the entity;

generating a link according to the calling relation between the entities, and constructing an indirect interaction relation between non-adjacent entities in the link;

and taking the entity as a node of the security knowledge graph, and taking the calling relationship between the entities and the indirect interaction relationship between non-adjacent entities in the link as edges of the security knowledge graph.

According to a second aspect of the present application, there is provided an attack chain completion device comprising:

the system comprises a knowledge graph acquisition module, a data processing module and a data processing module, wherein the knowledge graph acquisition module is used for acquiring a pre-constructed safety knowledge graph based on a host system audit log and abnormal edges in the safety knowledge graph, wherein nodes and edges in the safety knowledge graph are respectively interaction relations between entities in the host system audit log;

the dependence path determining module is used for determining a plurality of dependence paths in the safety knowledge graph according to the safety knowledge graph;

The dependence path clustering module is used for clustering the dependence paths to obtain a plurality of class clusters and host behavior categories corresponding to each class cluster;

the target dependency path determining module is used for selecting a target dependency path from the dependency paths to which each abnormal edge belongs;

the local attack chain determining module is used for merging target dependency paths with the same entity and host behavior category in the target dependency paths corresponding to all abnormal edges to obtain a plurality of local attack chains;

the first connection probability determining module is used for predicting links of the entity in the local attack chain and the entities in other local attack chains before the corresponding time stamp of the local attack chain in each class cluster to obtain the connection probability between the entity in the local attack chain and the entity in other local attack chains;

the intra-cluster link establishment module is used for selecting a first target entity with the largest connection probability and located in the local attack chain and a second target entity located in the other local attack chains, and establishing connection between the second target entity and the first target entity;

The time stamp determining module is used for acquiring the earliest time stamp in the time stamps corresponding to the local attack links in each class cluster, and taking the earliest time stamp as the time stamp corresponding to the class cluster;

the second connection probability determining module is used for carrying out link prediction on the entities in each class cluster and the entities in other class clusters before the timestamp corresponding to the class cluster to obtain the connection probability between the entities in each class cluster and the entities in the other class clusters;

and the inter-cluster link establishment module is used for selecting a third target entity with the maximum connection probability and positioned in the cluster and a fourth target entity positioned in the other clusters, and establishing connection between the fourth target entity and the third target entity.

Optionally, the dependence path determining module is specifically configured to select an entry vertex from the security knowledge graph, where an exit degree of the entry vertex is greater than 0, or when the entry vertex is a file entity or a communication entity, a sum of an entry degree and an exit degree of the entry vertex is less than or equal to a first threshold, or a sum of the entry degree and the exit degree of the entry vertex is less than or equal to a second threshold; the first threshold is less than the second threshold; traversing from the entrance vertexes according to a graph traversing algorithm based on depth-first search, and determining a plurality of candidate paths in the safety knowledge graph, wherein the time stamp of the vertexes in the candidate paths increases along with the direction of the edge; and determining a plurality of dependent paths according to the plurality of candidate paths.

Optionally, the dependent path determining module is specifically configured to determine a plurality of dependent paths according to the plurality of candidate paths by:

Optionally, the dependency path clustering module is specifically configured to map the entity and the interaction relationship in each dependency path into an embedded vector through an embedding model TransR, and obtain the embedded vector of each dependency path according to the embedded vector of the entity and the interaction relationship; clustering the embedded vectors of the multiple dependent paths through a clustering algorithm to obtain multiple class clusters and host behavior categories corresponding to each class cluster.

Optionally, the first connection probability determining module is specifically configured to input, for each local attack chain in each class cluster, an entity in the local attack chain and an entity in other local attack chains into a pre-established link prediction model, so as to obtain a connection probability between the entity in the local attack chain and the entity in other local attack chains;

The second connection probability determining module is specifically configured to input the entity in each class cluster and the entity in the other class cluster into the link prediction model, so as to obtain a connection probability between the entity in each class cluster and the entity in the other class cluster.

Optionally, the target dependent path determining module is specifically configured to:

calculating the ith dependent path P to which the abnormal edge belongs _i Score (P) _i ) SelectingThe dependent path with the highest value score is taken as a target dependent path;

Optionally, the attack chain completion device further includes:

the system comprises a host system audit log acquisition module, a host system audit log generation module and a host system audit log generation module, wherein the host system audit log acquisition module is used for acquiring a host system audit log;

the entity and call relation extraction module is used for extracting the entity from the design log of the host system and the call relation between the entities;

the interaction relation construction module is used for generating a link according to the calling relation between the entities and constructing an indirect interaction relation between non-adjacent entities in the link;

The safety knowledge graph construction module is used for taking the entity as a node of the safety knowledge graph, and taking a calling relationship between the entities and an indirect interaction relationship between non-adjacent entities in the link as edges of the safety knowledge graph.

According to a third aspect of the present application, there is provided an electronic device comprising: a processor for executing a computer program stored in a memory, which when executed by the processor implements the method according to the first aspect.

According to a fourth aspect of the present application, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of the first aspect.

According to a fifth aspect of the present application, there is provided a computer program product for, when run on a computer, causing the computer to perform the method of the first aspect.

Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:

and determining a plurality of dependent paths in the safety knowledge graph by acquiring a pre-constructed safety knowledge graph, and clustering the dependent paths with similar semantics in the safety knowledge graph to obtain a plurality of class clusters and host behavior categories corresponding to each class cluster. Obtaining abnormal edges in the security knowledge graph, selecting target dependent paths from the dependent paths to which each abnormal edge belongs, merging target dependent paths with the same entity and host behavior category in the target dependent paths corresponding to all the abnormal edges, and locating a plurality of left local attack chains, wherein connectivity among the left local attack chains can be determined as the occurrence of the missing of the attack chains. And converting the task of attack chain completion into a link prediction task on the safety knowledge graph, wherein the link prediction predicts the probability of connection of two entities in the safety knowledge graph. And obtaining the entity pairs most likely to be connected in the two local attack links by predicting the connection probability of the entity pairs in the two local attack links (each local attack link and the local attack link before the timestamp corresponding to the local attack link) in the class cluster, and establishing the connection between the entity pairs in the class cluster. Similarly, the connection probability of entity pairs in the local attack chains in the two class clusters is predicted, and the entity pairs most likely to exist in the local attack chains in the two class clusters are obtained, so that the connection of the entity pairs between the class clusters is established. Thus, a complete restored attack chain can be obtained through link complementation, and the link restoration integrity is improved. The completely restored attack chain is displayed to security event operators and used as a tracing analysis result of a host intrusion event, so that the interpretability and the accuracy of the attack tracing result are improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a flow chart of an attack chain completion method in an embodiment of the application;

FIG. 2 is a flowchart for constructing a secure knowledge graph in an embodiment of the application;

FIG. 3 is a schematic diagram of building entity relationships in an embodiment of the present application;

FIG. 4 is a schematic diagram of a local attack chain merge in an embodiment of the present application;

FIG. 5 is a schematic diagram of predicting connection probability between entity pairs by a link prediction model according to an embodiment of the present application;

FIG. 6 is a schematic diagram of attack chain completion in an embodiment of the present application;

FIG. 7 is a schematic diagram of an attack chain completion device according to an embodiment of the present application;

Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the application.

Detailed Description

In order that the above objects, features and advantages of the application will be more clearly understood, a further description of the application will be made. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the application.

Referring to fig. 1, fig. 1 is a flowchart of an attack chain completion method according to an embodiment of the present application, which may include the following steps:

step S102, a pre-constructed security knowledge graph based on the audit log of the host system and abnormal edges in the security knowledge graph are obtained, wherein nodes and edges in the security knowledge graph are interaction relations between entities in the audit log of the host system.

The host system audit log records the entities and events of the host when running. Different audit tools may be used to collect host system audit logs for different operating systems. For example, for a Linux operating system, a host system audit log may be collected using an audiod audit tool; the host system audit log may be collected for a Windows operating system using an ETW audit tool. The security knowledge graph can be constructed in advance according to the audit log of the host system. Referring to fig. 2, fig. 2 is a flowchart of constructing a security knowledge graph according to an embodiment of the application, which may include the following steps:

Step S202, obtaining a host system audit log, extracting an entity from a host system design log, and calling relation between the entities.

The entities and call relationships between the entities may be extracted from the host system design log. Common entities include process entities, file entities, and communication entities, wherein the communication entities cover internal pipes, sockets, or IP (internet protocol) addresses for external communication, etc. For example, the "process write file" is included in the audit log of the host system, and the call relationship, i.e., the write file, between the process entity and the file entity can be extracted.

Step S204, generating a link according to the calling relation between the entities, and constructing an indirect interaction relation between non-adjacent entities in the link.

Because the audit log of the host system records system events when the host runs, a large number of redundant temporary events exist, and the starting point entity and the final interaction end point entity of the host behavior are directly concerned, so that redundant information can be avoided to a certain extent, and behavior semantics can be understood more conveniently. Based on the above, a link including the entity and the calling relationship can be generated according to the calling relationship between every two entities, and the interaction relationship between the entities can be constructed according to the link. The interactive relationship is intuitively represented as a high-order connection relationship of a process entity to other reachable entities on the same directed path.

Referring to fig. 3, fig. 3 is a schematic diagram of building entity relationships in an embodiment of the present application. The ' browser ' entity and the ' sensitive file ' entity ' interaction relation exists in a ' browser-fake file-Trojan horse process-sensitive file ' link. While the "browser" entity does not interact with the "sensitive file" entity in anticipation of normal host behavior.

Step S206, taking the entity as a node of the security knowledge graph, and taking the calling relationship between the entity and the indirect interaction relationship between non-adjacent entities in the link as edges of the security knowledge graph.

The security knowledge-graph may be considered as a collection of triples. The triplet is defined as (head, interact, tail), where "head" represents a head entity, "interact" represents an interaction relationship, including call relationships and indirect interaction relationships between entities, and "tail" represents a tail entity. The interactive relationship represents high-order information or multi-hop connection information, and can provide a certain degree of behavior semantics.

The abnormal edges in the security knowledge graph refer to malicious events which can be recorded for the attack. The goal of attack tracing is to find a complete link in the security knowledge graph, which contains all the determined abnormal edges, locate the entrance of attack initiation and identify the subsequent influence of the attack on the system. The detection capability of the abnormal edge influences the attack tracing effect, and the higher the accuracy is, the better the attack tracing effect is. The method for determining the abnormal edge is not limited in the embodiment of the application, and for example, the method can be generated by manual labeling of security event operators, or the connection probability between the entities can be predicted through a link prediction model, wherein the smaller the connection probability is, the more likely the edge between the two entities is an abnormal edge, and the edge between the two entities with the connection probability smaller than a preset connection probability threshold is determined as the abnormal edge.

Step S104, determining a plurality of dependent paths in the safety knowledge graph according to the safety knowledge graph, and clustering the plurality of dependent paths to obtain a plurality of class clusters and host behavior categories corresponding to each class cluster.

The dependent paths refer to paths possibly existing in the safety knowledge graph, can be determined according to the structure of the safety knowledge graph, for example, a plurality of dependent paths in the safety knowledge graph can be determined through a graph traversal algorithm. If an exception edge belongs to a certain dependency path, the dependency path may be referred to as an exception dependency path, i.e., a path on which the exception edge depends.

In some embodiments, multiple dependent paths in the secure knowledge-graph may be determined by:

and selecting an entrance vertex from the safety knowledge graph, wherein the entrance vertex cannot be a termination vertex in the safety knowledge graph, namely the degree of emergence of the entrance vertex is larger than 0. Or when the entry vertex is a file entity or a communication entity, if the sum of the ingress and egress degrees of the entry vertex is too large, the vertex belongs to the center vertex, and the center vertex cannot be directly selected as the entry vertex, so that the sum of the ingress and egress degrees of the entry vertex is smaller than or equal to the first threshold. Or when the entry vertex is a process entity, a file entity or a communication entity, if the sum of the ingress and egress of the entry vertex is too large, the entry vertex is an explosion-dependent vertex, and the explosion-dependent vertex cannot be directly used as the entry vertex, so that the sum of the ingress and egress of the entry vertex is less than or equal to a second threshold value; the first threshold is less than the second threshold.

After the entry vertices are selected, the entry vertices may be traversed according to a depth-first search-based graph traversal algorithm to determine a plurality of candidate paths in the security knowledge graph. The traversal process requires that the time stamp recorded on the maintaining edge is increased along with the direction of the edge, so the time stamp of the vertex is recorded when the traversal is performed to a vertex, all the edges which meet the increasing of the time stamp of the vertex are added into the traversal result, the last-hop vertex of the current vertex is added in each traversal, and the time stamp from the last-hop vertex to the edge of the current vertex is maintained and is smaller than the minimum value in the time stamps of all the edges of the current vertex. The conditions for traversing backtracking are as follows: reaching a termination vertex or accessed vertex, traversing termination conditions are: there are no newly added edges. And the time stamp of the vertex in the obtained candidate path increases along with the direction of the edge.

Thereafter, a plurality of dependent paths may be determined from the plurality of candidate paths. Alternatively, each candidate path may be directly taken as a dependent path. Alternatively, it may be determined whether there is an inclusion relationship in the candidate paths, and if a candidate path is included in other N candidate paths, that is, the candidate path is a sub-path of the other N candidate paths, where N is a positive integer, N may be used as the sub-graph contribution degree of the candidate path. If a dependency path occurs multiple times in different dependency paths, it is more likely to be a underlying host behavior. Therefore, if any one of the candidate paths is a sub-path of the other candidate paths and the number of the other candidate paths is smaller than the preset number, the candidate paths are filtered, i.e. the candidate paths except the candidate paths are regarded as dependent paths.

The security knowledge graph records the system event sequence when the host operates, and corresponds to different host behaviors. The set of dependent paths is equivalent to the set of host behaviors, the more divergent dependent paths represent the more divergent in their behavior. Because the number of the dependent paths is large, semantic aggregation can be performed on the dependent paths to abstract host behaviors and cluster the dependent paths into limited groups in order to save the investigation cost of manpower. The clustering results of the same group represent a series of similar host behaviors.

In some embodiments, an embedding model TransR may be pre-trained to generate by which entities and interactions in each dependency path are mapped into embedding vectors. And obtaining the embedded vector of each dependent path according to the embedded vector of the entity and the interaction relation. The dependency path includes a plurality of triples associated with each other, and each triplet has a fixed dimension of 2×d+l, i.e., 2 times the dimension of the entity vector plus the dimension of the relation vector. The embedded vector of the dependent path may be calculated using a pooling approach of weighted summation. Summation pooling can use the embedded vectors of all triples to leverage system event information in the dependency path. Defining an embedded vector of triples as The embedding vector of the dependent path can be expressed as:

wherein the weight isThe weighting coefficient of the edge e in the summation process can be calculated by adopting a system event weight calculation method based on TF-IDF. The weights represent different system events with different impact and different importance in host behavior.

In the embodiment of the application, the system event weight calculation method based on TF-IDF is as follows:

with the time stamp in the system event, a window of time stamps of the same length (time range) is used as the document in the TF-IDF. Words within the document correspond to system events recorded within the timestamp window. Thus, the number of timestamp windows corresponds to the number of documents in the TF-IDF weighting algorithm, and the range of values of the timestamp windows reflects the collection duration of the audit log. The weight calculation formula for one system event is shown below. Adding a value to the denominator in the process of calculating the inverse document frequency can avoid the error condition that the denominator is 0. Finally, the value of the inverse document frequency is added by one, so that common system events are not given too little weight. The weight calculation method can be seen in the following formula:

weight of system event = TF x IDF.

And after calculating the embedding vectors of the dependent paths according to the weighted results of the system events in the safety knowledge graph, clustering the embedding vectors of the dependent paths through a clustering algorithm to obtain a plurality of class clusters and host behavior categories corresponding to each class cluster. In the embodiment of the application, the dependent paths can be clustered by adopting a condensed hierarchical clustering algorithm. An abstract result of host behavior can be understood as an approximate set of host behaviors. The clustering algorithm may divide all the dependent paths into different host behavior categories. The embedding vector of the dependent paths also benefits from the method of sum pooling so that all details of the system events are preserved. The agglomerative hierarchical clustering algorithm calculates euclidean distances between cluster centers to measure differences between clusters. The application designs a self-adaptive method for calculating the number of clusters. Considering that the number of dependent paths in different scenarios is different, more dependent paths generally correspond to richer host behavior categories, so the square root of the number of dependent paths can be preset as the number of clusters, while a minimum of 10 clusters can be set.

The dependency paths are represented as embedded vectors by a weighted summation method, and the aggregation hierarchical clustering algorithm is used for clustering the dependency path sets into different host behavior categories, so that semantic information of a host behavior layer is provided for an attack chain completion method.

And S106, selecting a target dependency path from the dependency paths to which each abnormal edge belongs, and merging target dependency paths with the same entity and host behavior category in the target dependency paths corresponding to all the abnormal edges to obtain a plurality of local attack chains.

The number of the dependent paths to which each abnormal edge belongs may be multiple, repeated map information is recorded among the multiple dependent paths, and if all the multiple dependent paths to which one abnormal edge belongs are used as local attack chains, a large amount of redundant information is necessarily introduced. Also, the dependency path describes one or more host behaviors, and an exception edge may actually originate from only one host behavior. Thus, the local attack chain generation process represents choosing the most consistent dependent path, i.e. the target dependent path, for each marked outlier edge in the graph. And finally summarizing target dependent paths selected by all abnormal edges to generate a set of local attack chains.

It will be appreciated that a dependency path with fewer vertices, fewer edges, more rare host behavior types, and higher outlier edge duty cycles is more valuable to choose as a local attack chain. In some embodiments, the following formula may be used:

where ε represents the dependency path P _i The number of abnormal edges included in the image reflects an index of the threat level. By counting the number of the abnormal edges in the dependent paths, the dependent paths with larger threat needing to be processed preferentially can be found out. For example, a dependency path where a malicious Trojan process exists will have a large number of outlier edges associated with the Trojan process. The dependency path tends to encompass as many outlier edges as possible, which is advantageous in reducing the number of local attack chains generated.

The number of the edges included in the dependent path is denoted as a supplementary index of the readability of the security knowledge graph, and compared with the number of the vertices depending on explosion, the fewer the number of the edges, the less easily excessive redundant information is introduced, and meanwhile, the dependency relationship of the system event is better clarified.

v represents the dependency path P _i The number of entities contained in the dependency path is the most important feature, and the smaller the number of entities in the dependency path, the more a succinctly readable local attack chain can be obtained, and the more a single host behavior is obtained instead of a series of host behaviors.

τ represents the dependency path P _i Is a rarity ranking of the host behavior categories. The anomaly detection paths are marked with abstract categories of host behavior, and by counting the number of dependent paths in each host behavior category, ordering can be from as few as multiple. The class of host behavior containing less dependent paths exhibits low frequency operation of the host and threat events also exhibit low frequency characteristics. Meanwhile, the sparse dependent path number of the host behavior class also reflects that the host behavior class is obviously different from other classes, and more attention is worth being paid in the tracing process.

After each abnormal edge selects the most appropriate dependent path, the preliminary attack tracing flow can generate a set of local attack chains. And some local attack chains may have the same entities or even the same relationships between them. The partial attack chains with the same map elements do not form a chain breakage phenomenon. Considering that the same entity appears in different local attack chains and possibly contributes to different host behaviors, firstly merging the local attack chains with the same entity and the same host behavior class, merging entity sets and relation sets in two local attack chains, reserving labels of the host behavior classes, and generating a new local attack chain to replace the original two local attack chains. By merging all the local attack chains, the rest of the local attack chains have no connectivity among them, and the connectivity among them can be positioned as the phenomenon that the attack chain is missing.

Referring to fig. 4, fig. 4 is a schematic diagram of local attack chain merging in an embodiment of the present application. The target dependent path 1 and the target dependent path 2 of the same host behavior category have the same entity 'configuration file A' to merge two target dependent paths, and the local attack chain after merging simultaneously comprises the target dependent path 1 and the target dependent path 2.

Step S108, for each local attack chain in each class cluster, carrying out link prediction on the entity in the local attack chain and the entity in other local attack chains before the timestamp corresponding to the local attack chain in the class cluster, and obtaining the connection probability between the entity in the local attack chain and the entity in other local attack chains.

Link completion refers to selecting a head entity and a tail entity from the entity sets of two local attack chains respectively to predict whether the entity pair has edges. Within the same class cluster (i.e., packets of the same host behavior class), one or more local attack chains with similar semantic meaning are contained. The link completion in the class cluster can be performed according to the time sequence of local attack chains in the packet. Since the plurality of abnormal edges in the local attack chain will cover a time range, the abnormal edge with the earliest timestamp in each local attack chain is selected as the representative of the local attack chain. And sequencing the local attack chains in the same class cluster according to the sequence of the time stamps, starting from the second local attack chain, respectively carrying out link completion with the local attack chains of the preamble, and selecting the entity pair with the highest connection probability to carry out link completion.

Because the completed links represent "interactions" in the graph, the head entity of the entity pair must be a process entity, and two entity sets need to be interchanged, so as to respectively traverse the process entity sets of the two local attack chains and determine the direction of the missing link. If there is an entity pair between two local attack chains, the entity pair has a corresponding edge on the security knowledge graph, but the edge does not belong to the two local attack chains, the "interactive" relationship with the process entity in the entity pair as the head entity can be directly added, without link prediction. To accelerate the link completion process, the link prediction results of the head entity and the tail entity may be memorized to avoid the overhead of repeated computation.

Optionally, the entity in the local attack chain and the entity in other local attack chains before the timestamp corresponding to the local attack chain may be input into a pre-established link prediction model to obtain the connection probability between the entity in the local attack chain and the entity in other local attack chains.

Step S110, selecting a first target entity in the local attack chain and a second target entity in other local attack chains with the maximum connection probability, and establishing connection between the second target entity and the first target entity.

Step S112, the earliest time stamp in the time stamps corresponding to the local attack links in each class cluster is obtained, and the earliest time stamp is used as the time stamp corresponding to the class cluster.

Similarly, since multiple local attack chains in each class cluster will cover a range of time, the local attack chain with the earliest timestamp in each class cluster is chosen as the representative of that class cluster.

Step S114, carrying out link prediction on the entities in each class cluster and the entities in other class clusters before the timestamp corresponding to the class cluster to obtain the connection probability between the entities in each class cluster and the entities in the other class clusters.

And similarly, inputting the entities in each class cluster and the entities in other class clusters into a link prediction model to obtain the connection probability between the entities in each class cluster and the entities in other class clusters.

Referring to fig. 5, fig. 5 is a schematic diagram of predicting a connection probability between a pair of entities through a link prediction model according to an embodiment of the present application. And inputting the two entities into a link prediction model to obtain corresponding connection probability. The larger the connection probability is, the more likely a connection is present between the two entities, and the smaller the connection probability is, the less likely a connection is present between the two entities.

Step S116, selecting a third target entity in the cluster and a fourth target entity in other clusters, wherein the third target entity and the fourth target entity are positioned in the other clusters, and establishing connection between the fourth target entity and the third target entity, wherein the third target entity and the fourth target entity are positioned in the other clusters and have the largest connection probability.

Referring to fig. 6, fig. 6 is a schematic diagram of attack chain completion in an embodiment of the present application. In the host behavior category "2", there are 4 local attack chains, and the last local attack chain will respectively perform link prediction with the first, second and third local attack chains, which takes the maximum value in the link prediction result of one entity pair with the second, so that the last local attack chain will be associated with the local attack chain of the preamble through the entity pair.

The link completion between class clusters (i.e. between host behavior classes) is similar to that in class clusters (i.e. within host behavior classes), and the earliest timestamp in the timestamps corresponding to each local attack chain in the class clusters is used as the timestamp corresponding to the class clusters. And respectively carrying out link completion with local attack chains in the preceding class cluster from the second class cluster according to the sequence of the time stamps corresponding to the class clusters, and selecting an entity pair with the largest connection probability to carry out link completion. The attack chain complement among the class clusters in fig. 6 starts from the local attack chain in the earliest host behavior class "3", points to the local attack chain in the host behavior class "2", and finally points to the local attack chain in the host behavior class "1".

According to the class cluster completion method, a plurality of dependence paths in the safety knowledge graph are determined by acquiring the pre-constructed safety knowledge graph, and the dependence paths with similar semantics in the clustering graph are obtained to obtain a plurality of class clusters and host behavior categories corresponding to each class cluster. Obtaining abnormal edges in the security knowledge graph, selecting target dependent paths from the dependent paths to which each abnormal edge belongs, merging target dependent paths with the same entity and host behavior category in the target dependent paths corresponding to all the abnormal edges, and locating a plurality of left local attack chains, wherein connectivity among the left local attack chains can be determined as the occurrence of the missing of the attack chains. And converting the task of attack chain completion into a link prediction task on the safety knowledge graph, wherein the link prediction predicts the probability of connection of two entities in the safety knowledge graph. And obtaining the entity pairs most likely to be connected in the two local attack links by predicting the connection probability of the entity pairs in the two local attack links (each local attack link and the local attack link before the timestamp corresponding to the local attack link) in the class cluster, and establishing the connection between the entity pairs in the class cluster. Similarly, the connection probability of entity pairs in the local attack chains in the two class clusters is predicted, and the entity pairs most likely to exist in the local attack chains in the two class clusters are obtained, so that the connection of the entity pairs between the class clusters is established. Thus, a complete restored attack chain can be obtained through link complementation, and the link restoration integrity is improved. The completely restored attack chain is displayed to security event operators and used as a tracing analysis result of a host intrusion event, so that the interpretability and the accuracy of the attack tracing result are improved.

Corresponding to the above method embodiment, the embodiment of the present application further provides an attack chain completion device, referring to fig. 7, the attack chain completion device 700 includes:

the knowledge graph acquisition module 702 is configured to acquire a pre-constructed security knowledge graph based on an audit log of the host system, and an abnormal edge in the security knowledge graph, where nodes and edges in the security knowledge graph are interaction relations between entities in the audit log of the host system;

a dependent path determination module 704, configured to determine a plurality of dependent paths in the security knowledge graph according to the security knowledge graph;

a dependent path clustering module 706, configured to cluster the multiple dependent paths to obtain multiple class clusters and host behavior classes corresponding to each class cluster;

a target dependency path determining module 708, configured to select a target dependency path from the dependency paths to which each abnormal edge belongs;

the local attack chain determining module 710 is configured to combine target dependency paths with the same entity and host behavior category in target dependency paths corresponding to all abnormal edges to obtain a plurality of local attack chains;

a first connection probability determining module 712, configured to perform, for each local attack chain in each class cluster, link prediction on an entity in the local attack chain and an entity in another local attack chain before a timestamp corresponding to the local attack chain in the class cluster, to obtain a connection probability between the entity in the local attack chain and the entity in another local attack chain;

An intra-cluster link establishment module 714, configured to select a first target entity located in the local attack chain and a second target entity located in other local attack chains with the largest connection probability, and establish a connection between the second target entity and the first target entity;

the timestamp determining module 716 is configured to obtain an earliest timestamp of timestamps corresponding to local attack links in each class cluster, and use the earliest timestamp as a timestamp corresponding to the class cluster;

a second connection probability determining module 718, configured to perform link prediction on the entities in each class cluster and the entities in other class clusters before the timestamp corresponding to the class cluster, so as to obtain a connection probability between the entities in each class cluster and the entities in other class clusters;

the inter-cluster link establishment module 720 is configured to select a third target entity located in the cluster and a fourth target entity located in another cluster, where the third target entity and the fourth target entity have the largest connection probability, and establish a connection between the fourth target entity and the third target entity.

Optionally, the dependency path determining module 704 is specifically configured to select an ingress vertex from the security knowledge graph, where the egress degree of the ingress vertex is greater than 0, or when the ingress vertex is a file entity or a communication entity, the sum of the ingress degree and the egress degree of the ingress vertex is less than or equal to a first threshold, or the sum of the ingress degree and the egress degree of the ingress vertex is less than or equal to a second threshold; the first threshold is less than the second threshold; traversing from the entrance vertexes according to a graph traversing algorithm based on depth-first search, determining a plurality of candidate paths in the safety knowledge graph, wherein the time stamp of the vertexes in the candidate paths increases along with the direction of the edge; a plurality of dependent paths is determined based on the plurality of candidate paths.

Optionally, the dependent path determining module 704 is specifically configured to determine a plurality of dependent paths according to the plurality of candidate paths by:

Optionally, the dependency path clustering module 706 is specifically configured to map the entity and the interaction relationship in each dependency path to an embedded vector through an embedding model TransR, and obtain the embedded vector of each dependency path according to the embedded vector of the entity and the interaction relationship; clustering the embedded vectors of the multiple dependent paths through a clustering algorithm to obtain multiple class clusters and host behavior categories corresponding to each class cluster.

Optionally, the first connection probability determining module 712 is specifically configured to input, for each local attack chain in each class cluster, an entity in the local attack chain and an entity in other local attack chains into a pre-established link prediction model, so as to obtain a connection probability between the entity in the local attack chain and the entity in other local attack chains;

The second connection probability determining module 718 is specifically configured to input the entities in each class cluster and the entities in other class clusters into the link prediction model, so as to obtain a connection probability between the entities in each class cluster and the entities in other class clusters.

Optionally, the attack chain completion device 700 further includes:

the interactive relation construction module is used for generating a link according to the calling relation between the entities and constructing an indirect interactive relation between non-adjacent entities in the link;

the security knowledge graph construction module is used for taking the entity as a node of the security knowledge graph, and taking the calling relationship between the entity and the indirect interaction relationship between non-adjacent entities in the link as edges of the security knowledge graph.

Specific details of each module or unit in the above apparatus have been described in the corresponding method, and thus are not described herein.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

In an exemplary embodiment of the present application, there is also provided an electronic apparatus including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to perform the attack chain completion method described above in this example embodiment.

Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the application. It should be noted that, the electronic device 800 shown in fig. 8 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.

As shown in fig. 8, the electronic device 800 includes a Central Processing Unit (CPU) 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for system operation are also stored. The central processing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, mouse, etc.; an output portion 807 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a Local Area Network (LAN) card, modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.

In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809, and/or installed from the removable media 811. When being executed by the central processing unit 801, performs the various functions defined in the apparatus of the present application.

In an embodiment of the present application, there is also provided a computer-readable storage medium having stored thereon a computer program that, when executed by a processor, implements the attack chain completion method described above.

The computer readable storage medium according to the present application may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory, a read-only memory, an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, radio frequency, and the like, or any suitable combination of the foregoing.

In an embodiment of the present application, there is also provided a computer program product, which when run on a computer, causes the computer to execute the attack chain completion method described above.

It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing is only a specific embodiment of the application to enable those skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown and described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An attack chain completion method, comprising:

aiming at each local attack chain in each class cluster, carrying out link prediction on the entity in the local attack chain and the entity in other local attack chains before the timestamp corresponding to the local attack chain in the class cluster to obtain the connection probability between the entity in the local attack chain and the entity in the other local attack chains;

2. The method of claim 1, wherein the determining a plurality of dependent paths in the secure knowledge-graph from the secure knowledge-graph comprises:

3. The method of claim 2, wherein the determining a plurality of dependent paths from the plurality of candidate paths comprises:

4. The method of claim 1, wherein clustering the plurality of dependent paths to obtain a plurality of class clusters and host behavior categories corresponding to each class cluster comprises:

5. The method of claim 1, wherein performing link prediction on the entity in the local attack chain and the entity in the other local attack chain to obtain the connection probability between the entity in the local attack chain and the entity in the other local attack chain, comprises:

6. The method of claim 1, wherein selecting the target dependency path from the dependency paths to which each exception edge belongs comprises:

according to the following formula:

7. The method of claim 1, wherein prior to the obtaining the pre-constructed host system audit log based security knowledge base map, the method further comprises:

8. An attack chain completion apparatus, the apparatus comprising:

9. An electronic device, comprising: a processor for executing a computer program stored in a memory, which when executed by the processor implements the method of any of claims 1-7.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of claims 1-7.