CN115632888A

CN115632888A - Attack path restoration method and system based on graph algorithm

Info

Publication number: CN115632888A
Application number: CN202211651409.0A
Authority: CN
Inventors: 王冲华; 郝志强; 林晨; 周昊; 李俊; 樊佩茹; 曲海阔; 刘奕彤; 李文婷; 张雪莹; 韦彦
Original assignee: China Industrial Control Systems Cyber Emergency Response Team
Current assignee: China Industrial Control Systems Cyber Emergency Response Team
Priority date: 2022-12-22
Filing date: 2022-12-22
Publication date: 2023-01-20
Anticipated expiration: 2042-12-22
Also published as: CN115632888B

Abstract

The invention relates to an attack path restoration method and system based on a graph algorithm, relating to the field of safety attack and defense, wherein the method comprises the steps of respectively constructing a heteromorphic graph and a directed weighted graph according to an original alarm log; determining a same composition according to the different composition and the meta path; calculating the embedded representation of the event by using a graph convolution network algorithm according to the isomorphic graph; computing an embedded representation of the device using a large scale information network embedding algorithm according to the directed weighted graph; and respectively determining the similarity of event type nodes and the similarity of equipment IP nodes by utilizing a similarity algorithm according to the embedded representation of the event and the embedded representation of the equipment, or determining the similarity of the event type nodes and the similarity of platform IP nodes, and traversing and backtracking the original alarm logs in the same time window to obtain an attack path. The invention can reduce the dependency on domain knowledge and expert knowledge and improve the reduction efficiency.

Description

Attack path restoration method and system based on graph algorithm

Technical Field

The invention relates to the field of security attack and defense, in particular to an attack path restoration method and system based on a graph algorithm.

Background

An Industrial Control System (ICS) is a type of industrial production automation system, and mainly comprises devices such as a physical process, a sensor, an actuator, a controller, a human-computer interface, a database and the like.

Network security in the traditional industrial control field focuses on intrusion detection and the like, and is often difficult to find out for higher-level threats such as long-term APT (active packet inspection) attacks, the attacks are hidden and have long duration, the process is complex, meanwhile, an attacker has a clear intention, the interests of an attack target are damaged, important information is stolen, and along with the technical progress, the attack means are more complex, so that attention needs to be paid to the attacks. The existing multi-step tracking attack method for industrial control site comprises the following steps: the causal relationship among the attacks is analyzed to carry out multi-step association analysis, but the method has high design requirement, complex definition and high realization difficulty, and is not suitable for unknown attacks and attacks without obvious precause relationship; event correlation analysis is carried out by using known attack scenes and knowledge mined by existing data sets, such as an expert knowledge base, and the like.

Most of the existing attack chain restoration aiming at the information security field of the industrial control system is manually assisted based on rules or expert knowledge. The process consumes manpower and has low efficiency; the requirements on the rules are high, and attack chains which are not in the rules cannot be found.

Disclosure of Invention

The invention aims to provide an attack path restoration method and system based on a graph algorithm, so that the dependence on domain knowledge and expert knowledge is reduced, and the restoration efficiency is improved.

In order to achieve the purpose, the invention provides the following scheme:

an attack path restoration method based on a graph algorithm comprises the following steps:

acquiring an original alarm log of an industrial control site; the original alarm log comprises industrial host data and service application data;

respectively constructing an abnormal graph and a directed weighted graph according to the original alarm log; the directed weighted graph is a graph structure representing the connection relation between equipment and a platform and the attack times;

determining a same composition according to the abnormal composition and the meta path; the nodes of the same composition are events; the isomorphic graph represents the event type and the connection relation between the events;

calculating an embedded representation of the event using a graph convolution network algorithm from the isomorphic graph;

computing an embedded representation of the device using a large scale information network embedding algorithm according to the directed weighted graph;

determining event type node similarity and equipment IP node similarity or determining event type node similarity and platform IP node similarity according to the embedded representation of the event and the embedded representation of the equipment by utilizing a similarity algorithm;

when the event type node similarity and the equipment IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the equipment IP node similarity to obtain an attack path; and when the event type node similarity and the platform IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the platform IP node similarity to obtain an attack path.

Optionally, the respectively constructing an abnormal graph and a directed weighted graph according to the original alarm log specifically includes:

mapping all devices and events in the original alarm log into nodes of a graph structure, and constructing an abnormal graph by taking whether the alarm events in the alarm log occur on the devices or not as edges of the graph structure;

mapping each device in the original alarm log into a node of a graph structure, and constructing a directed weighted graph based on all the connection relations and weights; the connection relation comprises the connection relation between equipment, the connection relation between equipment and a platform, and the connection relation between the platform and the platform; the weights include device to device, device to platform, and weight between platform to platform.

Optionally, the computing an embedded representation of a device using a large-scale information network embedding algorithm according to the directed weighted graph specifically includes:

computing an embedded representation of the device using second order similarity in a large scale information network embedding algorithm according to the directed weighted graph.

Optionally, when the event type node similarity and the device IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the device IP node similarity to obtain an attack path, specifically including:

when the event type node similarity and the equipment IP node similarity are determined, traversing the original alarm log in the same time window by using a spark partitioning method according to the event type node similarity and the equipment IP node similarity to obtain an attack path table;

and determining an attack path according to the attack path table.

Optionally, when the event type node similarity and the platform IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the platform IP node similarity to obtain an attack path, specifically including:

when the event type node similarity and the platform IP node similarity are determined, traversing the original alarm log in the same time window by using a spark partitioning method according to the event type node similarity and the platform IP node similarity to obtain an attack path table;

and determining an attack path according to the attack path table.

An attack path restoration system based on a graph algorithm comprises:

the attack log collection module is used for acquiring an original alarm log of an industrial control field; the original alarm log comprises industrial host data and service application data;

the attack log graph structuring module is used for respectively constructing an abnormal graph and a directed weighted graph according to the original alarm log; the directed weighted graph is a graph structure representing the connection relation and attack times between the equipment and the platform;

the homogeneous composition determining module is used for determining a homogeneous composition according to the heterogeneous composition and the meta path; the nodes of the same composition are events; the isomorphic graph represents the event type and the connection relation between the events;

the graph convolution network algorithm embedded representation calculation module is used for calculating the embedded representation of the event by utilizing the graph convolution network algorithm according to the isomorphic graph;

the large-scale information network embedding algorithm embedding expression calculation module is used for calculating the embedding expression of the equipment by utilizing the large-scale information network embedding algorithm according to the directed weighted graph;

the similarity calculation module is used for determining the similarity of event type nodes and the similarity of equipment IP nodes or determining the similarity of event type nodes and the similarity of platform IP nodes by utilizing a similarity algorithm according to the embedded representation of the event and the embedded representation of the equipment respectively;

the attack path restoration module is used for traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the equipment IP node similarity when the event type node similarity and the equipment IP node similarity are determined, so as to obtain an attack path; and when the event type node similarity and the platform IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the platform IP node similarity to obtain an attack path.

Optionally, the attack log graph structuring module specifically includes:

the heterogeneous graph constructing unit is used for mapping all the devices and events in the original alarm log into nodes of a graph structure, and constructing a heterogeneous graph by using whether the alarm event in the alarm log occurs on the device as an edge of the graph structure;

the directed weighted graph construction unit is used for mapping each device in the original alarm log into a node of a graph structure and constructing a directed weighted graph based on all the connection relations and weights; the connection relation comprises the connection relation between equipment, the connection relation between equipment and a platform, and the connection relation between the platform and the platform; the weights include weights between devices, devices and platforms, and platforms and between platforms.

Optionally, the large-scale information network embedding algorithm embedding expression calculation module specifically includes:

and the embedded representation calculation unit of the equipment is used for calculating the embedded representation of the equipment by utilizing the second-order similarity in the large-scale information network embedding algorithm according to the directed weighted graph.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

the invention obtains the original alarm log of the industrial control site; the original alarm log comprises industrial host data and service application data; respectively constructing an abnormal graph and a directed weighted graph according to the original alarm log; the directed weighted graph is a graph structure representing the connection relation between equipment and a platform and the attack times; determining a same composition according to the abnormal composition and the meta path; the nodes of the same composition are events; the isomorphic graph represents the event type and the connection relation between the events; calculating an embedded representation of the event using a graph convolution network algorithm from the isomorphic graph; computing an embedded representation of the device using a large scale information network embedding algorithm according to the directed weighted graph; determining event type node similarity and device IP node similarity or determining event type node similarity and platform IP node similarity according to the embedded representation of the event and the embedded representation of the device respectively by using a similarity algorithm; when the event type node similarity and the equipment IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the equipment IP node similarity to obtain an attack path; and traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the platform IP node similarity to obtain an attack path when the event type node similarity and the platform IP node similarity are determined. The invention collects alarm logs of an industrial control site at regular time, learns the alarm logs and outputs the result of the attack process without prior knowledge and human assistance, and the result is more accurate along with the more learned knowledge. The invention converts the alarm logs of the industrial control site into the graph structure, for the massive alarm logs, the graph structure is more consistent with the scene, meanwhile, the calculation speed is higher, and the similar equipment/events of the equipment and the events can be learned.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart of an attack path restoration method based on graph algorithm according to the present invention;

FIG. 2 is a schematic diagram of an attack path restoration system based on a graph algorithm according to the present invention;

FIG. 3 is a heterogeneous graph obtained based on an original alarm event provided by the present invention;

FIG. 4 is a schematic diagram of the metamorphic graph and the meta-path of FIG. 3;

fig. 5 is an explanatory diagram of the first order similarity and the second order similarity provided by the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, the present invention is described in detail with reference to the accompanying drawings and the detailed description thereof.

As shown in fig. 1, the attack path restoration method based on the graph algorithm according to the present invention calculates similar devices through embedded vectors of corresponding IPs of a computing device/platform, and restores an attack path, including:

step 101: acquiring an original alarm log of an industrial control site; the raw alarm log includes industrial host data and business application data.

Firstly, preprocessing an alarm log in an industrial control field environment, including industrial host data and business application data acquisition.

Industrial host data: and collecting user operation logs, illegal external connection logs, safety baseline logs, mobile storage logs and the like for industrial field hosts such as an operator station, an engineer station, a data server and the like.

Service application data: by collecting service flow, logs of a host and a server and industrial control equipment/systems (PLC, RTU, winCC and the like), engineering file downloading/changing logs, instruction timing sequence and period logs, service operation frequency logs, key action logs, important configuration changing logs, state and alarm logs of control equipment, real-time database operation logs and the like are extracted.

These data include the originating device (source IP), the attacked device (destination IP), the event type, and the time of occurrence.

Step 102: respectively constructing an abnormal graph and a directed weighted graph according to the original alarm log; the directed weighted graph is a graph structure representing the connection relationship between the equipment and the platform and the attack times.

Step 102, specifically comprising: mapping all devices and events in the original alarm log into nodes of a graph structure, and constructing a heteromorphic graph by taking whether the alarm events in the alarm log occur on the devices as edges of the graph structure; if a certain alarm event occurs between the two devices, adding two edges to the three nodes; mapping each device in the original alarm log into a node of a graph structure, and constructing a directed weighted graph based on all the connection relations and weights; the connection relation comprises the connection relation between equipment, the connection relation between equipment and a platform, and the connection relation between the platform and the platform; the weights include device to device, device to platform, and weight between platform to platform.

1) Firstly, constructing a heteromorphic graph based on all log data, wherein the construction principle of the graph is as follows: each alarm event is mapped to a node of a graph structure, an IP (Internet protocol) to which each device/platform belongs is mapped to a node, and if a certain alarm event occurs between two devices/platforms, two edges are added between three nodes, wherein the direction is not specified.

As shown in the collected alarms table 1:

a graph structure of the form shown in fig. 3 can be obtained, with the circular nodes representing IPs and the square nodes representing events:

the design meta-path is: event-IP-event (ignoring the difference between the source and destination IP), the same graph as shown in fig. 4 between event nodes can be obtained, and is denoted as G1= (V, edge), V is a node, and represents an event type, and Edge is its connection relationship.

2) Finding out the connection relation and the weight among all the devices/platforms according to the original log data, wherein the weight refers to: if multiple attacks occur between the device 1 and the device 2/platform 2, the attack times are recorded as weights. So here table 2 is obtained in the form of data similar to table 1:

according to the information, a directed weighted graph G2= (V, edge, w) can be generated, wherein V is a node, edge is an Edge, the Edge is directional and is consistent with the attack direction occurring between the devices and represents the connection relation, and w is the attack frequency and is used for calculating the second-order similarity of the devices.

Step 103: determining a same composition according to the abnormal composition and the meta path; the nodes of the same composition are events; the isomorphic graph represents the event type and the connection relationship between the events.

Step 104: an embedded representation of the event is computed from the isomorphic graph using a graph convolution network algorithm.

Step 105: computing an embedded representation of the device using a large scale information network embedding algorithm according to the directed weighted graph. Step 105, specifically comprising: computing an embedded representation of the device using second order similarity in a large scale information network embedding algorithm according to the directed weighted graph.

The invention adopts a mode of combining GCN in a graph algorithm and a large-scale information network embedding algorithm to calculate the embedding representation among all nodes, and the algorithm can learn to embed a network with millions of vertexes and billions of edges in a few hours on a typical single machine.

And calculating the embedded representation of the event by using a graph convolution network algorithm (GCN) for the G1 to obtain the similarity between the event and the event. The nodes of fig. G1 are events, the GCN is the existing algorithm, and the output is an embedded representation of the event, e.g. event a: 0.2,0.5,0.8., which is understood here as a feature vector. And subsequently calculating the similarity according to a vector similarity algorithm.

For the G2, a large-scale information network embedding algorithm is used, and two similarities are proposed, namely, a 1-order similarity and a 2-order similarity. The nodes of fig. G2 are devices. The similarity of order 2 is used to describe indirect similarity of nodes, and as shown in fig. 5, although there is no direct edge between the device 5 and the device 6, both have a relationship with the device/platform (1, 2,3, 4), which may also indicate that the device 5 and the device 6 are similar. The formalization is defined as: let p be _u = (w _u,1 , . . . , w _u,|V| ) Representing the 1 st order similarity between the vertex u and all other vertices, the 2 nd order similarity between u and v can be represented by p _u And p _v Is expressed in terms of similarity. If there is no identical neighbor vertex between u and v, the 2 nd order similarity is 0. The 2 nd order similarity performance learns the relation between the same attacking/attacked device IP.

The 1 st order represents the direct similarity between nodes, such as 6 and 7 in fig. 5, and the two nodes are similar when they are close to each other, which is already shown in G1, so the 1 st order similarity is not calculated here. The similarity of 2 order represents the indirect connection between nodes, such as 5 and 6 in fig. 5, which have no direct connection but are connected with 1,2,3,4, so 5 and 6 also have certain similarity, which is the similarity of 2 order. For G2, only second order similarities are calculated.

Therefore, the embedded representation of the node is calculated by adopting a mode of fusing the two algorithms. Inputting the two graphs into GCN and LINE models respectively, training the models, adjusting hyper-parameters (embedding _ dim, batch _ size, learning _ rate, num _ lots) until loss is reduced to a set threshold value, saving the models, outputting the expression vectors of each event and IP by the models, and obtaining two kinds of embedded expressions respectively.

Step 106: and determining the similarity of event type nodes and the similarity of equipment IP nodes or determining the similarity of event type nodes and the similarity of platform IP nodes by utilizing a similarity algorithm according to the embedded representation of the event and the embedded representation of the equipment respectively.

And embedding the nodes represented by the two calculated vectors, finding out the most similar Top n of each IP or event type by using a similarity algorithm, wherein the Top n represents the first n event types or IPs with the highest similarity, adopting cosine similarity, screening again according to a set similarity threshold, and finding out all IP/event pairs with the similarity larger than the threshold. As shown in tables 3 and 4:

step 107: when the event type node similarity and the equipment IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the equipment IP node similarity to obtain an attack path; and when the event type node similarity and the platform IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the platform IP node similarity to obtain an attack path.

When the event type node similarity and the device IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the device IP node similarity to obtain an attack path, which specifically includes: when the event type node similarity and the equipment IP node similarity are determined, traversing the original alarm log in the same time window by using a spark partitioning method according to the event type node similarity and the equipment IP node similarity to obtain an attack path table; and determining an attack path according to the attack path table.

When the event type node similarity and the platform IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the platform IP node similarity to obtain an attack path, which specifically includes: when the event type node similarity and the platform IP node similarity are determined, traversing the original alarm log in the same time window by using a spark partitioning method according to the event type node similarity and the platform IP node similarity to obtain an attack path table; and determining an attack path according to the attack path table. spark is a fast, general-purpose computing engine designed for large-scale data processing.

The part runs at regular time, for example, the part runs once per hour, only the alarm of the hour is calculated each time, the obtained attack path is stored in a big data read-write component such as redis and the like in a dictionary form, and the result of the previous round of attack path is read and updated when the part runs next hour.

Based on the similar device table and the similar event type table, traversing and backtracking the alarm event containing the characteristics of event type, time, IP and the like in the same time window, wherein the specific process is as follows:

1. firstly, the alarm streams of the batch (for example, within one hour, the time window may be set by itself) are preprocessed, and each alarm event includes specific information of the event, such as: the source and destination IP, ports, event types, time and the like are sequentially ordered according to the occurrence time.

Traversing the events one by one according to the occurrence time, and recording the alarm flow generated by a certain device as: event 1, event 2, event 3.. Event 11, for event 1, find its related events, traverse event pair (1, 2), (1, 3), (1, 4.) (1, 11), where spark partition parallel comparison may be used to improve computational efficiency, if two events compared satisfy IP and the event types are similar, such as event pair (1, 3), where event type of event 1 is a, device IP is a, event type of event 3 is C, device IP is f, according to the already calculated similar device table and similar event type table, i.e. table 3 and table 4, it can be known that device a and device f are similar, and event type a and event type C are similar, so that event 3 is called the related event of event 1, and so on, other related events of event 1 may be obtained, then add event 1 and its related event 3.

Firstly, traversing and comparing the alarm streams in the batch with the alarm events in a history period, wherein spark partitioning parallel comparison can be used, the calculation efficiency is improved, if the two compared events meet the IP and the event types are similar, the two compared events can be sequentially added into an attack chain, then the alarm streams in the batch are compared pairwise according to the mode, and the attack path table is updated in the same mode;

2. and updating the attack path table in sequence, wherein the updating principle is as follows:

(1) if no new event is added to an attack path for n days, the attack path table is not updated any more (the time can be set by self), and the attack path table is stored into another table to be used as a historical attack path table and is not read any more in the next updating.

(2) If the relevant events of the event a are all in an attack path, the event a is directly put into the attack path, and the time is updated.

(3) If the relevant event of the event a is in a plurality of attack paths, combining the attack paths where the relevant event is located, then putting the a in the attack paths, deleting the original attack paths and updating the time.

(4) If the relevant events of the event a are not in the attack path, putting a and the relevant events into a new attack path.

(5) If the event a has no related event, the event a is taken as an attack link and stored in the total attack path table so as to ensure that the event a cannot be missed under the condition that the related event is added subsequently.

Attack path result storage sample: "id": uuid1"," attecklist "[" event id1"," event id2"," event id3"," event id4"]," update time ": "2022-03-25td00: "2022-03-25tb00". The sample format is json, a field required to be stored in an attack path result is represented, id represents the unique identification of the attack path, and concrete id of each event is stored in the field of the attack path, and then the start time and the update time of the attack path are stored. Wherein, the attecklist in the field is an attack chain, the updatetime is the update time, and the starttime is the attack time.

Before the timing task updates the attack path result each time, it is first determined whether the attribute in the result is expired, and the updatetime is removed from the current time for a longer time, so that only the result in the time window is updated (the time can be set by itself). If an attack path has not been added with new alarms for a long time, it indicates that the attacker may have finished the attack, so it is not necessary to update the attack path, and the method here is: and calculating the time difference between the current time and the update time field in the result, if the time difference is greater than a set threshold (such as one month), the attack path is considered to be expired, the attack path is stored in another table, and the expired attack path does not participate when the whole result is updated next time. For the attack chain which is not updated for a long time, the attack chain is put into other tables of the redis, so that the traversal attack chain can be reduced, the calculation efficiency is improved, and the content of the aged attack chain can be checked.

Each device/platform has its own IP and is not generally changed. And collecting alarm logs in a period of time in the form of what alarm occurs between the source IP and the destination IP, and constructing graph structures for all alarms according to the connection relation between the source IP and the destination IP. The data of the graph structure may compute an embedded representation of each node for subsequent computation of similarities between the various nodes.

The invention provides an attack path restoration method based on a graph algorithm aiming at an industrial control scene, which can effectively trace the source of paths which continuously and slowly span various devices at the same time, collect massive industrial field security alarm events as a data set of the invention, process the events into a graph structure according to information such as device IP (Internet protocol), event types and the like, then find out similar nodes of each node by calculating the similarity among the nodes, and finally restore the attack path according to the similar nodes.

Processing raw data into data with two graph structures, wherein the first graph structure is as follows: collecting alarm events in an industrial control scene, expressing the alarm events in a heterogeneous graph mode, extracting the alarm events into a isomorphic graph among the events based on a meta path, and neglecting the direction and the times of attack events to be used for calculating the similarity among the event types; the second method is as follows: and mapping an attack path between the devices/platforms, the occurrence frequency and the direction of an attack event into a graph structure to obtain a directed weighted graph for calculating the second-order similarity between the nodes. The invention combines the occurrence frequency and direction of the attack event and the relationship between the types of the events, is more accurate than a single calculation mode, does not need prior knowledge in the process, and can greatly reduce the burden of manpower analysis.

The invention can continuously learn the alarm event in the industrial scene and can learn a new attack mode; the program designed by the invention is a timing task, can be manually set for running once at intervals according to the hardware condition of equipment, finds out the alarm which accords with the attack chain mode in the time under the industrial control scene, and greatly reduces the consumed manpower.

As shown in fig. 2, the present invention further provides an attack path restoration system based on a graph algorithm, including:

the attack log collection module is used for acquiring an original alarm log of an industrial control field; the raw alarm log includes industrial host data and business application data.

The attack log graph structuring module is used for respectively constructing an abnormal graph and a directed weighted graph according to the original alarm log; the directed weighted graph is a graph structure representing the connection relation between the equipment and the platform and the attack times.

The homogeneous composition determining module is used for determining a homogeneous composition according to the heterogeneous composition and the meta path; the nodes of the same composition are events; the isomorphic graph represents the event type and the connection relationship between the events.

And the graph convolution network algorithm embedded representation calculation module is used for calculating the embedded representation of the event by utilizing the graph convolution network algorithm according to the isomorphic graph.

And the large-scale information network embedding algorithm embedded expression calculation module is used for calculating the embedded expression of the equipment by utilizing the large-scale information network embedding algorithm according to the directed weighted graph.

And the similarity calculation module is used for determining the similarity of the event type node and the similarity of the equipment IP node or determining the similarity of the event type node and the similarity of the platform IP node by utilizing a similarity algorithm according to the embedded representation of the event and the embedded representation of the equipment respectively.

The attack path restoration module is used for traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the equipment IP node similarity when the event type node similarity and the equipment IP node similarity are determined, so as to obtain an attack path; and when the event type node similarity and the platform IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the platform IP node similarity to obtain an attack path. Wherein, the attack path restoring module is the attack path tracing module in fig. 2.

As an optional implementation manner, the attack log graph structuring module specifically includes:

and the heterogeneous graph construction unit is used for mapping all the devices and events in the original alarm log into nodes of a graph structure, and constructing the heterogeneous graph by using whether the alarm events in the alarm log occur on the devices as edges of the graph structure.

The directed weighted graph construction unit is used for mapping each device in the original alarm log into a node of a graph structure and constructing a directed weighted graph based on all the connection relations and weights; the connection relation comprises the connection relation between equipment, the connection relation between equipment and a platform, and the connection relation between the platform and the platform; the weights include device to device, device to platform, and weight between platform to platform.

As an optional implementation manner, the large-scale information network embedding algorithm embedding expression calculation module specifically includes:

an embedded representation computation unit of the device for computing an embedded representation of the device using second order similarity in a large scale information network embedding algorithm according to the directed weighted graph.

The invention integrates IP and embedded representation of events, thereby ensuring that the tracing result of the attack chain is more accurate and the misinformation is less; the method regularly collects the alarm data of the industrial control site, learns the alarm data and outputs the result of the attack process, prior knowledge and artificial assistance are not needed, and the more the learned knowledge is, the more accurate the result is. The invention converts the alarm logs of the industrial control site into the graph structure, for the massive alarm logs, the graph structure is more consistent with the scene, meanwhile, the calculation speed is higher, and the similar equipment/events of the equipment and the events can be learned.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. An attack path restoration method based on a graph algorithm is characterized by comprising the following steps:

respectively constructing an abnormal graph and a directed weighted graph according to the original alarm log; the directed weighted graph is a graph structure representing the connection relation and attack times between the equipment and the platform;

determining the same composition according to the different composition and the meta path; the nodes of the same composition are events; the isomorphic graph represents the event type and the connection relation between the events;

calculating the embedded representation of the event by utilizing a graph convolution network algorithm according to the isomorphic graph;

when the event type node similarity and the equipment IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the equipment IP node similarity to obtain an attack path; and traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the platform IP node similarity to obtain an attack path when the event type node similarity and the platform IP node similarity are determined.

2. The attack path restoration method based on the graph algorithm according to claim 1, wherein the respectively constructing an abnormal graph and a directed weighted graph according to the original alarm log specifically comprises:

mapping each device in the original alarm log into a node of a graph structure, and constructing a directed weighted graph based on all connection relations and weights; the connection relation comprises the connection relation between equipment, the connection relation between equipment and a platform, and the connection relation between the platform and the equipment; the weights include device to device, device to platform, and weight between platform to platform.

3. The attack path restoration method based on the graph algorithm according to claim 1, wherein the computing of the embedded representation of the device by using the large-scale information network embedding algorithm according to the directed weighted graph specifically comprises:

utilizing an embedded representation of a second order similarity computing device in a large scale information network embedding algorithm according to the directed weighted graph.

4. The attack path restoration method based on graph algorithm according to claim 1, wherein when determining the event type node similarity and the device IP node similarity, traversing and backtracking the original alarm log within the same time window according to the event type node similarity and the device IP node similarity to obtain an attack path specifically includes:

and determining an attack path according to the attack path table.

5. The attack path restoration method based on the graph algorithm according to claim 1, wherein when the event type node similarity and the platform IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the platform IP node similarity to obtain an attack path, specifically comprising:

and determining an attack path according to the attack path table.

6. An attack path restoration system based on a graph algorithm, comprising:

the homogeneous composition determining module is used for determining a homogeneous composition according to the heterogeneous composition and the element path; the nodes of the same composition are events; the isomorphic graph represents the event type and the connection relation between the events;

the large-scale information network embedding algorithm embedded expression calculation module is used for calculating the embedded expression of the equipment by utilizing the large-scale information network embedding algorithm according to the directed weighted graph;

the similarity calculation module is used for determining the similarity of event type nodes and the similarity of equipment IP nodes or determining the similarity of event type nodes and the similarity of platform IP nodes by utilizing a similarity algorithm according to the embedded representation of the event and the embedded representation of the equipment;

the attack path restoration module is used for traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the equipment IP node similarity when the event type node similarity and the equipment IP node similarity are determined, so as to obtain an attack path; and traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the platform IP node similarity to obtain an attack path when the event type node similarity and the platform IP node similarity are determined.

7. The system for restoring attack paths based on graph algorithm according to claim 6, wherein the attack log graph structuring module specifically includes:

the heterogeneous graph constructing unit is used for mapping all the devices and events in the original alarm log into nodes of a graph structure, and constructing a heterogeneous graph by taking whether the alarm events in the alarm log occur on the devices as edges of the graph structure;

8. The system for attack path restoration based on graph algorithm according to claim 6, wherein the large-scale information network embedding algorithm embedding expression calculation module specifically comprises: