CN115632888A - Attack path restoration method and system based on graph algorithm - Google Patents

Attack path restoration method and system based on graph algorithm Download PDF

Info

Publication number
CN115632888A
CN115632888A CN202211651409.0A CN202211651409A CN115632888A CN 115632888 A CN115632888 A CN 115632888A CN 202211651409 A CN202211651409 A CN 202211651409A CN 115632888 A CN115632888 A CN 115632888A
Authority
CN
China
Prior art keywords
graph
similarity
platform
node similarity
event type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211651409.0A
Other languages
Chinese (zh)
Other versions
CN115632888B (en
Inventor
王冲华
郝志强
林晨
周昊
李俊
樊佩茹
曲海阔
刘奕彤
李文婷
张雪莹
韦彦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Industrial Control Systems Cyber Emergency Response Team
Original Assignee
China Industrial Control Systems Cyber Emergency Response Team
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Industrial Control Systems Cyber Emergency Response Team filed Critical China Industrial Control Systems Cyber Emergency Response Team
Priority to CN202211651409.0A priority Critical patent/CN115632888B/en
Publication of CN115632888A publication Critical patent/CN115632888A/en
Application granted granted Critical
Publication of CN115632888B publication Critical patent/CN115632888B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/12Applying verification of the received information
    • H04L63/126Applying verification of the received information the source of the received data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/22Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks comprising specially adapted graphical user interfaces [GUI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to an attack path restoration method and system based on a graph algorithm, relating to the field of safety attack and defense, wherein the method comprises the steps of respectively constructing a heteromorphic graph and a directed weighted graph according to an original alarm log; determining a same composition according to the different composition and the meta path; calculating the embedded representation of the event by using a graph convolution network algorithm according to the isomorphic graph; computing an embedded representation of the device using a large scale information network embedding algorithm according to the directed weighted graph; and respectively determining the similarity of event type nodes and the similarity of equipment IP nodes by utilizing a similarity algorithm according to the embedded representation of the event and the embedded representation of the equipment, or determining the similarity of the event type nodes and the similarity of platform IP nodes, and traversing and backtracking the original alarm logs in the same time window to obtain an attack path. The invention can reduce the dependency on domain knowledge and expert knowledge and improve the reduction efficiency.

Description

Attack path restoration method and system based on graph algorithm
Technical Field
The invention relates to the field of security attack and defense, in particular to an attack path restoration method and system based on a graph algorithm.
Background
An Industrial Control System (ICS) is a type of industrial production automation system, and mainly comprises devices such as a physical process, a sensor, an actuator, a controller, a human-computer interface, a database and the like.
Network security in the traditional industrial control field focuses on intrusion detection and the like, and is often difficult to find out for higher-level threats such as long-term APT (active packet inspection) attacks, the attacks are hidden and have long duration, the process is complex, meanwhile, an attacker has a clear intention, the interests of an attack target are damaged, important information is stolen, and along with the technical progress, the attack means are more complex, so that attention needs to be paid to the attacks. The existing multi-step tracking attack method for industrial control site comprises the following steps: the causal relationship among the attacks is analyzed to carry out multi-step association analysis, but the method has high design requirement, complex definition and high realization difficulty, and is not suitable for unknown attacks and attacks without obvious precause relationship; event correlation analysis is carried out by using known attack scenes and knowledge mined by existing data sets, such as an expert knowledge base, and the like.
Most of the existing attack chain restoration aiming at the information security field of the industrial control system is manually assisted based on rules or expert knowledge. The process consumes manpower and has low efficiency; the requirements on the rules are high, and attack chains which are not in the rules cannot be found.
Disclosure of Invention
The invention aims to provide an attack path restoration method and system based on a graph algorithm, so that the dependence on domain knowledge and expert knowledge is reduced, and the restoration efficiency is improved.
In order to achieve the purpose, the invention provides the following scheme:
an attack path restoration method based on a graph algorithm comprises the following steps:
acquiring an original alarm log of an industrial control site; the original alarm log comprises industrial host data and service application data;
respectively constructing an abnormal graph and a directed weighted graph according to the original alarm log; the directed weighted graph is a graph structure representing the connection relation between equipment and a platform and the attack times;
determining a same composition according to the abnormal composition and the meta path; the nodes of the same composition are events; the isomorphic graph represents the event type and the connection relation between the events;
calculating an embedded representation of the event using a graph convolution network algorithm from the isomorphic graph;
computing an embedded representation of the device using a large scale information network embedding algorithm according to the directed weighted graph;
determining event type node similarity and equipment IP node similarity or determining event type node similarity and platform IP node similarity according to the embedded representation of the event and the embedded representation of the equipment by utilizing a similarity algorithm;
when the event type node similarity and the equipment IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the equipment IP node similarity to obtain an attack path; and when the event type node similarity and the platform IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the platform IP node similarity to obtain an attack path.
Optionally, the respectively constructing an abnormal graph and a directed weighted graph according to the original alarm log specifically includes:
mapping all devices and events in the original alarm log into nodes of a graph structure, and constructing an abnormal graph by taking whether the alarm events in the alarm log occur on the devices or not as edges of the graph structure;
mapping each device in the original alarm log into a node of a graph structure, and constructing a directed weighted graph based on all the connection relations and weights; the connection relation comprises the connection relation between equipment, the connection relation between equipment and a platform, and the connection relation between the platform and the platform; the weights include device to device, device to platform, and weight between platform to platform.
Optionally, the computing an embedded representation of a device using a large-scale information network embedding algorithm according to the directed weighted graph specifically includes:
computing an embedded representation of the device using second order similarity in a large scale information network embedding algorithm according to the directed weighted graph.
Optionally, when the event type node similarity and the device IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the device IP node similarity to obtain an attack path, specifically including:
when the event type node similarity and the equipment IP node similarity are determined, traversing the original alarm log in the same time window by using a spark partitioning method according to the event type node similarity and the equipment IP node similarity to obtain an attack path table;
and determining an attack path according to the attack path table.
Optionally, when the event type node similarity and the platform IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the platform IP node similarity to obtain an attack path, specifically including:
when the event type node similarity and the platform IP node similarity are determined, traversing the original alarm log in the same time window by using a spark partitioning method according to the event type node similarity and the platform IP node similarity to obtain an attack path table;
and determining an attack path according to the attack path table.
An attack path restoration system based on a graph algorithm comprises:
the attack log collection module is used for acquiring an original alarm log of an industrial control field; the original alarm log comprises industrial host data and service application data;
the attack log graph structuring module is used for respectively constructing an abnormal graph and a directed weighted graph according to the original alarm log; the directed weighted graph is a graph structure representing the connection relation and attack times between the equipment and the platform;
the homogeneous composition determining module is used for determining a homogeneous composition according to the heterogeneous composition and the meta path; the nodes of the same composition are events; the isomorphic graph represents the event type and the connection relation between the events;
the graph convolution network algorithm embedded representation calculation module is used for calculating the embedded representation of the event by utilizing the graph convolution network algorithm according to the isomorphic graph;
the large-scale information network embedding algorithm embedding expression calculation module is used for calculating the embedding expression of the equipment by utilizing the large-scale information network embedding algorithm according to the directed weighted graph;
the similarity calculation module is used for determining the similarity of event type nodes and the similarity of equipment IP nodes or determining the similarity of event type nodes and the similarity of platform IP nodes by utilizing a similarity algorithm according to the embedded representation of the event and the embedded representation of the equipment respectively;
the attack path restoration module is used for traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the equipment IP node similarity when the event type node similarity and the equipment IP node similarity are determined, so as to obtain an attack path; and when the event type node similarity and the platform IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the platform IP node similarity to obtain an attack path.
Optionally, the attack log graph structuring module specifically includes:
the heterogeneous graph constructing unit is used for mapping all the devices and events in the original alarm log into nodes of a graph structure, and constructing a heterogeneous graph by using whether the alarm event in the alarm log occurs on the device as an edge of the graph structure;
the directed weighted graph construction unit is used for mapping each device in the original alarm log into a node of a graph structure and constructing a directed weighted graph based on all the connection relations and weights; the connection relation comprises the connection relation between equipment, the connection relation between equipment and a platform, and the connection relation between the platform and the platform; the weights include weights between devices, devices and platforms, and platforms and between platforms.
Optionally, the large-scale information network embedding algorithm embedding expression calculation module specifically includes:
and the embedded representation calculation unit of the equipment is used for calculating the embedded representation of the equipment by utilizing the second-order similarity in the large-scale information network embedding algorithm according to the directed weighted graph.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention obtains the original alarm log of the industrial control site; the original alarm log comprises industrial host data and service application data; respectively constructing an abnormal graph and a directed weighted graph according to the original alarm log; the directed weighted graph is a graph structure representing the connection relation between equipment and a platform and the attack times; determining a same composition according to the abnormal composition and the meta path; the nodes of the same composition are events; the isomorphic graph represents the event type and the connection relation between the events; calculating an embedded representation of the event using a graph convolution network algorithm from the isomorphic graph; computing an embedded representation of the device using a large scale information network embedding algorithm according to the directed weighted graph; determining event type node similarity and device IP node similarity or determining event type node similarity and platform IP node similarity according to the embedded representation of the event and the embedded representation of the device respectively by using a similarity algorithm; when the event type node similarity and the equipment IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the equipment IP node similarity to obtain an attack path; and traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the platform IP node similarity to obtain an attack path when the event type node similarity and the platform IP node similarity are determined. The invention collects alarm logs of an industrial control site at regular time, learns the alarm logs and outputs the result of the attack process without prior knowledge and human assistance, and the result is more accurate along with the more learned knowledge. The invention converts the alarm logs of the industrial control site into the graph structure, for the massive alarm logs, the graph structure is more consistent with the scene, meanwhile, the calculation speed is higher, and the similar equipment/events of the equipment and the events can be learned.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart of an attack path restoration method based on graph algorithm according to the present invention;
FIG. 2 is a schematic diagram of an attack path restoration system based on a graph algorithm according to the present invention;
FIG. 3 is a heterogeneous graph obtained based on an original alarm event provided by the present invention;
FIG. 4 is a schematic diagram of the metamorphic graph and the meta-path of FIG. 3;
fig. 5 is an explanatory diagram of the first order similarity and the second order similarity provided by the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide an attack path restoration method and system based on a graph algorithm, so that the dependence on domain knowledge and expert knowledge is reduced, and the restoration efficiency is improved.
In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, the present invention is described in detail with reference to the accompanying drawings and the detailed description thereof.
As shown in fig. 1, the attack path restoration method based on the graph algorithm according to the present invention calculates similar devices through embedded vectors of corresponding IPs of a computing device/platform, and restores an attack path, including:
step 101: acquiring an original alarm log of an industrial control site; the raw alarm log includes industrial host data and business application data.
Firstly, preprocessing an alarm log in an industrial control field environment, including industrial host data and business application data acquisition.
Industrial host data: and collecting user operation logs, illegal external connection logs, safety baseline logs, mobile storage logs and the like for industrial field hosts such as an operator station, an engineer station, a data server and the like.
Service application data: by collecting service flow, logs of a host and a server and industrial control equipment/systems (PLC, RTU, winCC and the like), engineering file downloading/changing logs, instruction timing sequence and period logs, service operation frequency logs, key action logs, important configuration changing logs, state and alarm logs of control equipment, real-time database operation logs and the like are extracted.
These data include the originating device (source IP), the attacked device (destination IP), the event type, and the time of occurrence.
Step 102: respectively constructing an abnormal graph and a directed weighted graph according to the original alarm log; the directed weighted graph is a graph structure representing the connection relationship between the equipment and the platform and the attack times.
Step 102, specifically comprising: mapping all devices and events in the original alarm log into nodes of a graph structure, and constructing a heteromorphic graph by taking whether the alarm events in the alarm log occur on the devices as edges of the graph structure; if a certain alarm event occurs between the two devices, adding two edges to the three nodes; mapping each device in the original alarm log into a node of a graph structure, and constructing a directed weighted graph based on all the connection relations and weights; the connection relation comprises the connection relation between equipment, the connection relation between equipment and a platform, and the connection relation between the platform and the platform; the weights include device to device, device to platform, and weight between platform to platform.
1) Firstly, constructing a heteromorphic graph based on all log data, wherein the construction principle of the graph is as follows: each alarm event is mapped to a node of a graph structure, an IP (Internet protocol) to which each device/platform belongs is mapped to a node, and if a certain alarm event occurs between two devices/platforms, two edges are added between three nodes, wherein the direction is not specified.
As shown in the collected alarms table 1:
Figure DEST_PATH_IMAGE001
a graph structure of the form shown in fig. 3 can be obtained, with the circular nodes representing IPs and the square nodes representing events:
the design meta-path is: event-IP-event (ignoring the difference between the source and destination IP), the same graph as shown in fig. 4 between event nodes can be obtained, and is denoted as G1= (V, edge), V is a node, and represents an event type, and Edge is its connection relationship.
2) Finding out the connection relation and the weight among all the devices/platforms according to the original log data, wherein the weight refers to: if multiple attacks occur between the device 1 and the device 2/platform 2, the attack times are recorded as weights. So here table 2 is obtained in the form of data similar to table 1:
Figure 910336DEST_PATH_IMAGE002
according to the information, a directed weighted graph G2= (V, edge, w) can be generated, wherein V is a node, edge is an Edge, the Edge is directional and is consistent with the attack direction occurring between the devices and represents the connection relation, and w is the attack frequency and is used for calculating the second-order similarity of the devices.
Step 103: determining a same composition according to the abnormal composition and the meta path; the nodes of the same composition are events; the isomorphic graph represents the event type and the connection relationship between the events.
Step 104: an embedded representation of the event is computed from the isomorphic graph using a graph convolution network algorithm.
Step 105: computing an embedded representation of the device using a large scale information network embedding algorithm according to the directed weighted graph. Step 105, specifically comprising: computing an embedded representation of the device using second order similarity in a large scale information network embedding algorithm according to the directed weighted graph.
The invention adopts a mode of combining GCN in a graph algorithm and a large-scale information network embedding algorithm to calculate the embedding representation among all nodes, and the algorithm can learn to embed a network with millions of vertexes and billions of edges in a few hours on a typical single machine.
And calculating the embedded representation of the event by using a graph convolution network algorithm (GCN) for the G1 to obtain the similarity between the event and the event. The nodes of fig. G1 are events, the GCN is the existing algorithm, and the output is an embedded representation of the event, e.g. event a: 0.2,0.5,0.8., which is understood here as a feature vector. And subsequently calculating the similarity according to a vector similarity algorithm.
For the G2, a large-scale information network embedding algorithm is used, and two similarities are proposed, namely, a 1-order similarity and a 2-order similarity. The nodes of fig. G2 are devices. The similarity of order 2 is used to describe indirect similarity of nodes, and as shown in fig. 5, although there is no direct edge between the device 5 and the device 6, both have a relationship with the device/platform (1, 2,3, 4), which may also indicate that the device 5 and the device 6 are similar. The formalization is defined as: let p be u = (w u,1 , . . . , w u,|V| ) Representing the 1 st order similarity between the vertex u and all other vertices, the 2 nd order similarity between u and v can be represented by p u And p v Is expressed in terms of similarity. If there is no identical neighbor vertex between u and v, the 2 nd order similarity is 0. The 2 nd order similarity performance learns the relation between the same attacking/attacked device IP.
The 1 st order represents the direct similarity between nodes, such as 6 and 7 in fig. 5, and the two nodes are similar when they are close to each other, which is already shown in G1, so the 1 st order similarity is not calculated here. The similarity of 2 order represents the indirect connection between nodes, such as 5 and 6 in fig. 5, which have no direct connection but are connected with 1,2,3,4, so 5 and 6 also have certain similarity, which is the similarity of 2 order. For G2, only second order similarities are calculated.
Therefore, the embedded representation of the node is calculated by adopting a mode of fusing the two algorithms. Inputting the two graphs into GCN and LINE models respectively, training the models, adjusting hyper-parameters (embedding _ dim, batch _ size, learning _ rate, num _ lots) until loss is reduced to a set threshold value, saving the models, outputting the expression vectors of each event and IP by the models, and obtaining two kinds of embedded expressions respectively.
Step 106: and determining the similarity of event type nodes and the similarity of equipment IP nodes or determining the similarity of event type nodes and the similarity of platform IP nodes by utilizing a similarity algorithm according to the embedded representation of the event and the embedded representation of the equipment respectively.
And embedding the nodes represented by the two calculated vectors, finding out the most similar Top n of each IP or event type by using a similarity algorithm, wherein the Top n represents the first n event types or IPs with the highest similarity, adopting cosine similarity, screening again according to a set similarity threshold, and finding out all IP/event pairs with the similarity larger than the threshold. As shown in tables 3 and 4:
Figure DEST_PATH_IMAGE003
Figure 876411DEST_PATH_IMAGE004
step 107: when the event type node similarity and the equipment IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the equipment IP node similarity to obtain an attack path; and when the event type node similarity and the platform IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the platform IP node similarity to obtain an attack path.
When the event type node similarity and the device IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the device IP node similarity to obtain an attack path, which specifically includes: when the event type node similarity and the equipment IP node similarity are determined, traversing the original alarm log in the same time window by using a spark partitioning method according to the event type node similarity and the equipment IP node similarity to obtain an attack path table; and determining an attack path according to the attack path table.
When the event type node similarity and the platform IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the platform IP node similarity to obtain an attack path, which specifically includes: when the event type node similarity and the platform IP node similarity are determined, traversing the original alarm log in the same time window by using a spark partitioning method according to the event type node similarity and the platform IP node similarity to obtain an attack path table; and determining an attack path according to the attack path table. spark is a fast, general-purpose computing engine designed for large-scale data processing.
The part runs at regular time, for example, the part runs once per hour, only the alarm of the hour is calculated each time, the obtained attack path is stored in a big data read-write component such as redis and the like in a dictionary form, and the result of the previous round of attack path is read and updated when the part runs next hour.
Based on the similar device table and the similar event type table, traversing and backtracking the alarm event containing the characteristics of event type, time, IP and the like in the same time window, wherein the specific process is as follows:
1. firstly, the alarm streams of the batch (for example, within one hour, the time window may be set by itself) are preprocessed, and each alarm event includes specific information of the event, such as: the source and destination IP, ports, event types, time and the like are sequentially ordered according to the occurrence time.
Traversing the events one by one according to the occurrence time, and recording the alarm flow generated by a certain device as: event 1, event 2, event 3.. Event 11, for event 1, find its related events, traverse event pair (1, 2), (1, 3), (1, 4.) (1, 11), where spark partition parallel comparison may be used to improve computational efficiency, if two events compared satisfy IP and the event types are similar, such as event pair (1, 3), where event type of event 1 is a, device IP is a, event type of event 3 is C, device IP is f, according to the already calculated similar device table and similar event type table, i.e. table 3 and table 4, it can be known that device a and device f are similar, and event type a and event type C are similar, so that event 3 is called the related event of event 1, and so on, other related events of event 1 may be obtained, then add event 1 and its related event 3.
Firstly, traversing and comparing the alarm streams in the batch with the alarm events in a history period, wherein spark partitioning parallel comparison can be used, the calculation efficiency is improved, if the two compared events meet the IP and the event types are similar, the two compared events can be sequentially added into an attack chain, then the alarm streams in the batch are compared pairwise according to the mode, and the attack path table is updated in the same mode;
2. and updating the attack path table in sequence, wherein the updating principle is as follows:
(1) if no new event is added to an attack path for n days, the attack path table is not updated any more (the time can be set by self), and the attack path table is stored into another table to be used as a historical attack path table and is not read any more in the next updating.
(2) If the relevant events of the event a are all in an attack path, the event a is directly put into the attack path, and the time is updated.
(3) If the relevant event of the event a is in a plurality of attack paths, combining the attack paths where the relevant event is located, then putting the a in the attack paths, deleting the original attack paths and updating the time.
(4) If the relevant events of the event a are not in the attack path, putting a and the relevant events into a new attack path.
(5) If the event a has no related event, the event a is taken as an attack link and stored in the total attack path table so as to ensure that the event a cannot be missed under the condition that the related event is added subsequently.
Attack path result storage sample: "id": uuid1"," attecklist "[" event id1"," event id2"," event id3"," event id4"]," update time ": "2022-03-25td00: "2022-03-25tb00". The sample format is json, a field required to be stored in an attack path result is represented, id represents the unique identification of the attack path, and concrete id of each event is stored in the field of the attack path, and then the start time and the update time of the attack path are stored. Wherein, the attecklist in the field is an attack chain, the updatetime is the update time, and the starttime is the attack time.
Before the timing task updates the attack path result each time, it is first determined whether the attribute in the result is expired, and the updatetime is removed from the current time for a longer time, so that only the result in the time window is updated (the time can be set by itself). If an attack path has not been added with new alarms for a long time, it indicates that the attacker may have finished the attack, so it is not necessary to update the attack path, and the method here is: and calculating the time difference between the current time and the update time field in the result, if the time difference is greater than a set threshold (such as one month), the attack path is considered to be expired, the attack path is stored in another table, and the expired attack path does not participate when the whole result is updated next time. For the attack chain which is not updated for a long time, the attack chain is put into other tables of the redis, so that the traversal attack chain can be reduced, the calculation efficiency is improved, and the content of the aged attack chain can be checked.
Each device/platform has its own IP and is not generally changed. And collecting alarm logs in a period of time in the form of what alarm occurs between the source IP and the destination IP, and constructing graph structures for all alarms according to the connection relation between the source IP and the destination IP. The data of the graph structure may compute an embedded representation of each node for subsequent computation of similarities between the various nodes.
The invention provides an attack path restoration method based on a graph algorithm aiming at an industrial control scene, which can effectively trace the source of paths which continuously and slowly span various devices at the same time, collect massive industrial field security alarm events as a data set of the invention, process the events into a graph structure according to information such as device IP (Internet protocol), event types and the like, then find out similar nodes of each node by calculating the similarity among the nodes, and finally restore the attack path according to the similar nodes.
Processing raw data into data with two graph structures, wherein the first graph structure is as follows: collecting alarm events in an industrial control scene, expressing the alarm events in a heterogeneous graph mode, extracting the alarm events into a isomorphic graph among the events based on a meta path, and neglecting the direction and the times of attack events to be used for calculating the similarity among the event types; the second method is as follows: and mapping an attack path between the devices/platforms, the occurrence frequency and the direction of an attack event into a graph structure to obtain a directed weighted graph for calculating the second-order similarity between the nodes. The invention combines the occurrence frequency and direction of the attack event and the relationship between the types of the events, is more accurate than a single calculation mode, does not need prior knowledge in the process, and can greatly reduce the burden of manpower analysis.
The invention can continuously learn the alarm event in the industrial scene and can learn a new attack mode; the program designed by the invention is a timing task, can be manually set for running once at intervals according to the hardware condition of equipment, finds out the alarm which accords with the attack chain mode in the time under the industrial control scene, and greatly reduces the consumed manpower.
As shown in fig. 2, the present invention further provides an attack path restoration system based on a graph algorithm, including:
the attack log collection module is used for acquiring an original alarm log of an industrial control field; the raw alarm log includes industrial host data and business application data.
The attack log graph structuring module is used for respectively constructing an abnormal graph and a directed weighted graph according to the original alarm log; the directed weighted graph is a graph structure representing the connection relation between the equipment and the platform and the attack times.
The homogeneous composition determining module is used for determining a homogeneous composition according to the heterogeneous composition and the meta path; the nodes of the same composition are events; the isomorphic graph represents the event type and the connection relationship between the events.
And the graph convolution network algorithm embedded representation calculation module is used for calculating the embedded representation of the event by utilizing the graph convolution network algorithm according to the isomorphic graph.
And the large-scale information network embedding algorithm embedded expression calculation module is used for calculating the embedded expression of the equipment by utilizing the large-scale information network embedding algorithm according to the directed weighted graph.
And the similarity calculation module is used for determining the similarity of the event type node and the similarity of the equipment IP node or determining the similarity of the event type node and the similarity of the platform IP node by utilizing a similarity algorithm according to the embedded representation of the event and the embedded representation of the equipment respectively.
The attack path restoration module is used for traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the equipment IP node similarity when the event type node similarity and the equipment IP node similarity are determined, so as to obtain an attack path; and when the event type node similarity and the platform IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the platform IP node similarity to obtain an attack path. Wherein, the attack path restoring module is the attack path tracing module in fig. 2.
As an optional implementation manner, the attack log graph structuring module specifically includes:
and the heterogeneous graph construction unit is used for mapping all the devices and events in the original alarm log into nodes of a graph structure, and constructing the heterogeneous graph by using whether the alarm events in the alarm log occur on the devices as edges of the graph structure.
The directed weighted graph construction unit is used for mapping each device in the original alarm log into a node of a graph structure and constructing a directed weighted graph based on all the connection relations and weights; the connection relation comprises the connection relation between equipment, the connection relation between equipment and a platform, and the connection relation between the platform and the platform; the weights include device to device, device to platform, and weight between platform to platform.
As an optional implementation manner, the large-scale information network embedding algorithm embedding expression calculation module specifically includes:
an embedded representation computation unit of the device for computing an embedded representation of the device using second order similarity in a large scale information network embedding algorithm according to the directed weighted graph.
The invention integrates IP and embedded representation of events, thereby ensuring that the tracing result of the attack chain is more accurate and the misinformation is less; the method regularly collects the alarm data of the industrial control site, learns the alarm data and outputs the result of the attack process, prior knowledge and artificial assistance are not needed, and the more the learned knowledge is, the more accurate the result is. The invention converts the alarm logs of the industrial control site into the graph structure, for the massive alarm logs, the graph structure is more consistent with the scene, meanwhile, the calculation speed is higher, and the similar equipment/events of the equipment and the events can be learned.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (8)

1. An attack path restoration method based on a graph algorithm is characterized by comprising the following steps:
acquiring an original alarm log of an industrial control site; the original alarm log comprises industrial host data and service application data;
respectively constructing an abnormal graph and a directed weighted graph according to the original alarm log; the directed weighted graph is a graph structure representing the connection relation and attack times between the equipment and the platform;
determining the same composition according to the different composition and the meta path; the nodes of the same composition are events; the isomorphic graph represents the event type and the connection relation between the events;
calculating the embedded representation of the event by utilizing a graph convolution network algorithm according to the isomorphic graph;
computing an embedded representation of the device using a large scale information network embedding algorithm according to the directed weighted graph;
determining event type node similarity and equipment IP node similarity or determining event type node similarity and platform IP node similarity according to the embedded representation of the event and the embedded representation of the equipment by utilizing a similarity algorithm;
when the event type node similarity and the equipment IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the equipment IP node similarity to obtain an attack path; and traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the platform IP node similarity to obtain an attack path when the event type node similarity and the platform IP node similarity are determined.
2. The attack path restoration method based on the graph algorithm according to claim 1, wherein the respectively constructing an abnormal graph and a directed weighted graph according to the original alarm log specifically comprises:
mapping all devices and events in the original alarm log into nodes of a graph structure, and constructing an abnormal graph by taking whether the alarm events in the alarm log occur on the devices or not as edges of the graph structure;
mapping each device in the original alarm log into a node of a graph structure, and constructing a directed weighted graph based on all connection relations and weights; the connection relation comprises the connection relation between equipment, the connection relation between equipment and a platform, and the connection relation between the platform and the equipment; the weights include device to device, device to platform, and weight between platform to platform.
3. The attack path restoration method based on the graph algorithm according to claim 1, wherein the computing of the embedded representation of the device by using the large-scale information network embedding algorithm according to the directed weighted graph specifically comprises:
utilizing an embedded representation of a second order similarity computing device in a large scale information network embedding algorithm according to the directed weighted graph.
4. The attack path restoration method based on graph algorithm according to claim 1, wherein when determining the event type node similarity and the device IP node similarity, traversing and backtracking the original alarm log within the same time window according to the event type node similarity and the device IP node similarity to obtain an attack path specifically includes:
when the event type node similarity and the equipment IP node similarity are determined, traversing the original alarm log in the same time window by using a spark partitioning method according to the event type node similarity and the equipment IP node similarity to obtain an attack path table;
and determining an attack path according to the attack path table.
5. The attack path restoration method based on the graph algorithm according to claim 1, wherein when the event type node similarity and the platform IP node similarity are determined, traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the platform IP node similarity to obtain an attack path, specifically comprising:
when the event type node similarity and the platform IP node similarity are determined, traversing the original alarm log in the same time window by using a spark partitioning method according to the event type node similarity and the platform IP node similarity to obtain an attack path table;
and determining an attack path according to the attack path table.
6. An attack path restoration system based on a graph algorithm, comprising:
the attack log collection module is used for acquiring an original alarm log of an industrial control field; the original alarm log comprises industrial host data and service application data;
the attack log graph structuring module is used for respectively constructing an abnormal graph and a directed weighted graph according to the original alarm log; the directed weighted graph is a graph structure representing the connection relation and attack times between the equipment and the platform;
the homogeneous composition determining module is used for determining a homogeneous composition according to the heterogeneous composition and the element path; the nodes of the same composition are events; the isomorphic graph represents the event type and the connection relation between the events;
the graph convolution network algorithm embedded representation calculation module is used for calculating the embedded representation of the event by utilizing the graph convolution network algorithm according to the isomorphic graph;
the large-scale information network embedding algorithm embedded expression calculation module is used for calculating the embedded expression of the equipment by utilizing the large-scale information network embedding algorithm according to the directed weighted graph;
the similarity calculation module is used for determining the similarity of event type nodes and the similarity of equipment IP nodes or determining the similarity of event type nodes and the similarity of platform IP nodes by utilizing a similarity algorithm according to the embedded representation of the event and the embedded representation of the equipment;
the attack path restoration module is used for traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the equipment IP node similarity when the event type node similarity and the equipment IP node similarity are determined, so as to obtain an attack path; and traversing and backtracking the original alarm log in the same time window according to the event type node similarity and the platform IP node similarity to obtain an attack path when the event type node similarity and the platform IP node similarity are determined.
7. The system for restoring attack paths based on graph algorithm according to claim 6, wherein the attack log graph structuring module specifically includes:
the heterogeneous graph constructing unit is used for mapping all the devices and events in the original alarm log into nodes of a graph structure, and constructing a heterogeneous graph by taking whether the alarm events in the alarm log occur on the devices as edges of the graph structure;
the directed weighted graph construction unit is used for mapping each device in the original alarm log into a node of a graph structure and constructing a directed weighted graph based on all the connection relations and weights; the connection relation comprises the connection relation between equipment, the connection relation between equipment and a platform, and the connection relation between the platform and the platform; the weights include device to device, device to platform, and weight between platform to platform.
8. The system for attack path restoration based on graph algorithm according to claim 6, wherein the large-scale information network embedding algorithm embedding expression calculation module specifically comprises:
an embedded representation computation unit of the device for computing an embedded representation of the device using second order similarity in a large scale information network embedding algorithm according to the directed weighted graph.
CN202211651409.0A 2022-12-22 2022-12-22 Attack path restoration method and system based on graph algorithm Active CN115632888B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211651409.0A CN115632888B (en) 2022-12-22 2022-12-22 Attack path restoration method and system based on graph algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211651409.0A CN115632888B (en) 2022-12-22 2022-12-22 Attack path restoration method and system based on graph algorithm

Publications (2)

Publication Number Publication Date
CN115632888A true CN115632888A (en) 2023-01-20
CN115632888B CN115632888B (en) 2023-04-07

Family

ID=84910974

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211651409.0A Active CN115632888B (en) 2022-12-22 2022-12-22 Attack path restoration method and system based on graph algorithm

Country Status (1)

Country Link
CN (1) CN115632888B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117176416A (en) * 2023-09-01 2023-12-05 中国信息通信研究院 Attack partner discovery method and system based on graph model

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109308409A (en) * 2018-10-16 2019-02-05 国网湖南省电力有限公司 A kind of attack path reconstructing method based on similarity calculation
WO2019028341A1 (en) * 2017-08-03 2019-02-07 T-Mobile Usa, Inc. Similarity search for discovering multiple vector attacks
US20210258328A1 (en) * 2019-12-17 2021-08-19 Upstream Security, Ltd. Centralized detection techniques for cyber-attacks directed at connected vehicles
CN113676484A (en) * 2021-08-27 2021-11-19 绿盟科技集团股份有限公司 Attack tracing method and device and electronic equipment
US20220150268A1 (en) * 2019-03-27 2022-05-12 British Telecommunications Public Limited Company Pre-emptive computer security
CN114637989A (en) * 2022-03-21 2022-06-17 西安电子科技大学 APT attack tracing method and system based on distributed system and storage medium
CN115277124A (en) * 2022-07-12 2022-11-01 清华大学 Online system and server for searching and matching attack mode based on system tracing graph
CN115378733A (en) * 2022-08-29 2022-11-22 北京航空航天大学 Multi-step attack scene construction method and system based on dynamic graph embedding

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019028341A1 (en) * 2017-08-03 2019-02-07 T-Mobile Usa, Inc. Similarity search for discovering multiple vector attacks
CN109308409A (en) * 2018-10-16 2019-02-05 国网湖南省电力有限公司 A kind of attack path reconstructing method based on similarity calculation
US20220150268A1 (en) * 2019-03-27 2022-05-12 British Telecommunications Public Limited Company Pre-emptive computer security
US20210258328A1 (en) * 2019-12-17 2021-08-19 Upstream Security, Ltd. Centralized detection techniques for cyber-attacks directed at connected vehicles
CN113676484A (en) * 2021-08-27 2021-11-19 绿盟科技集团股份有限公司 Attack tracing method and device and electronic equipment
CN114637989A (en) * 2022-03-21 2022-06-17 西安电子科技大学 APT attack tracing method and system based on distributed system and storage medium
CN115277124A (en) * 2022-07-12 2022-11-01 清华大学 Online system and server for searching and matching attack mode based on system tracing graph
CN115378733A (en) * 2022-08-29 2022-11-22 北京航空航天大学 Multi-step attack scene construction method and system based on dynamic graph embedding

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117176416A (en) * 2023-09-01 2023-12-05 中国信息通信研究院 Attack partner discovery method and system based on graph model
CN117176416B (en) * 2023-09-01 2024-05-24 中国信息通信研究院 Attack partner discovery method and system based on graph model

Also Published As

Publication number Publication date
CN115632888B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
Wang et al. Anomaly detection for industrial control system based on autoencoder neural network
Almalawi et al. An unsupervised anomaly-based detection approach for integrity attacks on SCADA systems
CN115632888B (en) Attack path restoration method and system based on graph algorithm
CN112491860A (en) Industrial control network-oriented collaborative intrusion detection method
CN113094707A (en) Transverse mobile attack detection method and system based on heterogeneous graph network
CN114363212B (en) Equipment detection method, device, equipment and storage medium
CN111343143B (en) Data identification method, device and storage medium
CN114338195A (en) Web traffic anomaly detection method and device based on improved isolated forest algorithm
Liu et al. Multi-step attack scenarios mining based on neural network and Bayesian network attack graph
Rao et al. An optimal machine learning model based on selective reinforced Markov decision to predict web browsing patterns
CN113592308B (en) Monitoring data alarm threshold extraction method based on normal model
CN113746780B (en) Abnormal host detection method, device, medium and equipment based on host image
CN117729027A (en) Abnormal behavior detection method, device, electronic equipment and storage medium
CN117527401A (en) Network attack identification method and device for traffic log and electronic equipment
CN111885084A (en) Intrusion detection method and device and electronic equipment
CN116739940A (en) Point cloud completion method, device, equipment and medium
WO2019032502A1 (en) Knowledge transfer system for accelerating invariant network learning
Mansouri et al. Anomaly detection in industrial control systems using evolutionary-based optimization of neural networks
Yan Intelligent intrusion detection based on soft computing
Wei et al. SOM-based intrusion detection for SCADA systems
CN115049022B (en) Data processing method and device based on time difference
CN115086138B (en) Internet of things terminal node fault prediction method, system and storage medium
WO2024124640A1 (en) Node analysis method and apparatus based on threat analysis graph
Sun et al. Industrial Control System Attack Detection Model Based on Bayesian Network and Timed Automata
Alshehri A Survey on Analysis Terrorist Networks with Complex Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant