CN115296924B

CN115296924B - Network attack prediction method and device based on knowledge graph

Info

Publication number: CN115296924B
Application number: CN202211156094.2A
Authority: CN
Inventors: 饶志宏; 刘方; 徐锐; 聂大成; 陈剑锋; 许卡
Original assignee: CETC 30 Research Institute
Current assignee: CETC 30 Research Institute
Priority date: 2022-09-22
Filing date: 2022-09-22
Publication date: 2023-01-31
Anticipated expiration: 2042-09-22
Also published as: CN115296924A

Abstract

The invention discloses a network attack prediction method and a device based on a knowledge graph, belonging to the field of network security and comprising the following steps: s101, acquiring data; s102, preprocessing the acquired data; s103, constructing a network security body facing to network attack; s104, extracting data according to the defined knowledge expression model; s105, fusing and correcting the extracted various data to construct a network security knowledge graph; and S106, predicting the attack event by using the constructed network security knowledge graph. The invention improves the prediction accuracy of the attack behavior.

Description

Network attack prediction method and device based on knowledge graph

Technical Field

The invention relates to the field of network security, in particular to a network attack prediction method and device based on a knowledge graph.

Background

At present, networks have penetrated into people's lives from all corners, and various attack strategies are continuously emerging and renewed. The network malicious intrusion attack has been developed from single simple operation (password cracking, file damage, webpage tampering and the like) in the early stage to complex multiple means (vulnerability attack, virus propagation, domain name hijacking, denial of service, APT attack and the like). The possibility that the attack target can be threatened through the single-step attack behavior is very low, most attackers realize an action plan with a specific target through a series of steps and combined coordinated attacks, so that the network has increasingly serious security problems, and the network security presents an offensive and refractory situation. At present, network attack prediction is a key link for realizing active defense of network security. The method researches how to discover behavior and law of hacker intrusion by using massive network security data, predicts multi-step attack behaviors possibly suffered by a network system in the future, a final target of hacker intrusion and facilities and equipment possibly suffering from threats, and can take effective and targeted measures to defend and prevent.

At present, there are many methods for predicting network attacks, and according to mode classification of prediction methods, currently mainstream prediction methods are classified into prediction methods based on a neural network, prediction methods based on a game theory, prediction methods based on an attack graph, prediction methods based on data mining, and other methods.

The prediction method based on the neural network is based on an artificial neural network algorithm, has absolute advantages in learning the nonlinear characteristics of the network attack event sequence, has the characteristics of good fitting property, self-learning and self-memory of a target sample and the like, can obtain the characteristic mode of complex nonlinear data in the intelligent attack event, and has the typical work of Tiresias, BRNN-LSTM, ALEAP and the like. The prediction method based on the neural network is based on large-scale sample training, has high accuracy in mining the logical relation and the rule among network attack events, but has strong dependence on the quality of data samples, takes long training time, has high cost, is easy to fall into local minimum points, and is easy to generate overfitting so that the generalization capability is poor.

The prediction method based on the game theory is generally aimed at a confrontation environment with an attack and defense game, different game models are established according to the integrity of opponent information mastered by an attacker and a defender, and the prediction models work in a NashSVM algorithm, a double zero-sum static game, a random prediction game, a dynamic Bayesian game and the like. The method based on the game theory considers the income type strategic reasoning, can more deeply understand the intention of an attacker, including the attack target, the attack source, the relation among attack behaviors and the like, and describes the logical relationship among the behaviors, so as to play games and fight against the attacker and make more targeted decisions.

The prediction method based on the attack graph constructs a model by a graph network structure, such as a directed attack graph, a Markov chain, a Bayesian network graph and the like, and the representative work is a botnet dependency graph, an uncertainty perception attack graph, a double-layer attack and defense model combining the attack graph and a game theory and the like. The algorithm usually takes the identity as a node, an attack means as an edge of a graph network, different relations among entities are represented, the algorithm is better in small-scale data scene, and certain priori knowledge is needed as a basis.

Compared with the previous 3 prediction methods, data mining has stronger characterization capability on hidden features and internal modes of deep data, but is generally used as a technical means in the process, and representative work comprises an emotion analysis method, similarity sequence alignment and recommendation system construction. The prediction method based on data mining is used for mining rules among attack information by carrying out statistical analysis, rule association, classification induction and the like on a large number of prior knowledge such as attack alarms, detection results and the like, and classifying and predicting future attacks; or the method is combined with algorithm modeling prediction such as an attack graph and a game theory, and has good performance on the prediction of phishing websites and social network attacks.

The existing network attack prediction method has the following problems: (1) For some compound attacks, direct association may not be provided between multiple attack behaviors, or extraction of behavior characteristics is difficult, for example, encrypted router in-and-out traffic, deep data packets and the like, and for such attacks, the existing prediction method cannot associate attack events initiated by the same attacker, so that prediction errors occur; (2) Hidden features and internal modes of a general data deep layer can represent logical association between attack behaviors and complex attack intentions of attackers, and the existing method cannot reason a plurality of hidden features and implicit relations, so that the prediction accuracy is low; (3) The method aims at the situation that false alarm and false alarm exist in alarm information of an intrusion detection system, the alarm information is used as an important data source for network attack prediction, the attack path prediction is wrong due to wrong alarm information, the fault-tolerant capability of the existing attack prediction method is low, and the prediction accuracy rate in practical application is very low.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a network attack prediction method and device based on a knowledge graph, so that the prediction accuracy rate of attack behaviors is improved.

The purpose of the invention is realized by the following scheme:

a network attack prediction method based on knowledge graph includes the following steps:

s101, acquiring data;

s102, preprocessing the acquired data;

s103, constructing a network security body facing to network attack;

s104, extracting data according to the defined knowledge expression model;

s105, fusing and correcting the extracted various data to construct a network security knowledge graph;

and S106, predicting the attack event by using the constructed network security knowledge graph.

Further, in step S101, the acquired data includes network asset detection data, vulnerability information data, threat intelligence data, and security device log data.

Further, in step S102, the preprocessing includes data normalization processing, data deduplication and merging processing, data classification processing, and data spatio-temporal registration processing.

Further, in step S103, the sub-step of: and defining a knowledge expression model, and performing knowledge expression by adopting a triple.

Further, in step S104, various types of data triples are extracted according to the defined knowledge expression model.

Further, in step S105, the correcting includes merging aggregation of security events, correcting event reliability, correcting mutual exclusion events, removing false alarm events, and completing false alarm events; the network security knowledge graph comprises an attack mode graph, a threat intelligence graph and a network asset graph; the method specifically comprises the following substeps:

1) Merging the data of the same equipment;

2) Merging the consistent event data;

3) Tagging events based on threat information data, analyzing credibility, correcting exclusive events, and eliminating false alarm events;

4) Establishing an attack pattern library aiming at known attacks by utilizing expert knowledge; completing the missed report event according to the attack mode library; according to the description of the attack chain in the attack pattern library, in the associated multi-step attack, if one attack step is found to be omitted, whether the omitted attack step is a necessary step of the next attack step in the multi-step attack step is judged, if yes, the fact that the security equipment fails to report the attack event is deduced according to the judgment, and the multi-step attack event is completed; if not, directly entering step 5);

5) Carrying out map construction on the corrected data to form an attack mode map, a threat information map and a network asset map; and accessing and storing the map by using the basic database.

Further, the basic database is a Neo4J database.

Further, in step S106, the following sub-steps are included:

1) Representing structured knowledge in a knowledge graph for network security as an undirected graphG=(V,E) In which

Represents a collection of entity nodes in the graph,Erepresenting a collection of various relational edges between entities; each triplet in the network security knowledge graph is represented as

Wherein

And

respectively representing linked head and tail entity nodes,

representing the relationship between the two entity nodes; embedding a heterogeneous network of a network security knowledge graph into a low-dimensional vector space to form a low-dimensional vector;

2) On the basis of vectorization, adding a constraint rule condition, converting the constraint condition into a basic database query statement to obtain a candidate sub-graph, performing similarity calculation on the candidate sub-graph, measuring the similarity of an attack event sequence detected by security equipment and an attack mode graph in a constructed knowledge graph by using a similarity calculation algorithm, excavating a hidden relation and a path of the attack event sequence, and predicting an attack path and a target; the constraint rule conditions comprise a vulnerability to be utilized when an attack event sequence occurs, asset attributes of an attack target, and the premise that one attack event occurs is that after a certain attack is successfully executed;

3) Calculating the shortest path of the vectorized attack event sequence and the attack mode subgraph filtered by the constraint condition based on a DTW algorithm;

4) Correcting the obtained attack pattern subgraph by depending on a domain expert;

5) And predicting the attack path and the attack target according to the obtained attack mode subgraph.

Further, in step 3), the method comprises the sub-steps of:

step (1): vectorizing the attack event sequence and the attack mode subgraph filtered by the constraint condition;

step (2): calculating a distance matrix between the vectorized attack event sequence and each attack mode sequence in the attack mode subgraph;

and (3): and finding a path from the upper left corner to the lower right corner of the matrix, wherein if the sum of elements on the path is minimum, the path is an attack pattern subgraph matched with the attack event sequence.

A network attack prediction device based on a knowledge-graph comprises a program storage unit and a program running unit, and when a program in the program storage unit is loaded by the program running unit, the network attack prediction device based on the knowledge-graph executes the network attack prediction method based on the knowledge-graph.

The beneficial effects of the invention include:

(1) The method comprises the steps of carrying out normalization, duplicate removal, cleaning, classification and space-time matching processing on multi-source heterogeneous data by collecting network asset detection data, vulnerability data, open source threat information data, security and protection equipment log data and the like to form standardized format data; and constructing a network security ontology based on the knowledge in the network security field. And (3) extracting knowledge of the network asset detection data, the vulnerability data and the like by combining a network security knowledge expression model, constructing network security knowledge maps such as an attack mode map, a threat intelligence map and a network asset map, and predicting the network attack based on the constructed knowledge maps.

(2) The invention combines the network security domain ontology to extract the knowledge of the network asset data, the vulnerability data, the threat information data and the security equipment log data, and the constructed network security ontology refers to the network security knowledge map description language at home and abroad, thereby improving the expansibility and compatibility of the security knowledge map.

(3) The embodiment of the invention utilizes the constructed network security knowledge graph, embeds the heterogeneous network of the knowledge graph into a continuous low-dimensional vector space based on a TransE translation model, and introduces constraint conditions to improve the efficiency of similarity calculation when calculating the similarity of sub-graphs; meanwhile, similarity calculation is carried out by combining a Dynamic Time Warping (DTW) algorithm, and matching accuracy can be improved under the condition that an alarm event sequence has false alarm and missed alarm, so that the technical effects of improving the accuracy of an attack path and attack target prediction are achieved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart of a method for predicting cyber-attack based on a knowledge-graph according to an embodiment of the present invention;

FIG. 2 is a flowchart of attack prediction with constraints according to an embodiment of the present invention;

fig. 3 is an example of attack event prediction.

Detailed Description

All features disclosed in all embodiments of the present specification, or all methods or process steps implicitly disclosed, may be combined and/or expanded, or substituted, in any way, except for mutually exclusive features and/or steps.

The invention provides a network attack prediction method and a network attack prediction device based on a knowledge graph, aiming at solving the technical problems in the background. The method comprises the following technical concepts: by collecting network asset detection data, vulnerability data, open source threat information data, security equipment log data and the like, carrying out normalization, duplicate removal, cleaning, classification and space-time matching processing on multi-source heterogeneous data; and constructing a network security ontology facing network attack behaviors based on network security domain knowledge. Based on a network security ontology expression model, network security knowledge maps such as an attack mode map, a threat intelligence map and a network asset map are constructed, and target implicit feature mining and implicit relation reasoning are carried out based on the constructed knowledge maps to realize prediction of network attack behaviors. According to the method, the attack behavior prediction accuracy is improved by constructing the network security knowledge maps such as the attack mode map, the threat intelligence map and the network asset map and mining the logic association between the attack behaviors and the attack intention of an attacker by using similar target discovery and implicit relation reasoning.

In order to make the objects, technical solutions and advantages of the present invention clearer and more obvious, the present invention is further described in detail below with reference to the accompanying drawings and technical solutions.

Fig. 1 is a flowchart of a method for predicting cyber-attack based on a knowledge-graph according to an embodiment of the present invention, including the following steps:

step S101: the network port scanning tool is utilized to detect the target network assets, so that network port scanning data, certificate data, DNS data, web website frame data and the like can be obtained; acquiring vulnerability information from vulnerability information bases, vulnerability forums, personal blogs, twitters, gitHub and other information sources through a web crawler technology; and acquiring security event information, IOC information and the like from security company security bulletins, hacker forums, security websites, third party threat information and the like by utilizing a web crawler technology. And obtaining the log data of security equipment such as firewall logs, intrusion detection system logs, sandbox logs and the like in a cooperative mode.

Step S102: preprocessing network asset detection data, vulnerability data, threat information data and log data of security equipment, wherein the processing flow is as follows:

1) Data normalization processing: carrying out homogenization treatment on multi-source heterogeneous data, unifying field structures, and then carrying out data format conversion, including data type conversion, date and time format conversion, chinese coding conversion and conversion from coding to name;

2) Data deduplication and merging: comparing the cleaned data with data in a database according to the key fields, judging whether redundant data exist or not, and directly storing the data into the database if the redundant data do not exist; if redundant data exists, judging whether the new data is completely identical to the existing data in each field, and if the new data is identical to the existing data in each field, discarding the new data. If not, judging whether the fields with different values have conflict. If no conflict exists, combining the new data with the existing data; and if the new data and the existing data fields have conflict, merging the data after the conflict is eliminated.

3) Data classification: and classifying the multi-source data by using a decision tree, and classifying the data in a multi-layer manner through layering.

4) Data space-time registration: and (3) associating and matching the basic data with time and space (mechanism and geographical position information) coordinates, and marking a space-time coordinate label on each piece of basic data.

Step S103: in terms of knowledge representation, embodiments of the present invention employ triples (entities, relationships, attributes) for knowledge representation. According to ontology modeling of network asset data, referring to CYBOX2.0, entities of the network asset data comprise IP, ports, protocols, equipment, operating systems, certificates, domain names, AS numbers and the like, relationships comprise has relationships, belong _ to relationships, ower relationships and the like, naming rules of the entities and the relationships refer to naming rules of the CYBOX2.0, and compatibility with network security knowledge graph description languages at home and abroad is facilitated. And (4) referring to the STIX standard aiming at triple definitions of vulnerability data, threat intelligence data and log data of security equipment. For vulnerability data, an operating system, hardware equipment, software, a protocol, a vulnerability, a utilization code and the like are defined as entity tags, and have a has relation, a cause relation and the like. Entities such as equipment, software, protocols, vulnerabilities, attack events, attack tools and the like are defined for threat information data, and have a has relation, a cause relation, a belong _ to relation and the like. Aiming at log data of security equipment, entities such as IP, ports and events are defined, and has a has relation, a cause relation, a belong _ to relation and the like.

Step S104: extracting various data triples according to a defined knowledge expression model, and the specific steps are as follows:

1) And (4) aiming at the fact that the network asset data belongs to the structured data, extracting the knowledge of the network asset data directly by adopting a D2R tool according to the knowledge expression model defined in the step S106.

2) Aiming at vulnerability information obtained by crawling a webpage by a web crawler, belonging to semi-structured data, extracting entities, relations and attributes by adopting a rule-based entity identification algorithm; for the vulnerability information obtained from the vulnerability database, because the vulnerability information is structured data, the vulnerability information can be directly extracted according to the knowledge expression model defined in step S103.

3) For threat intelligence data, the web page data acquired by the web crawler is utilized, knowledge extraction is carried out by adopting entity identification based on rules, and the structured threat intelligence data from a third party can be directly extracted according to the knowledge expression model defined in the step S103.

4) For log data of the security equipment, referring to attack mechanism mode classification, malicious code mode classification and hidden danger mode systems of foreign mainstream CAPEC and ATT & CK, establishing an attack mode library by using expert knowledge, and classifying the log data of the security equipment based on the attack mode library. Because the attack pattern library is structured data, direct extraction can be performed according to the knowledge expression model defined in step S103.

Step S105: and performing fusion correction on various extracted data, including aggregation merging of security events, event reliability correction, mutual exclusion event correction, removal of false alarm events, omission of alarm events and the like, and constructing network security knowledge maps such as an attack mode map, a threat information map, a network asset map and the like. The method comprises the following specific steps:

1) Merging the data of the same equipment;

2) Merging event data consistent with the source IP, the source port, the destination IP and the destination port;

3) And labeling the event based on the threat information data, analyzing the credibility, correcting the mutually exclusive event, and rejecting the false alarm event. For example, for an attack event, discovering that the attack event utilizes a vulnerability of a Window operating system through threat intelligence, and discovering that an attacked target is a Linux operating system through asset detection data, it is inferred that the attack cannot occur at the asset at all, so the attack event can be marked as a false report;

4) Establishing an attack pattern library aiming at known attacks by utilizing expert knowledge; completing the missed report event according to the attack mode library; according to the description of the attack chain in the attack mode library, in the related multi-step attack, if one attack step is found to be omitted, whether the omitted attack step is a necessary step of the next attack step of the multi-step attack step is judged, if yes, the fact that the security equipment fails to report the attack event is inferred, and the multi-step attack event is completed; if not, directly entering the step 5);

5) And constructing the corrected data to form an attack mode map, a threat intelligence map and a network asset map, and accessing and storing the maps by adopting Neo4J as a basic database.

Step S106: by utilizing the constructed network security knowledge graph, some network anomalies and attacks can be effectively discovered, hidden relations and paths of security threats are excavated, and the attacks are predicted, as shown in FIG. 2, the specific steps are as follows:

1) Structured knowledge in a knowledge graph for network security can be represented as an undirected graphG=(V,E) In which

Represents a collection of entity nodes in the graph,Erepresenting a collection of various relational edges between entities. Each triplet in the security knowledge graph is represented as

Wherein

And

respectively representing linked head and tail entity nodes,

representing the relationship between the two entity nodes. The embodiment of the invention adopts a TransE-based translation model to embed the heterogeneous network of the knowledge graph into a continuous low-dimensional vector space to form a low-dimensional vector.

2) On the basis of vectorization, the similarity of an attack event sequence detected by security equipment and an attack mode map in a constructed knowledge map is measured by using a similarity calculation algorithm, the hidden relation and the path of the attack event sequence are excavated, and the attack path and the target are predicted. Aiming at the fact that constraint relation exists between attack event sequences and certain constraint relation also exists between attack events and attack target environments, if the distance between nodes is measured directly according to the attributes, the relation and the subgraph structure of the nodes, a plurality of useless attack modes can be obtained, and along with the increase of the graph scale, the calculation complexity of the similarity is increased. For example, for a multi-step attack, a general attack flow includes target reconnaissance, tool making, tool delivery, attack penetration, installation implantation, command control and malicious activities, and when the security device detects the first several attack stages, it is possible to associate a plurality of attack modes from an attack mode map through similarity calculation, and it is difficult to determine the true attack intention of an attacker. Therefore, before calculating the similarity, corresponding constraint rules are added, such as vulnerabilities required to be utilized when an attack event sequence occurs, asset attributes of an attack target, and the premise that one of the attack events occurs is that the constraint conditions need to be converted into a Neo4J query statement Cypher after a certain attack is successfully executed, so that candidate subgraphs are obtained, and then the similarity calculation is performed on the candidate subgraphs.

3) Aiming at the problem that an attack event sequence is not executed in every attack step, multiple attack means can be adopted in every attack stage, false alarm or missing report can exist in an intrusion detection system, and similarity calculation of the attack event sequence is carried outW(n) Describing the time corresponding relation between the test template and the reference template, and solving the regular function corresponding to the minimum accumulated distance when the two templates are matched. After the creative thinking of the inventor of the present invention, the shortest path is calculated by using the vectorized attack event sequence and the attack mode subgraph filtered by the constraint condition based on the DTW algorithm, and the specific implementation steps are as follows:

4) Correcting the obtained attack mode subgraph by depending on a domain expert;

Fig. 3 gives an example of predicting the attack path and the attack target from the acquired attack pattern subgraph. The method comprises the steps of receiving an attack alarm sequence aiming at a certain asset from security equipment, carrying out attack mode matching by using a knowledge map, finding a similar attack mode sub-graph, finding that the asset also has the vulnerabilities and asset attributes according to vulnerabilities and asset attributes associated with an attack mode, and recording a corresponding attack path by combining a time sequence if the attack spreads, so that early warning can be carried out in advance, the asset can start defense in advance to prevent further attack, and other assets directly connected with the asset can be early warned in advance if the vulnerabilities and asset attributes exist in the other assets so as to avoid attack spreading.

Example 1

s101, acquiring data;

s102, preprocessing the acquired data;

s103, constructing a network security body facing to network attack;

s104, extracting data according to the defined knowledge expression model;

Example 2

On the basis of embodiment 1, in step S101, the acquired data includes network asset detection data, vulnerability information data, threat intelligence data, and security equipment log data.

Example 3

On the basis of embodiment 1, in step S102, the preprocessing includes a data normalization process, a data deduplication and merging process, a data classification process, and a data spatiotemporal registration process.

Example 4

On the basis of embodiment 1, in step S103, the method includes the sub-steps of: and defining a knowledge expression model, and performing knowledge expression by adopting a triple.

Example 5

Based on embodiment 4, in step S104, various types of data triples are extracted according to the defined knowledge expression model.

Example 6

On the basis of embodiment 1, in step S105, the correction includes aggregation merging of security events, event reliability correction, mutual exclusion event correction, false alarm event removal, and false negative event completion; the network security knowledge map comprises an attack mode map, a threat intelligence map and a network asset map; the method specifically comprises the following substeps:

1) Merging the data of the same equipment;

2) Merging the consistent event data;

4) Establishing an attack pattern library aiming at known attacks by utilizing expert knowledge; completing the missed report event according to the attack mode library; according to the description of the attack chain in the attack pattern library, in the associated multi-step attack, if one attack step is found to be omitted, whether the omitted attack step is a necessary step of the next attack step in the multi-step attack step is judged, if yes, the fact that the security equipment fails to report the attack event is deduced according to the judgment, and the multi-step attack event is completed; if not, directly entering the step 5);

Example 7

On the basis of example 6, the base database is the Neo4J database.

Example 8

On the basis of embodiment 1, in step S106, the following sub-steps are included:

1) Representing structured knowledge in a knowledge graph for network security as an undirected graphG=(V,E) Wherein

Wherein

And

respectively representing linked head and tail entity nodes,

2) On the basis of vectorization, adding a constraint rule condition, converting the constraint condition into a basic database query statement to obtain a candidate subgraph, performing similarity calculation on the candidate subgraph, measuring the similarity of an attack event sequence detected by security equipment and an attack mode map in a constructed knowledge map by using a similarity calculation algorithm, excavating a hidden relation and a path of the attack event sequence, and predicting an attack path and a target; the constraint rule conditions comprise a vulnerability to be utilized when an attack event sequence occurs, asset attributes of an attack target, and the premise that one attack event occurs is that after a certain attack is successfully executed;

Example 9

On the basis of embodiment 8, in step 3), the method comprises the sub-steps of:

Example 10

A device for predicting a network attack based on a knowledge graph, comprising a program storage unit and a program execution unit, wherein the method for predicting a network attack based on a knowledge graph according to any one of embodiments 1 to 9 is performed when a program in the program storage unit is loaded by the program execution unit.

The units described in the embodiments of the present invention may be implemented by software or hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

According to an aspect of the application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations described above.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs, which when executed by one of the electronic devices, cause the electronic device to implement the method described in the above embodiments.

The parts not involved in the present invention are the same as or can be implemented using the prior art.

The above-described embodiment is only one embodiment of the present invention, and it will be apparent to those skilled in the art that various modifications and variations can be easily made based on the application and principle of the present invention disclosed in the present application, and the present invention is not limited to the method described in the above-described embodiment of the present invention, so that the above-described embodiment is only preferred, and not restrictive.

Other embodiments than the above examples may be devised by those skilled in the art based on the foregoing disclosure, or by adapting and using knowledge or techniques of the relevant art, and features of various embodiments may be interchanged or substituted and such modifications and variations that may be made by those skilled in the art without departing from the spirit and scope of the present invention are intended to be within the scope of the following claims.

Claims

1. A network attack prediction method based on a knowledge graph is characterized by comprising the following steps:

s101, acquiring data;

s102, preprocessing the acquired data;

s103, constructing a network security body facing to network attack;

s104, extracting data according to the defined knowledge expression model;

s105, fusing and correcting the extracted various data to construct a network security knowledge graph; in step S105, the correcting includes aggregating and merging of security events, event reliability correcting, mutual exclusion event correcting, false alarm event removing, and missed alarm event completing; the network security knowledge graph comprises an attack mode graph, a threat intelligence graph and a network asset graph, and specifically comprises the following substeps:

1) Merging the data of the same equipment;

2) Merging the consistent event data;

4) Establishing an attack pattern library aiming at known attacks by utilizing expert knowledge; completing the missed report event according to the attack mode library; according to the description of the attack chain in the attack pattern library, in the associated multi-step attack, if one attack step is found to be omitted, whether the omitted attack step is a necessary step of the next attack step in the multi-step attack is judged, if yes, the fact that the security equipment fails to report the attack event is deduced according to the judgment, and the multi-step attack is completed; if not, directly entering the step 5);

5) Carrying out map construction on the corrected data to form an attack mode map, a threat information map and a network asset map; accessing and storing the map by using a basic database;

s106, predicting the attack event by using the constructed network security knowledge graph, wherein in the step S106, the method comprises the following substeps:

Wherein

And

respectively representing linked head and tail entity nodes,

representing the relationship between the two entity nodes; embedding a heterogeneous network of the network security knowledge graph into a low-dimensional vector space to form a low-dimensional vector;

2. The method of predicting cyber attacks according to claim 1, wherein the acquired data includes cyber asset detection data, vulnerability information data, threat intelligence data, and security equipment log data in step S101.

3. The method of predicting cyber attacks according to claim 1, wherein in step S102, the preprocessing comprises a data normalization process, a data deduplication and merging process, a data classification process and a data spatiotemporal registration process.

4. The knowledge-graph-based network attack prediction method according to claim 1, comprising the sub-steps of, in step S103: and defining a knowledge expression model, and performing knowledge expression by adopting a triple.

5. The method of predicting cyber-attack based on knowledge-graph according to claim 4, wherein in step S104, each type of data triple is extracted according to a defined knowledge expression model.

6. The knowledge-graph-based cyber-attack prediction method according to claim 1, wherein the base database is a Neo4J database.

7. The knowledge-graph-based cyber-attack prediction method according to claim 1, comprising, in the step 3), the sub-steps of:

8. A network attack prediction device based on the knowledge graph is characterized by comprising a program storage unit and a program running unit, wherein the network attack prediction method based on the knowledge graph according to any one of claims 1 to 7 is executed when a program in the program storage unit is loaded by the program running unit.