CN115883218B

CN115883218B - Multi-mode data model-based composite attack chain completion method, system and medium

Info

Publication number: CN115883218B
Application number: CN202211537067.XA
Authority: CN
Inventors: 亓玉璐; 陈磊; 贾焰; 周斌; 李爱平; 江荣; 涂宏魁; 王晔; 罗宇; 喻承
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2022-12-02
Filing date: 2022-12-02
Publication date: 2024-04-12
Anticipated expiration: 2042-12-02
Also published as: CN115883218A

Abstract

The invention provides a multi-mode data model-based composite attack chain completion method, which can obtain an attack chain conforming to an attack rule and comprises the following steps: constructing a network security ontology model based on the quintuple model, constructing an ontology-instance model, and identifying a network security entity based on network security ontology features; performing network security entity identification based on the entity identification optimization model; constructing a network security knowledge graph; representing the acquired network security related data by using a multi-mode data model; matching the acquired network security related data with a network security knowledge graph, returning to single-step attack, and matching an attack chain generated by the single-step attack meeting time sequencing and IP constraint with an attack rule to obtain an attack chain conforming to the attack rule; judging whether an error state associated sequence exists in the obtained attack chain, and complementing the missing nodes in the attack chain based on the multi-mode data model.

Description

Multi-mode data model-based composite attack chain completion method, system and medium

Technical Field

The invention relates to the technical field of network security, in particular to a multi-mode data model-based composite attack chain completion method, a multi-mode data model-based composite attack chain completion system and a multi-mode data model-based composite attack chain completion medium.

Background

The key point of the open domain knowledge graph is that the semantic information of the entity is constructed through entity identification, entity link and entity disambiguation, but the accuracy of acquiring the triplet knowledge by the method is not high. The focus of the domain knowledge graph is on the relationship among the entities, and the accuracy of the relationship is ensured by constructing an ontology model, but at present, the ontology model is mostly constructed manually, and automatic modeling cannot be realized. The knowledge graph comprises a data layer and a pattern layer, the pattern layer of the open domain knowledge graph is an entity concept and a relationship between concepts, which are automatically abstracted and extracted from the data layer, and the domain knowledge graph comprises the entity concept but cannot automatically construct the relationship between concepts.

In the field of network security, knowledge in a knowledge base is acknowledged to be correct, relationships among entity concepts in the knowledge are determined, entities have unique identification property and custom achievement property due to the specificity of naming rules in the computer field, the relationships among the entities are usually lack or weakened in expression and relate to a plurality of entities, and the characteristics lead the accuracy of entity identification to be high, but the triplet knowledge cannot be automatically acquired.

In recent years, new network attacks represented by advanced persistent threat attacks frequently occur, and the detection methods such as anomaly detection technology, intrusion detection technology and the like based on machine learning and deep learning are difficult to be effective due to the characteristics of concealment and persistence. According to the multidimensional security event association analysis method represented by the finite state machine, an attack chain is discovered by setting triggering conditions of initial state, intermediate state, ending state and state transition. The method has poor flexibility, and is difficult to effectively discover the compound attack with long duration and high concealment.

Disclosure of Invention

Aiming at the problems, the invention provides a multi-mode data model-based composite attack chain completion method, a multi-mode data model-based composite attack chain completion system and a multi-mode data model-based composite attack chain completion medium, which can obtain accurate network security knowledge from a multi-source heterogeneous security knowledge base, and further obtain an attack chain conforming to an attack rule from massive data by correlation analysis and dynamic removal of an attack chain with an error state correlation sequence, a completion attack chain and pruning of the attack chain.

The technical scheme is as follows: the compound attack chain complement method based on the multi-mode data model is characterized by comprising the following steps:

step 1: constructing a network security ontology model based on a quintuple model, wherein the quintuple model comprises a network security knowledge ontology, a network security knowledge instance, a network security knowledge ontology, a relation among the instances, an attribute of the network security knowledge instance and a network security knowledge reasoning rule;

step 2: constructing an ontology-instance model, wherein the ontology-instance model is used for guiding an instance to be added to a network security ontology model, and optimizing the constructed ontology-instance model;

step 3: based on the network security body characteristics, carrying out network security entity identification; performing network security entity identification based on the entity identification optimization model; constructing a network security knowledge graph;

Step 4: representing the acquired network security related data by using a multi-mode data model;

step 5: matching the acquired network security related data with a network security knowledge graph, returning single-step attacks with time, source IP and destination IP, sequencing the single-step attacks according to time sequence based on an analyzed time window, connecting the single-step attacks based on IP constraint of node propagation, matching an attack chain generated by the single-step attacks meeting the time sequencing and the IP constraint with an attack rule, and reserving the attack chain conforming to the attack rule;

step 6: judging whether an error state association sequence exists in the obtained attack chain, storing the error state association sequence into an error state set, discarding the attack chain with the error state association sequence, finding out the IP associated with the error state for the first time according to the transmission condition of the error state association sequence in the error state set, and re-carrying out IP constraint to obtain the attack chain conforming to the attack rule again;

step 7: for an attack chain with a single node missing, complementing the missing node in the attack chain based on the multi-mode data model; for an attack chain with a plurality of nodes continuously missing, searching all reachable paths based on a multi-mode data model, sequencing all reachable paths according to the comprehensive weight of each path according to possible given corresponding weights as attack paths, determining the number of missing nodes based on an attack rule, selecting the optimal reachable paths meeting the conditions from the reachable paths, and complementing the attack chain by adopting the optimal reachable paths.

Further, the network security ontology comprises a multi-level ontology;

the relationship between the network security ontology and the instance comprises: relationships among multiple levels of ontologies, relationships among different ontologies, and relationships between an ontology and an instance;

the network security knowledge reasoning rules include:

the attribute reasoning is used for reasoning out the missing attribute of the instance according to the ontology;

and the relationship reasoning is used for reasoning out the missing relationship among the examples according to the relationship among the ontologies.

Further, in step 3, network security entity identification is performed based on network security ontology features, where the ontology features include: vulnerability names, attack names and negatives in a vulnerability library, worm names, trojan names, attack names and negatives in a virus library;

the network security entity identification based on the entity identification optimization model specifically comprises the following steps:

segmenting the labeled words in the corpus through jieba segmentation, converting the segmented words or characters into vectors by adopting a CBOW Multi-Word Context training model of Word2vec, and training Word vectors through the distance between feature words;

extracting character-level features by adopting CNN convolutional neural networks with convolutional kernels of different sizes;

extracting context information in the network security corpus by using a bidirectional LSTM model;

Under the constraint of the position relation between the keywords and the identification words in the body features, the CRF identification model is utilized to obtain the arrangement combination of the identification word feature labels.

Further, the method for obtaining the permutation and combination of the characteristic labels of the recognition words by using the CRF recognition model specifically comprises the following steps:

and respectively calculating recognition word label fraction including negative words output by the BiLSTM hidden layer, wherein the calculation mode is shown as the following formula:

fraction＝W _f ·O _t +b _f +F _n

wherein W is _f As a weight matrix, b _f For the offset, O _t To hide layer output result, F _n The sum of feature weights of the ontology features;

setting a transfer matrix T to represent transfer scores among tags; setting position templates of the negatives and the attack words to determine attributions of the negatives and the front and back attack names, calculating transfer scores of the negatives and the identification words according to the position templates of the negatives, and summing tag scores of the negatives and the identification words through a transfer matrix, wherein the calculation mode is shown in the following formula:

wherein x is _i Tag indicating negative word, x _i+1 A label representing the next negative word, y representing the attack word;

training parameters by adopting a cross entropy loss function and utilizing a random gradient descent learning algorithm, and removing attack names with negative words in the corpus according to ontology features to obtain the condition that each row in the corpus contains a vulnerability name, a Trojan name, the negative words, all attack names { att_1, att_2, & gt, att_n } and labels { att_1_tag, att_2_tag, & gt, att_n_tag }.

Further, the ontology-instance model comprises a one-to-one ontology-instance model and a one-to-many ontology-instance model;

in a one-to-one ontology-instance model, an attack ontology, a security event ontology, a vulnerability ontology, a Trojan horse ontology, a worm ontology and a snort alarm ontology are built, a secondary ontology is built for the security event ontology, the vulnerability ontology, the Trojan horse ontology and the worm ontology, and if all instances in each category of the ontology in the one-to-one ontology-instance model are associated with the same attack, a one-to-one relationship is built for the corresponding secondary ontology and the attack;

in the one-to-many ontology-instance model, an attack ontology, a security event ontology, a vulnerability ontology, a Trojan horse ontology, a worm ontology and a snort alarm ontology are built, a secondary ontology is built for the security event ontology, the vulnerability ontology, the Trojan horse ontology and the worm ontology, and if all instances in each category of the ontology in the one-to-many ontology-instance model are associated with the same plurality of attacks, a one-to-many relationship is built for the corresponding secondary ontology and attacks.

Further, the optimizing of the ontology-instantiation model includes:

the method comprises the steps of counting keywords contained in the same type of attack names, calculating the similarity by matching the identified attack names with the keywords and counting the successfully matched character number, and giving the attack names if the similarity is larger than a set threshold value;

Optimizing entity correlation, determining an attack name to be removed according to the occurrence position of a negative word, judging the spatial correlation between the negative word and the attack name, selecting the attack name with small reserved spatial correlation, deleting the attack name with large spatial correlation, and obtaining the spatial correlation by calculating Euclidean distance between the characteristic variables of the negative recognition word and the attack recognition word;

for category optimization: matching all the identified attack names in each statement with an attack class set, if the attack names are completely matched, directly giving the class labels of the attack, and storing the class labels of the attack into an attack label cache queue; otherwise, calculating the similarity between the identified attack name and the attack category, giving an attack name with a similarity result larger than a threshold value to the category label of the attack, and storing the attack name in an attack label cache queue; and then, calculating the spatial correlation between the negatives and the attack names, removing attack name labels with large spatial correlation with the negatives from an attack label cache queue, and finally obtaining the attack label queue as example class labels of the loopholes, the Trojan horses and the worms, and obtaining the ontology classification of the loopholes, the Trojan horses and the worms through merging and redundancy removal.

Further, the construction of the network security knowledge graph comprises the following steps:

for the structured data to express the relation among different knowledge bases, the examples of the different knowledge bases are related through attribute values;

for unstructured data, after entities are identified in a network security knowledge base and relationship classifications among the entities are determined, the network security ontology-instance model is optimized. And adding the identified entity into a corresponding ontology-instance model, reasoning to obtain triplet knowledge through SWRL (Semantic Web RuleLanguage), and associating the vulnerability, trojan horse and worm instance with the attack to complete the construction of the network security knowledge graph.

In step 4, the multi-mode data model is constructed based on a five-tuple model, in the multi-mode data model, the ontology is divided into a Primary ontology (PE) and a Secondary ontology (SE), the relationship between the Primary ontology PE and the Primary ontology PE is denoted as a relationship R, and the relationship between the Primary ontology PE and the Secondary ontology SE is denoted as a relationship P; the relation R represents the relation among main bodies, and corresponding space-time characteristics are extended by adding new nodes; the relation P expresses the attribute value of the main ontology and directly extends the relevant space-time characteristic on the attribute value of the main ontology PE; the multi-mode data model expresses the time, space information and space-time fusion characteristics among main bodies by adding attribute nodes, and expresses the time information, space and space-time fusion characteristics of the main bodies and secondary bodies by adding attribute values.

Further, in step 6, when judging whether there is an error state association sequence, judging whether there is an error state association sequence according to the time of node propagation of the attack and the source IP and destination IP of node propagation, when the time t of node propagation _i <t _j ，n _i [dip]＝n _j [sip]When outputting the error state sequence (s _i ,s _j ) Wherein n is _i And n _j Representing two nodes, dip representing destination IP, sip representing source IP, representing node n _i Destination IP and node n of (1) _j Is the same as the source IP of the attack slave node n _i Jump to node n _j ；

And judging each state association sequence based on the error state set, counting the error state sequences in the state association sequences, comparing the number of the error sequences with the total number of nodes in the sequences to obtain the error state transmission proportion, taking the IP with the proportion higher than the set threshold value as the IP associated with the error state for the first time, re-carrying out IP constraint, and re-obtaining the attack chain conforming to the attack rule.

Further, for an attack chain with a single node missing, complementing the missing node in the attack chain based on the multi-mode data model; for an attack chain with a plurality of nodes continuously missing, searching all reachable paths based on a multi-mode data model, sequencing all reachable paths according to the comprehensive weight of each path according to possible given corresponding weights as attack paths, determining the number of missing nodes based on an attack rule, selecting the optimal reachable paths meeting the conditions from the reachable paths, and complementing the attack chain by adopting the optimal reachable paths.

Further, the complement of the attack chain missing a single node specifically includes:

single point complement: for an attack chain of a missing single node, determining IP information of the missing node based on a multi-mode data model according to IP constraints in association constraints, traversing an attack rule base, performing similarity matching and rule constraint, and determining a missing attack name set;

single point topology complement: for a node A in known network attack data of a missing source IP/destination IP, adding a newly added node PEi-Rk-Pej for the node A and the destination IP thereof based on a multi-mode data model, wherein the destination IP of an inherited node A of the newly added node PEi-Rk-Pej is used as the source IP of the missing node; if the source IP of the node C in the acquired data is known, the source/destination IP is deleted on the node of the IP, a new node PEm-Rp-PEn is added for the node C and the source IP thereof based on a multi-mode data model, and the source IP of the new node PEm-Rp-PEn is used as the destination IP of the deleted node;

single point content complement: and carrying out name matching on known attacks in the attack chain and an attack rule base, selecting rules containing all attack names, filtering non-conforming rules by the positions of the attack names, and listing all attack rules conforming to matching conditions.

Further, when false alarm data exist in the data, the false alarm data comprise topology false alarm and content false alarm, whether the topology false alarm exists or not is judged according to a topology result in the multi-mode data model, and if so, pruning is directly carried out; if not, optimizing the attack chain through content pruning.

Further, when false data exists in the data, matching the topology obtained by association in the data with the topology in the attack scene in the multi-mode data model, if the matching is successful, pruning is carried out according to the comprehensive weight value of the reachable path, and the path with low weight is removed; if topology misinformation does not exist, constraint is carried out on an attack chain through an error state set, if the attack chain is associated with the error state set, pruning is directly carried out, otherwise pruning is carried out through semantic relation closeness, semantic relation closeness is calculated and determined from the semantic distances of adjacent nodes in the reachable paths, the semantic relation closeness is used as probability that the reachable paths are used as attack paths, and pruning is carried out on the reachable paths with the semantic relation closeness smaller than a set probability threshold.

A computer apparatus, comprising: comprises a processor, a memory and a program;

The program is stored in the memory, and the processor calls the program stored in the memory to execute the compound attack chain completion method based on the multi-mode data model.

A computer-readable storage medium, characterized by: the computer readable storage medium is used for storing a program for executing the compound attack chain complement method based on the multi-mode data model.

In the invention, a network security ontology model and an ontology-instance model are constructed based on the quintuple model, and under the constraint of the network security ontology-instance model, the problem that triplets cannot be obtained through methods such as entity identification, relation extraction, entity linking and disambiguation in the construction of a network security knowledge graph is solved by utilizing a method combining network security ontology features and entity identification; aiming at the characteristics of multi-entity, weak relation, named uniqueness, strong dependency relation among entities and the like of the network security corpus, the invention provides a network security entity identification and ontology-instance model optimization method based on ontology features, the characteristics of the network security corpus are comprehensively utilized, the accuracy of relation classification is improved by combining with a context, the ontology-instance model is optimized based on the relation classification, the identified entity is added into the model, network security knowledge is generated through rule reasoning, and the automatic association of the network security knowledge base is realized in a construction mode, and the accuracy of the network security knowledge is ensured; aiming at the problems that the traditional association analysis method can generate response delay and low accuracy rate when processing massive data, the invention dynamically prunes the associated attack chain through an error state transfer value by constructing an error state set and automatically clustering based on a network security knowledge graph and an attack rule. Based on the multidimensional data association and the special structure of the threat analysis model, a source/destination IP and acquisition/detection time for acquiring data are expressed by adding attribute nodes in the nodes, so that path support is provided for supplementing missing data; aiming at the problems of missing report and false report of detection equipment and acquisition equipment, an attack chain complement-pruning strategy based on optimal reachable path query is provided. The invention expresses space-time attribute based on a multi-mode data model, utilizes the propagation rule among nodes to complement the topology of the missing attack chain, and complements the content of the attack chain by traversing a matching attack rule base; and filtering and correlating to generate an error attack chain through the reachable path inquiry and the content pruning, and finally obtaining the attack chain conforming to the attack rule.

Drawings

FIG. 1 is a schematic diagram of steps of a multi-modal data model-based composite attack chain completion method of the present invention;

FIG. 2 is a schematic diagram of a relationship R that extends the corresponding spatio-temporal characteristics by adding new nodes;

FIG. 3 is a schematic diagram of a relationship P by adding new nodes to derive corresponding spatio-temporal characteristics;

fig. 4 is an internal structural view of the computer device in one embodiment.

Detailed Description

Referring to fig. 1, the method for completing the composite attack chain based on the multi-mode data model is characterized by comprising the following steps:

In step 1, a network security ontology model is built based on a quintuple model, wherein the quintuple model comprises a network security ontology, a network security knowledge instance, a relationship among the network security ontology and the instance, an attribute of the network security knowledge instance and a network security knowledge reasoning rule;

specifically, the network security ontology is a concept summarized and abstracted from network security knowledge, and the relationship among the concepts, and classifies and ranks the concepts. The relationship between the instances corresponding to the ontology is guided to be determined by determining the relationship between the ontologies, and the network security ontology comprises a multi-level ontology;

the network security instance is in a network security knowledge graph, the network security knowledge instance is specific network security knowledge corresponding to a network security ontology, and taking an asset ontology as an example, examples of an asset-software-operating system-Windows operating system are as follows: win7, win10, etc. Examples of attack-single step attack-buffer overflow attack vulnerabilities are: CVE-2019-1010309, CVE-2019-1010306, and the like.

The relationship between the network security ontology and the instance comprises: relationships between multi-level ontologies, relationships between different ontologies, relationships between ontologies and instances.

Attributes of the network security knowledge instance, including:

network security data attributes: in the network security knowledge graph, the network security data attribute is an attribute of a network security instance, such as a version number of an operating system, a serial number of a vulnerability, a discovery time, an update time, a hazard level, and the like.

Network security object attributes: in the network security knowledge graph, the object attribute of the network security knowledge instance, namely the relationship between the network security knowledge instances, is determined by the relationship between the network attack bodies. The relation between the multi-stage bodies is as follows: the relationships between different ontologies, such as (asset, subClassOf, hardware), (asset, subClassOf, software), include: hasExit, exploit, etc. (Windows operating system, hasExit, buffer overflow loopholes), etc. The relationship between the body and the instance is: instanceOf, such as (CVE-2019-1010298, instanceOf, buffer overflow loopholes), and the like.

The network security knowledge reasoning rule is constrained by the network security reasoning rule, and can mine new or hidden attribute of the network security knowledge instance and new or hidden relation between the network security knowledge instance, and the network security knowledge reasoning rule comprises:

And the attribute reasoning is used for reasoning out the missing attribute of the instance according to the ontology, wherein the attribute is represented by nodes, keys and value pairs.

And the relationship reasoning is used for reasoning out the missing relationship among the instances according to the relationship among the ontologies, and judging whether the relationship exists among the entities by calculating the reachable paths among the instances.

In step 2, building an ontology-instance model for guiding an instance to be added to a network security ontology model, and then optimizing the built ontology-instance model;

the ontology model constructed by the concept layer needs to depend on the relation classification among the examples in the data layer, and the ontology model is optimized into an ontology-example model, so that the classification of the ontology and the classification of the relation among the ontologies are refined.

Since the relationships of vulnerabilities, trojan instances, worm instances, alarm instances, and security event instances to attack combinations are uncertain, it is necessary to construct a network security ontology-instance model by extracting, generalizing all possible relationships from a large number of instances. Here, the ontology-instantiation model is classified into two cases of a one-to-one ontology-instantiation model and a one-to-many ontology-instantiation model.

After the build is complete, the optimization for the ontology-instantiation model includes:

optimizing the entity category:

due to the problems: the syntax form in the web security corpus is fixed, and based on entity recognition, the recognition result form is as follows: { vulnerability name/Trojan name/worm name: attacks that will/will not occur }, typically comprising a vulnerability name/Trojan name/worm name and one or more attack names, the attacks that will not occur are removed, and the filtered result is in the form of: { vulnerability name/Trojan name/worm name: attack 1, attack 2, … attack n }. The classification of the attacks is fixed, the classification of the loopholes, the trojans and the worms depends on the combination of the attacks, but the problem of inconsistent description of the attack names identified from the corpus can occur, for example, the authority upgrading, the privilege obtaining, the authority obtaining and lifting all represent the same attack. This problem can result in inaccurate classification of vulnerabilities, trojans, and worms.

The method for optimizing the entity class comprises the following steps: the method for measuring the similarity is adopted to unify the categories of the attack names, and concretely comprises the steps of counting keywords contained in the attack names of the same category, calculating the similarity by matching the identified attack names with the keywords and counting the successfully matched character numbers, and giving the attack categories to the attack names if the similarity is larger than a set threshold value;

optimizing the entity relevance:

the classification of loopholes, trojans and worms depends on the combination of attacks, and after the classes are unified for the identified attack names, the attack names to be removed are determined according to the occurrence positions of the negatives.

Judging the spatial correlation between the negative word and the attack name, selecting the attack name with small reserved spatial correlation, deleting the attack name with large spatial correlation, and obtaining the spatial correlation by calculating the Euclidean distance between the characteristic variables of the negative recognition word and the attack recognition word, wherein the smaller the distance is, the larger the spatial correlation between the two recognition words is. The method comprises the steps of carrying out a first treatment on the surface of the

For category optimization:

based on the generated vulnerability names, trojan names, worm names, attack names and negatives and attack categories, the invention designs a category optimization method, which specifically comprises the following steps:

Matching all the identified attack names in each statement with an attack class set, if the attack names are completely matched, directly giving the class labels of the attack, and storing the class labels of the attack into an attack label cache queue; otherwise, calculating the similarity between the identified attack name and the attack category, giving an attack name with a similarity result larger than a threshold value to the category label of the attack, and storing the attack name in an attack label cache queue; and then, calculating the spatial correlation between the negatives and the attack names, removing attack name labels with large spatial correlation with the negatives from an attack label cache queue, and finally obtaining the attack label queue as example class labels of the loopholes, the Trojan horses and the worms, and obtaining the ontology classification of the loopholes, the Trojan horses and the worms through merging and redundancy removal.

In step 3, network security entity identification is performed based on network security ontology features, the ontology features including: vulnerability names, attack names and negatives in a vulnerability library, worm names, trojan names, attack names and negatives in a virus library;

The method for obtaining the permutation and combination of the characteristic labels of the recognition words by using the CRF recognition model specifically comprises the following steps:

fraction＝W _f ·O _t +b _f +F _n

wherein x is _i Tag indicating negative word, x _i+1 A label representing the next negative word, y representing Attack words;

In step 3, constructing a network security knowledge graph;

The map construction is a process of generating and storing triples. The structured data can directly generate triples, unstructured data can obtain triples knowledge through SWRL (Semantic Web RuleLanguage) reasoning to generate triples.

Step 4: the acquired network safety related data is represented by a multi-mode data model, the multi-mode data model is constructed based on a quintuple model,

in the multi-mode data model, the ontology is divided into a Primary ontology (PE) and a Secondary ontology (SE), the relationship between the Primary ontology PE and the Primary ontology PE is denoted as a relationship R, and the relationship between the Primary ontology PE and the Secondary ontology SE is denoted as a relationship P;

the relation R represents the relation among main ontologies, and corresponding space-time characteristics are extended by adding new nodes. For example, as shown in FIG. 2, the relationships (PE 1, R1, PE 2) are divided into (PE 1, null, PE1-R1-PE 2) and (PE 1-R1-PE2, tail, PE 2), and the added nodes PE1-R1-PE2 are added with space-time characteristics, time zone indicates a time zone, space zone indicates a space zone, and T-S indicates the integrated expression of the time-space characteristics. By representation of the spatio-temporal characteristics, the spatio-temporal characteristic values can be directly modified without changing the graph structure when the spatio-temporal knowledge changes.

The relation P expresses the attribute value of the main ontology, and the main ontology is obviously different from the secondary ontology, so that a new node is not required to be added between PE and SE, and the related time-space characteristic is directly derived from the attribute value of PE. As shown in fig. 3, the relationship between the primary entity PE1 and the secondary entity SE2 is P1, the space-time characteristics of SE2 exist, the time zone represents a time interval, mainly used for representing acquisition time and detection time, the spatial represents spatial information, and the T-S represents the comprehensive expression of the time-space characteristics. As with the relationship R, by the representation of the spatiotemporal characteristics, when the spatiotemporal knowledge changes, the spatiotemporal characteristic values can be directly modified without changing the graph structure.

The multi-mode data model expresses the time, space information and space-time fusion characteristics among main bodies by adding attribute nodes, and expresses the time information, space and space-time fusion characteristics of the main bodies and secondary bodies by adding attribute values. By defining PE and SE to distinguish the relationship and attribute, the query and calculation efficiency of the relationship and attribute can be improved. In the field of network security, when the multi-mode data model carries out compound attack research and judgment, time sequencing and IP constraint on real-time collected data and detected data can be completed rapidly, and the accuracy and efficiency of attack research and judgment are improved.

In step 5, carrying out association analysis on the composite attack based on the network security knowledge graph, wherein the input data are processed acquisition data and detection data, and the output data are composite attack chains, wherein the acquisition data mainly refer to time, IP and security events extracted from data such as terminal acquisition, honeypot acquisition, node acquisition, system log acquisition, abnormal behavior acquisition, firewall log acquisition, IDS log acquisition and the like; the detection data mainly comprise threat elements such as time, source IP, destination IP, vulnerability, virus, snort alarm and the like extracted from data such as flow analysis, IDS detection, virus detection, snort rule filtration and the like. And the association analysis is to mine out the associated security events and threat elements from a large number of security events and threat elements according to constraint conditions, and form a composite attack chain. The constraints are as follows.

Time sequencing: the compound attack is composed of a plurality of single-step attack permutation and combination, the collected data and the detected data are firstly matched with the network security knowledge graph, the single-step attack with time, source IP and destination IP is returned, and the single-step attacks are sequenced according to time sequence based on the time window of analysis.

IP constraints: the single-step attack is connected based on the IP constraint of node propagation, and the compound attack usually involves a plurality of nodes, and the second constraint on input data is the propagation of the nodes. The IP in the collected data generally refers to a destination IP, the detected data generally includes a source IP and a destination IP, and if the source IP and the destination IP are the same, the IP is indicated to be directed to the same node, and if the source IP and the destination IP are different, the IP is indicated to be directed to different nodes. Under the condition that the time ordering is satisfied, if the destination IP of the previous node is the same as the source IP of the next node, the two nodes can be judged to be connected.

Attack rule constraint: under the condition that the time ordering and the IP constraint are met, the input data matches an attack chain generated by single-step attack meeting the time ordering and the IP constraint with an attack rule, and an attack chain conforming to the attack rule is reserved;

in an analysis window, if only one attack is simulated, two nodes can be associated through IP constraint to prepare for generating a composite attack chain as long as the nodes are ordered in time sequence according to the rule of the association constraint. In practical application, in one analysis window, a plurality of attacks are simulated for the same node, and when two nodes are associated, a lot of error data can be generated. For example, 2 attacks are simulated for three nodes at the same time, wherein the first attack firstly detects a port, enters a plate hopping machine through the opened port, then infects a second plate hopping machine and a target machine by using worm viruses, and finally modifies file contents by using loopholes on the target machine; the second attack firstly uses the system loophole to enter the plate jumping machine, then detects the port, enters the second plate jumping machine through the opened port, and finally utilizes the Trojan horse to modify the file content on the target machine. The attack process is the same, but the selected attack tools are different, 6 pieces of collected and detected data are generated, and the results of the input data after being matched with the network security knowledge graph and time ordered are shown in table 1.

TABLE 1 input data within a time window

According to the IP constraint condition of association analysis, when 6 pieces of data are associated and constrained in the same analysis window, the association of the IP firstly selects data with similar time, and the obtained attack chain is as follows: (probe port-security event attack- > modification-Trojan attack) and (buffer overflow-vulnerability attack- > propagation-worm attack- > data tampering-vulnerability attack). Obviously, the obtained result does not accord with the rule of attack and does not accord with the expected analysis result. Therefore, it is not sufficient to set only the time ordering, IP constraint, and attack rule constraint in the association constraint.

In order to improve the accuracy of the association analysis, an optimization mechanism based on error state dynamic clustering is added when IP constraint is carried out on single-step attack. The error state set is initially empty, the association result after IP constraint is subjected to attack rule constraint, and the generated error state forms the error state set. First, the wrong state set is clustered, if the single step attack name associated by the IP constraint exists in the wrong state set, the result is discarded, and the IP constraint is carried out again on the analysis window.

In step 6, judging whether an error state association sequence exists in the obtained attack chain, storing the error state association sequence into an error state set, discarding the attack chain with the error state association sequence, finding out the IP associated with the error state for the first time according to the transmission condition of the error state association sequence in the error state set, and re-carrying out IP constraint to obtain the attack chain conforming to the attack rule again;

specifically in step 6, in determining whether there is an error state associationWhen the sequence is in sequence, judging whether an error state associated sequence exists or not according to the propagation time of the attacked node and the source IP and the destination IP propagated by the node, and when the propagation time t of the node is _i <t _j ，n _i [dip]＝n _j [sip]When outputting the error state sequence (s _i ,s _j ) Wherein n is _i And n _j Representing two nodes, dip representing destination IP, sip representing source IP, representing node n _i Destination IP and node n of (1) _j Is the same as the source IP of the attack slave node n _i Jump to node n _j ；

The error states are transitive, i.e. once an error state occurs, the error state following it may be caused by the initial error state, and in a chain the more times the error state is associated, the more closely the position representing the error is near the initial node. To determine the IP location associated with the first error condition, a determination may be made by calculating the delivery of the error condition. In an analysis window, error state transfer of the nodes is calculated respectively, and in the analysis process, if new error state association occurs, state transition combinations which can be in error are enumerated and dynamically clustered again.

Within a time window, the input data is a sequence (I ₁ ,I ₂ ,…I _n ) By satisfying time t _i ＜t _j ，n _i [dip]＝n _j [sip]Will output a plurality of sequences (I _i ,I _j ) If (I) ₁ ,I ₂ ) Is associated with an error state, then (I ₁ ,I ₂ ,I ₃ )，(I ₁ ,I ₂ ,I ₃ ,I ₄ )，…(I ₁ ,I ₂ ,…I _n ) An error state association is possible. And judging each state association sequence based on the error state set, counting the error state sequences in the state association sequences, and comparing the number of the error sequences with the total number of nodes in the sequences to obtain the error state transmission proportion. The higher the ratio, the more state association errors representing the first two in the sequence, the more state association needs to be performed; the lower the ratio, the later state association errors in the sequence are represented. Associating corresponding by first occurrence of error stateIP to re-associate the state. And obtaining the attack chain conforming to the attack rule again.

In step 7: aiming at the problem of data missing report, the completion of an attack chain is carried out, including topology completion and content completion. For an attack chain with a single node missing, complementing the missing node in the attack chain based on the multi-mode data model;

specifically, the complement of the attack chain missing a single node specifically includes:

For an attack chain with a plurality of nodes continuously missing, searching all reachable paths based on a multi-mode data model, sequencing all reachable paths according to the comprehensive weight of each path according to possible given corresponding weights as attack paths, determining the number of missing nodes based on an attack rule, selecting the optimal reachable paths meeting the conditions from the reachable paths, and complementing the attack chain by adopting the optimal reachable paths.

In step 7, when false alarm data exists in the data, the false alarm data comprises topology false alarm and content false alarm, whether the topology false alarm exists or not is judged according to a topology result in the multi-mode data model, and if so, pruning is directly carried out; if not, optimizing the attack chain through content pruning.

Specifically, when false data exists in the data, matching topology obtained by association in the data with topology in an attack scene in a multi-mode data model, pruning according to a comprehensive weight value of an reachable path if matching is successful, and removing a path with low weight; if topology misinformation does not exist, constraint is carried out on an attack chain through an error state set, if the attack chain is in error state association, pruning is directly carried out, otherwise pruning is carried out through semantic relation closeness, semantic relation closeness is determined through calculating semantic distances of adjacent nodes in the reachable paths, the semantic relation closeness is used as probability that the reachable paths are used as attack paths, and pruning is carried out on the reachable paths with the semantic relation closeness smaller than a set probability threshold.

The patent refers to the advantages that the domain knowledge graph comprises entity concepts and the open domain knowledge graph to automatically acquire the relationship between the concepts, and combines the advantages of the two construction modes to automatically construct the network security domain knowledge graph.

Aiming at the problem of low analysis efficiency caused by error and redundant data in mass data analysis, the patent provides a multidimensional data association analysis method based on a dynamic clustering mechanism. The method takes the space-time characteristics and the attack rule of the attack as the constraint conditions of the associated attack chain, adopts a dynamic clustering mechanism to automatically cluster the wrong state association under the constraint of an attack rule base, and dynamically prunes the associated attack chain according to the wrong state transfer value.

In the network attack and defense experiment, the network topology is fixed, but the communication between the nodes is uncertain, for example, the incomplete attack path can be caused by the fact that any two nodes in the attack path are not communicated. Similarly, the attack path is complete, and the lack of any node information in the acquired data can cause the incomplete excavated attack path. The reachable path query algorithm refers to judging whether a given two nodes can find a path in the graph data to connect the two nodes. The shortest path query algorithm on the uncertainty graph, i.e., finding the shortest path probability among all paths greater than a certain threshold between a given two nodes. The patent combines the reachable path query with the shortest path query on the uncertain graph, proposes an optimal reachable path query algorithm, and solves the problem of attack chain completion under the condition of missing data.

In an embodiment of the present invention, there is also provided a computer apparatus including: comprises a processor, a memory and a program;

the program is stored in the memory, and the processor calls the program stored in the memory to execute the composite attack chain complement method based on the multi-mode data model.

The computer device may be a terminal, and its internal structure may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a multi-modal data model-based composite attack chain completion method. The display screen of the computer device can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer device can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer device, and can also be an external keyboard, a touch pad or a mouse and the like.

The Memory may be, but is not limited to, random access Memory (Random Access Memory; RAM; ROM; programmable Read-Only Memory; PROM; erasable ROM; erasable Programmable Read-Only Memory; EPROM; electrically erasable ROM; electric Erasable Programmable Read-Only Memory; EEPROM; etc.). The memory is used for storing a program, and the processor executes the program after receiving the execution instruction.

The processor may be an integrated circuit chip with signal processing capabilities. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), and the like. The processor may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 4 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer apparatus to which the present application may be applied, and that a particular computer apparatus may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In an embodiment of the present invention, there is also provided a computer readable storage medium storing a program for executing the above-described multi-modal data model-based composite attack chain completion method.

It will be appreciated by those skilled in the art that embodiments of the invention may be provided as a method, a computer device, or a computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations of methods, computer apparatus, or computer program products according to embodiments of the invention. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart.

The above describes in detail the application of the multi-modal data model-based composite attack chain completion method, the computer device and the computer readable storage medium, and specific examples are applied to illustrate the principles and the implementation of the present invention, and the above examples are only used to help understand the method and the core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. The compound attack chain complement method based on the multi-mode data model is characterized by comprising the following steps:

2. The multi-modal data model-based composite attack chain completion method of claim 1, wherein: the network security ontology comprises a multi-level ontology;

The network security knowledge reasoning rules include:

3. The multi-modal data model-based composite attack chain completion method of claim 2, wherein: in step 3, network security entity identification is performed based on network security ontology features, the ontology features including: vulnerability names, attack names and negatives in a vulnerability library, worm names, trojan names, attack names and negatives in a virus library;

4. The multi-modal data model-based composite attack chain completion method of claim 3, wherein:

fraction＝W _f ·O _t +b _f +F _n

5. The multi-modal data model-based composite attack chain completion method of claim 4, wherein:

the ontology-instance model comprises a one-to-one ontology-instance model and a one-to-many ontology-instance model;

6. The multi-modal data model-based composite attack chain completion method of claim 5, wherein: optimization for the ontology-instantiation model includes:

7. The multi-modal data model-based composite attack chain completion method of claim 6, wherein the network security knowledge graph construction comprises:

for unstructured data, after identifying entities in a network security knowledge base and determining relationship classifications among the entities, optimizing a network security entity-instance model; and adding the identified entity into a corresponding ontology-instance model, reasoning to obtain triplet knowledge through SWRL (Semantic Web RuleLanguage), and associating the vulnerability, trojan horse and worm instance with the attack to complete the construction of the network security knowledge graph.

8. The multi-modal data model-based composite attack chain completion method of claim 7, wherein: in step 4, the multi-mode data model is constructed based on a five-tuple model, in the multi-mode data model, the ontology is divided into a Primary ontology (PE) and a Secondary ontology (SE), the relationship between the Primary ontology PE and the Primary ontology PE is denoted as a relationship R, and the relationship between the Primary ontology PE and the Secondary ontology SE is denoted as a relationship P; the relation R represents the relation among main bodies, and corresponding space-time characteristics are extended by adding new nodes; the relation P expresses the attribute value of the main ontology and directly extends the relevant space-time characteristic on the attribute value of the main ontology PE; the multi-mode data model expresses the time, space information and space-time fusion characteristics among main bodies by adding attribute nodes, and expresses the time information, space and space-time fusion characteristics of the main bodies and secondary bodies by adding attribute values.

9. The multi-modal data model-based composite attack chain completion method of claim 8, wherein: in step 6, when judging whether there is an error state associated sequence, judging whether there is an error state associated sequence by the time of node propagation of the attack and the source IP and destination IP of node propagation, when the time t of node propagation _i <t _j ，n _i [dip]＝n _j [sip]When outputting the error state sequence (s _i ,s _j ) Wherein n is _i And n _j Representing two nodes, dip representing destination IP, sip representing source IP, representing node n _i Destination IP and node n of (1) _j Is the same as the source IP of the attack slave node n _i Jump to node n _j ；

10. The multi-modal data model-based composite attack chain completion method of claim 9, wherein: for an attack chain with a single node missing, complementing the missing node in the attack chain based on the multi-mode data model; for an attack chain with a plurality of nodes continuously missing, searching all reachable paths based on a multi-mode data model, sequencing all reachable paths according to the comprehensive weight of each path according to possible given corresponding weights as attack paths, determining the number of missing nodes based on an attack rule, selecting the optimal reachable paths meeting the conditions from the reachable paths, and complementing the attack chain by adopting the optimal reachable paths.

11. The multi-modal data model-based composite attack chain completion method of claim 10, wherein: the complement of the attack chain missing a single node specifically includes:

12. The multi-modal data model-based composite attack chain completion method of claim 11, wherein: when false alarm data exists in the data, the false alarm data comprises topology false alarm and content false alarm, judging whether the topology false alarm exists according to a topology result in a multi-mode data model, and if so, pruning directly; if not, optimizing the attack chain through content pruning.

13. The multi-modal data model-based composite attack chain completion method of claim 12, wherein: when false data exists in the data, matching the topology obtained by association in the data with the topology in an attack scene in the multi-mode data model, if the matching is successful, pruning is carried out according to the comprehensive weight value of the reachable path, and the path with low weight is removed; if topology misinformation does not exist, constraint is carried out on an attack chain through an error state set, if the attack chain is associated with the error state set, pruning is directly carried out, otherwise pruning is carried out through semantic relation closeness, semantic relation closeness is calculated and determined from the semantic distances of adjacent nodes in the reachable paths, the semantic relation closeness is used as probability that the reachable paths are used as attack paths, and pruning is carried out on the reachable paths with the semantic relation closeness smaller than a set probability threshold.

14. A computer apparatus, comprising: comprises a processor, a memory and a program;

the program is stored in the memory, and the processor invokes the memory-stored program to perform the multi-modal data model-based composite attack chain completion method of claim 1.

15. A computer-readable storage medium, characterized by: the computer readable storage medium is for storing a program for executing the multi-modal data model-based composite attack chain completion method of claim 1.