CN116318818A

CN116318818A - Network security intelligent decision automatic arrangement response method and system

Info

Publication number: CN116318818A
Application number: CN202211732878.5A
Authority: CN
Inventors: 姜迎畅; 胡浩; 李飞扬; 谭晶磊; 张玉臣; 刘鹏程; 汪永伟; 周洪伟; 孙怡峰; 张恒巍
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2022-12-30
Filing date: 2022-12-30
Publication date: 2023-06-23

Abstract

The invention belongs to the technical field of network security, and particularly relates to a network security intelligent decision automatic arrangement response method and a system, which construct a vulnerability attack and defense knowledge graph according to network vulnerability attack information; performing alarm detection on network attack behaviors, and mapping an environment state of an alarm detection result to a vulnerability attack and defense knowledge map state node through matching mapping, wherein the environment state comprises: network environment and attack information; responding to the alarm information according to the known knowledge graph state nodes of vulnerability attack and defense, the response security decision set and the attack type, and configuring the firewall according to the response decision. The invention integrates the knowledge graph construction of vulnerability attack and defense, the network attack warning and the strategy scenario arrangement into a whole to meet the requirements of security operation and maintenance service, can realize timely and efficient response when facing attack, and is convenient for network deployment.

Description

Network security intelligent decision automatic arrangement response method and system

Technical Field

The invention belongs to the technical field of network security, and particularly relates to a network security intelligent decision automatic arrangement response method and system.

Background

The network security operation and maintenance is used for continuously sensing, monitoring, testing, analyzing, alarming, responding, configuring and recovering various elements (assets, weaknesses and threats) of the network security space, and continuously optimizing to ensure the security of the network space and elastically support business strategy of enterprises and organizations. Enterprises and organizations have conducted extensive research in order to achieve safe and efficient operation. However, as the security status of the network is increasingly improved, network attack means are diversified and the security running of the network is gradually deteriorated, the existing security operation and maintenance work faces the following challenges due to the construction concept of the traditional security operation and maintenance center: 1. threat event and alarm information processing tasks are heavy, the number of threat events is increased continuously, network countermeasure is increased continuously, meanwhile, the number of alarms is increased continuously due to the fact that security tools are overlapped continuously, and processing of massive alarm information becomes one of important work in security operation and maintenance work. 2. The manpower is insufficient, experience is difficult to solidify, the current safety operation and maintenance post skill requirement is increasingly complex, so that safety practitioners are deficient, the safety operation and maintenance personnel are particularly short and skill is insufficient, the repeatability labor capacity is large, the efficiency is low, the safety operation and maintenance process is the condensation of the safety actual combat technology, the safety event processing process is mostly free of record, knowledge of the safety operation and maintenance process is difficult to realize, experience and process of information safety event response cannot obtain sediment with proper and useful forms, and the safety capability is not closed loop. 3. The integration of equipment isolation technology is low, and the application of a large number of safety tools causes operators to manually switch back and forth among various tools in operation. The lack of cooperation between the person and the tool and between the tools restricts the working efficiency and subjective motility of the person.

Due to the problems of insufficient safety operation and maintenance personnel, alarming fatigue, difficult knowledge of operation flow, lack of coordination between people and tools and the like, the fragmentation phenomenon of operation work is serious, a large number of break points exist, and the efficiency is difficult to improve. In addition, the network security attacks against the trend is vigorous, and the network security simply looks at that the strategy for preventing and stopping has failed, so that more attention must be paid to detection and response. Enterprises and organizations are required to construct a brand new security protection system integrating prevention, detection, response and prevention on the premise that the network is already under attack. More attention is paid to the novel detection of existing detection response type security products, particularly in the field of unknown threat detection. Although the user obtains lower average detection time MTTD by the products and the technologies, attack and invasion can be detected more quickly and accurately; however, these product technologies do not help the user reduce the average response time MTTR and do not form an integrated safety shield system.

Disclosure of Invention

Therefore, the invention provides the automatic arrangement and response method and system for the network security intelligent decision, which integrates the knowledge graph construction of vulnerability attack and defense, the network attack warning and the intelligent script strategy arrangement into a whole to meet the requirements of security operation and maintenance business, and can realize timely and effective response when facing attack.

According to the design scheme provided by the invention, the invention provides a network security intelligent decision automatic arrangement response method, which comprises the following contents:

constructing a vulnerability attack and defense knowledge graph according to the network vulnerability attack information;

performing alarm detection on network attack behaviors, and mapping an environment state of an alarm detection result to a vulnerability attack and defense knowledge map state node through matching mapping, wherein the environment state comprises: network environment and attack information;

responding to the alarm information according to the known knowledge graph state nodes of vulnerability attack and defense, the response security decision set and the attack type, and configuring the firewall according to the response decision.

As the automatic programming response method of the network security intelligent decision, the invention further constructs a vulnerability attack knowledge graph according to the vulnerability attack information, comprising the following steps:

firstly, establishing an alarm knowledge base for describing statistical attack mode data and a defense scenario knowledge base for describing statistical response security policies according to network vulnerability attack information;

then, an atomic attack set is obtained through instantiation of the atomic attack, and a vulnerability attack and defense knowledge graph is generated by utilizing the atomic attack set.

As the automatic arrangement response method of the network security intelligent decision, further, the triple Attack Pattern DB= (predicte, type, attack Pattern) is utilized in the alarm knowledge base to describe the Attack mode, the triple Strategy= (Vul, attack PatternDB, payLoad) is utilized in the defending scenario knowledge base to describe the response security policy, wherein Type is an argument all-Type set in the Attack mode, predicte is all Predicate sets required for describing the preconditions and the consequences of the Attack mode, attack Pattern is an Attack mode set, and each Attack mode in the Attack mode set is represented by a five-tuple (Name, vul, var, pre, eff) description; vuls is all vulnerability sets that can be utilized by the attack mode; var is a local variable set, each element in the set is expressed by < v, t >, v is an argument, t is the Type to which the argument belongs, and t epsilon Type; pre and Eff are preconditions and consequences that the attack patterns are utilized respectively in terms of predicates in the predicte set, and PayLoads are a set of defending script formulated for different attack patterns.

As the network security intelligent decision automatic arrangement response method of the invention, further, the method for obtaining the atomic attack set by instantiating the atomic attack comprises the following steps: firstly, storing response security policy attributes in a defense script library in a tree structure, and establishing a dispatch queue about the response security policy attributes of a target environment; then, in the process of constructing the atlas, traversing the attributes in the dispatch queue, generating an atomic attack set about the attributes and the executable thereof according to the environment state stored in the data structure and by instantiating the attack mode by utilizing the target network environment attributes, and updating the executable alarm knowledge base information into the tree structure.

As the automatic arrangement and response method of the network security intelligent decision in the invention, further, in the process of carrying out alarm detection on the network attack behavior, the network attack behavior is identified by carrying out cross comparison on alarm information of different sources according to the preset association rule, wherein the preset association rule comprises the following steps: setting an abnormal threshold value of the data packet flow, detecting abnormal flow by using the threshold value, and supplementing an association rule by using an association rule mining algorithm.

As the network security intelligent decision automatic arrangement response method of the invention, further, the association rule is supplemented by using an association rule mining algorithm, comprising:

Firstly, performing intrusion detection on a network by using an intrusion detection tool, storing an intrusion detection log file into a database, and identifying association rules according to log information in the database;

then, converting the association rule into an intrusion detection rule of an intrusion detection tool and carrying out rule configuration in an intrusion detection tool rule base;

and finally, carrying out alarm detection on the network attack by using the configured intrusion detection rule, and storing an alarm detection result into a related log file.

As the network security intelligent decision automatic arrangement response method of the invention, further, responding to the alarm information according to the known vulnerability attack and defense knowledge graph state node, the response security decision set and the attack type comprises the following steps:

firstly, mapping vulnerability attack and defense knowledge map state nodes to corresponding attack modes according to a maximum degree principle, wherein the maximum degree principle is that the maximum attack mode degree is preferentially selected for mapping when a plurality of attack modes are consistent;

and then, optimizing a state action estimation function by utilizing a pre-trained network attack and defense deep learning model, mapping an attack mode to a response safety decision, performing self-adaptive optimization on the response safety decision in an online learning mode, and outputting an optimal response decision according to response income.

As the network security intelligent decision automatic arrangement response method of the invention, further, in the pre-training of the network attack and defense deep learning model, the CVE vulnerability library is used as a training set, and the action estimation function Q is operated according to the current input state s and the latest state _θ (s, a) selecting with probability 1- ε

Action is randomly selected by probability epsilon, and epsilon is in interval (0, 1)]The internal is linearly changed from large to small until the training of the network attack and defense deep learning model meets the expectation, wherein θ is a network parameter, A is a response action, and +.>

Is an action space.

As the network security intelligent decision automatic arrangement response method, when the state action estimation function is optimized by utilizing the pre-trained network attack and defense deep learning model, the Bootstrap characteristic of the Belman formula is utilized, and the function is utilized

Calculating a target value, converting a state motion update problem into a function fitting problem, and approximating the target value function to an optimal Q value by updating a learning rate alpha, wherein E _s,a～D Representing the expected defending benefit, D represents the probability of selecting defending action a at ambient state s, (s ', a ') represents the new state s, the strategy of selecting new action a ' at the time, Q _θ (s ', a ') represents a policy benefit function, gamma represents a benefit discount rate, r (s ', a ') represents a new state s, and a jackpot for the new action a ' is selected.

Further, the invention also provides a network security intelligent decision automatic arrangement response system, which comprises: the system comprises a data management module, an alarm analysis module and an intrusion detection module, wherein,

the data management module is used for constructing a vulnerability attack and defense knowledge graph according to the network vulnerability attack information;

the alarm analysis module is used for carrying out alarm detection on the network attack behavior and mapping the environment state of the alarm detection result onto the vulnerability attack and defense knowledge map state node through the matching mapping, wherein the environment state comprises: network environment and attack information;

the intrusion detection module is used for responding to the alarm information according to the known knowledge graph state nodes of vulnerability attack and defense, the response security decision set and the attack type, and configuring the firewall according to the response decision.

The invention has the beneficial effects that:

according to the method, visual description of network environment states, accurate detection of network abnormal flow, arrangement of emergency response strategies and targeted defense actions are realized through coordination and coordination among vulnerability attack and defense knowledge graph construction, intrusion detection and automatic response intelligent decisions based on reinforcement learning. The vulnerability attack and defense knowledge graph is constructed through the alarm knowledge base and the defense drama library, concepts, entities, attributes and correlations thereof in the security field are formally described, and the concepts and the entities are mutually connected to form a mesh graph structure. The response problem to the attack is converted into the problem of searching the optimal path and predicting the link, and knowledge support is provided for assisting intelligent decision. The scheme can be deployed in a raspberry group, is convenient to deploy, gets rid of the dependence of the existing system on the running environment, can be applied to various places such as intelligent families, mine detection, campus networks and the like by combining soft and hard, implements alarm response and decision defense on networks in various environments, and has good mobility. Aiming at the situation that a complex task which is closer to the actual situation often has a large state space and a continuous action space, in reinforcement learning, a network attack and defense deep learning model is utilized to optimize a state action estimation function, an updating problem is converted into a function fitting problem, an attack mode is mapped to a response safety decision, a Q value can be not stored any more through a neural network in the deep learning, the response safety decision is adaptively optimized in an online learning mode, and an optimal response decision is output according to response income; in the case of using the CVE vulnerability database as a training set and combining with the deep learning technology, Q values can be not stored any more by introducing a neural network in the deep learning, so that the application of deployment implementation in an actual network space is facilitated.

Description of the drawings:

FIG. 1 is a schematic diagram of an automatic response arrangement flow of network security intelligent decisions in an embodiment;

FIG. 2 is a schematic diagram of a network security intelligent decision automatic arrangement response system architecture in an embodiment;

FIG. 3 is a schematic diagram of a vulnerability attack and defense knowledge graph construction flow in an embodiment;

FIG. 4 is a flowchart of analysis of attack alarms in an embodiment;

FIG. 5 is a flowchart of the Apriori algorithm in an embodiment;

FIG. 6 is a schematic diagram of a correlation analysis flow in an embodiment;

FIG. 7 is a comparative illustration of reinforcement learning mapping structure in an embodiment;

FIG. 8 is a schematic flow chart of an intrusion automatic response decision-making based on reinforcement learning in an embodiment;

fig. 9 is a schematic flow chart of strategy arrangement of multi-stage attack and defense game in the embodiment.

The specific embodiment is as follows:

the present invention will be described in further detail with reference to the drawings and the technical scheme, in order to make the objects, technical schemes and advantages of the present invention more apparent.

The product of combining artificial intelligence with a decision making system is a computer-implemented technology, and by introducing intelligent data processing theory and method, some problems in a quantitative analysis model are solved. Compared with the traditional decision, the intelligent decision technology based on the neural network and the deep learning has obvious superiority in the aspects of parallel reasoning, fuzzy data processing, self-adaptive capacity, fault tolerance capacity and the like, and provides technical support for optimizing the intelligent decision problem in the network security scene. However, the current intelligent decision technology is only suitable for solving the problem of small scale, and state explosion is easy to generate for the perception state information and the action space in the complex environment of the network security application, and the performance of the decision method is limited to a great extent, so that the research of a feasible network security intelligent decision scheme is urgent. Referring to fig. 1, an embodiment of the present disclosure provides a method for automatically arranging and responding to network security intelligent decisions, which includes:

S101, constructing a vulnerability attack and defense knowledge graph according to network vulnerability attack information;

s102, carrying out alarm detection on network attack behaviors, and mapping an environment state of an alarm detection result onto a vulnerability attack and defense knowledge map state node through matching mapping, wherein the environment state comprises: network environment and attack information;

and S103, responding to the alarm information according to the known vulnerability attack and defense knowledge graph state node, the response security decision set and the attack type, and configuring the firewall according to the response decision.

In the embodiment of the scheme, the vulnerability attack and defense knowledge graph is constructed, the association analysis is carried out on the network attack or the abnormal behavior, the attack alarm response is realized, finally the detection result and the vulnerability attack and defense knowledge graph are transmitted to the response decision maker for decision making, various hidden threats can be found, the weakness of the current safety protection is analyzed, the safety protection strategy is perfected, the local offline threat information and cloud collaborative defense is realized, and the like. The intrusion response is based on reinforcement learning technology, has the characteristics of high safety, strong portability and the like, can effectively attack unknown network flows, and enhances the defending capability of the system.

As a preferred embodiment, further, constructing a vulnerability attack and defense knowledge graph according to the network vulnerability attack information includes:

The vulnerability attack and defense knowledge graph is used as a high-efficiency organization form of security knowledge such as entities, concepts and the like, can exert the advantage of knowledge integration, organizes scattered multi-source heterogeneous security data, provides support in data analysis and knowledge reasoning for threat modeling, risk analysis, attack reasoning and the like of a network security space, and accelerates security to enter a cognitive intelligent stage. In this embodiment, referring to fig. 3, according to the vulnerability attack information, a knowledge graph construction language is used to describe a target environment, and the knowledge graph construction language and a typical alarm library are used as input of an automatic vulnerability attack and defense knowledge graph construction engine, the output of the automatic vulnerability attack and defense knowledge graph construction engine is all atomic attack sets which can be executed in a target network, for visualization of the knowledge graph, a script system part can be generated through an instantiated atomic attack, and a Neo4j tool is used to automatically generate a visualized vulnerability attack and defense knowledge graph.

The alarm knowledge base is built by adopting an Attack mode, and the grammar describing the Attack mode can be formally expressed as a triplet attach Pattern DB= (precede, type, attach Pattern), wherein the Type set contains all types of arguments in the Attack mode; the predicte set contains all predicates needed to describe the preconditions and consequences of the Attack Pattern, and the attach Pattern set contains all Attack patterns, each described in (Name, vul, var, pre, eff). Here Name is the Name of the attack pattern; the Vuls set contains all vulnerabilities that can be exploited by the attack mode; var is a local variable set, each element is expressed by < v, t >, where v is an argument and t epsilon Type is the Type to which the argument belongs; pre and Eff describe the preconditions and consequences of the attack pattern being exploited in terms of predicates in the set, respectively, the relationship of the predicates in the set being an AND relationship. Based on the predefined asset vulnerability, vulnerability and attack mode in the alarm library, referring to the attack and defense actions of the Internet emergency center on various network attacks, making a response security policy and making a security script to establish a defending script library, and responding to the network attacks by using the defending script library. The syntax describing the Attack response may be expressed as the triplet strategy= (Vul, attach PatternDB, payLoad), where the PayLoads set contains scripts formulated for different Attack modes.

And generating a script system component through the instantiated atomic attack, and automatically generating a vulnerability attack and defense knowledge map by utilizing a python written script according to the atomic attack set. Visual description is carried out on the systematically defended deployment state, and the state node in which the system is currently positioned is matched by combining the environment state of the system.

Further, in the embodiment of the present application, the method for obtaining the atom attack set by instantiating the atom attack includes: firstly, storing response security policy attributes in a defense script library in a tree structure, and establishing a dispatch queue about the response security policy attributes of a target environment; then, in the process of constructing the atlas, traversing the attributes in the dispatch queue, generating an atomic attack set about the attributes and the executable thereof according to the environment state stored in the data structure and by instantiating the attack mode by utilizing the target network environment attributes, and updating the executable alarm knowledge base information into the tree structure.

The alarm library utilizes a vulnerability database, can reflect vulnerability, attack threat and the like of the asset, and is represented by a triplet vkm= (S, V, E), wherein S is a system state node set and represents the current environmental state; v represents a system vulnerability node set and represents a network intrusion or attack behavior; e represents a directed edge set, representing transitions between state nodes through system vulnerability information.

When the alarm library is constructed, the values of variables in the attack modes are required to be judged, all attack modes stored in the vulnerability database are taken out, the taken out attack modes are instantiated according to the values of the variables, and finally, a usable atomic attack set is formed.

For all attributes f in the target environment, it is assumed that in the obtained attack pattern AP epsilon Attackpattern, if the precondition p epsilon AP.Pre exists, the predicate of the attribute f is the same as the name of the attack pattern AP, namely f ₀ ＝p ₀ And for any constant parameter p in p _j ＝f _j We can consider attribute f to be an instantiation of attack pattern AP.

If attribute f can instantiate attack pattern AP, assume premise p ε AP.Pre, and f ₀ ＝p ₀ The variables of this pattern can be divided into two categories: one class is present among the parameters of p, while the other class is notAmong them. The set formed by the former type of variables is recorded as Varp, and the set formed by the latter type of variables is recorded as Varp ^* Thus ap.var=varp u Varp ^* From the attribute f, the range of values of the variables in Varp can be determined substantially, e.g. if v _j E Varp, assuming that its parameter corresponding to precondition p is p _k V is then _j The value range of (2) is f _k Then its value domain (v _j )＝{f _k We can therefore derive the value range of the vector argument v in the attack pattern AP, which is its value with respect to the attribute f:

Φ _f ＝domain(AP,f)＝obj(v ₁ ^* )×obj(v ₂ ^* )×…×obj(v _m ^* )×domain(v ₁ )×domain(v ₂ )×…×domain(v _n ) Wherein v is ₁ ^* ,v ₂ ^* ,…,v _m ^* ,v ₁ ,v ₂ ,…,v _n Component expressed as vector argument v, and Varp ^* ＝{v ₁ ^* ,v ₂ ^* ,…,v _m ^* },Varp＝{v ₁ ,v ₂ ,…,v _n }. assume Φ _f Is the value range of the vector argument v for attribute f in attack mode AP, and λε Φ _f For any premise p e ap.pre, the function insPre (AP, λ, p) can be expressed as a function in p by instantiating the variables in premise p by λ, and likewise for any consequences e ap.eff that exist, the process can be expressed as a function in eff (AP, λ, e) if all the variables in e can be instantiated by λ.

The essence of the instantiated attack mode is that firstly, a target environment in the current target network is stored in a tree-shaped data structure, then, the attack mode AP is instantiated by utilizing the attribute f according to the data information, and finally, an atomic attack set Ack capable of meeting the premise is generated _f This was designated as an instancePattern (Tree, AP, f).

The strategy specifically executed for different networks is defined in the defending script library, describing a network attack solution, and different attack actions correspond to different scripts through the arrangement of the scripts, so that the automatic response of the network attack can be realized. If the CVE-2017-8464l loophole is referred to a report issued by the image response center, when the alarm of the loophole is detected, the operations of closing the port, deleting the file and closing the process are executed; and executing the scenario of disconnecting the network and closing the process for the warning of the CVE-2017-0199 vulnerability. Other scenarios include port management, firewall policy adjustment, account blocking, etc.

In a large-scale network, the vulnerability attack and defense knowledge graph construction algorithm is mainly completed through two steps. Firstly, preprocessing all existing target environments by using a preprocessing technology, wherein the preprocessing is essentially to store all the attributes existing in the target environments in a tree structure; then, the attack pattern AP is instantiated by using the attribute f in combination with an algorithm for instantiating the attack pattern and a target environment which is stored in a tree structure in the current target network, so that an atomic attack set A capable of meeting the preconditions is generated _c k _f . The algorithm for constructing the vulnerability attack and defense knowledge graph mainly comprises a preprocessing algorithm for a target environment and an algorithm for instantiating an attack mode, and the final purpose of the algorithm is to output all alarm library information AlarmInformation existing in the target network, and finally automatically generate the vulnerability attack and defense knowledge graph by using a python writing script.

The algorithm input comprises script information of the script library, an obtained alarm library and script written out by the script library, and automatically generates vulnerability attack and defense knowledge maps, and outputs all executable alarm library information AlarmInformation, and the specific steps are as follows: the algorithm firstly stores the attribute in the script information into the tree structure T, and establishes a dispatch queue related to the attribute, the algorithm continuously fetches each attribute f from the dispatch queue during execution, and secondly generates related attribute f and executable atomic attack set A from the function instancePattern by using the instantiation technology _c k _f The method comprises the steps of carrying out a first treatment on the surface of the And finally, adding the obtained executable alarm library information into the original tree structure. The above operations are repeated continuously until the algorithm is terminated when the dispatch queue is empty. VulnerabilityThe specific visual description of the attack and defense knowledge graph construction algorithm is as follows:

in a preferred embodiment, in the alarm detection of the network attack behavior, the network attack behavior is identified by cross-comparing alarm information of different sources according to a preset association rule, where the preset association rule includes: setting an abnormal threshold value of the data packet flow, detecting abnormal flow by using the threshold value, and supplementing an association rule by using an association rule mining algorithm.

The rule-based association analysis can carry out cross comparison on massive alarm information of different sources, and the alarm information really needing to be concerned is automatically identified from the massive alarm information. Setting a threshold value for the network flow as a data source for intrusion detection, and displaying abnormal flow alarm when the flow exceeds the threshold value; mining can be based on association rules that improve the Apriori algorithm, with the objective of deriving association rules and supplementing rules. And detecting the data packet flow through the association rule, directly obtaining specific abnormal attack flow, and transmitting the attacked condition of each node to a decision-making responder so as to formulate a corresponding optimal strategy through reinforcement learning.

Referring to fig. 4, in the attack alarm analysis based on the combination of the raspberry serving software and hardware, the protocol parser is used to parse the information of the protocol part from the acquired data packet, and then the information is written into the defined data structure. And sequentially analyzing according to the sequence of the protocol stack from bottom to top. The preprocessor is utilized to realize a Surica plug-in mechanism, and plug-ins manufactured by the preprocessor can be loaded into the Surica by the preprocessor, so that data can be processed according to the method in the plug-ins. And (3) matching the information in the processed data stream with rules in the rule base one by utilizing the detection engine, and if the matching is successful, considering that an attack or an intrusion behavior occurs, and at the moment, making corresponding coping behaviors according to the setting of the rules. The alarm output is utilized to output the detection result according to the requirement in a certain format, and the output mode mainly comprises the following steps: outputting the file, a Surica custom format, a tepdump format and a csv format; and outputting to a database, mySQL.

In the embodiment of the present disclosure, an association analyzer plug-in may be added on the basis of the original surica, so as to find out the hidden relationship between the data items of the system log, and the relationship between the intrusions and the data items of the attacks may be used to find out the intrusion behavior, and finally the obtained detection result is added to the rule base, so that the surica may not only effectively use the massive data, but also find out new attacks and intrusions. Security on is a network Security monitoring platform that can be used to monitor north/south traffic to detect adversaries entering the environment, establish commands and controls (C2), or possible data leaks. It may also be desirable to monitor east/west traffic to detect lateral movement. As more and more network traffic is encrypted, it is important to fill in these blind spots with additional visibility in the form of endpoint telemetry. Security on can use the logs of servers and workstations so that all network and host logs can be searched simultaneously.

As a preferred embodiment, further, supplementing the association rule with an association rule mining algorithm includes:

The association rule of the Apriori algorithm, which is one of the data mining modes, is a rule used to represent hidden links existing between individual data items, in the form of "a= > B", where A, B is a set of data items. From the association rule, it is known that if a occurs, then B also occurs.

The Apriori algorithm for performing association rule mining in the embodiment of the present application mainly includes two steps, wherein the first step is to find out a one-dimensional maximum frequent item set, directly scan a database for statistics, and the second step is to execute a loop part mainly by two functions: aprioriGen, initLK, as shown in Table 1 below, the end of the loop is marked by no more new maximum frequent item sets being generated:

Table 1 modified Apriori principal functions

It should be noted in the pruning step that one property of the Apriori algorithm is improved, namely: the subset of the most frequent item set must also be the most frequent item set, whereas if there is an unconditional subset, then this item needs to be culled. Referring to fig. 5, the support degree is set to be 3, tid is a transaction identifier, itiems is a specific item contained in a transaction, L1 is obtained by scanning, then L2 deletes elements with the occurrence number of elements less than 2, and finally L3 deletes elements with the occurrence number of elements less than 3.

The association analyzer is used for finding out the relations which are not found among attack data items in the Surica log, converting the relations into a Surica rule format and supplementing the Surica rule format into a rule base of the Surica rule format, so that the defending performance of the system can be enhanced. As shown in the flowchart of fig. 6, in the data preprocessing, the Suricata system log includes a plurality of fields, valid fields are screened out from the fields, and fields irrelevant to association analysis are removed. In the use of the modified Apriori algorithm correlation analysis, it should be noted that the user needs to set the minimum support and the minimum trust according to the needs, and the influence of these two values on the result is shown in the following table 2:

TABLE 2 influence of minimum support and minimum Trust

	Minimum support threshold increase	Minimum confidence threshold increase
			Number of frequent item sets mined	Reduction of	Has no influence on
Number of association rules mined	Reduction of	Reduction of
			Association rule accuracy of mining	Increase in	Increase in

In rule conversion, the mined association rule can be put into a Surica rule base for use after format conversion, the conversion method is that a part of the association rule corresponding to a rule head in the rule is put into the rule head, a part of the association rule corresponding to a rule option is put into the rule option, and finally the association rule and the rule option are combined to finish format conversion.

As a preferred embodiment, further, responding to the alarm information according to the known vulnerability attack and defense knowledge graph state node, the response security decision set and the attack type includes:

Conventional reinforcement learning is limited to situations where the action space and sample space are small and generally discrete. However, more complex, more realistic tasks tend to have a large state space and continuous motion space. In the reinforcement learning process, an existing Q value is needed to be used for calculating a network target value, in the embodiment of the present disclosure, a CVE vulnerability library may be used as a training set, and in combination with a deep learning technology, the Q value may not be stored any more by introducing a neural network in the deep learning, a neural network structure conforming to network attack and defense is designed, an update problem is converted into a function fitting problem, a similar state corresponds to a similar output action, the Q value is fitted by a function, and a target value function approximates to an optimal Q value by updating a learning rate α. As an initial value of the Q value in the decision stage, the convergence process is accelerated on the basis of the prior probability.

DQN (Deep Q-Networks) inherits the idea of Q-Learning, and utilizes Bootstrap characteristics of the Belman formula to compute target values and iteratively optimize a state-action valuation function based on the following

Until convergence, Q _θ (s, a) is represented by a neural network with a parameter of θ, and all possible actions are output through one forward calculation (the total is action space dimension +. >

I) so that the optimal action can be selected in various states according to their relative sizes.

The DQN can use an e-greedy exploration strategy during training according to the current person conveying state s and the latest estimation function Q _θ (s, a) selecting with probability 1- ε

Action is randomly selected with probability epsilon, epsilon being in interval (0, 1 as training progresses]The internal changes linearly from large to small, and the DQN is correspondingly gradually transited from 'strong exploration and weak utilization' to 'weak exploration and strong utilization'. And the single-side transfer samples (s, a, s', r) collected in the training process can be stored by using the first-come-first-out stack structure of the Replay Buffer, and one Batch is randomly selected from the single-side transfer samples for gradient calculation and parameter updating. Because Replay Buffer allows the reuse of historical data, the training mode with Batch as a unit covers a larger state space and neutralizes Variance when a single sample calculates gradient, DQN training can be stabilized and sample efficiency can be improved.

In order to overcome instability of Bootstrap to training, a target Q network with the same structure as the Q network can be arranged in the DQN and is specially used for target value calculation, and the parameter of the target Q network is represented by theta. The target Q network does not update parameters in each iteration like the main Q network, but copies the main Q network parameters in whole from each generation to the corresponding theta-to-theta, so that the stability of DQN training can be effectively improved.

In the automatic policy arrangement, the intrusion response is a set of mapping from intrusion alarms to response measures, the environmental state in the intrusion response consists of a network environment and an attacker, in the embodiment of the present disclosure, the environmental state can be formally described by using a graph database, after the intrusion alarms are found, the intrusion alarms are matched with state nodes of a Neo4j graph database, and then the state nodes are mapped to the response measures through reinforcement learning to complete response decisions.

Reinforcement learning is the learning of a set of mappings from environmental states to actions such that an Agent's actions can obtain the greatest cumulative return from the environment. Traditional reinforcement learning is a two-layer mapping that maps from the environmental state directly to the response measure layer after an intrusion alert is found, which makes the resulting response decision less targeted for attacks. In the embodiment, referring to fig. 7 and 8, after the attack mode layer is introduced, the attack mode layer can be changed into three-layer mapping, and after the attack mode layer is introduced, the intrusion response process becomes: after the intrusion alarm is found, the intrusion alarm is matched with the corresponding vulnerability attack and defense knowledge map state node, the state node is mapped to the corresponding attack mode, and the response decision is completed by remapping the attack mode to the response measure. Mapping from the state node to the attack mode is to follow the principle of maximum degree, namely if a plurality of attack modes are consistent, the mapping with the maximum degree of attack mode is preferentially selected. Mapping from attack mode to response measure requires mapping by a response measure selection mechanism, and then adaptively optimizing the mapping by an online learning mode.

Based on the knowledge graph construction of vulnerability attack and defense and the invasion detection based on raspberry pie, a Q-Learning algorithm is introduced, and aiming at the dynamic characteristics of network attack, a Softmax algorithm is adopted, and a security threshold, a stable rewarding factor and a punishment factor are introduced to carry out automatic and intelligent selection of a defense strategy, so that the self-adaption problem in response decision is solved. The method comprises the steps of formally describing states and actions in network attack and defense based on a generated vulnerability attack and defense knowledge map, converting a two-layer mapping in traditional reinforcement learning into a three-layer mapping by adding an attack mode layer, memorizing and identifying different types of attackers by utilizing the added attack mode layer, and accordingly making a targeted response decision; and carrying out accumulated return updating on eight response purposes through quantization, realizing multi-purpose evaluation on response strategies based on a voting mechanism, and meeting the safety requirements of the multi-response purposes.

The eight response objective quantization process can be described as follows:

tracking attacks: confirming the identity of the attacker and capturing evidence of the attacker for later attack on the attacker or pursuing the attacker by means of law, etc., the method is shown in table 3:

TABLE 3 tracking attack immediate rewards

Analysis attack: analyzing the attack process, knowing the attack mode, path and the like, and the quantification method is shown in table 4:

table 4 analysis of attack immediate return

Shielding attack: terminating the attack to prevent the damage to the service, and the quantification method is shown in Table 5:

table 5 mask attack immediate rewards

Maximizing confidentiality: preventing leakage of information, confidentiality definition is given in table 6:

table 6 confidentiality definition

/>

Maximizing confidentiality immediate rewards

Wherein->

The value 0 or 1,0 representing that the information k is not leaked, and 1 representing that leakage.

Maximizing integrity: the file is prevented from being tampered with, and the integrity definition is shown in table 7:

table 7 integrity definition

Maximizing integrity immediate rewards

Wherein->

The value 0 or 1,0 representing that the asset k has not been tampered with, and 1 representing that it has been tampered with.

Recovery system: the system is recovered from the attack at less cost, but termination of the attack is not pursued.

Immediate return to recovery system

Wherein->

The value 0 or 1,0 represents a recovery failure, and 1 represents a successful recovery. W (W) _k Is an asset value.

Maintenance service: unlike the screening attack, here the attack is not terminated, and the maintenance service is the core of the response purpose here. Maintenance service immediate rewards

Wherein->

The value 0 or 1,0 represents that the service can not be normally used, 1 represents that the service is successfully maintained, and the service can be normally used, W _k Is the value of the service.

Minimizing cost: the cost of the response measures is minimized, and the description method is shown in Table 8:

table 8 minimizing cost immediate rewards

/>

In order to be able to learn quickly after changing the response objective, a way is taken to update the cumulative return and the expected cumulative return independently for each response objective. Wherein the cumulative report update process is:

the expected cumulative return update procedure is:

instead of choosing only one response objective when making a decision, one suitable response measure is to consider multiple objective results in combination, so the Agent takes a voting mechanism when making a response decision, each response objective has a weight set by the administrator, representing the floor at which each response objective votes, and the expected cumulative return for each response measure is obtained by voting on the expected cumulative return for each response objective, then the expected cumulative return for each response measure can be described as:

to be able to accelerate learning, and at the same time to be able to make response decisions further conform to the need for security of the actual intrusion response, a punishment mechanism is introduced. Each time a response measure is performed and the immediate return for each response objective is evaluated and the cumulative return is updated and the cumulative return is expected, the process of calculating the cumulative return for the response measure may be represented as:

If vote _γ (s, a) greater than a safety threshold θ, rewarding the responsive measure with a stable rewarding factor μ:

Q ⁱ (s,a)＝μQ ⁱ (s,a),μ≥1

if vote _γ (s, a) is less than the safety threshold θ, penalty is applied to the response measure with an instability penalty factor v:

compared with the single-stage game problem, the multi-stage game problem needs to consider the strategy adjustment process of both sides. An attacker with enough attack resources tends to launch multiple attacks within a short time rather than multiple targets simultaneously. Thus, referring to fig. 9, an attacker can dynamically adjust the attack targets of the subsequent stages to maximize attack returns according to the results of the attack of the preceding stages, and the game process can be described as:

step 1, combining one attack and defense actions of an attacker and a defender as a stage; estimating attack resources of a defender in advance, and giving the total number of stages of a game process, wherein the total number is represented by D; d is used for representing the current attack and defense stage, and the initial value of d is 1.

And 2, determining the current running state of the system in the D stage, estimating attack means and attack targets possibly adopted by an attacker, and taking the total number of stages D-D remained in the game process into consideration to obtain all attack combinations of the attacker in the D-D stage.

And 3, calculating the optimal load shedding value of each attack combination by using an optimal load shedding algorithm.

Step 4, calculating potential return value R of each target n in the d stage according to the result of step 3 _dn . The potential return value of a certain target refers to the total return value of potential attack combinations of the rest stages after the target is considered to be attacked, namely, an attacker selects the current attack line by taking the total return of the whole multiple stages as the maximum target in the attack process. The continuous change of the potential return value in each stage is actually the basis for the policy adjustment of the attack and defense parties, wherein:

wherein R is _dn Potential return value for attack target n at stage d; v is the total number of possible subsequent attack combinations; v is one of the attack combinations; f (f) _v The sum of the system load shedding amounts for the attack combination v.

Step 5, combining double zero and game theory, and obtaining the return of each defense mode facing each attack mode according to the defense means owned by the defender to form a return matrix U; offline computing rational defenders and aggressors under the situationNash equilibrium point (A) ^* ，D ^* ) The method comprises the steps of carrying out a first treatment on the surface of the The set of attack and defense policy combinations are added to an offline decision table. D obtained in each stage ^* The optimal defending strategy for the network attack at the stage is obtained.

Step 6, judging whether the stages of the attack and defense parties reach the preset total number D of stages, if not, updating D to d+1, and returning to the step 2; if so, the multi-stage gaming process ends.

When strategy arrangement is performed, after inputting environment states and parameters, a mode matching layer (divided into a high-capability attacker or a low-capability attacker) is mapped to state nodes of a vulnerability attack-defense knowledge map through a multi-stage attack-defense game process, and then a script library is mobilized; and then repeatedly running response measure selection based on a Softmax algorithm and online Learning based on a Q-Learning algorithm based on the script set in the script library, and finally outputting an optimal script strategy. The core of the strategy programming engine is to design a response measure selection mechanism and an online learning mechanism on the basis of the completion state and action definition, so that a complete algorithm can be designed.

Further, based on the above method, the embodiment of the present invention further provides a network security intelligent decision automatic arrangement response system, which includes: the system comprises a data management module, an alarm analysis module and an intrusion detection module, wherein,

In order to verify the effectiveness of the scheme, the related algorithm and the raspberry group are combined to test attack and defense of mine camera monitoring, and through test result verification, the scheme can automatically perform intrusion detection on unknown network flow attacks and generate a targeted defense strategy, so that the problems of weak adaptability of traditional decision response and difficulty in resisting vulnerabilities of unknown attacks are solved, and network deployment is facilitated.

The relative steps, numerical expressions and numerical values of the components and steps set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The elements and method steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or a combination thereof, and the elements and steps of the examples have been generally described in terms of functionality in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Those of ordinary skill in the art may implement the described functionality using different methods for each particular application, but such implementation is not considered to be beyond the scope of the present invention.

Those of ordinary skill in the art will appreciate that all or a portion of the steps in the above methods may be performed by a program that instructs associated hardware, and that the program may be stored on a computer readable storage medium, such as: read-only memory, magnetic or optical disk, etc. Alternatively, all or part of the steps of the above embodiments may be implemented using one or more integrated circuits, and accordingly, each module/unit in the above embodiments may be implemented in hardware or may be implemented in a software functional module. The present invention is not limited to any specific form of combination of hardware and software.

Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The automatic arrangement and response method for the network security intelligent decision is characterized by comprising the following steps:

2. The automatic programming response method of network security intelligent decision according to claim 1, wherein constructing a vulnerability attack knowledge graph according to the network vulnerability attack information comprises:

3. The automatic response arrangement method of network security intelligent decision according to claim 2, characterized in that, the triple stack atlas db= (Pre, type, atlas Pattern) is used in the alarm knowledge base to describe the Attack Pattern, the triple stack strategy= (Vul, atlas PatternDB, payLoad) is used in the defending scenario knowledge base to describe the response security policy, wherein, type is the set of all types of the argument in the Attack Pattern, pre is the set of all predicates required for describing the preconditions and the consequences of the Attack Pattern, atlas Pattern is the set of Attack Pattern, and each Attack Pattern in the Attack Pattern set is represented by the description of five-tuple (Name, vul, var, pre, eff), name is the Name of the Attack Pattern; vuls is all vulnerability sets that can be utilized by the attack mode; var is a local variable set, each element in the set is expressed by < v, t >, v is an argument, t is the Type to which the argument belongs, and t epsilon Type; pre and Eff are preconditions and consequences that the attack patterns are utilized respectively in terms of predicates in the predicte set, and PayLoads are a set of defending script formulated for different attack patterns.

4. The method for automatically arranging and responding to network security intelligent decision as claimed in claim 3, wherein the step of obtaining the atomic attack set by instantiating the atomic attack comprises the steps of:

firstly, storing response security policy attributes in a defense script library in a tree structure, and establishing a dispatch queue about the response security policy attributes of a target environment;

then, in the process of constructing the atlas, traversing the attributes in the dispatch queue, generating an atomic attack set about the attributes and the executable thereof according to the environment state stored in the data structure and by instantiating the attack mode by utilizing the target network environment attributes, and updating the executable alarm knowledge base information into the tree structure.

5. The method for automatically arranging and responding to network security intelligent decision according to claim 1, wherein in the process of detecting the network attack, the network attack is identified by cross-comparing the alarm information of different sources according to preset association rules, wherein the preset association rules comprise: setting an abnormal threshold value of the data packet flow, detecting abnormal flow by using the threshold value, and supplementing an association rule by using an association rule mining algorithm.

6. The network security intelligent decision automatic orchestration response method according to claim 1, wherein supplementing the association rules with an association rule mining algorithm comprises:

7. The method for automatically arranging and responding to network security intelligent decisions according to claim 1, wherein responding to the alarm information according to the known vulnerability attack and defense knowledge graph state nodes, the response security decision set and the attack type comprises: firstly, mapping vulnerability attack and defense knowledge map state nodes to corresponding attack modes according to a maximum degree principle, wherein the maximum degree principle is that the maximum attack mode degree is preferentially selected for mapping when a plurality of attack modes are consistent;

8. The automatic programming response method of network security intelligent decision according to claim 7, wherein in the pre-training of the network attack and defense deep learning model, a CVE vulnerability library is used as a training set, and the action estimation function Q is based on the current input state s and the latest state _θ (s, a) selecting with probability 1- ε

Is an action space.

9. The method for automatically arranging and responding to network security intelligent decisions according to claim 8, wherein when a state action estimation function is optimized by using a pre-trained network attack and defense deep learning decision model, bootstrap characteristics according to a Belman formula are utilized

Calculating a target value, converting a state motion update problem into a function fitting problem, and enabling the target value function to approach an optimal Q value by updating a strategy learning rate alpha of a defender, wherein E _s,a～D Representing the expected defending benefit, D represents the probability of selecting defending action a at ambient state s, (s ', a') represents the strategy of selecting new action a 'at new state s', Q _θ (s ', a') represents a policy benefit function, gamma represents a benefit discount rate, and r (s ', a') represents a jackpot for selecting a new action a 'when a new state s' is represented.

10. A network security intelligent decision automatic orchestration response system, comprising: the system comprises a data management module, an alarm analysis module and an intrusion detection module, wherein,