CN110225019B

CN110225019B - Network security processing method and device

Info

Publication number: CN110225019B
Application number: CN201910479765.0A
Authority: CN
Inventors: 毛婷伟; 梁玉; 洪春华
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-06-04
Filing date: 2019-06-04
Publication date: 2021-08-31
Anticipated expiration: 2039-06-04
Also published as: CN110225019A

Abstract

The embodiment of the application discloses a network security processing method and a device; the method includes the steps of detecting a network security state of a target network, obtaining an execution probability of executing a preset network security response in the network security state based on a network security mapping relation, wherein the network security mapping relation comprises the mapping relation between the network security state and the probability of executing the preset network security response, obtaining a state reward corresponding to the network security state based on the execution probability, calculating a target probability enabling the current state reward corresponding to the network security state to be the maximum value based on the obtained state reward, and updating the network security mapping relation based on the target probability to obtain an updated network security mapping relation. The scheme can improve the efficiency of network security processing.

Description

Network security processing method and device

Technical Field

The present application relates to the field of computer technologies, and in particular, to a network security processing method and apparatus.

Background

As the network security situation becomes more and more complex, events threatening network security such as malicious activities, abnormal attacks and the like occur, and after the network is attacked, problems such as network data leakage, server paralysis and the like are easily caused, so that timely processing of the network security events is very necessary.

At present, a method for processing network security mainly detects network security events occurring in a network through security experts, for example, security experts, and provides a corresponding solution according to the occurring network security events, and the efficiency of the method for processing network security is very low.

Disclosure of Invention

The embodiment of the application provides a network security processing method and device, which can improve the efficiency of network security processing.

The embodiment of the application provides a network security processing method, which comprises the following steps:

detecting a network security state of a target network;

acquiring an execution probability of executing a preset network security response in the network security state based on a network security mapping relation, wherein the network security mapping relation comprises a mapping relation between the network security state and the probability of executing the preset network security response;

acquiring a state reward corresponding to the network security state based on the execution probability;

calculating a target probability that the current state reward corresponding to the network security state is the maximum value based on the acquired state reward;

and updating the network security mapping relation based on the target probability to obtain an updated network security mapping relation.

Correspondingly, an embodiment of the present application further provides a network security processing apparatus, including:

the detection module is used for detecting the network security state of the target network;

a probability obtaining module, configured to obtain, based on a network security mapping relationship, an execution probability of executing a preset network security response in the network security state, where the network security mapping relationship includes a mapping relationship between the network security state and the probability of executing the preset network security response;

the reward obtaining module is used for obtaining the state reward corresponding to the network security state based on the execution probability;

the calculation module is used for calculating the target probability which enables the current state reward corresponding to the network security state to be the maximum value based on the acquired state reward;

and the updating module is used for updating the network security mapping relation based on the target probability to obtain an updated network security mapping relation.

Correspondingly, an embodiment of the present application further provides a storage medium, where the storage medium stores instructions, and the instructions, when executed by a processor, implement the steps of the network security processing method provided in any embodiment of the present application.

Correspondingly, an embodiment of the present application further provides a computer device, where the computer device includes a processor and a memory, where the memory stores a plurality of instructions, and the processor loads the instructions from the memory to execute the steps of the network security processing method provided in any of the embodiments of the present application.

The method includes the steps of detecting a network security state of a target network, obtaining an execution probability of executing a preset network security response in the network security state based on a network security mapping relation, obtaining a state reward corresponding to the network security state based on the execution probability, calculating a target probability enabling the current state reward corresponding to the network security state to be the maximum value based on the obtained state reward, and updating the network security mapping relation based on the target probability to obtain an updated network security mapping relation. The scheme can improve the efficiency of network security processing.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic view of a network security processing system provided in an embodiment of the present application;

fig. 2 is a first flowchart of a network security processing method according to an embodiment of the present application;

fig. 3 is a second flowchart of a network security processing method according to an embodiment of the present application;

fig. 4 is a framework diagram of a network security processing method provided in an embodiment of the present application;

fig. 5 is a schematic technical flow chart of a network security processing method according to an embodiment of the present application;

FIG. 6 is a schematic diagram illustrating a solution flow of a reinforcement learning model according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a first network security processing method according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a computer device provided in an embodiment of the present application.

Detailed Description

Referring to the drawings, wherein like reference numbers refer to like elements, the principles of the present application are illustrated as being implemented in a suitable computing environment. The following description is based on illustrated embodiments of the application and should not be taken as limiting the application with respect to other embodiments that are not detailed herein.

In the description that follows, specific embodiments of the present application will be described with reference to steps and symbols executed by one or more computers, unless otherwise indicated. Accordingly, these steps and operations will be referred to, several times, as being performed by a computer, the computer performing operations involving a processing unit of the computer in electronic signals representing data in a structured form. This operation transforms the data or maintains it at locations in the computer's memory system, which may be reconfigured or otherwise altered in a manner well known to those skilled in the art. The data maintains a data structure that is a physical location of the memory that has particular characteristics defined by the data format. However, while the principles of the application have been described in language specific to above, it is not intended to be limited to the specific form set forth herein, and it will be recognized by those of ordinary skill in the art that various of the steps and operations described below may be implemented in hardware.

The term "module" as used herein may be considered a software object executing on the computing system. The different components, modules, engines, and services described herein may be considered as implementation objects on the computing system. The apparatus and method described herein are preferably implemented in software, but may also be implemented in hardware, and are within the scope of the present application.

An execution main body of the network security processing method may be the network security processing apparatus provided in the embodiment of the present application, or a network device integrated with the network security processing apparatus, where the network security processing apparatus may be implemented in a hardware or software manner. The network device may be a smart phone, a tablet computer, a palm computer, a notebook computer, or a desktop computer.

Referring to fig. 1, fig. 1 is a schematic view of an application scenario of a network security processing method according to an embodiment of the present disclosure, taking an example that a network security processing apparatus is integrated in a network device, where the network device may detect a network security state of a target network, and obtain, based on a network security mapping relationship, an execution probability of executing a preset network security response in the network security state, where the network security mapping relationship includes a mapping relationship between the network security state and the probability of executing the preset network security response, obtain, based on the execution probability, a state reward corresponding to the network security state, calculate, based on the obtained state reward, a target probability that a current state reward corresponding to the network security state is a maximum value, and update the network security mapping relationship based on the target probability to obtain an updated network security mapping relationship.

Referring to fig. 2, fig. 2 is a schematic flow chart illustrating a network security processing method according to an embodiment of the present disclosure. The specific process of the network security processing method provided by the embodiment of the application can be as follows:

201. the network security status of the target network is detected.

The network security state may be a network state obtained by performing security detection on a network security scene where the target network is located. For example, the network security state may include a security state, a network scanned state, a network exploit state, a network attacked state, a network compromised state, and so on.

In an embodiment, the network security state set S may be represented by S, and may include a plurality of network security states S, for example, the network security state set S may include a security state S1, a network scanned state S2, a network exploit state S3, a network attacked state S4, a network attacked state S5, and so on.

In practical application, the network security state of the target network can be detected, for example, the network security scene where the target network is located can be detected, and the network security scene where the target network is located is determined to be a terminal intrusion scene. And according to the network security states possibly generated in the terminal intrusion scene, defining the network security state set S as the network security states including a security state S1, a network scanned state S2, a network exploit state S3, a network attacked state S4 and a network attacked state S5. And then, detecting the target network to obtain the network security state of the target network.

In an embodiment, in order to improve accuracy of obtaining the network security state, the network security parameters may be detected by a plurality of network security engines, so as to obtain the network security state, and specifically, the step "detecting the network security state of the target network" may include:

acquiring a network security state set, wherein the network security state set comprises a plurality of network security states;

respectively detecting network security results corresponding to the network security engines in a target network based on a plurality of network security engines;

and according to the network security result, determining the network security state of the target network from a plurality of network security states in the network security state set.

In practical application, a network security state set may be obtained, the network security state set S may include a plurality of network security states S, then network security results corresponding to the network security engines in a target network may be detected respectively according to the plurality of network security engines, and each network security engine may detect one network security result correspondingly, for example, the plurality of network security engines may detect a plurality of parts such as a database, a transmission network, and a user terminal, respectively, to obtain a network security result. And determining the network security state corresponding to the target network in the network security state set according to the plurality of network results.

In an embodiment, after detecting the network security scene where the target network is located and determining that the network security scene where the target network is located is a terminal intrusion scene, a main body performing network security response according to a network security state may be defined as an agent. For example, in a terminal intrusion scenario, the agent may be defined as a root role of the terminal host.

The root user may be a unique super user in the system, and may have all rights in the system, such as starting or stopping a process, deleting or adding a user, adding or disabling hardware, and so on.

202. And acquiring the execution probability of executing the preset network security response in the network security state based on the network security mapping relation.

The preset network security response may be preparation of the network for handling occurrence of various network abnormal events and measures taken after the network abnormal event occurs. For example, the preset network security response may include closing an abnormal connection port, blocking traffic from an IP, quarantining running suspicious files, deleting virus files, closing malicious processes, and the like.

In an embodiment, after detecting a network security scene where a target network is located, and determining that the network security scene where the target network is located is a terminal intrusion scene, a preset network security response set may be further represented by a, and according to a conventional processing flow when a host is subjected to an abnormal attack, the preset network security response set a is defined to include a plurality of preset network security responses a, for example, the preset network security response set a may include preset network security responses such as closing a scanned port a1, preventing and deleting an abnormal download file a2, closing an abnormal connection a3, locking an abnormal login account a4, and reporting an administrator a 5.

The execution probability of executing the preset network security response may be: and under a certain network security state, the target network selects a basis for executing a certain preset network security response from a plurality of preset network security responses through the cognition of the current network security state. For example, the execution probability of the target network executing various preset network security responses may be represented as: the target network may execute the probability of each preset network security response a in the preset network security response set a in a certain network security state, and the execution probability may be represented as a probability distribution based on the preset network security response set.

The network security mapping relationship may include a mapping relationship between a network security state and a probability of executing various preset network security responses, for example, the network security mapping relationship may include a probability of executing each preset network security response a in the preset network security response set a in a certain network security state of the target network.

In an embodiment, the network security mapping relationship may be represented by a policy, and the policy may represent a probability distribution of the target network based on a preset network security response set in a certain network security state by pi, for example, a representation formula of the policy may be as follows:

π(a|s)＝P[A_t＝a|S_t＝s]

the network security state set comprises a network security state set S, a preset network security response set A, a network security state set S, and t.

Wherein the policy pi may represent the probability that the agent takes a possible pre-set network security response a to a certain network security state s. The strategy pi is only related to the current network security state and is not related to the historical network security state. Meanwhile, the strategy pi is static and independent of time, but the agent can also adjust the strategy pi in real time.

Because the basis for executing the preset network security response is a probability distribution, different preset network security responses can be generated by the target network according to the same strategy for different network security states; for the same network security status, the target network may also generate different predetermined network security responses according to the same policy.

In an embodiment, the network security processing apparatus may be integrated in a reinforcement learning model, and the steps of the network security processing method may be performed by the reinforcement learning model.

The reinforcement learning model may guide a response (behavior) for a reward obtained by interacting with an environment, and learn in a trial and error manner, and the reinforcement learning is targeted to obtain a maximum reward. Reinforcement learning emphasizes how to respond based on state to achieve maximum expected benefit. For example, the reinforcement learning model may be a Markov decision model or the like.

The markov property may be that, given a current state and all past states, a random process has a conditional probability distribution of its future states that depends only on the current state. That is, the stochastic process is conditionally independent from the past states given the present state, and then the stochastic process has a markov property, and a process having a markov property may be referred to as a markov process.

The Markov decision can mean that an intelligent agent periodically or continuously observes a random dynamic system with Markov property, and corresponding strategies are made sequentially. The Markov decision can be executed by selecting a response (behavior) from the available response set based on the strategy according to the observed state of the intelligent body at each moment, the future state of the stochastic dynamic system is random, and the state transition probability has Markov property. And the intelligent agent makes a new strategy to execute corresponding response according to the newly observed state, and the process is repeated.

In practical application, the execution probability of executing various preset network security responses in a network security state can be acquired based on the reinforcement learning model. For example, a network security state s may be obtained, and a corresponding network security mapping relationship, that is, a policy pi, may be obtained based on a reinforcement learning model, where an expression manner of the policy pi may be as follows:

π(a|s)＝P[A_t＝a|S_t＝s]

In an embodiment, a reinforcement learning system may also be constructed, for example, the reinforcement learning system may be constructed as a < network security state set S, a preset network security response set a, a reward method R, a state transfer function P, and an attenuation factor γ >, and a network security mapping relationship may be defined, where the network security mapping relationship may be represented as a policy pi.

The reward method R can represent the time t when the network is in a certain network security state s, after a certain preset network security response a is adopted, the intelligent agent obtains the accumulated reward corresponding to the preset network security response, and the reward method can obtain experience expectation. The reward method R may be a reward function based on the network security status s and the preset network security response a. The functional formula of the reward method R may be as follows:

The state transition function P may represent a probability that the network is in a certain network security state s at a time t, and after a certain preset network security response a is adopted, the next time jumps from the network security state s to the network security state s'. The state transition function P may be defined as a two-dimensional gaussian distribution, and the calculation formula of the state transition function may be as follows:

Wherein the decay factor γ may be a factor for adjusting the reward contribution of the response at different points in time. Since the network security status that the network later experiences will be influenced by the current network security status, but the influence will be gradually reduced, the attenuation of this influence can be expressed by an attenuation factor γ, which can be chosen to be a value between 0 and 1. In one embodiment, the reinforcement learning system may not include the attenuation factor γ.

For the definition of the network security state set S, the preset network security response set a, and the policy pi, reference may be made to the above description, and details are not described here.

203. And acquiring a state reward corresponding to the network security state based on the execution probability.

The status reward may be a reward expectation value of an accumulated reward which can be obtained according to a probability determined according to the network security mapping relation in a certain network security status. For example, when the target network is in the network security state s, the agent may obtain the reward expectation value of the accumulated reward after taking a plurality of preset network security responses a according to the probability determined by the network security mapping relationship.

In practical application, the state reward corresponding to the network security state can be acquired according to the execution probability. For example, the state reward v corresponding to the network security state s can be obtained according to the execution probability_π(s), i.e. Bellman's equation

In an embodiment, in order to improve accuracy of obtaining the status reward, the method may further obtain the status reward corresponding to the network security status by obtaining an instant reward and a future reward respectively, and specifically, the step "obtaining the status reward corresponding to the network security status based on the execution probability" may include:

acquiring an instant reward and a future reward corresponding to a preset network security response based on the execution probability;

and combining the instant reward and the future reward to obtain the state reward corresponding to the network security state.

Wherein the status rewards may include an instant reward and a future reward.

The instant reward may be a reward for an instant network security response based on the network security status s.

The instant network security response may be to determine that the network security state s corresponds to the current preset network security response a according to a network security mapping relationship (i.e., a policy pi), and the current preset network security response a may be used as the instant network security response.

Wherein the future reward may be a reward for future network security responses based on the network security status s. After the instant network security response a is executed in the network security state s, the network security state is transformed into s ', and the future reward can be reward for a plurality of preset network security responses which are possibly performed in the future based on the network security state s' in the network security mapping relation (namely the strategy pi).

The future network security response may be a preset network security response that may be performed in the future after the instant network security response a corresponding to the network security state s is determined according to the network security mapping relationship (i.e., the policy pi), and the future network security response may be used to calculate the future reward. After the instant network security response a is executed in the network security state s, the network security state is transformed into s ', and the future network security response may be a preset network security response a' corresponding to the network security state s 'and a network security response after the preset network security response a' in the policy pi.

In practical application, for example, the instant network security response a corresponding to the network security state s may be obtained according to the execution probability, then the instant reward obtained by executing the instant network security response a in the network security state s may be obtained, after the instant network security response a is executed, the future reward obtained according to the execution probability may be followed, and the state reward corresponding to the network security state s may be obtained according to the obtained instant reward and the future reward.

In an embodiment, in order to improve the accuracy of the network security processing, the step of obtaining an instant reward through an instant network security response, specifically, the step of obtaining the instant reward and the future reward corresponding to the preset network security response based on the execution probability, may include:

based on the execution probability, acquiring an instant network security response corresponding to the network security state from multiple preset network security responses;

acquiring an instant reward corresponding to the instant network security response;

and acquiring future rewards corresponding to preset network security responses based on the execution probability.

In practical application, an instant network security response corresponding to the network security state may be obtained from the multiple preset network security responses based on the execution probability, an instant reward corresponding to the instant network security response may then be obtained, and a future reward corresponding to the preset network security response may be obtained based on the execution probability. For example, in the network security state s, a preset network security response set a may be obtained, and according to a conventional processing flow when the host is subjected to an abnormal attack, the preset network security response set a is defined to include a plurality of preset network security responses a, for example, the preset network security response set a may include preset network security responses such as closing a scanned port a1, preventing and deleting an abnormal download file a2, closing an abnormal connection a3, locking an abnormal login account a4, and reporting an administrator a 5. And then, in a network security state s, according to the execution probability, acquiring an instant network security response corresponding to the network security state from the preset network security response set A, acquiring an instant reward corresponding to the instant network security response, and then acquiring a future reward corresponding to the preset network security response based on the execution probability.

In one embodiment, future rewards may be obtained for reward expectations that may be obtained after a preset network security response is performed for the target network. Specifically, the step of "obtaining a future reward corresponding to a preset network security response based on the execution probability" may include:

obtaining a future reward expectation value obtained by performing security response according to the execution probability after the target network executes the instant network security response;

obtaining a future reward based on the future reward expected value.

In practical applications, for example, after the target network obtains the instant network security response according to the execution probability, the network security state of the target network changes, and the network security state s is changed into the network security state s' after the execution of the response. When the target network is in the network security state s', the next preset network security response can be obtained continuously according to the execution probability, the network security state changes after the next response is executed, and the like. As can be seen from the above, the target network obtains several preset network security responses that need to be executed according to the same execution probability, and the future reward expectation value can predict the reward given to all the preset network security responses that may be executed in the future. Therefore, the future reward expectation value obtained by the target network after executing the instant network security response and performing the security response according to the execution probability can be obtained and used as the future reward.

In an embodiment, the intelligent agent is prohibited from executing the preset dangerous response by auditing the instant network safety response, so that the safety of network safety processing is improved. Specifically, the step "obtaining an instant reward corresponding to the instant network security response" may include:

when the instant network safety response is a preset dangerous response, determining that the instant reward of the instant network safety response is a negative reward;

and when the instant network safety response is not the preset dangerous response, acquiring the instant reward of the instant network safety response as a positive reward.

In practical application, when the immediate network security response is the preset dangerous response, the instant reward of the immediate network security response can be determined to be negative reward; when the instant network safety response is not the preset dangerous response, the instant reward of the instant network safety response can be acquired as the positive reward.

For example, through the experience of experts, a preset dangerous response set may be established, where the preset dangerous response set may include a plurality of preset dangerous responses that the agent prohibits from executing, such as deleting a database, executing a file in batch, downloading a suspicious file autonomously, deleting a sensitive file, closing an important service port, closing some important services, and so on. When the instant network safety response is the preset dangerous response, judging that the instant network safety response has danger, stopping executing the instant network safety response, and giving a negative reward to the instant network safety response; when the instant network security response is not the preset dangerous response, the instant network security response can be judged to have no danger, the instant network security response can be executed, and meanwhile positive rewards are given to the instant network security response.

The preset network safety response performed by the intelligent agent is audited, so that the intelligent agent is prevented from executing the preset dangerous response, the preset dangerous response is given a certain negative reward, and the network safety response executed by the intelligent agent can not bring greater safety problems.

In one embodiment, the negative and positive awards are not limited to negative and positive numbers, and may be relatively positive or negative. The negative and positive awards may be both positive and negative numbers, and so on. For example, when the immediate cyber-security response is a preset dangerous response, a relatively small reward may be given to the immediate cyber-security response; when the immediate cyber-security response is not the preset dangerous response, a relatively large reward may be given to the immediate cyber-security response.

In one embodiment, positive rewards can be given according to network security events occurring in the target network so as to improve the accuracy of network security processing. Specifically, the step "obtaining the instant reward of the instant network security response as a positive reward" may include:

acquiring a network security event set corresponding to the network security state, wherein the network security event set comprises a plurality of network security sub-events;

detecting the event occurrence probability of each network security sub-event of the target network after the target network executes the instant network security response;

and acquiring the instant reward of the instant network security response as a positive reward based on the event occurrence probability.

In practical application, a network security event set corresponding to a network security state can be obtained, the network security event set comprises a plurality of network security sub-events, the event occurrence probability of each network security sub-event occurring in a target network after an instant network security response is executed in the target network is detected, and an instant reward of the instant network security response is obtained as a positive reward based on the event occurrence probability.

For example, after detecting the network security scene where the target network is located and determining that the network security scene where the target network is located is a terminal intrusion scene, a network security event set corresponding to the network security state may be obtained, where the network security event set includes a plurality of network security sub-events, such as whether port scanning exists in the network, whether a server is attacked by DDos, whether a system is infected with a virus file, whether the network has an abnormal behavior, and the like. Then, the event occurrence probability of each network security sub-event of the target network after the target network executes the instant network security response a in the network security state s can be detected, and the instant reward of the instant network security response is obtained as the positive reward based on the event occurrence probability.

In an embodiment, the sub-feature detection module may further detect an event occurrence probability of a network security sub-event occurring in the network. The sub-feature detection module may include a plurality of sub-feature detection sub-modules, and each sub-feature detection sub-module may detect an event occurrence probability of a network security sub-event. The sub-feature detection module may be combined with a plurality of security detection engines.

For example, after detecting the network security scene of the target network and determining that the network security scene of the target network is the terminal intrusion scene, the sub-feature detection module may be defined to include a port scanning monitoring sub-module, a malicious file downloading monitoring sub-module, a root permission stealing monitoring sub-module, a highly suspicious code execution monitoring sub-module, a sensitive directory access monitoring sub-module, a sensitive file transmission monitoring sub-module, an abnormal communication sub-monitoring module, and other sub-feature detection sub-modules. And detecting the event occurrence probability of a network security sub-event through each sub-feature detection sub-module to obtain the event occurrence probability, and acquiring the positive reward of the instant network security response based on the event occurrence probability.

Can be represented by R_tThe instant reward for the instant network security response a at time t may be represented by k, where k may be a constant, for example, k ═ 10, and represents a fixed loss of the instant network security response a when the instant network security response a is a preset dangerous response. Can pass through A_u(a) Representing the relation between the instant network safety response and the preset dangerous response, when the instant network safety response is the preset dangerous response, A can be enabled_u(a) When the immediate network security response is not the preset dangerous response, the method can lead A to be not the preset dangerous response_u(a) 0. The probability of occurrence of an event can be represented by o (a), which can be embodied as an array. Instant reward R_tThe calculation formula of (c) may be as follows:

R_t＝k·A_u(a)+(1-A_u(a))·f(o(a))

in one embodiment, the real-time reward and the future reward can be calculated through a cost function, so that the accuracy of network security processing is improved. For example, a state cost function and a state behavior cost function may be introduced in the Markov decision.

Wherein the state cost function may be used to evaluate the value of the network security state s. The state cost function may be a cost function based on a network security mapping (i.e., policy π) representing the expectation of the cumulative rewards received by the agent when following the current policy π, starting from network security state s. Value function of state v_πThe calculation formula of(s) may be as follows:

v_π(s)＝E[G_t|S_t＝s]

wherein S may represent a network security state set, S may represent a network security state in the network security state set S, t may represent a current time, G_tHarvesting may be indicated.

Where the harvest may represent the decayed sum of all rewards from a time onward. Can pass through G_tDenotes harvest, this harvest G_tIt may represent the sum of the band attenuations of all rewards R from the network security state s until the network security state is terminated. Harvesting G_tThe calculation formula of (c) may be as follows:

where γ may represent a decay factor, R may represent a reward, and t may represent a current time. The attenuation factor gamma represents the value proportion of future reward at the current moment t, and the reward R obtained at the moment t + k +1_t+k+1The value expressed at time t is γ^kR。

The state behavior cost function can be used to evaluate the value of the preset network security response a in the network security state s. Can be obtained by q_π(s, a) represents a state behavior cost function that may represent the expectation of a reward that an agent may receive when performing some pre-set network security response a to network security state s while following policy π. State behavior cost function q_πThe calculation formula of (s, a) may be as follows:

q_π(s,a)＝E[G_t|S_t＝s,A_t＝a]

wherein, a may represent a preset network security response set, a may represent a preset network security response in the preset network security response set a, S may represent a network security state set, S may represent a network security state in the network security state set S, t may represent a current time, G_tHarvesting may be indicated.

The following conclusions can be drawn from the above formula:

further, the state value can be obtainedFunction v_π(s) and a state behavior cost function q_π(s, a), also known as Bellman's equation, is formulated as follows:

where the bellman equation may be a system of functional equations about an objective function, the dynamic optimisation problem may be simplified by expressing the value of what the decision problem is at a particular time in the form of a reward from the initial selection over the value of the decision problem derived from the initial selection.

In the above-mentioned formula, the compound of formula,

an immediate reward may be indicated in the form of,

it may represent a future reward, both the instant reward and the future reward being associated with the policy pi (a | s).

204. And calculating the target probability which enables the current state reward corresponding to the network security state to be the maximum value based on the acquired state reward.

In practical application, a target probability that the current state reward corresponding to the network security state is the maximum value may be calculated based on the acquired state reward, for example, the current state reward corresponding to the network security state may be calculated by using the acquired state reward as a known quantity, and the formula of the current state reward may be

Solving the target probability pi (a | s) that maximizes the current state reward, i.e., following the target probability in the network security state, yields the maximum state reward value, which the target network can rely onThe target probability performs a more accurate pre-set network security response.

In one embodiment, the target probability that the current state reward corresponding to the network security state is the maximum value may be calculated by maximizing the cost function. For example, a maximized state cost function v may be calculated_*(s) and a maximized state behavior cost function q_*(s, a), the formula may be as follows:

wherein the maximized state cost function v_*(s) may be a function that maximizes the value of network security state s, a maximized state behavior cost function q_*(s, a) may be a function that maximizes the value of the pre-set network security response a in network security state s. By making the state cost function v_*(s) and a state behavior cost function q_*And (s, a) maximizing, namely solving to obtain the target probability which enables the current state reward corresponding to the network security state to be maximum. For example, for any network security state s, if the value of the compliance policy π is not less than the value under the compliance policy π ', then the policy π is superior to the policy π'.

205. And updating the network security mapping relation based on the target probability to obtain the updated network security mapping relation.

In practical applications, for example, after the target probability is obtained through calculation, parameters in the network security mapping relationship may be adjusted according to the target probability, for example, the probability of executing various preset network security responses in the network security mapping relationship is adjusted, and then the network security mapping relationship is updated to obtain an updated network security mapping relationship.

In an embodiment, whether the updated network security mapping relationship is obtained through iteration may be determined by detecting the target probability. Specifically, the step "updating the network security mapping relationship based on the target probability to obtain an updated network security mapping relationship" may include:

when the target probability meets a probability adjusting condition, adjusting the execution probability corresponding to the network security state to the target probability;

returning to the step of obtaining the state reward corresponding to the network security state based on the execution probability;

and when the target probability does not meet the probability adjustment condition, updating the network security mapping relation based on the current target probability to obtain an updated network security mapping relation.

The iteration is a process of repeated feedback, in order to approximate a required result, each repetition of the process may be referred to as an iteration, and a result obtained by each iteration may be used as an initial value of a next iteration.

The probability adjustment condition is a condition for determining whether the target probability meets a requirement for adjusting the execution probability to the target probability and then updating the network security mapping relationship, for example, when the target probability is different from the execution probability, the target probability may be considered to meet the probability adjustment condition, the execution probability needs to be adjusted to the target probability, and the network security mapping relationship is updated. For another example, it may be further defined that when a preset gap exists between the target probability and the execution probability, the target probability is considered to satisfy the probability adjustment condition.

In an embodiment, for convenience of implementation, the probability adjustment condition may be that when the number of updates to the network security mapping does not reach a preset number of updates, the probability adjustment condition is considered to be satisfied, the execution probability needs to be adjusted to the target probability, and the network security mapping is updated.

In practical application, for example, the probability adjustment condition may be defined as that the target probability is different from the execution probability, when the target probability is different from the execution probability, the execution probability may be adjusted to the target probability, and the step of obtaining the state reward corresponding to the network security state based on the execution probability is returned, and the calculation of the target probability is performed again until the target probability is the same as the execution probability, and the network security mapping relationship is not updated any more when the probability adjustment condition is not satisfied, so that the updated network security mapping relationship is obtained.

In an embodiment, the probability adjustment condition may be defined by the number of times of updating the network security mapping relationship, for example, after the network security mapping relationship is updated by the preset number of times, the probability adjustment condition is not satisfied, the network security mapping relationship is not updated, and the updated network security mapping relationship is obtained.

In an embodiment, the detected network security state of the target network, the target probability of the target network for performing the preset network security response, the state reward corresponding to the network security state, and the event occurrence probability of each network security sub-event occurring in the target network may be obtained and recorded in a log, so as to facilitate searching and recording.

In an embodiment, the reinforcement learning model may not be limited to the markov decision model, but may also utilize other reinforcement learning models to perform network security processing.

In an embodiment, after obtaining the updated network security mapping relationship, the target network may further execute a corresponding preset network security response according to the updated network security mapping relationship, specifically, after the step "update the network security mapping relationship based on the target probability to obtain the updated network security mapping relationship", the method may further include:

detecting the current network security state of a target network;

when the current network security state is a preset network security state, acquiring a current execution probability for executing a preset network security response in the current network security state based on the updated network security mapping relation;

determining a current network security response corresponding to the current network security state from a plurality of preset network security responses based on the current execution probability;

executing the current network security response for the target network.

The preset network security state can be a network state in which the network is damaged, changed and leaked due to accidental or malicious reasons, the system cannot continuously, reliably and normally operate, the network service is interrupted, and the like.

In practical application, the current network security state of the target network can be detected, and when the current network security state is the preset network security state, the current execution probability for executing various preset network security responses in the current network security state can be acquired based on the updated network security mapping relation. And then determining a current network security response corresponding to the network security state from the multiple preset network security responses based on the current execution probability, and executing the current network security response aiming at the target network.

For example, when the current network security state s of the target network is the preset network security state, the current execution probability of executing various preset network security responses in the current network security state s may be obtained based on the updated network security mapping relationship, that is, the current execution probability may be obtained according to a policy pi, where an expression manner of the policy pi may be as follows:

π(a|s)＝P[A_t＝a|S_t＝s]

After the current execution probability is obtained, a current network security response corresponding to the network security state may be determined from a plurality of preset network security responses, for example, a current network security response a corresponding to the network security state s is determined from a preset network security response set a. The target network may then execute the current network security response a.

In an embodiment, the current network security response may also be audited to ensure that the network does not malfunction, which may cause more serious network security problems. Specifically, the step of "executing the current network security response for the current network" may include:

when the current network safety response is not a preset dangerous response, executing the current network safety response aiming at the target network;

the method further comprises refusing to execute the current network security response when the current network security response is a preset dangerous response.

In practical application, when the current network security response is not the preset dangerous response, the current network security response is executed, and the method further comprises refusing to execute the current network security response when the current network security response is the preset dangerous response. For example, when the current network security response is not the preset dangerous response, the current network security response may be executed; when the current cyber-security response is a preset dangerous response, the current cyber-security response may be refused to be executed.

In an embodiment, after the current network security response is executed, the network security status may be continuously detected, so as to improve the accuracy of the network security processing. Specifically, the network security processing method may further include:

detecting the executed network security state of the target network after executing the current network security response;

when the executed network security state is a preset network security state, updating the current network security state to the executed network security state;

and returning to execute the step of obtaining the current execution probability of executing various preset network security responses in the current network security state based on the updated network security mapping relation until a stop condition is met.

The stop condition may be a condition for stopping the step loop process, for example, the step loop may be stopped when it is detected that the network security status of the current network is not the preset network security status. It may also be that a stop condition is satisfied when cycling a predetermined number of times, that the step cycle is stopped, etc.

In practical application, the executed network security state after the target network executes the current network security response can be detected, when the executed network security state is the preset network security state, the current network security state is updated to the executed network security state, the execution is returned based on the updated network security mapping relation, and the current execution probability of executing various preset network security responses in the current network security state is obtained until the stop condition is met.

For example, after the target network executes the current network security response, the network security state of the target network may be detected, and when the executed network security state is the preset network security state, it indicates that the target network is still abnormal, and the current network security state may be updated to the executed network security state. And then, based on the updated network security mapping relationship, acquiring a current execution probability of executing various preset network security responses aiming at the target network in the current network security state until the network security state after the execution of the target network is detected to be not the preset network security state. When the executed network security state is not the preset network security state, the target network is already secure, and the loop of the execution steps may be stopped.

As can be seen from the above, in the embodiment of the present application, the network security state of the target network may be detected, the execution probability of executing the preset network security response in the network security state is obtained based on the network security mapping relationship, the network security mapping relationship includes a mapping relationship between the network security state and the probability of executing the preset network security response, the state reward corresponding to the network security state is obtained based on the execution probability, the target probability that the current state reward corresponding to the network security state is the maximum value is calculated based on the obtained state reward, and the network security mapping relationship is updated based on the target probability to obtain the updated network security mapping relationship. According to the scheme, the execution probability of executing preset network security response by a target network and the target probability of enabling the current state to be rewarded to be maximum are obtained through the network security mapping relation, and the network security mapping relation is updated according to the target probability and the execution probability, so that the updated network security mapping relation capable of obtaining the maximum current state reward is obtained. And whether the preset network safety response is the preset dangerous response or not can be judged, so that the target network cannot be subjected to misoperation, and the stability of the network environment is ensured. Meanwhile, the accumulated reward with the attenuation of the preset network security response band can be obtained by detecting the event occurrence probability of each network security sub-event of the target network after the target network executes the preset network security response. Therefore, when a network security event occurs, the network security processing can be carried out in time, so that the dependence on manpower in the network security processing process is reduced, the network security processing efficiency is improved, and the loss is reduced.

The method described in the above embodiments is further illustrated in detail by way of example.

Referring to fig. 3, the specific process of the network security processing method may be as follows:

and (I) constructing a reinforcement learning model.

In practical application, a reinforcement learning model can be constructed.

(1) In one embodiment, as shown in fig. 4, a sub-feature detection module may be constructed, which may detect the event occurrence probability of the network security sub-event occurring in the network. The sub-feature detection module may include a plurality of sub-feature detection sub-modules, and each sub-feature detection sub-module may detect an event occurrence probability of a network security sub-event. The sub-feature detection module may be combined with a plurality of security detection engines.

For example, a sub-feature detection module may be constructed, and the sub-feature detection module may include a plurality of sub-feature detection sub-modules, such as a port scanning monitoring sub-module, a malicious file downloading monitoring sub-module, a root permission stealing monitoring sub-module, a high suspicious code execution monitoring sub-module, a sensitive directory access monitoring sub-module, a sensitive file transmission monitoring sub-module, an abnormal communication sub-monitoring module, and other sub-feature detection sub-modules. The event occurrence probability of a network security sub-event is detected by each sub-feature detection sub-module, and the event occurrence probability can be obtained.

(2) In an embodiment, a preset network security response set may be defined, where a represents the preset network security response set, and according to a conventional processing flow when the host is subjected to an abnormal attack, the preset network security response set a is defined to include a plurality of preset network security responses a, for example, the preset network security response set a may include preset network security responses such as closing the scanned port a1, preventing and deleting the abnormal download file a2, closing the abnormal connection a3, locking the abnormal login account a4, and reporting the administrator a 5.

(3) In one embodiment, a reward method may be defined that rewards a predetermined network security response. For example, when the preset network security response is the preset dangerous response, a negative reward may be given to the preset network security response; when the preset network safety response is not the preset dangerous response, positive reward can be given to the preset network safety response according to the event occurrence probability detected by the sub-feature detection module.

For example, can be represented by R_tThe reward for the preset network safety response a at time t is represented by k, where k may be a constant, for example, k is-10, and represents a fixed loss of the preset network safety response a when the preset network safety response a is a preset dangerous response. Can pass through A_u(a) Representing the relation between the preset network safety response and the preset danger response, when the preset network safety response is the preset danger response, enabling A to be in a state of being in a state of being in a state of_u(a) When the preset network security response is not the preset dangerous response, the method can enable the A to be carried out_u(a) 0. The probability of occurrence of an event detected by the sub-feature detection module may be represented by o (a), which may be embodied as an array. The calculation formula for the reward may be as follows:

R_t＝k·A_u(a)+(1-A_u(a))·f(o(a))

(4) in one embodiment, a reinforcement learning system may be constructed. For example, the reinforcement learning system may be constructed as a network security state set S, a preset network security response set a, a reward method R, a state transfer function P, and an attenuation factor γ >, and a network security mapping relationship in which a preset network security response is generated by a network security state is defined, and the network security mapping relationship may be represented as a policy π.

The reward method R may represent a time t when the network is in a certain network security state s, and the accumulated reward obtained by the agent may be obtained after a certain preset network security response a is taken. The reward method R may be a reward function based on the network security status s and the preset network security response a. The functional formula of the reward method R may be as follows:

the state transition function P may represent a probability that the network is in a certain network security state s at a time t, and after a certain preset network security response a is adopted, the next time jumps from the network security state s to the network security state s'. Which can be defined as a two-dimensional gaussian distribution, the formula for the state transfer function P can be as follows:

wherein the decay factor γ may be a factor for adjusting the response reward contribution at different points in time. Since the network security status that the network later experiences will be influenced by the current network security status, but the influence will be gradually reduced, the attenuation can be expressed by an attenuation factor γ, which can be chosen to be a value between 0 and 1. In an embodiment, the attenuation factor γ may not be included.

The policy may be a basis for selecting and executing a certain preset network security response by recognizing a current network security state of the network in a certain network security state. For example, the policy may be represented as a probability size that the network may execute each preset network security response in the preset network security response set in a certain network security state, and is represented as a probability distribution based on the preset network security response set. The policy formula may be as follows:

π(a|s)＝P[A_t＝a|S_t＝s]

(5) in an embodiment, a preset dangerous response set may be constructed, for example, the preset dangerous response set may be created through experience of an expert, and the preset dangerous response set may include a plurality of preset dangerous responses that the agent prohibits from performing, such as deleting a database, executing a file in batch, downloading a suspicious file autonomously, deleting a sensitive file, closing an important service port, closing some important services, and the like. And so on.

(6) In an embodiment, a log may be constructed, and the detected network security state, the target probability of the network performing the preset network security response, the state reward corresponding to the network security state, and the event occurrence probability of each network security sub-event occurring in the network are all acquired and recorded in the log, so as to facilitate searching and recording.

As shown in fig. 5, the network device may construct a sub-feature detection module, define a preset network security response set a, define a reward method for rewarding network security responses, construct a reinforcement learning system, construct a log, train a reinforcement learning model to obtain a trained reinforcement learning model, and perform network security event response based on the reinforcement learning model.

And (II) training the reinforcement learning model to obtain the reinforcement learning model after training.

301. The network device detects a network security status of the target network.

In practical application, the network device can detect the network security scene where the target network is located, and determine that the network security scene where the target network is located is a terminal intrusion scene. And according to the network security states possibly generated in the terminal intrusion scene, defining the network security state set S as the network security states including a security state S1, a network scanned state S2, a network exploit state S3, a network attacked state S4 and a network attacked state S5. And then, detecting the target network to obtain the network security state of the target network, wherein the network security state can be one network security state in the network security state set S.

In an embodiment, after detecting the network security scene where the target network is located and determining that the network security scene where the target network is located is a terminal intrusion scene, a main body performing network security response according to a network security state may be defined as an agent. For example, the agent may be defined as a root role for the end host.

302. The network equipment acquires the execution probability of executing the preset network security response in the network security state based on the network security mapping relation.

In practical application, the network device may obtain a network security state s, and obtain an execution probability of executing various preset network security responses in the network security state based on a network security mapping relationship, that is, a policy pi, where an expression of the policy pi may be as follows:

π(a|s)＝P[A_t＝a|S_t＝s]

303. and the network equipment acquires the state reward corresponding to the network security state based on the execution probability.

In practical application, in a network security state s, the network device may obtain a preset network security response set a, and define the preset network security response set a as including a plurality of preset network security responses a according to a conventional processing flow when the host is subjected to an abnormal attack, for example, the preset network security response set a may include preset network security responses such as closing a scanned port a1, preventing and deleting an abnormal download file a2, closing an abnormal connection a3, locking an abnormal login account a4, and reporting an administrator a 5. And the network equipment acquires an instant network security response corresponding to the network security state from the preset network security response set A based on the execution probability and acquires an instant reward corresponding to the instant network security response. And a future reward corresponding to a number of future network security responses that follow the execution probability under the instant network security response. According to the instant reward and the future reward, the state reward corresponding to the network security state can be obtained, wherein the calculation formula of the state reward can be as follows:

in an embodiment, a preset dangerous response set may be further established through experience of an expert, where the preset dangerous response set may include a plurality of preset dangerous responses that the agent prohibits from executing, such as deleting a database, executing a file in batch, downloading a suspicious file autonomously, deleting a sensitive file, closing an important service port, closing some important services, and the like. When the instant network safety response is the preset dangerous response, judging that the instant network safety response has danger, stopping executing the instant network safety response, and giving a negative reward to the instant network safety response; when the instant network security response is not the preset dangerous response, the instant network security response can be judged to have no danger, the instant network security response can be executed, and meanwhile positive rewards are given to the instant network security response.

In an embodiment, after detecting a network security scene where a target network is located and determining that the network security scene where the target network is located is a terminal intrusion scene, a network device may obtain a network security event set corresponding to a network security state, where the network security event set includes a plurality of network security sub-events, such as whether port scanning exists in a network, whether a server is attacked by DDos, whether a system is infected with a virus file, whether the network has an abnormal behavior, and the like. Then, the event occurrence probability of each network security sub-event of the target network after the target network executes the instant network security response a in the network security state s can be detected, and the instant reward of the instant network security response is obtained as the positive reward based on the event occurrence probability.

In an embodiment, the network device may further detect a network security scene where the target network is located, and after determining that the network security scene where the target network is located is a terminal intrusion scene, may define the sub-feature detection module to include a port scanning monitoring sub-module, a malicious file downloading monitoring sub-module, a root permission stealing monitoring sub-module, a highly suspicious code execution monitoring sub-module, a sensitive directory access monitoring sub-module, a sensitive file transmission monitoring sub-module, an abnormal communication sub-monitoring module, and other sub-feature detection sub-modules. And detecting the event occurrence probability of a network security sub-event through each sub-feature detection sub-module to obtain the event occurrence probability, and acquiring the positive reward of the instant network security response based on the event occurrence probability.

R_t＝k·A_u(a)+(1-A_u(a))·f(o(a))

Value function of state v_πThe calculation formula of(s) may be as follows:

v_π(s)＝E[G_t|S_t＝s]

Harvesting G_tThe calculation formula of (c) may be as follows:

state behavior cost function q_πThe calculation formula of (s, a) may be as follows:

q_π(s,a)＝E[G_t|S_t＝s,A_t＝a]

the following conclusions can be drawn from the above formula:

v_π(s)＝∑_a∈Aπ(a|s)q_π(s,a)

further, a state cost function v can be obtained_π(s) and a state behavior cost function q_π(s, a), also known as Bellman's equation, is formulated as follows:

in the above-mentioned formula, the compound of formula,

an immediate reward may be indicated in the form of,

304. And the network equipment calculates the target probability which enables the current state reward corresponding to the network security state to be the maximum value based on the acquired state reward.

Solve so thatThe maximum target probability pi (a | s) is rewarded for the current state, that is, the maximum state reward value is obtained when the target probability is followed in the network security state, and the target network can execute more correct preset network security response according to the target probability.

As shown in fig. 6, a reward method for rewarding network security responses may be defined, a bellman equation is established, and an iterative algorithm is used to solve the bellman equation to obtain an optimized probability, thereby completing a security event response process.

In an embodiment, the detected network security state of the target network, the target probability of the target network executing the preset network security response, the state reward corresponding to the network security state, and the event occurrence probability of each network security sub-event occurring in the target network may be obtained and recorded in a log, so as to facilitate searching and recording.

305. And when the target probability meets the probability adjusting condition, adjusting the execution probability corresponding to the network security state to the target probability.

In practical applications, for example, the probability adjustment condition may be defined as that the target probability is different from the execution probability, and when the target probability is different from the execution probability, the execution probability corresponding to the network security state may be updated to the target probability.

306. And returning to the step of acquiring the state reward corresponding to the network security state based on the execution probability.

In practical applications, for example, the probability adjustment condition may be defined as that the target probability is different from the execution probability, and when the target probability is different from the execution probability, the execution probability corresponding to the network security state may be updated to the target probability. And then, returning to the step of acquiring the state reward corresponding to the network security state based on the execution probability, continuing to calculate the target probability which enables the current state reward corresponding to the network security state to be the maximum value based on the acquired state reward, acquiring the target probability again, updating the execution probability corresponding to the network security state to the target probability when the target probability is different from the execution probability, and continuing to circulate. When the target probability is the same as the execution probability, the loop of steps may be stopped.

307. And when the target probability does not meet the probability adjustment condition, updating the network security mapping relation based on the current target probability to obtain the updated network security mapping relation.

In practical applications, for example, when the target probability does not satisfy the probability adjustment condition, parameters in the network security mapping relationship may be adjusted according to the target probability, for example, the probability of executing various preset network security responses in the network security mapping relationship is adjusted, and then the network security mapping relationship is updated to obtain an updated network security mapping relationship.

In an embodiment, when the network security mapping relationship is updated and the updated network security mapping relationship is obtained, it is considered that the reinforcement learning model is trained and the reinforcement learning model after training is obtained.

And (III) responding to the network security event based on the reinforcement learning model.

308. The network device detects a current network security state of the target network.

In practical applications, the current network security status of the target network may be detected, and the specific steps of the detection have been described above and are not described herein again.

309. And when the current network security state is the preset network security state, the network equipment acquires the current execution probability of executing the preset network security response in the current network security state based on the updated network security mapping relation.

In practical applications, when the current network security state s is the preset network security state, the current execution probability of executing various preset network security responses in the current network security state s may be obtained based on the updated network security mapping relationship, that is, the current execution probability is obtained according to the policy pi, and the expression manner of the policy pi may be as follows:

π(a|s)＝P[A_t＝a|S_t＝s]

310. the network equipment determines the current network security response corresponding to the current network security state from a plurality of preset network security responses based on the current execution probability.

In practical applications, after the current execution probability is obtained, the current network security response corresponding to the network security state may be determined from multiple preset network security responses, for example, the current network security response a corresponding to the network security state s is determined from the preset network security response set a.

311. The target network performs the current network security response.

In practical applications, the target network may execute the current network security response a.

In one embodiment, when the current cyber-security response is not the preset dangerous response, the current cyber-security response is executed, and the method further includes refusing to execute the current cyber-security response when the current cyber-security response is the preset dangerous response. For example, when the current network security response is not the preset dangerous response, the current network security response may be executed; when the current cyber-security response is a preset dangerous response, the current cyber-security response may be refused to be executed.

In practical applications, for example, after the target network executes the current network security response, the network security state of the target network may be detected, and when the executed network security state is the preset network security state, it indicates that the target network is still abnormal, and the network security state may be updated to the executed network security state. And then, the step of obtaining the current execution probability of executing various preset network security responses aiming at the target network in the network security state based on the updated network security mapping relation can be executed until the network security state after the execution of the target network is detected to be not the preset network security state. When the executed network security state is not the preset network security state, the target network is already secure, and the loop of the execution steps may be stopped.

As shown in fig. 4, the agent may follow the policy, determine a preset network security response through the network security state, determine whether the preset network security response is a preset dangerous response, give a reward to the preset network security response through the probability detected by the sub-feature detection module, and record the policy, the network security response, and the reward corresponding to the network security state into a log.

As can be seen from the above, in the embodiment of the present application, the network security state of the target network may be detected by the network device, the execution probability of executing the preset network security response in the network security state is obtained based on the network security mapping relationship, the network security mapping relationship includes a mapping relationship between the network security state and the probability of executing the preset network security response, the state reward corresponding to the network security state is obtained based on the execution probability, the target probability that the current state reward corresponding to the network security state is the maximum value is calculated based on the obtained state reward, and the network security mapping relationship is updated based on the target probability to obtain the updated network security mapping relationship. According to the scheme, the execution probability of executing various preset network security responses by a target network and the target probability of enabling the current state to be rewarded to be maximum are obtained through the network security mapping relation, and the network security mapping relation is updated according to the target probability and the execution probability, so that the updated network security mapping relation capable of obtaining the maximum current state reward is obtained. And whether the preset network safety response is the preset dangerous response or not can be judged, so that the target network cannot be subjected to misoperation, and the stability of the network environment is ensured. Meanwhile, the accumulated reward with the attenuation of the preset network security response band can be obtained by detecting the event occurrence probability of each network security sub-event of the target network after the target network executes the preset network security response. Therefore, when a network security event occurs, the network security processing can be carried out in time, so that the dependence on manpower in the network security processing process is reduced, the network security processing efficiency is improved, and the loss is reduced.

In order to better implement the above method, an embodiment of the present application further provides a network security processing apparatus, which may be applied to a network device, as shown in fig. 7, and the network security processing apparatus may include: the detection module 71, probability acquisition module 72, reward acquisition module 73, calculation module 74, and update module 75 are as follows:

a detection module 71, configured to detect a network security status of a target network;

a probability obtaining module 72, configured to obtain, based on a network security mapping relationship, an execution probability of executing a preset network security response in the network security state, where the network security mapping relationship includes a mapping relationship between the network security state and the probability of executing the preset network security response;

a reward obtaining module 73, configured to obtain a status reward corresponding to the network security status based on the execution probability;

a calculating module 74, configured to calculate, based on the obtained status reward, a target probability that a current status reward corresponding to the network security status is a maximum value;

and an updating module 75, configured to update the network security mapping relationship based on the target probability to obtain an updated network security mapping relationship.

In one embodiment, the reward capture module 73 may include a reward capture sub-module 731 and a status reward capture sub-module 732, as follows:

the reward obtaining sub-module 731 is configured to obtain an instant reward and a future reward corresponding to a preset network security response based on the execution probability;

the status reward obtaining sub-module 732 is configured to combine the instant reward and the future reward to obtain a status reward corresponding to the network security status.

In an embodiment, the reward obtaining sub-module 731 may be specifically configured to:

a response obtaining sub-module 7311, configured to obtain, based on the execution probability, an instant network security response corresponding to the network security state from multiple preset network security responses;

an instant reward obtaining sub-module 7312, configured to obtain an instant reward corresponding to the instant network security response;

a future reward obtaining sub-module 7313, configured to obtain, based on the execution probability, a future reward corresponding to the preset network security response.

In an embodiment, the instant reward obtaining sub-module 7312 may be specifically configured to:

a negative reward determination sub-module 73121, configured to determine that the instant reward of the instant network security response is a negative reward when the instant network security response is a preset dangerous response;

a positive reward determination sub-module 73122, configured to obtain the instant reward of the instant network security response as a positive reward when the instant network security response is not a preset dangerous response.

In one embodiment, the positive reward determination sub-module 73122 may be specifically configured to:

In an embodiment, the network security processing apparatus may be further specifically configured to:

detecting the current network security state of a target network;

executing the current network security response for the target network.

In an embodiment, the update module 75 may be specifically configured to:

As can be seen from the above, in the embodiment of the present application, the network security state of the target network may be detected by the detection module 71, the probability obtaining module 72 obtains the execution probability of executing the preset network security response in the network security state based on the network security mapping relationship, where the network security mapping relationship includes the mapping relationship between the network security state and the probability of executing the preset network security response, the reward obtaining module 73 obtains the status reward corresponding to the network security state based on the execution probability, the calculating module 74 calculates the target probability that the current status reward corresponding to the network security state is the maximum value based on the obtained status reward, and the updating module 75 updates the network security mapping relationship based on the target probability to obtain the updated network security mapping relationship. According to the scheme, the execution probability of executing various preset network security responses by a target network and the target probability of enabling the current state to be rewarded to be maximum are obtained through the network security mapping relation, and the network security mapping relation is updated according to the target probability and the execution probability, so that the updated network security mapping relation capable of obtaining the maximum current state reward is obtained. And whether the preset network safety response is the preset dangerous response or not can be judged, so that the target network cannot be subjected to misoperation, and the stability of the network environment is ensured. Meanwhile, the accumulated reward with the attenuation of the preset network security response band can be obtained by detecting the event occurrence probability of each network security sub-event of the target network after the target network executes the preset network security response. Therefore, when a network security event occurs, the network security processing can be carried out in time, so that the dependence on manpower in the network security processing process is reduced, the network security processing efficiency is improved, and the loss is reduced.

The embodiment of the present application further provides a computer device, which may be a server or a terminal, and integrates any one of the network security processing apparatuses provided in the embodiments of the present application. As shown in fig. 8, fig. 8 is a schematic structural diagram of a computer device provided in an embodiment of the present application, and specifically:

the computer device may include components such as a processor 801 of one or more processing cores, memory 802 of one or more computer-readable storage media, a power supply 803, and an input unit 804. Those skilled in the art will appreciate that the computer device configuration illustrated in FIG. 8 does not constitute a limitation of computer devices, and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. Wherein:

the processor 801 is a control center of the computer device, connects various parts of the entire computer device using various interfaces and lines, and performs various functions of the computer device and processes data by running or executing software programs and/or modules stored in the memory 802 and calling data stored in the memory 802, thereby monitoring the computer device as a whole. Alternatively, processor 801 may include one or more processing cores; preferably, the processor 801 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 801.

The memory 802 may be used to store software programs and modules, and the processor 801 executes various functional applications and data processing by operating the software programs and modules stored in the memory 802. The memory 802 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 802 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 802 may also include a memory controller to provide the processor 801 access to the memory 802.

The computer device further includes a power supply 803 for supplying power to the various components, and preferably, the power supply 803 is logically connected to the processor 801 via a power management system, so that functions such as managing charging, discharging, and power consumption are performed via the power management system. The power supply 803 may also include one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and any like components.

The computer device may further include an input unit 804, the input unit 804 being operable to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the computer device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 801 in the computer device loads the executable file corresponding to the process of one or more application programs into the memory 802 according to the following instructions, and the processor 801 runs the application programs stored in the memory 802, thereby implementing various functions as follows:

the method comprises the steps of detecting the network security state of a target network, obtaining the execution probability of executing preset network security response in the network security state based on the network security mapping relation, obtaining the state reward corresponding to the network security state based on the execution probability, calculating the target probability of enabling the current state reward corresponding to the network security state to be the maximum value based on the obtained state reward, and updating the network security mapping relation based on the target probability to obtain the updated network security mapping relation.

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

As can be seen from the above, in the embodiment of the present application, the network security state of the target network may be detected, the execution probability of executing the preset network security response in the network security state is obtained based on the network security mapping relationship, the network security mapping relationship includes a mapping relationship between the network security state and the probability of executing the preset network security response, the state reward corresponding to the network security state is obtained based on the execution probability, the target probability that the current state reward corresponding to the network security state is the maximum value is calculated based on the obtained state reward, and the network security mapping relationship is updated based on the target probability to obtain the updated network security mapping relationship. According to the scheme, the execution probability of executing various preset network security responses by a target network and the target probability of enabling the current state to be rewarded to be maximum are obtained through the network security mapping relation, and the network security mapping relation is updated according to the target probability and the execution probability, so that the updated network security mapping relation capable of obtaining the maximum current state reward is obtained. And whether the preset network safety response is the preset dangerous response or not can be judged, so that the target network cannot be subjected to misoperation, and the stability of the network environment is ensured. Meanwhile, the accumulated reward with the attenuation of the preset network security response band can be obtained by detecting the event occurrence probability of each network security sub-event of the target network after the target network executes the preset network security response. Therefore, when a network security event occurs, the network security processing can be carried out in time, so that the dependence on manpower in the network security processing process is reduced, the network security processing efficiency is improved, and the loss is reduced.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, embodiments of the present application provide a storage medium, in which a plurality of instructions are stored, where the instructions can be loaded by a processor to execute steps in any network security processing method provided in embodiments of the present application. For example, the instructions may perform the steps of:

Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the storage medium may execute the steps in any network security processing method provided in the embodiments of the present application, beneficial effects that can be achieved by any network security processing method provided in the embodiments of the present application may be achieved, which are detailed in the foregoing embodiments and will not be described herein again.

The network security processing method and device provided by the embodiment of the present application are introduced in detail above, and a specific example is applied in the description to explain the principle and the implementation of the present application, and the description of the above embodiment is only used to help understand the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A network security processing method is characterized by comprising the following steps:

detecting a network security state of a target network;

acquiring an execution probability of executing a preset network security response in the network security state based on a network security mapping relation, wherein the network security mapping relation comprises a mapping relation between the network security state and the probability of executing the preset network security response, and the preset network security response is preparation of the target network for responding to various network abnormal events and measures taken after the network abnormal events occur;

updating the network security mapping relation based on the target probability to obtain an updated network security mapping relation;

detecting the current network security state of a target network;

executing the current network security response for the target network.

2. The network security processing method of claim 1, wherein updating the network security mapping relationship based on the target probability to obtain an updated network security mapping relationship comprises:

3. The network security processing method according to claim 1, wherein obtaining the status reward corresponding to the network security status based on the execution probability comprises:

4. The network security processing method of claim 3, wherein obtaining the instant reward and the future reward corresponding to the preset network security response based on the execution probability comprises:

5. The network security processing method of claim 4, wherein obtaining the future reward corresponding to the preset network security response based on the execution probability comprises:

obtaining a future reward based on the future reward expected value.

6. The network security processing method of claim 4, wherein obtaining the instant reward corresponding to the instant network security response comprises:

7. The network security processing method of claim 6, wherein obtaining the instant network security response with an instant reward being a positive reward comprises:

8. The network security processing method of claim 1, wherein executing the current network security response for the target network comprises:

9. A network security processing apparatus, comprising:

a probability obtaining module, configured to obtain, based on a network security mapping relationship, an execution probability of executing a preset network security response in the network security state, where the network security mapping relationship includes a mapping relationship between the network security state and the probability of executing the preset network security response, and the preset network security response is a preparation of the target network for dealing with occurrence of various network abnormal events and a measure taken after the network abnormal event occurs;

the updating module is used for updating the network security mapping relation based on the target probability to obtain an updated network security mapping relation;

the network security processing apparatus is further specifically configured to:

detecting the current network security state of a target network;

executing the current network security response for the target network.

10. A storage medium storing a plurality of instructions that can be loaded by a processor to perform the network security processing method of any one of claims 1 to 8.