CN110225019A

CN110225019A - A kind of network security processing method and device

Info

Publication number: CN110225019A
Application number: CN201910479765.0A
Authority: CN
Inventors: 毛婷伟; 梁玉; 洪春华
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-06-04
Filing date: 2019-06-04
Publication date: 2019-09-10
Anticipated expiration: 2039-06-04
Also published as: CN110225019B

Abstract

The embodiment of the present application discloses a kind of network security processing method and device；The network safe state of this method detection target network, based on network security mapping relations, it obtains under network safe state, execute the execution probability of default network security response, network security mapping relations include the mapping relations between network safe state and the probability for executing default network security response, the corresponding state reward of network safe state is obtained based on probability is executed, based on the state reward got, it calculates so that the corresponding current state reward of network safe state is the destination probability of maximum value, network security mapping relations are updated based on destination probability, obtain updated network security mapping relations.The efficiency of network security processing can be improved in the program.

Description

A kind of network security processing method and device

Technical field

This application involves field of computer technology, and in particular to a kind of network security processing method and device.

Background technique

Since network safety situation becomes to become increasingly complex, the event of the menace networks safety such as rogue activity, abnormal aggression Happen occasionally, after network is attacked, be easy to cause network data leakage, servers go down the problems such as, therefore, to network security thing Part is timely handled very necessary.

The method handled at present network security mainly passes through security expert, for example, security expert goes out in network Existing network safety event is detected, and provides corresponding solution, this network according to the network safety event of appearance The method efficiency of safe handling is very low.

Summary of the invention

The embodiment of the present application provides a kind of network security processing method and device, and the effect of network security processing can be improved Rate.

The embodiment of the present application provides a kind of network security processing method, comprising:

Detect the network safe state of target network；

It based on network security mapping relations, obtains under the network safe state, executes default network security response Probability is executed, the network security mapping relations include between network safe state and the probability for executing default network security response Mapping relations；

The corresponding state reward of the network safe state is obtained based on the execution probability；

Based on the state reward got, calculate so that the corresponding current state reward of the network safe state is maximum The destination probability of value；

The network security mapping relations are updated based on the destination probability, updated network security is obtained and reflects Penetrate relationship.

Correspondingly, the embodiment of the present application also provides a kind of network safety processing equipment, comprising:

Detection module, for detecting the network safe state of target network；

Probability obtains module, for being based on network security mapping relations, obtains under the network safe state, executes pre- If the execution probability of network security response, the network security mapping relations include network safe state and the default network peace of execution Mapping relations between the probability of total regression；

Reward obtains module, for obtaining the corresponding state reward of the network safe state based on the execution probability；

Computing module, for calculating so that the network safe state is corresponding current based on the state reward got State reward is the destination probability of maximum value；

Update module is updated for being updated based on the destination probability to the network security mapping relations Network security mapping relations afterwards.

Correspondingly, the embodiment of the present application also provides a kind of storage medium, the storage medium is stored with instruction, described instruction The step of network security processing method of any offer of the embodiment of the present application is provided when being executed by processor.

Correspondingly, the embodiment of the present application also provides a kind of computer equipment, the computer equipment includes processor and deposits Reservoir, the memory are stored with a plurality of instruction, and the processor loads instruction from the memory, to execute the application reality The step of network security processing method of any offer of example is provided.

The embodiment of the present application detects the network safe state of target network, is based on network security mapping relations, obtains in net Under network safe condition, the execution probability of default network security response is executed, network security mapping relations include network safe state Mapping relations between the probability of the default network security response of execution, based on executing, probability acquisition network safe state is corresponding State reward is calculated based on the state reward got so that the corresponding current state reward of network safe state is maximum value Destination probability, network security mapping relations are updated based on destination probability, the mapping of updated network security is obtained and closes System.The efficiency of network security processing can be improved in the program.

Detailed description of the invention

In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, the drawings in the following description are only some examples of the present application, for For those skilled in the art, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.

Fig. 1 is the schematic diagram of a scenario of network security processing system provided by the embodiments of the present application；

Fig. 2 is the first pass schematic diagram of network security processing method provided by the embodiments of the present application；

Fig. 3 is the second procedure schematic diagram of network security processing method provided by the embodiments of the present application；

Fig. 4 is the frame diagram of network security processing method provided by the embodiments of the present application；

Fig. 5 is the techniqueflow schematic diagram of network security processing method provided by the embodiments of the present application；

Fig. 6 is intensified learning model solution flow diagram provided by the embodiments of the present application；

Fig. 7 is the first structural schematic diagram of network security processing method provided by the embodiments of the present application；

Fig. 8 is the structural schematic diagram of computer equipment provided by the embodiments of the present application.

Specific embodiment

Schema is please referred to, wherein identical component symbol represents identical component, the principle of the application is to implement one It is illustrated in computing environment appropriate.The following description be based on illustrated by the application specific embodiment, should not be by It is considered as limitation the application other specific embodiments not detailed herein.

In the following description, the specific embodiment of the application will refer to the step as performed by one or multi-section computer And symbol illustrates, unless otherwise stating clearly.Therefore, these steps and operation will have to mention for several times is executed by computer, this paper institute The computer execution of finger includes by representing with the computer processing unit of the electronic signal of the data in a structuring pattern Operation.This operation is converted at the data or the position being maintained in the memory system of the computer, reconfigurable Or in addition change the running of the computer in mode known to the tester of this field.The maintained data structure of the data For the provider location of the memory, there is the specific feature as defined in the data format.But the application principle is with above-mentioned text Word illustrates that be not represented as a kind of limitation, this field tester will appreciate that plurality of step and behaviour as described below Also it may be implemented in hardware.

Term as used herein " module " can see the software object executed in the arithmetic system as.It is as described herein Different components, module, engine and service can see the objective for implementation in the arithmetic system as.And device as described herein and side Method is preferably implemented in the form of software, can also be implemented on hardware certainly, within the application protection scope.

The embodiment of the present application provides a kind of network security processing method, and the executing subject of the network security processing method can be with It is network safety processing equipment provided by the embodiments of the present application, or is integrated with the network equipment of the network safety processing equipment, Wherein the network safety processing equipment can be realized by the way of hardware or software.Wherein, the network equipment can be intelligence The equipment such as mobile phone, tablet computer, palm PC, laptop or desktop computer.

Referring to Fig. 1, Fig. 1 is the application scenarios schematic diagram of network security processing method provided by the embodiments of the present application, with For network safety processing equipment integrates in the network device, the network equipment can detecte the network safe state of target network, It based on network security mapping relations, obtains under network safe state, executes the execution probability of default network security response, network Security mapping relationship includes the mapping relations between network safe state and the probability for executing default network security response, based on holding Row probability is obtained the corresponding state reward of network safe state and is calculated based on the state reward got so that network security shape The corresponding current state reward of state is the destination probability of maximum value, is carried out more based on destination probability to network security mapping relations Newly, updated network security mapping relations are obtained.

Referring to Fig. 2, Fig. 2 is the flow diagram of network security processing method provided by the embodiments of the present application.The application The detailed process for the network security processing method that embodiment provides can be such that

201, the network safe state of target network is detected.

Wherein, network safe state can be to carry out what safety detection obtained to the network security scene where target network Network state.For example, network safe state may include safe condition, network be scanned state, network hole be utilized state, Network state under attack, network by state of capturing etc..

In one embodiment, network safe state set can be indicated by S, can wrap in network safe state set S Multiple network safe condition s is included, for example, may include safe condition s1, the scanned state of network in network safe state set S S2, network hole are utilized state s3, network, and state s4 under attack, network are captured state s5 etc..

In practical applications, it can detecte the network safe state of target network, for example, can be to where target network Network security scene is detected, and the network security scene where determining target network is terminal intrusion scenario.And according to end Issuable network safe state in intrusion scenario is held, network safe state set S is defined as to include safe condition s1, net Network is scanned state s2, network hole is utilized state s3, network state s4 and network under attack are captured state s5 etc. Network safe state.Then target network is detected, obtains the network safe state of target network.

In one embodiment, it can also pass through multiple networks to promote the accuracy that network safe state obtains Security engine detects network security parameters, so that network safe state is obtained, specifically, step " detection target network Network safe state ", may include:

Network safe state set is obtained, includes multiple network safe condition in the network safe state set；

The corresponding network security of web security engine described in target network is detected respectively based on multiple web security engines As a result；

According to the network security as a result, from the multiple network safe condition of the network safe state set, determine The network safe state of the target network.

In practical applications, available network safe state set may include a variety of in network safe state set S Then network safe state s can detect web security engine described in target network according to multiple web security engines respectively Corresponding network security is as a result, each web security engine can correspond to and detect a kind of network security as a result, such as, multiple network Security engine can respectively detect the multiple portions such as database, transmission network, user terminal, obtain network security knot Fruit.And according to multiple web results, the corresponding network safe state of target network is determined in network safe state set.

In one embodiment, the network security scene where target network is detected, determines target network place Network security scene be terminal intrusion scenario after, the master of network security response can also will be carried out according to network safe state Body is defined as intelligent body.For example, the intelligent body can be defined as the root role of end host under terminal intrusion scenario.

Wherein, root user can be unique power user in system, can possess permission all in system, such as open Some process is moved or stopped, user is deleted or increase, increase or disable hardware etc..

202, network security mapping relations are based on, are obtained under network safe state, default network security response is executed Execute probability.

Wherein, the standard that various network exception events are done can be coped with by network by presetting network security response Measure that is standby and being taken after network exception event generation.For example, default network security response may include closing exception Virus document, isolation operation apocrypha, is deleted, closes malicious process etc. flow of the shielding from certain IP by connectivity port Deng.

In one embodiment, the network security scene where target network is detected, determines target network place Network security scene be terminal intrusion scenario after, default network security response sets can also be indicated by A, according to host Default network security response sets A is defined as including that a variety of default networks are pacified by conventional process flow when by abnormal aggression Total regression a is prevented and under suppressing exception for example, may include closing scanned port a1 in default network security response sets A Published article part a2, the default network security response such as abnormal connection a3, locking abnormal login account a4 are closed, reports administrator a5.

Wherein, the execution probability for executing default network security response can be with are as follows: target network is in some network safe state Under, by the cognition to current network safe state, selection executes certain default net from a variety of default network security responses The foundation of network security response.For example, the execution probability that the target network executes various default network security responses can indicate are as follows: Target network may execute every kind of default network security in default network security response sets A under some network safe state The probability size of a is responded, the probability distribution based on default network security response sets can be expressed as by executing probability.

Wherein, network security mapping relations may include that network safe state and the various default network securitys of execution respond Mapping relations between probability, such as the network security mapping relations may include target network in some network safe state Under, the probability size of every kind of default network security response a in default network security response sets A may be executed.

In one embodiment, network security mapping relations can be indicated by strategy, indicates that strategy, strategy can be with by π Target network is indicated under some network safe state, based on the probability distribution of default network security response sets, for example, tactful Representation formula can be such that

π (a | s)=P [A_t=a | S_t=s]

Wherein, A can indicate default network security response sets, and a can be indicated in default network security response sets A Default network security response, S can indicate network safe state set, and s can indicate the network in network safe state set S Safe condition, t can indicate current time.

Wherein, tactful π can indicate that intelligent body takes possible default network security to respond a certain network safe state s The probability of a.Strategy π is only related with current network safe state, unrelated with the network safe state of history.Meanwhile tactful π Be it is static, be unrelated with the time, but intelligent body can also in real time be adjusted tactful π.

Foundation due to executing default network security response is a probability distribution, so for different network security shapes State, there may be the responses of different default network securitys according to the same strategy for target network；For identical network security shape State, target network may also generate different default network security responses according to the same strategy.

In one embodiment, network safety processing equipment can integrate in intensified learning model, network security processing side Multiple steps of method can be executed by intensified learning model.

Wherein, intensified learning model can be that response (behavior) is instructed by interacting the reward of acquisition with environment, Learnt in a manner of trial and error, the target of intensified learning is the intensified learning model in order to obtain maximum reward.Intensified learning Emphasize how to be responded based on state, to obtain maximized antedated profit.For example, the intensified learning model can be Ma Er It can husband's decision model etc..

Wherein, Markov property can be when a random process is in given present status and all past states In the case of, the conditional probability distribution of future state only relies upon current state.Namely when given present status, random mistake Journey and past state are conditional samplings, then this random process has Markov property, the mistake with Markov property Journey is properly termed as Markov process.

Wherein, it is random with Markov property can to refer to that intelligent body periodically or is continuously observed for Markovian decision Dynamical system sequentially makes corresponding strategy.The state that Markovian decision can be observed according to intelligent body at each moment, A response (behavior) is selected to be executed from available response sets based on strategy, the state in stochastic systems future is Random, and its state transition probability has Markov property.Intelligent body is according to newly observed state, then makes new plan Corresponding response is slightly executed, is repeatedly carried out according to this.

In practical applications, it can be based on intensified learning model, obtained under network safe state, various default nets are executed The execution probability of network security response.For example, available network safe state s, and it is based on intensified learning model, it obtains corresponding Network security mapping relations, i.e. strategy π, the expression way of tactful π can be such that

π (a | s)=P [A_t=a | S_t=s]

In one embodiment, intensified learning system can also be constructed, for example, can be < net by intensified learning system construction Network safe condition set S, default network security response sets A, bonusing method R, state transition function P, decay factor γ >, and Network security mapping relations are defined, which can be expressed as tactful π.

Wherein, bonusing method R can indicate t at the time of network is in some network safe state s, take some default net After network security response a, intelligent body obtains default network security and responds corresponding cumulative award, and bonusing method can learn from else's experience the phase of testing It hopes.Bonusing method R can be the reward function based on network safe state s and default network security response a.Bonusing method R's Function formula can be such that

Wherein, state transition function P can indicate t at the time of network is in some network safe state s, take some pre- If subsequent time jumps to the probability of network safe state s ' from network safe state s after network security responds a.State transfer Function P can be defined as dimensional gaussian distribution, and the calculation formula of state transition function can be such that

Wherein, decay factor γ can be the factor of the reward contribution for adjusting different time points response.Due to network The network safe state undergone later will receive the influence of current network security state, but this influence can gradually weaken, can To express the decaying of this influence by decay factor γ, decay factor γ can be chosen for the numerical value between 0 to 1.? It can not also include decay factor γ in the intensified learning system in one embodiment.

Wherein, the definition of network safe state set S, default network security response sets A and strategy π are referred to Described above, details are not described herein again.

203, the corresponding state reward of network safe state is obtained based on execution probability.

Wherein, state reward can be under certain network safe state, it then follows is determined according to network security mapping relations Probability, the reward desired value of available cumulative award.For example, when target network is in network safe state s, according to The probability that network security mapping relations determine, after taking several default network security response a, intelligent body is available tired in total The reward desired value that bonuses distributed according to strict calculations is encouraged.

In practical applications, the corresponding state reward of network safe state can be obtained according to probability is executed.For example, can According to probability is executed, to obtain the corresponding state reward v of network safe state s_π(s) namely Bellman equation

It in one embodiment, can also be by obtaining instant reward respectively in order to improve the accuracy that state reward obtains It will be rewarded with future, obtains the corresponding state reward of network safe state, specifically, step " is based on the execution probability, obtains institute State the corresponding state reward of network safe state ", may include:

Based on the execution probability, the corresponding instant reward of default network security response and the following reward are obtained；

Processing is merged to the instant reward and the following reward, obtains the corresponding state of the network safe state Reward.

Wherein, state reward may include instant reward and the following reward.

Wherein, reward immediately can be the reward based on network safe state s, to the progress of instant network security response.

Wherein, instant network security response can network security mapping relations (i.e. the strategy π) according to, determine network pacify The corresponding current default network security of total state s responds a, which can be used as instant network Security response.

Wherein, the following reward can be the reward based on network safe state s, to the progress of future network security response.? After executing instant network security response a under network safe state s, network safe state is transformed to s ', future reward can for Under network security mapping relations (i.e. strategy π), several default networks that future can be carried out are pacified based on network safe state s ' The reward that total regression carries out.

Wherein, future network security response can network security mapping relations (i.e. the strategy π) according to, determine network pacify After the corresponding instant network security response a of total state s, the default network security response that future can be carried out, future network peace Total regression can be used for carrying out the calculating of the following reward.After executing instant network security response a at network safe state s, net Network safe condition is transformed to s ', and future network security response can be the corresponding default net of network safe state s ' at tactful π Network security response after network security response a ' and default network security response a '.

In practical applications, for example, the corresponding instant network peace of network safe state s can be obtained according to probability is executed Then total regression a is obtained and is executed the instant reward that instant network security response a is obtained at network safe state s, and executes After instant network security response a, it then follows execute the reward of future accessed by probability, and according to the instant reward got and not It rewards, obtains the corresponding state reward of network safe state s.

In one embodiment, it in order to improve network security processing accuracy, can be obtained by instant network security response Immediately reward, specifically, step " be based on the execution probability, obtain the corresponding instant reward of default network security response and Future rewards ", may include:

Based on the execution probability, it is corresponding i.e. from a variety of default network securitys responses to obtain the network safe state When network security respond；

Obtain the corresponding instant reward of the instant network security response；

The corresponding following reward of default network security response is obtained based on the execution probability.

In practical applications, network security shape can be obtained from a variety of default network security responses based on probability is executed Then the corresponding instant network security response of state obtains the corresponding instant reward of instant network security response, and general based on executing Rate obtains the corresponding following reward of default network security response.For example, at network safe state s, available default network Security response set A, and conventional process flow when according to host by abnormal aggression, by default network security response sets A It is defined as including a variety of default network security response a, for example, may include closing to be swept in default network security response sets A Port a1 is retouched, prevents and suppressing exception downloads file a2, closes abnormal connection a3, locking abnormal login account a4, reports management The default network security response such as member a5.Then it responds and collects from default network security according to probability is executed at network safe state s It closes and obtains instant network security response corresponding with network safe state in A, and it is corresponding i.e. to obtain instant network security response When reward, be then based on and execute probability and obtain the corresponding following reward of default network security response.

In one embodiment, the reward that can execute to target network may get after default network security response it is expected Value obtains the following reward.Specifically, step " obtains the corresponding following prize of default network security response based on the execution probability Encourage ", may include:

Target network is obtained after executing the instant network security response, safety is carried out according to the execution probability and is rung The following reward desired value that should be obtained；

The following reward is obtained based on the following reward desired value.

In practical applications, for example, target network according to execute probability obtain instant network security response after, target network The network safe state of network can change, and be changed into the network safe state s ' after executing response by network safe state s.When When target network is in network safe state s ', it can also continue to be rung according to the default network security for executing probability acquisition next step It answers, after the response performed the next step, network safe state changes again, and so on, etc..From the foregoing, it will be observed that target network The same execution probability is followed, several default network security responses needed to be implemented can be obtained, future reward desired value can be pre- Measure the reward assigned for the following all default network security responses that may be executed.Therefore, available target network exists After executing instant network security response, according to the following reward desired value for executing probability progress security response acquisition, and should Future reward desired value is as the following reward.

In one embodiment, intelligent body can also be forbidden to execute default by auditing to instant network security response Danger response, to improve the safety of network security processing.Specifically, step " obtains the instant network security response pair The instant reward answered " may include:

When the instant network security response is default dangerous response, the instant of the instant network security response is determined Reward the reward that is negative；

When the instant network security response is not default dangerous response, obtain the instant network security response i.e. When reward be positive reward.

In practical applications, when the response of instant network security responds for default danger, instant network safety can be determined The instant of response rewards the reward that is negative；When the response of instant network security does not respond for default danger, available instant network The instant of security response rewards the reward that is positive.

It can be with for example, default dangerous response sets can be established by the experience of expert, in the default dangerous response sets Forbid the default dangerous response executed including a variety of intelligent bodies, such as deletes database, execute in batches file, independently download suspicious text Part deletes sensitive document, closes critical services port, the certain critical services of closing etc..When the response of instant network security is pre- If when danger response, it can be determined that the instant network security response has risk, stops executing the instant network security response, Negative reward is assigned to the instant network security response simultaneously；It, can be with when the response of instant network security is for default dangerous response Judge that the instant network security response does not have risk, the instant network security response can be executed, while to the instant net Network security response assigns positive reward.

It is audited by the default network security response carried out to intelligent body, so that intelligent body be prevented to execute default danger Response, and certain negative reward is given to default dangerous response, it is ensured that the network security response that intelligent body executes will not band Carry out bigger safety problem.

In one embodiment, the negative reward and positive reward can be not limited to negative and positive number, or it is opposite just It is negative.The negative reward and positive reward can be positive number simultaneously, can also be simultaneously negative etc..For example, when instant network security is rung When should be default dangerous response, relatively small reward can be assigned to the instant network security response；When instant network security When response does not respond for default danger, relatively large reward can be assigned to the instant network security response.

In one embodiment, the imparting that can be just being rewarded according to the network safety event that target network occurs, to mention The accuracy of high network security processing.Specifically, step " obtains the instant of the instant network security response and rewards the prize that is positive Encourage ", may include:

The corresponding network safety event set of the network safe state is obtained, includes in the network safety event set The safe subevent of multiple network；

After detecting the target network execution instant network security response, every kind of network occurs for the target network The event occurrence rate of safe subevent；

Based on the event occurrence rate, obtains the instant of the instant network security response and reward the reward that is positive.

In practical applications, the corresponding network safety event set of available network safe state, network safety event It include the safe subevent of multiple network in set, after detection target network executes instant network security response, target network hair The event occurrence rate of raw every kind of network security subevent, is based on event occurrence rate, and obtain instant network security response is When reward be positive reward.

For example, detecting to the network security scene where target network, the network peace where target network is determined Whole scene is the corresponding network safety event set of available network safe state after terminal intrusion scenario, network peace In total event set include the safe subevent of multiple network, as in network whether there is port scan, server whether by Whether whether virus infection file, network have abnormal behaviour etc. for DDos attack, system.Then it can detecte target network in net Under network safe condition s, after executing instant network security response a, the event of every kind of network security subevent occurs for target network Probability of happening, and it is based on event occurrence rate, it obtains the instant of instant network security response and rewards the reward that is positive.

It in one embodiment, can also be by subcharacter detection module to the event that network security subevent occurs in network Probability of happening is detected.It may include multiple subcharacter detection sub-modules in subcharacter detection module, every height can be passed through Feature detection sub-module detects a kind of event occurrence rate of network security subevent.The subcharacter detection module can be with It is composed of a variety of safety detection engines.

For example, detecting to the network security scene where target network, the network peace where target network is determined Whole scene includes that port scan monitors submodule, malice for that after terminal intrusion scenario, can define in subcharacter detection module File download monitoring submodule, root authority are stolen monitoring submodule, the high suspect code monitoring submodule of execution, sensitive catalogue and are visited Ask monitoring submodule, sensitive document transmission of monitoring submodule, the sub- monitoring modular of exceptional communication etc. subcharacter detection sub-module.It is logical It crosses each subcharacter detection sub-module to detect a kind of event occurrence rate of network security subevent, obtains event Probability, and it is based on event occurrence rate, obtain the positive reward of instant network security response.

R can be passed through_tIt indicates the instant reward that t moment carries out instant network security response a, can indicate to work as by k When instant network security response a is default dangerous response, the dead loss of instant network security response a, k can be constant, such as k =-10.A can be passed through_u(a) relationship between instant network security response and default dangerous response is indicated, when instant network is pacified When total regression is default dangerous response, A can be made_u(a)=1, when the response of instant network security does not respond for default danger, It can make A_u(a)=0.It can indicate that event occurrence rate, the o (a) can be embodied in an array by o (a).I.e. When reward R_tCalculation formula can be such that

R_t=kA_u(a)+(1-A_u(a))·f(o(a))

In one embodiment, instant reward and the following reward can also be calculated by cost function, to improve The accuracy of network security processing.For example, state value function and state behavior memory can be introduced in Markovian decision Function.

Wherein, state value function can be used to assess the value of network safe state s.State value function can be base In the cost function of network security mapping relations (i.e. strategy π), indicate since network safe state s, it then follows when current strategies π The expectation of cumulative award obtained by intelligent body.State value function v_π(s) calculation formula can be such that

v_π(s)=E [G_t|S_t=s]

Wherein, S can indicate network safe state set, and s can indicate the network security in network safe state set S State, t can indicate current time, G_tIt can indicate to harvest.

Wherein, harvest can indicate the summation for having decaying of all rewards backward since certain moment.G can be passed through_tTable Show harvest, harvest G_tIt can indicate since network safe state s, until when terminating network safe state, all reward R's The sum of band decaying.Harvest G_tCalculation formula can be such that

Wherein, γ can indicate decay factor, and R can indicate to reward, and t can indicate current time.Decay factor γ body The following reward is showed in the value ratio of current time t, in the reward R that the t+k+1 moment obtains_t+k+1In the valence that moment t is embodied Value is γ^kR。

Wherein, state behavior cost function can be used to assess at network safe state s, preset network security and respond a Value.Q can be passed through_π(s a) indicates that state behavior cost function, state behavior cost function can indicate following strategy When π, when executing some default network security response a to network safe state s, intelligent body getable reward expectation.Shape State behavior memory function q_π(s, calculation formula a) can be such that

q_π(s, a)=E [G_t|S_t=s, A_t=a]

Wherein, A can indicate default network security response sets, and a can be indicated in default network security response sets A Default network security response, S can indicate network safe state set, and s can indicate the network in network safe state set S Safe condition, t can indicate current time, G_tIt can indicate to harvest.

Such as draw a conclusion by the way that above formula is available:

Further available state value function v_π(s) and state behavior cost function q_π(s, a) namely Bell is graceful Equation, formula are expressed as follows:

Wherein, Bellman equation can be the functional equation group about objective function, can be by by " decision problem is in spy Fix time value how " it is carried out in the form of " value of the remuneration from initial selected than the decision problem derived from initial selected " It indicates, thus by dynamic optimization problem reduction.

In above formula,It can indicate to reward immediately,It can indicate future Reward, immediately reward and the following reward are all related to tactful π (a | s).

204, it based on the state reward got, calculates so that the corresponding current state reward of network safe state is maximum The destination probability of value.

In practical applications, it can be calculated based on the state reward got so that network safe state is corresponding current State reward is the destination probability of maximum value, for example, the state reward that can be will acquire calculates network security as known quantity The corresponding current state reward of state, the formula of current state reward can beIt solves so that current state reward is maximum Destination probability π (a | s), that is, following the destination probability under network safe state, maximum state reward value, mesh can be obtained More correct default network security response can be executed according to the destination probability by marking network.

In one embodiment, it can be calculated by maximizing cost function so that the corresponding current shape of network safe state State reward is the destination probability of maximum value.For example, maximized state value function v can be calculated_*(s) and it is maximized State behavior cost function q_*(s, a), formula can be such that

Wherein, maximized state value function v_*It (s) can be the function so that network safe state s Maximum Value, Maximized state behavior cost function q_*(s can be a) so that presetting network security at network safe state s and responding a The function of Maximum Value.By making state value function v_*(s) and state behavior cost function q_*(s a) is maximized, Ji Keqiu Solution obtains so that the corresponding current state of network safe state rewards maximum destination probability.For example, for arbitrary network safety State s, if the value for following tactful π is not less than the value followed under tactful π ', strategy π is better than strategy π '.

205, network security mapping relations are updated based on destination probability, obtain updated network security mapping and closes System.

In practical applications, for example, after destination probability is obtained by calculation, network security can be reflected according to destination probability The parameter penetrated in relationship is adjusted, such as to executed in network security mapping relations the probability of various default network securitys responses into Row adjustment, and then network security mapping relations are updated, obtain updated network security mapping relations.

In one embodiment, can also be decided whether after obtaining update by iteration by being detected to destination probability Network security mapping relations.Specifically, step " carries out more the network security mapping relations based on the destination probability Newly, updated network security mapping relations are obtained ", may include:

When the destination probability meets probability regularization condition, the corresponding execution probability of the network safe state is adjusted For the destination probability；

It returns and the step of corresponding state of the network safe state is rewarded is obtained based on the execution probability；

When destination probability is unsatisfactory for probability regularization condition, the network security is mapped based on current destination probability and is closed System is updated, and obtains updated network security mapping relations.

Wherein, iteration is to repeat the process of feedback, in order to be approached required as a result, each time to the repetition of process It is properly termed as an iteration, and the result that iteration obtains each time can be used as the initial value of next iteration.

Wherein, probability regularization condition is to determine whether destination probability meets needs and be adjusted to destination probability for probability is executed, And then the condition that network security mapping relations are updated, for example, when destination probability and execution probability difference, it is believed that The destination probability meets probability regularization condition, needs to execute probability and is adjusted to destination probability, and to network security mapping relations It is updated.For another example, it can also define when there are when preset gap between destination probability and execution probability, it is believed that the mesh Mark probability meets probability regularization condition.

In one embodiment, for the ease of implement, probability regularization condition can also for when to network security mapping relations into When the number that row updates does not reach default update times, it is believed that meet probability regularization condition, need to execute probability and be adjusted to Destination probability, and network security mapping relations are updated.

In practical applications, for example, can define probability regularization condition is that destination probability and execution probability be not identical, work as mesh When marking probability and not identical execution probability, it can will execute probability and be adjusted to destination probability, and return based on execution probability acquisition The step of corresponding state of network safe state is rewarded, re-starts the calculating of destination probability, until destination probability and execution are general Rate is identical, is unsatisfactory for probability regularization condition, the update of network security mapping relations is no longer carried out, to obtain updated network Security mapping relationship.

In one embodiment, probability adjustment item can also be defined by the number being updated to network security mapping relations Part, for example, probability regularization condition is unsatisfactory for, no after can defining and having updated default update times to network security mapping relations The update of network security mapping relations is carried out again, and obtains updated network security mapping relations.

In one embodiment, network safe state, the target network for the target network that can also be will test out are preset Every kind of network security occurs for the corresponding state reward of destination probability, the network safe state of network security response, target network The event occurrence rate of event is all obtained, and is recorded into log, with easy-to-look-up and record.

In one embodiment, which can be not limited only to Markovian decision model, can also utilize it He carries out network security processing by intensified learning model.

In one embodiment, after getting updated network security mapping relations, target network can also be according to this Updated network security mapping relations execute default network security response accordingly, and specifically, step " is based on the target Probability is updated the network security mapping relations, obtains updated network security mapping relations " after, it can also wrap It includes:

Detect the current network security state of target network；

When the current network security state is default network safe state, reflected based on the updated network security Relationship is penetrated, is obtained under the current network security state, the current execution probability of default network security response is executed；

Based on the current execution probability, the current network security state is determined from a variety of default network security responses Corresponding current network security response；

The current network security response is executed for the target network.

Wherein, default network safe state can for network due to by accidental or malice the reason of by destroying, more Change, reveal, system continuously reliably cannot normally be run, network service outages etc. network state.

In practical applications, the current network security state that can detecte target network is in current network security state When default network safe state, updated network security mapping relations can be based on, are obtained under current network security state, Execute the current execution probability of various default network security responses.It is then based on current execution probability, is pacified from a variety of default networks The corresponding current network security response of network safe state is determined in total regression, and executes current network security for target network Response.

For example, the current network security state s of detection target network, is default network security in current network security state When state, updated network security mapping relations can be based on, are obtained at current network security state s, are executed various pre- If the current execution probability of network security response, namely current execution probability, the expression way of strategy π are obtained according to tactful π It can be such that

π (a | s)=P [A_t=a | S_t=s]

After getting current execution probability, it can determine that network safe state is corresponding from a variety of default network securitys responses Current network security response, for example, determining that network safe state s is corresponding from default network security response sets A and working as Preceding network security responds a.Then target network can execute current network security response a.

In one embodiment, current network security can also be responded and is audited, will do it accidentally behaviour to ensure network not Make, brings more serious network security problem.Specifically, step " executes the current network security for the current network Respond ", may include:

When current network security response does not respond for default danger, executed for the target network described current Network security response；

The method also includes when current network security response responds for default danger, refusal executes described current Network security response.

In practical applications, it when current network security response does not respond for default danger, executes current network security and rings It answers, the method also includes when current network security response responds for default danger, refusal executes current network security response. For example, current network security response can be executed when current network security response does not respond for default danger；When current net When network security response is default dangerous response, it can refuse to execute current network security response.

In one embodiment, when perform current network security response after, can also continue to network safe state into Row detection, to improve the accuracy of network security processing.Specifically, which can also include:

It detects the target network and executes network safe state after the execution after the current network security responds；

When network safe state is default network safe state after the execution, more by the current network security state It is newly network safe state after the execution；

It returns to execute and is based on the updated network security mapping relations, obtain in the current network security state Under, the step of executing the current execution probability of various default network securitys responses, until meeting stop condition.

Wherein, stop condition can be so that the condition that step cycle process stops, detecting current network for example, can work as Network safe state when not being default network safe state, stop step cycle.It can also be full when recycling pre-determined number Sufficient stop condition stops step cycle, etc..

In practical applications, it can detecte target network and execute network security shape after the execution after current network security responds Current network security state when network safe state is default network safe state after execution, is updated to net after executing by state Network safe condition, and execution is returned based on updated network security mapping relations, it obtains under current network security state, holds The step of current execution probability of the various default network security responses of row, until meeting stop condition.

For example, after target network executes current network security response, can network safe state to target network into Row detection illustrates that target network is still abnormal, can will work as when network safe state is default network safe state after execution Preceding network safe state is updated to network safe state after executing.Then it can execute to map based on updated network security and close System obtains under current network security state, and the current execution for executing various default network security responses for target network is general The step of rate, until network safe state is not default network safe state after the execution of detection target network.Work as execution When network safe state is not default network safe state afterwards, target network is safe at this time, can stop carrying out step Circulation.

From the foregoing, it will be observed that the embodiment of the present application can detecte the network safe state of target network, mapped based on network security Relationship obtains under network safe state, executes the execution probability of default network security response, and network security mapping relations include Mapping relations between network safe state and the probability for executing default network security response obtain network peace based on probability is executed The corresponding state reward of total state is calculated based on the state reward got so that the corresponding current state of network safe state Reward is the destination probability of maximum value, is updated based on destination probability to network security mapping relations, obtains updated net Network Security mapping relationship.The program gets target network and executes default network security response by network security mapping relations Execution probability, and current state is made to reward maximum destination probability, according to destination probability and executes probability network is pacified Full mapping relations are updated, to get the updated network security mapping pass that can obtain maximum current state reward System.It can also be by judging whether default network security response is default dangerous response, it is ensured that target network, which not will do it, accidentally to be grasped Make, to guarantee the stabilization of network environment.After default network security response can be executed by detection target network simultaneously, mesh The event occurrence rate of every kind of network security subevent occurs for mark network, obtains the accumulative prize of default network security in response to belt decaying It encourages.Therefore when network safety event occurs, carry out network security processing that can be instant, to reduce network security treatment process In for artificial dependence, improve the efficiency of network security processing, reduce loss.

Citing, is described in further detail by the method according to described in above-described embodiment below.

Referring to Fig. 3, the detailed process of the network security processing method can be such that

(1) intensified learning model is constructed.

In practical applications, intensified learning model can be constructed.

(1) in one embodiment, as shown in figure 4, subcharacter detection module can be constructed, which can be in network The event occurrence rate that network security subevent occurs is detected.It may include multiple subcharacter inspections in subcharacter detection module Submodule is surveyed, can be examined by a kind of event occurrence rate of each subcharacter detection sub-module to network security subevent It surveys.The subcharacter detection module can be composed of a variety of safety detection engines.

It may include multiple subcharacter detections in the subcharacter detection module for example, subcharacter detection module can be constructed Submodule is stolen such as port scan monitoring submodule, malicious file downloading monitoring submodule, root authority and monitors submodule, holds The high suspect code monitoring submodule of row, sensitive directory access monitor submodule, sensitive document transmission of monitoring submodule, exceptional communication Sub- monitoring modular etc. subcharacter detection sub-module.By each subcharacter detection sub-module to a kind of network security subevent Event occurrence rate is detected, available event occurrence rate.

(2) in one embodiment, default network security response sets can be defined, indicate that default network security is rung by A Should gather, conventional process flow when according to host by abnormal aggression, default network security response sets A is defined as include A variety of default network securitys respond a, for example, may include closing scanned port a1 in default network security response sets A, resistance The default nets such as only simultaneously suppressing exception downloads file a2, closing exception connects a3, locks abnormal login account a4, reports administrator a5 Network security response.

(3) in one embodiment, it can define and the bonusing method rewarded is responded to default network security.For example, working as When default network security response is presets dangerous response, default network security can be responded and give negative reward；When default network When security response is not default dangerous response, the event occurrence rate that can be detected according to subcharacter detection module, to default Positive reward is given in network security response.

For example, R can be passed through_tIt indicates the reward that t moment carries out default network security response a, can indicate to work as by k When default network security response a is default dangerous response, the dead loss of network security response a is preset, k can be constant, such as k =-10.A can be passed through_u(a) relationship for indicating default network security response with default dangerous response, when default network security is rung When should be default dangerous response, A can be made_uIt (a)=1, can be with, when the response of default network security is for default dangerous response So that A_u(a)=0.It can indicate that the event occurrence rate that subcharacter detection module detects, the o (a) can have by o (a) Body shows as an array.The calculation formula of reward can be such that

R_t=kA_u(a)+(1-A_u(a))·f(o(a))

(4) in one embodiment, intensified learning system can be constructed.For example, can be < net by intensified learning system construction Network safe condition set S, default network security response sets A, bonusing method R, state transition function P, decay factor γ >, and Definition is generated the network security mapping relations of default network security response by network safe state, which can To be expressed as tactful π.

Wherein, bonusing method R can indicate t at the time of network is in some network safe state s, take some default net After network security response a, the cumulative award that intelligent body obtains can learn from else's experience and test expectation.Bonusing method R can be for based on network security The reward function of state s and default network security response a.The function formula of bonusing method R can be such that

Wherein, state transition function P can indicate t at the time of network is in some network safe state s, take some pre- If subsequent time jumps to the probability of network safe state s ' from network safe state s after network security responds a.It can define Formula for dimensional gaussian distribution, state transition function P can be such that

Wherein, decay factor γ can be the factor of the response reward contribution for adjusting different time points.Due to network The network safe state undergone later will receive the influence of current network security state, but this influence can gradually weaken, can To express this decaying by decay factor γ, decay factor γ can be chosen for the numerical value between 0 to 1.Implement one It can not also include decay factor γ in example.

Wherein, strategy can be network under some network safe state, by the cognition to current network security state, Selection executes the foundation of certain default network security response.For example, strategy can be expressed as network in some network safe state Under, the probability size of every kind of default network security response in default network security response sets may be executed, is expressed as based on pre- If the probability distribution of network security response sets.Tactful formula can be such that

π (a | s)=P [A_t=a | S_t=s]

(5) in one embodiment, default dangerous response sets can be constructed, for example, can establish by the experience of expert Danger response sets are preset, may include that intelligent body forbids a variety of default danger carried out to ring in the default dangerous response sets Answer, such as delete database, execute in batches file, independently download apocrypha, delete sensitive document, close critical services port, Close certain critical services etc..Etc..

(6) in one embodiment, log can be constructed, network safe state, the network that will test out carry out default network The thing of every kind of network security subevent occurs for the corresponding state reward of destination probability, the network safe state of security response, network Part probability of happening is all obtained, and is recorded into log, with easy-to-look-up and record.

As shown in figure 5, the network equipment can construct subcharacter detection module, default network security response sets A is defined, it is fixed Justice responds the bonusing method rewarded to network security, constructs intensified learning system, log is constructed, then to intensified learning mould Type is trained, intensified learning model after being trained, and carries out network safety event response based on intensified learning model.

(2) intensified learning model is trained, intensified learning model after being trained.

301, the network safe state of network equipment detection target network.

In practical applications, the network equipment can detect the network security scene where target network, determine Network security scene where target network is terminal intrusion scenario.And pacified according to network issuable in terminal intrusion scenario Total state, by network safe state set S, to be defined as include safe condition s1, that network is scanned state s2, network hole is sharp With state s3, network, state s4 and network under attack are captured the network safe states such as state s5.Then to target network It is detected, obtains the network safe state of target network, which can be in network safe state set S A kind of network safe state.

In one embodiment, the network security scene where target network is detected, determines target network place Network security scene be terminal intrusion scenario after, the master of network security response can also will be carried out according to network safe state Body is defined as intelligent body.For example, the intelligent body can be defined as the root role of end host.

302, the network equipment is based on network security mapping relations, obtains under network safe state, executes default network peace The execution probability of total regression.

In practical applications, the available network safe state s of the network equipment, and be based on network security mapping relations, i.e., Tactful π is obtained under network safe state, executes the execution probability of various default network security responses, the expression side of strategy π Formula can be such that

π (a | s)=P [A_t=a | S_t=s]

303, the network equipment is based on executing the corresponding state reward of probability acquisition network safe state.

In practical applications, at network safe state s, the available default network security response sets A of the network equipment, And conventional process flow when according to host by abnormal aggression, default network security response sets A is defined as including a variety of Default network security responds a, for example, may include closing scanned port a1 in default network security response sets A, prevents simultaneously The default networks such as suppressing exception downloads file a2, closing exception connects a3, locks abnormal login account a4, reports administrator a5 are pacified Total regression.The network equipment is based on executing probability, obtains from default network security response sets A corresponding with network safe state Instant network security response, and obtain the corresponding instant reward of instant network security response.And it is responded in instant network security Under, it then follows execute the corresponding following reward of several future network security responses of probability.It, can according to instant reward and the following reward To obtain network safe state corresponding state reward, wherein the calculation formula of state reward can be with are as follows:

In one embodiment, default dangerous response sets can also be established by the experience of expert, the default dangerous response It may include the default dangerous response that a variety of intelligent bodies are forbidden executing in set, such as delete database, execute in batches file, is autonomous Download apocrypha, delete sensitive document, close critical services port, close certain critical services etc..When instant network is pacified When total regression is default dangerous response, it can be determined that the instant network security response has risk, stops executing the instant net Network security response, while negative reward is assigned to the instant network security response；When the response of instant network security is not default danger When response, it can be determined that the instant network security response does not have risk, can execute the instant network security response, simultaneously Positive reward is assigned to the instant network security response.

In one embodiment, the network security scene where target network is detected, determines target network place Network security scene be terminal intrusion scenario after, the corresponding network security thing of the available network safe state of the network equipment Part set includes the safe subevent of multiple network in the network safety event set, as whether there is port scan, clothes in network Whether by DDos attack, system, whether virus infection file, network have abnormal behaviour etc. to business device.Then it can detecte Target network is at network safe state s, and after executing instant network security response a, every kind of network security occurs for target network The event occurrence rate of subevent, and it is based on event occurrence rate, it obtains the instant of instant network security response and rewards the prize that is positive It encourages.

In one embodiment, the network equipment can also detect the network security scene where target network, determine After the network security scene where target network is terminal intrusion scenario out, can define includes end in subcharacter detection module Mouth scanning monitoring submodule, malicious file downloading monitoring submodule, root authority steal and monitor submodule, execute high suspect code Monitor submodule, sensitive directory access monitoring submodule, sensitive document transmission of monitoring submodule, the sub- monitoring modular of exceptional communication etc. Equal subcharacters detection sub-module.By each subcharacter detection sub-module to a kind of event occurrence rate of network security subevent It is detected, obtains event occurrence rate, and be based on event occurrence rate, obtain the positive reward of instant network security response.

R_t=kA_u(a)+(1-A_u(a))·f(o(a))

State value function v_π(s) calculation formula can be such that

v_π(s)=E [G_t|S_t=s]

Harvest G_tCalculation formula can be such that

State behavior cost function q_π(s, calculation formula a) can be such that

q_π(s, a)=E [G_t|S_t=s, A_t=a]

Such as draw a conclusion by the way that above formula is available:

v_π(s)=∑_a∈Aπ(a|s)q_π(s,a)

304, the network equipment is calculated based on the state reward got so that the corresponding current state prize of network safe state Encourage the destination probability for maximum value.

The bonusing method rewarded is responded to network security as shown in fig. 6, can define, and establishes Bellman equation, Bellman equation is solved using iterative algorithm, optimization probability is obtained, to complete security incident response process.

In one embodiment, network safe state, the target network for the target network that can also be will test out execute default Every kind of network security occurs for the corresponding state reward of destination probability, the network safe state of network security response, target network The event occurrence rate of event is all obtained, and is recorded into log, with easy-to-look-up and record.

305, when destination probability meets probability regularization condition, the corresponding execution probability of network safe state is adjusted to mesh Mark probability.

In practical applications, for example, can define probability regularization condition is that destination probability and execution probability be not identical, work as mesh It can be destination probability by the corresponding execution probability updating of network safe state when marking probability and not identical execution probability.

306, it returns and the step of corresponding state of network safe state is rewarded is obtained based on execution probability.

In practical applications, for example, can define probability regularization condition is that destination probability and execution probability be not identical, work as mesh It can be destination probability by the corresponding execution probability updating of network safe state when marking probability and not identical execution probability.Then It can return based on the step of probability obtains network safe state corresponding state reward is executed, continue based on getting State reward calculates so that the destination probability that the corresponding current state of the network safe state is rewarded as maximum value, is obtained again Destination probability is got, it, can be by the corresponding execution probability of network safe state when the destination probability and not identical execution probability It is updated to destination probability, continues to recycle.When the destination probability is identical as probability is executed, the circulation of step can be stopped.

307, when destination probability is unsatisfactory for probability regularization condition, network security is mapped based on current destination probability and is closed System is updated, and obtains updated network security mapping relations.

It in practical applications, can be according to destination probability to net for example, when destination probability is unsatisfactory for probability regularization condition Parameter in network Security mapping relationship is adjusted, and is such as responded to executing various default network securitys in network security mapping relations Probability be adjusted, and then network security mapping relations are updated, obtain updated network security mapping relations.

In one embodiment, network security mapping relations are updated, obtain updated network security mapping relations When, i.e., it is believed that being trained to intensified learning model, and intensified learning model after being trained.

(3) network safety event response is carried out based on intensified learning model.

308, the current network security state of network equipment detection target network.

In practical applications, can the current network security state to target network detect, detect specific steps on Text has described, and details are not described herein again.

309, when current network security state is default network safe state, the network equipment is pacified based on updated network Full mapping relations obtain under current network security state, execute the current execution probability of default network security response.

In practical applications, when current network security state s is default network safe state, updated net can be based on Network Security mapping relationship is obtained at current network security state s, and the current execution for executing various default network security responses is general Rate, namely current execution probability is obtained according to tactful π, the expression way of strategy π can be such that

π (a | s)=P [A_t=a | S_t=s]

310, the network equipment is based on current execution probability, determines current network security from a variety of default network securitys responses The corresponding current network security response of state.

In practical applications, after getting current execution probability, net can be determined from a variety of default network securitys responses The corresponding current network security response of network safe condition, for example, determining that network is pacified from default network security response sets A The corresponding current network security of total state s responds a.

311, target network executes current network security response.

In practical applications, target network can execute current network security response a.

In one embodiment, it when current network security response does not respond for default danger, executes current network security and rings It answers, the method also includes when current network security response responds for default danger, refusal executes current network security response. For example, current network security response can be executed when current network security response does not respond for default danger；When current net When network security response is default dangerous response, it can refuse to execute current network security response.

It in practical applications, can be to the net of target network for example, after target network execution current network security response Network safe condition is detected, and when network safe state is default network safe state after execution, illustrates target network still It is abnormal, network safe state can be updated to network safe state after executing.Then it can execute based on updated network Security mapping relationship obtains under network safe state, executes the current of various default network security responses for target network The step of executing probability, until network safe state is not default network safe state after the execution of detection target network. When network safe state is not default network safe state after execution, target network is safe at this time, can stop carrying out The circulation of step.

As shown in figure 4, intelligent body can follow strategy, default network security response is determined by network safe state, And judge whether default network security response is default dangerous response, while the probability detected by subcharacter detection module It awards to the response of default network security, can also the corresponding strategy of network safe state, network security be responded and be encouraged It encourages and is recorded in log.

From the foregoing, it will be observed that the embodiment of the present application can detect the network safe state of target network by the network equipment, it is based on Network security mapping relations obtain under network safe state, execute the execution probability of default network security response, network security Mapping relations include the mapping relations between network safe state and the probability for executing default network security response, general based on executing Rate is obtained the corresponding state reward of network safe state and is calculated based on the state reward got so that network safe state pair The current state reward answered is the destination probability of maximum value, is updated, is obtained to network security mapping relations based on destination probability To updated network security mapping relations.It is various to get target network execution by network security mapping relations for the program The execution probability of default network security response, and current state is made to reward maximum destination probability, according to destination probability and It executes probability to be updated network security mapping relations, thus after getting the update that can obtain maximum current state reward Network security mapping relations.It can also be by judging whether default network security response is default dangerous response, it is ensured that target Network not will do it maloperation, to guarantee the stabilization of network environment.Default net can be executed by detection target network simultaneously After network security response, the event occurrence rate of every kind of network security subevent occurs for target network, obtains default network security The cumulative award of in response to belt decaying.Therefore when network safety event occurs, carry out network security processing that can be instant, to subtract For artificial dependence in few network security treatment process, the efficiency of network security processing is improved, reduces loss.

In order to better implement above method, the embodiment of the present application also provides a kind of network safety processing equipment, the network Secure processing device can be adapted for the network equipment, as shown in fig. 7, the network safety processing equipment may include: detection module 71, probability obtains module 72, reward obtains module 73, computing module 74 and update module 75, as follows:

Detection module 71, for detecting the network safe state of target network；

Probability obtains module 72, for being based on network security mapping relations, obtains under the network safe state, executes The execution probability of default network security response, the network security mapping relations include network safe state and the default network of execution Mapping relations between the probability of security response；

Reward obtains module 73, for obtaining the corresponding state prize of the network safe state based on the execution probability It encourages；

Computing module 74, for calculating so that the network safe state is corresponding works as based on the state reward got Preceding state reward is the destination probability of maximum value；

Update module 75 is obtained more for being updated based on the destination probability to the network security mapping relations Network security mapping relations after new.

In one embodiment, the reward obtains module 73, may include that reward acquisition submodule 731 and state reward obtain Submodule 732 is taken, as follows:

Acquisition submodule 731 is rewarded, it is corresponding for based on the execution probability is based on, obtaining default network security response Immediately reward and the following reward；

State rewards acquisition submodule 732, for merging processing to the instant reward and the following reward, obtains The corresponding state reward of the network safe state.

In one embodiment, the reward acquisition submodule 731, can be specifically used for:

Acquisition submodule 7311 is responded, for being based on the execution probability, is obtained from a variety of default network security responses The corresponding instant network security response of the network safe state；

Immediately reward acquisition submodule 7312, for obtaining the corresponding instant reward of the instant network security response；

Future reward acquisition submodule 7313, it is corresponding for obtaining default network security response based on the execution probability Future reward.

In one embodiment, the instant reward acquisition submodule 7312, can be specifically used for:

Negative reward determines submodule 73121, for determining when the instant network security response is default dangerous response The instant of the instant network security response rewards the reward that is negative；

Positive reward determines submodule 73122, for obtaining when the instant network security response is not default dangerous response The instant of the instant network security response is taken to reward the reward that is positive.

In one embodiment, the positive reward determines submodule 73122, can be specifically used for:

In one embodiment, the network safety processing equipment can also be specifically used for:

Detect the current network security state of target network；

The current network security response is executed for the target network.

In one embodiment, the update module 75, can be specifically used for:

From the foregoing, it will be observed that the embodiment of the present application can detect the network safe state of target network by detection module 71, lead to It crosses probability and obtains module 72 based on network security mapping relations, obtain under network safe state, execute default network security and ring The execution probability answered, network security mapping relations include between network safe state and the probability for executing default network security response Mapping relations, module 73 obtained by reward be based on executing probability and obtain the corresponding state of network safe state and reward, pass through Computing module 74 is calculated based on the state reward got so that the corresponding current state reward of network safe state is maximum value Destination probability, by update module 75 be based on destination probability network security mapping relations are updated, obtain updated Network security mapping relations.The program gets target network and executes various default network peaces by network security mapping relations The execution probability of total regression, and current state is made to reward maximum destination probability, according to destination probability and execute probability pair Network security mapping relations are updated, to get the updated network security that can obtain maximum current state reward Mapping relations.It can also be by judging whether default network security response be default dangerous response, it is ensured that target network will not be into Row maloperation, to guarantee the stabilization of network environment.Default network security response can be executed by detection target network simultaneously Later, the event occurrence rate of every kind of network security subevent occurs for target network, obtains default network security in response to belt decaying Cumulative award.Therefore when network safety event occurs, carry out network security processing that can be instant, to reduce network security For artificial dependence in treatment process, the efficiency of network security processing is improved, reduces loss.

The embodiment of the present application also provides a kind of computer equipment, which can set for server or terminal etc. It is standby, it is integrated with any network safety processing equipment provided by the embodiment of the present application.As shown in figure 8, Fig. 8 is the application reality The structural schematic diagram of the computer equipment of example offer is provided, specifically:

The computer equipment may include the processor 801, one or one of one or more than one processing core with The components such as memory 802, power supply 803 and the input unit 804 of upper computer readable storage medium.Those skilled in the art can be with Understand, computer equipment structure shown in Fig. 8 does not constitute the restriction to computer equipment, may include than illustrate it is more or Less component perhaps combines certain components or different component layouts.Wherein:

Processor 801 is the control centre of the computer equipment, is set using various interfaces and the entire computer of connection Standby various pieces, by running or executing the software program and/or module that are stored in memory 802, and calling storage Data in memory 802 execute the various functions and processing data of computer equipment, to carry out to computer equipment whole Body monitoring.Optionally, processor 801 may include one or more processing cores；Preferably, processor 801 can integrate at Manage device and modem processor, wherein the main processing operation system of application processor, user interface and application program etc. are adjusted Demodulation processor processed mainly handles wireless communication.It is understood that above-mentioned modem processor can not also integrate everywhere It manages in device 801.

Memory 802 can be used for storing software program and module, and processor 801 is stored in memory 802 by operation Software program and module, thereby executing various function application and data processing.Memory 802 can mainly include storage journey Sequence area and storage data area, wherein storing program area can the (ratio of application program needed for storage program area, at least one function Such as sound-playing function, image player function) etc.；Storage data area can be stored to be created according to using for computer equipment Data etc..In addition, memory 802 may include high-speed random access memory, it can also include nonvolatile memory, such as At least one disk memory, flush memory device or other volatile solid-state parts.Correspondingly, memory 802 can be with Including Memory Controller, to provide access of the processor 801 to memory 802.

Computer equipment further includes the power supply 803 powered to all parts, it is preferred that power supply 803 can pass through power supply pipe Reason system and processor 801 are logically contiguous, to realize management charging, electric discharge and power managed by power-supply management system Etc. functions.Power supply 803 can also include one or more direct current or AC power source, recharging system, power failure inspection The random components such as slowdown monitoring circuit, power adapter or inverter, power supply status indicator.

The computer equipment may also include input unit 804, which can be used for receiving the number or word of input Information is accorded with, and generates keyboard related with user setting and function control, mouse, operating stick, optics or trace ball letter Number input.

Although being not shown, computer equipment can also be including display unit etc., and details are not described herein.Specifically in the present embodiment In, the processor 801 in computer equipment can be according to following instruction, by the process pair of one or more application program The executable file answered is loaded into memory 802, and the application journey being stored in memory 802 is run by processor 801 Sequence, thus realize various functions, it is as follows:

The network safe state of target network is detected, network security mapping relations are based on, is obtained under network safe state, The execution probability of default network security response is executed, network security mapping relations include network safe state and the default network of execution Mapping relations between the probability of security response obtain the corresponding state reward of network safe state based on probability is executed, are based on The state reward got calculates so that the corresponding current state reward of network safe state is the destination probability of maximum value, base Network security mapping relations are updated in destination probability, obtain updated network security mapping relations.

The specific implementation of above each operation can be found in the embodiment of front, and details are not described herein.

From the foregoing, it will be observed that the embodiment of the present application can detecte the network safe state of target network, mapped based on network security Relationship obtains under network safe state, executes the execution probability of default network security response, and network security mapping relations include Mapping relations between network safe state and the probability for executing default network security response obtain network peace based on probability is executed The corresponding state reward of total state is calculated based on the state reward got so that the corresponding current state of network safe state Reward is the destination probability of maximum value, is updated based on destination probability to network security mapping relations, obtains updated net Network Security mapping relationship.The program gets target network and executes various default network securitys by network security mapping relations The execution probability of response, and current state is made to reward maximum destination probability, according to destination probability and probability is executed to net Network Security mapping relationship is updated, and is reflected to get and can obtain the updated network security that maximum current state is rewarded Penetrate relationship.It can also be by judging whether default network security response is default dangerous response, it is ensured that target network not will do it Maloperation, to guarantee the stabilization of network environment.Default network security can be executed by detection target network simultaneously and responds it Afterwards, the event occurrence rate of every kind of network security subevent occurs for target network, obtains default network security in response to belt decaying Cumulative award.Therefore when network safety event occurs, carry out network security processing that can be instant, to reduce at network security For artificial dependence during reason, the efficiency of network security processing is improved, reduces loss.

It will appreciated by the skilled person that all or part of the steps in the various methods of above-described embodiment can be with It is completed by instructing, or relevant hardware is controlled by instruction to complete, which can store computer-readable deposits in one In storage media, and is loaded and executed by processor.

For this purpose, the embodiment of the present application provides a kind of storage medium, wherein being stored with a plurality of instruction, which can be processed Device is loaded, to execute the step in any network security processing method provided by the embodiment of the present application.For example, this refers to Order can execute following steps:

Wherein, which may include: read-only memory (ROM, Read Only Memory), random access memory Body (RAM, Random Access Memory), disk or CD etc..

By the instruction stored in the storage medium, any network peace provided by the embodiment of the present application can be executed Step in full processing method, it is thereby achieved that any network security processing method institute provided by the embodiment of the present application The beneficial effect being able to achieve is detailed in the embodiment of front, and details are not described herein.

A kind of network security processing method and device provided by the embodiment of the present application are described in detail above, this Specific case is applied in text, and the principle and implementation of this application are described, the explanation of above example is only intended to Help understands the present processes and its core concept；Meanwhile for those skilled in the art, according to the thought of the application, There will be changes in the specific implementation manner and application range, in conclusion the content of the present specification should not be construed as to this The limitation of application.

Claims

1. a kind of network security processing method characterized by comprising

Detect the network safe state of target network；

It based on network security mapping relations, obtains under the network safe state, executes the execution of default network security response Probability, the network security mapping relations include reflecting between network safe state and the probability for executing default network security response Penetrate relationship；

Based on the state reward got, calculate so that the corresponding current state reward of the network safe state is maximum value Destination probability；

The network security mapping relations are updated based on the destination probability, updated network security mapping is obtained and closes System.

2. network security processing method according to claim 1, which is characterized in that based on the destination probability to the net Network Security mapping relationship is updated, and obtains updated network security mapping relations, comprising:

When the destination probability meets probability regularization condition, the corresponding execution probability of the network safe state is adjusted to institute State destination probability；

When destination probability is unsatisfactory for probability regularization condition, based on current destination probability to the network security mapping relations into Row updates, and obtains updated network security mapping relations.

3. network security processing method according to claim 1, which is characterized in that be based on the execution probability, obtain institute State the corresponding state reward of network safe state, comprising:

Processing is merged to the instant reward and the following reward, obtains the corresponding state prize of the network safe state It encourages.

4. network security processing method according to claim 3, which is characterized in that be based on the execution probability, obtain pre- If the corresponding instant reward of network security response and the following reward, comprising:

Based on the execution probability, the corresponding instant net of the network safe state is obtained from a variety of default network security responses Network security response；

5. network security processing method according to claim 4, which is characterized in that obtained based on the execution probability default The corresponding following reward of network security response, comprising:

Target network is obtained after executing the instant network security response, security response is carried out according to the execution probability and is obtained The following reward desired value obtained；

The following reward is obtained based on the following reward desired value.

6. network security processing method according to claim 4, which is characterized in that obtain the instant network security response Corresponding instant reward, comprising:

When the instant network security response is default dangerous response, the instant reward of the instant network security response is determined Be negative reward；

When the instant network security response is not default dangerous response, the instant prize of the instant network security response is obtained Encourage the reward that is positive.

7. network security processing method according to claim 6, which is characterized in that obtain the instant network security response Instant reward be positive reward, comprising:

The corresponding network safety event set of the network safe state is obtained, includes a variety of in the network safety event set Network security subevent；

After detecting the target network execution instant network security response, every kind of network security occurs for the target network The event occurrence rate of subevent；

8. network security processing method according to claim 1, which is characterized in that based on the destination probability to the net Network Security mapping relationship is updated, after obtaining updated network security mapping relations, the method also includes:

Detect the current network security state of target network；

When the current network security state is default network safe state, closed based on the updated network security mapping System obtains under the current network security state, executes the current execution probability of default network security response；

Based on the current execution probability, determine that the current network security state is corresponding from a variety of default network security responses Current network security response；

The current network security response is executed for the target network.

9. network security processing method according to claim 8, which is characterized in that for described in target network execution Current network security response, comprising:

When current network security response does not respond for default danger, the current network is executed for the target network Security response；

The method also includes when current network security response responds for default danger, refusal executes the current network Security response.

10. a kind of network safety processing equipment characterized by comprising

Detection module, for detecting the network safe state of target network；

Probability obtains module, for being based on network security mapping relations, obtains under the network safe state, executes default net The execution probability of network security response, the network security mapping relations include that network safe state and the default network security of execution are rung The mapping relations between probability answered；

Computing module, for calculating so that the corresponding current state of the network safe state based on the state reward got Reward is the destination probability of maximum value；

Update module is obtained updated for being updated based on the destination probability to the network security mapping relations Network security mapping relations.