CN110166428A - Intelligence defence decision-making technique and device based on intensified learning and attacking and defending game - Google Patents
Intelligence defence decision-making technique and device based on intensified learning and attacking and defending game Download PDFInfo
- Publication number
- CN110166428A CN110166428A CN201910292304.2A CN201910292304A CN110166428A CN 110166428 A CN110166428 A CN 110166428A CN 201910292304 A CN201910292304 A CN 201910292304A CN 110166428 A CN110166428 A CN 110166428A
- Authority
- CN
- China
- Prior art keywords
- attacking
- defending
- defence
- intensified learning
- defender
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/20—Network architectures or network communication protocols for network security for managing network security; network security policies in general
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention belongs to technical field of network security, in particular to a kind of intelligence defence decision-making technique and device based on intensified learning and attacking and defending game, this method includes: constructing attacking and defending betting model under bounded rationality constraint, and it generates for extracting the attacking and defending figure that network state and attacking and defending act in betting model, the attacking and defending figure is set as centered on host, attacking and defending node of graph extracts network state, and attacking and defending movement is analyzed on attacking and defending figure side;Defender obtains the selection that defence income makes defender in face of making optimal defence policies when different attackers automatically when network state transition probability is unknown, through on-line study.The present invention is effectively compressed game state space, reduces storage and operation expense;Defender with attacker fight according to environmental feedback carry out intensified learning, face different attacks when can adaptively make optimal selection;Defender's pace of learning is promoted, defence income is improved, reduces and historical data is relied on, effectively promotes real-time and intelligence when defender's decision.
Description
Technical field
The invention belongs to technical field of network security, in particular to a kind of intelligence based on intensified learning and attacking and defending game is anti-
Imperial decision-making technique and device.
Background technique
In recent years, information security events are increased, bring huge loss, according to statistics, A Liyun to network security
2017 be only daily will by 1,600,000,000 times or so attacks, for different attackers, may each attacking and defending scene can only go out
It is now primary, but for the defender for being represented with Ali Yun Wei, it daily will be in face of a large amount of identical attacking and defending scenes.Consider
It is limited to network device hardware resource, how to comprehensively consider defence costs and benefits, to defend maximum revenue as target, makes to prevent
Driver reaches a kind of balanced between risk and investment, carries out defender in a large amount of identical attacking and defending scenes to income
On-line study and update, safety officer face under appropriate safety condition the predicament of " optimal policy is difficult to choose ".Game theory with
Target antagonism, relationship Non-synergic possessed by network-combination yarn and tactful interdependence high fit.Currently based on game theory
Defence decision-making technique, which can be divided into, to be assumed to assume two classes with bounded rationality based on rational: first is that complete based on attacking and defending participant
The defence decision-making technique of rationality.Rational assume premise be each participant can reason selection optimal policy make oneself benefit
Benefit maximizes, while can predict the policy selection of other participants.It is applied to wireless sensor safe field, is attacked by establishing
Non-cooperative game model between person and sensor trusted node provides optimal attack strategies according to Nash Equilibrium, can be to worm
The efficiency of virus attack and defence policies is analyzed.By establishing the repetition between intruding detection system and wireless sensor node
Betting model can analyze the forwarding strategy of node packet.Second is that the defence decision-making technique based on attacking and defending participant's bounded rationality.Have
Limit rationality means that attacking and defending both sides will not find at the very start optimal policy, can learn attacking and defending game in attacking and defending game, close
Suitable study mechanism is won victory in game.Such method is unfolded mainly around evolutionary Game, and evolutionary Game is with group
It completes to learn by imitating the dominating stragegy of other members using biological evolution mechanism for research object.Join in evolutionary Game
Information exchange is excessive between people and mainly studies adjustment process, trend and the stability of attacking and defending collective strategy, no
Conducive to the real-time policy selection for instructing individual member.Better study mechanism simulation ping-pong process how is taken, defence is improved and determines
The accuracy and timeliness of plan become technical problem urgently to be resolved.
Summary of the invention
For this purpose, the present invention provides a kind of intelligence defence decision-making technique and device based on intensified learning and attacking and defending game, fit
For real attacking and defending network environment, realizes the intelligent defence decision of on-line study ability, there is stronger practicability and can grasp
The property made.
According to design scheme provided by the present invention, a kind of intelligence defence decision-making party based on intensified learning and attacking and defending game
Method includes following content:
A attacking and defending betting model) is constructed under bounded rationality constraint, and generate for extract in betting model network state with
The attacking and defending figure of attacking and defending movement, the attacking and defending figure are set as centered on host, and attacking and defending node of graph extracts network state, attacking and defending figure side point
Analyse attacking and defending movement;
B it) is acted based on network state and attacking and defending, relies on attacking and defending betting model, intensified learning is carried out to attacking and defending gambling process,
According to system feedback in attacking and defending both sides confrontation, so that defender is optimal anti-in face of making automatically when different attackers under bounded rationality
Drive the selection of strategy.
Above-mentioned, A) in, six element group representations of attacking and defending betting model, i.e. AD-SGM=(N, S, D, R, Q, π), wherein N table
Show that the player for participating in game, S indicate that Stochastic Game state set, D indicate defender's set of actions, R indicates defender immediately
Return, Q indicate that defender's state-movement revenue function, π indicate defender's defence policies.
Above-mentioned, attacking and defending figure is indicated with binary group, i.e. G=(S, E), wherein S indicates that node security state set, E indicate
Attack or defence movement cause the transfer of node state.
Preferably, generate attack graph when, first to target network scan obtain Network security factor, then with attack template
In conjunction with attack instance is carried out, be on the defensive instantiation in conjunction with defence template, ultimately produces attacking and defending figure, wherein attacking and defending game
The state set of model is extracted by attacking and defending node of graph, and defence set of actions is extracted by attacking and defending figure side.
Above-mentioned, B) in, in intensified learning, model intensified learning mechanism is exempted from using wolf hill climbing WoLF-PHC, is passed through
Return is obtained with environmental interaction and ambient condition shifts knowledge, and knowledge utilization income indicates, sets defender's height policy learning
Rate carries out intensified learning to adapt to attacker's strategy, by updating income, determines the optimal defence policies of defender.
Preferably, income is expressed as
The strategy of intensified learning are as follows:Wherein, α is income learning rate;γ is discount factor, Rd(s,d,
S' return immediately of the defender after state s executes defence movement d network transitions to state s') is indicated.
Further, the judgment criteria using Average Strategy as triumph and failure, formula indicate are as follows:
Further, exempt from model intensified learning mechanism, introduce state-movement locus money for tracking recent visit
Current return is distributed to state-movement of recent visit, is updated using eligibility trace to income by lattice mark.
Further, in intensified learning, the eligibility trace for defining each state-movement is e (s, a), if current network state
For s*, eligibility trace withMode is updated, and recent visit is distributed in current return
State-movement, wherein γ is discount factor, and λ is track decay factor.
Further, a kind of intelligence defence decision making device based on intensified learning and attacking and defending game includes:
Attacking and defending figure generation module for constructing attacking and defending betting model under bounded rationality constraint, and generates rich for extracting
The attacking and defending figure of network state and attacking and defending movement in model is played chess, which is set as centered on host, and attacking and defending node of graph extracts
Attacking and defending movement is analyzed on network state, attacking and defending figure side;
Defence policies choose module, are acted based on network state and attacking and defending, in conjunction with attacking and defending betting model, to attacking and defending game
Cheng Jinhang intensified learning, according to environmental feedback in attacking and defending both sides confrontation, so that defender faces different attackers under bounded rationality
Shi Zidong makes the selection of optimal defence policies.
Beneficial effects of the present invention:
Attacking and defending graph model in the present invention centered on host is acted for network state and attacking and defending, is effectively compressed game shape
State space;Defender uses intensified learning mechanism, with attacker fight in learn according to the feedback of environment so that limited
Defender under rationality can make optimal selection when facing different attackers automatically;Eligibility trace is added in decision making device,
Improve the pace of learning of defender, reduce the dependence to historical data, effectively promoted defender's decision when real-time and
Intelligence.
Detailed description of the invention:
Fig. 1 is intelligently to defend decision process schematic diagram in embodiment;
Fig. 2 is that attacking and defending state shifts schematic diagram in embodiment;
Fig. 3 is intensified learning mechanism principle figure in embodiment;
Fig. 4 is Experimental Network structure in embodiment;
Fig. 5 is network vulnerability information schematic diagram in embodiment;
Fig. 6 is attack figure in embodiment;
Fig. 7 is to defend action diagram in embodiment;
Fig. 8 is to defend action description in embodiment;
Fig. 9 is experimental setup parameters figure in embodiment;
Figure 10 is that decision situation map is defendd in embodiment;
Figure 11 is that income situation map is defendd in embodiment.
Specific embodiment:
To make the object, technical solutions and advantages of the present invention clearer, understand, with reference to the accompanying drawing with technical solution pair
The present invention is described in further detail.The technical term being related in embodiment is as follows:
Intensified learning is a kind of on-line study method of classics, and participant carries out independent study by the feedback of environment,
Compared to biological evolution type mode of learning, pace of learning is fast, meets that change between attack and defend is fast, the strong feature of timeliness.The non-cooperation of game
Property, target antagonism and the features such as tactful interdependence meet the essential characteristic of network-combination yarn.The embodiment of the present invention, referring to Fig. 1
It is shown, a kind of intelligence defence decision-making technique based on intensified learning and attacking and defending game is provided, includes:
Attacking and defending betting model is constructed under bounded rationality constraint, and is generated for extracting in betting model network state and attacking
The attacking and defending figure of anti-movement, the attacking and defending figure are set as centered on host, and attacking and defending node of graph extracts network state, the analysis of attacking and defending figure side
Attacking and defending movement;
It is acted based on network state and attacking and defending and intensified learning is carried out to attacking and defending betting model, foundation system in attacking and defending both sides confrontation
System feedback, so that defender faces the selection for making optimal defence policies when different attackers automatically under bounded rationality.
Dynamic threats trace analysis based on attribute attack graph in attack path deduction, threatens transition probability, front and back pieces to push away
Disconnected, resolution loop, in real time analysis, comprehensive multipath, privilege-escalation and access visit relationship etc. have a clear superiority.
Intensified learning mechanism is introduced into attacking and defending game, attacking and defending betting model is constructed under bounded rationality constraint, and raw
At the attacking and defending figure centered on host, acted for extracting the network state in betting model and attacking and defending;Pass through intensified learning reality
The defence decision that present line automates in real time.
Network-combination yarn betting model describes the randomness of network state transfer using probability value, due to current network state master
It is related with previous network state, state transfer relationship is indicated using first order Markov, as shown in Fig. 2, transition probability
For P (st,at,dt,st+1), wherein s is network state, and (a, d) is attacking and defending movement.Since network-combination yarn both sides have target pair
Vertical property and Non-synergic, attacking and defending both sides can deliberately hide the key message of oneself, and transition probability is set as the unknown of attacking and defending both sides
Information.On this basis, betting model is constructed.In another embodiment of the present invention, attacking and defending Stochastic Game Model (attack
Defense stochastic game model, AD-SGM) it is indicated with a hexa-atomic group of AD-SGM=(N, S, D, R, Q, π),
In, N=(attacker, defender) is two players for participating in game, respectively represent network attack person and defender;S
=(s1,s2,…,sn) it is Stochastic Game state set, it is made of network state;D=(D1,D2,…,Dn) it is defender's behavior aggregate
It closes, wherein Dk={ d1,d2,…dmIt is defender in game state SkSet of actions;Rd(si,d,sj) it is defender in state
siIt executes defence and acts d network transitions to state sjReturn immediately afterwards;Qd(si, d) and it is to indicate in state siLower defender takes
Expected revenus after acting d;πd(sk) it is defender in state skDefence policies.
Defence policies and defence the movement concept that be two different, defence policies are the set of defence movement.Defence policies
Defender is defined in the form of probability vector selects anything to act in each network state, such as πd(sk)=(πd(sk,d1),…,
πd(sk,dm)) it is defender in network state skStrategy, πd(sk,dm)) for its selection act dmProbability, wherein
By creating network-combination yarn figure G, from the Node extraction network state of attacking and defending figure G, the side analysis attacking and defending of attacking and defending figure G is dynamic
Make, for extracting pursuit-evasion strategy.In another embodiment of the present invention, attacking and defending chart is shown as a binary group G=(S, E), wherein S
={ s1,s2,…,snIt is node security state set, si=< host, privilege >, wherein host is the unique of node
Mark, privilege={ none, user, root } respectively indicate without any permission, have normal user permission, have
Administrator right.E=(Ea,Ed) it is directed edge, indicate that attack or defence movement cause the transfer of node state,
ek=(sr,v/d,sd), k=a, d, wherein srFor source node, sdFor purpose node.
Further, when attacking and defending map generalization, first to target network scan obtain Network security factor, then with attack
Template, which combines, carries out attack instance, then the instantiation that is on the defensive in conjunction with defence template, ultimately produces attacking and defending figure.Attacking and defending is random
The state set of betting model is extracted by attacking and defending node of graph, and defence set of actions is extracted by the side of attacking and defending figure.Specific steps can be set
It is calculated as shown in algorithm 1:
1. attacking and defending figure generating algorithm of algorithm
Wherein, the 1) step be to generate all possible state nodes using Network security factor and initialize side;2) -11)
Step is attack instance, generates all attack sides;The 12) -18) step is defence instantiation, generate all defence sides;19)-
23) step is all isolated nodes of removal;The 24) step be output attacking and defending figure.
In the embodiment of the present invention, intensified learning mechanism is introduced into attacking and defending game, describes the study and improvement of pursuit-evasion strategy
Process.WoLF-PHC is that one kind typically exempts from model nitrification enhancement, and study mechanism is as shown in Figure 3.The present invention another
In embodiment, in intensified learning Agent by with environment interact obtain return and ambient condition transfer knowledge, knowledge receipts
Beneficial QdIt indicates, passes through and update QdTo be learnt.Its revenue function QdAre as follows:
In formula (1), α is income learning rate;γ is discount factor.The strategy of intensified learning are as follows:
Further, WoLF-PHC wolf hill climbing makes defender have two different plans by introducing WoLF mechanism
Slightly learning rate uses low policy learning rate δ when winningw, high policy learning rate δ is used when failurel, as shown in formula (5).Two
A learning rate enables defender to rapidly adapt to the strategy of attacker in performance difference than expected, and energy is careful when doing very well than expected
Study, while ensure that convergence.Judgment criteria of the WoLF-PHC algorithm using Average Strategy as triumph and failure,
As shown in formula (6) (7).
C (s)=C (s)+1 (7)
In order to improve the pace of learning of WoLF-PHC algorithm, algorithm is reduced to the degree of dependence of data volume, the present invention is another
In a embodiment, eligibility trace is introduced in WoLF-PHC.Eligibility trace can track particular state-movement locus of recent visit, so
Current return is distributed to state-movement of recent visit afterwards.Further, the eligibility trace for defining each state-movement is e
(s, a), if current network state is s*, eligibility trace is updated in a manner of shown in formula (8), and wherein λ is track decay factor.
Defend decision-making technique to obtain better effects based on WoLF-PHC, it is reasonable to carry out to tetra- parameters of α, δ, λ and γ
Setting.1) income learning rate α value range is 0 < α < 1, and more important, pace of learning is awarded in the accumulation of the bigger representative of α more rearward
Also faster;The stability of the smaller algorithm of α is better.2) policy learning rate δ value range is 0 < δ < 1, is obtained, is taken according to experimentWhen can obtain better effects.3) eligibility trace decay factor λ value range is 0 < λ < 1, is responsible for state-movement
Prestige is distributed, is considered as the scale of time, the λ the big, and it is bigger to distribute to historic state-movement prestige.4) discount factor
γ value range is 0 < γ < 1, represents defender to the preference returned immediately with future returns.When γ is close to 0, indicate
Future returns are unimportant, more value and return immediately;When γ is close to 1, representative is returned unimportant immediately, more values future
Return.
Agent in WoLF-PHC, as shown in figure 3, the defender in corresponding attacking and defending Stochastic Game Model AD-SGM,
Game state in the state corresponding A D-SGM of Agent, the defence movement in the behavior corresponding A D-SGM of Agent, Agent's is vertical
Return the return immediately in corresponding A D-SGM, the defence policies in the tactful corresponding A D-SGM of Agent.On the basis of the above,
Specific defence decision making algorithm may be designed as shown in algorithm 2:
Algorithm 2. defends decision making algorithm
1) initialization of the step to attacking and defending Stochastic Game Model AD-SGM and relevant parameter, wherein network state and attacking and defending are dynamic
Work is extracted by algorithm 1, the 2) step defender detect current network state, the 3) -22) step is on the defensive decision and on-line study,
Wherein 4) -5) step chooses defence movement according to current strategies, and the 6) -14) step using eligibility trace to income QdIt is updated, the
15) -21) step is according to new income QdDefence policies π is updated using hill-climbing algorithmd.The space complexity of algorithm is concentrated mainly on
To Rd(s,d,s')、e(s,d)、πd(s,d)、And QdThe storage of (s, d), if | S | it is status number, | D | it is each state
The measure number of defender, then space complexity be O (4 | S | | D |+| S |2·|D|).Algorithm do not need to game equilibrium into
Row solves, and greatly reduces computation complexity compared with existing Stochastic Game Model, enhances the actual effect of algorithm.
Based on above-mentioned intelligence defence decision-making technique, the embodiment of the present invention also provides a kind of rich based on intensified learning and attacking and defending
The intelligence defence decision making device played chess includes:
Attacking and defending figure generation module for constructing attacking and defending betting model under bounded rationality constraint, and generates rich for extracting
The attacking and defending figure of network state and attacking and defending movement in model is played chess, which is set as centered on host, and attacking and defending node of graph extracts
Attacking and defending movement is analyzed on network state, attacking and defending figure side;
Defence policies choose module, are acted based on network state and attacking and defending, in conjunction with attacking and defending betting model, to attacking and defending game
Cheng Jinhang intensified learning, according to environmental feedback in attacking and defending both sides confrontation, so that defender faces different attackers under bounded rationality
Shi Zidong makes the selection of optimal defence policies.
Target network is carried out using the above-mentioned intelligence defence decision-making technique based on intensified learning and attacking and defending game and defends plan
Intelligence slightly is chosen.
For the validity for further verifying technical solution in the embodiment of the present invention, by building typical case as shown in Fig. 4
Enterprise network is tested.Attacking and defending event occurs in Intranet, and attacker comes from outer net.Network administrator is responsible for as defender
The safety of Intranet.Due to the setting of firewall 1 and firewall 2, outer net normal users can only access Web server, and Web service
The accessible database server of device, ftp server and e-mail server.Experimental Network is carried out using Nessus tool
Scanning, Experimental Network vulnerability information are as shown in Fig. 5.
With reference to MIT Lincoln laboratory attacking and defending behavior database building attack, defence template, using A identified attacks person host,
W identifies Web server, D identification database server, F mark ftp server, E and identifies e-mail server, utilizes attacking and defending
Figure generating means construct network-combination yarn figure, for convenient for showing and describsion, attacking and defending figure is divided into attack graph and defence figure, respectively as attached
Shown in Fig. 6 and attached drawing 7.Defend defence action description in figure as shown in Fig. 8.Construct the attacking and defending betting model of experiment scene:
1. N=(attacker, defender) is the player for participating in game, respectively represent network attack person and defence
Person;
2. Stochastic Game state set S=(s0,s1,s2,s3,s4,s5,s6), Stochastic Game state is made of network state,
By the Node extraction in Fig. 5 and Fig. 6;
3. defender's set of actions are as follows: D=(D0,D1,D2,D3,D4,D5,D6), wherein D0={ NULL } D1={ d1,d2}D2
={ d3,d4}D3={ d1,d5,d6}D4={ d1,d5,d6}D5={ d1,d2,d7}D6={ d3,d4, it is extracted by the side of Fig. 6;
4. defender returns R immediatelyd(si,d,sj) quantized result are as follows:
(Rd(s0,NULL,s0),Rd(s0,NULL,s1), Rd(s0,NULL,s2))=(0, -40, -59)
(Rd(s1,d1,s0),Rd(s1,d1,s1),Rd(s1,d1,s2);Rd(s1,d2,s0),Rd(s1,d2,s1),Rd(s1,d2,
s2))=(40,0, -29;5,-15,-32)
(Rd(s2,d3,s0),Rd(s2,d3,s1),Rd(s2,d3,s2),Rd(s2,d3,s3),Rd(s2,d3,s4),Rd(s2,d3,
s5);Rd(s2,d4,s0),Rd(s2,d4,s1),Rd(s2,d4,s2),Rd(s2,d4,s3),Rd(s2,d4,s4),Rd(s2,d4,s5))=
(24,9,-15,-55,-49,-65;19,5,-21,-61,-72,-68)
(Rd(s3,d1,s2),Rd(s3,d1,s3),Rd(s3,d1,s6);Rd(s3,d5,s2),Rd(s3,d5,s3),Rd(s3,d5,
s6);Rd(s3,d6,s2),Rd(s3,d6,s3),Rd(s3,d6,s6))=(21, -16, -72;15,-23,-81;-21,-36,-81)
(Rd(s4,d1,s2),Rd(s4,d1,s4),Rd(s4,d1,s6);Rd(s4,d5,s2),Rd(s4,d5,s4),Rd(s4,d5,
s6);Rd(s4,d6,s2),Rd(s4,d6,s4),Rd(s4,d6,s6))=(26,0, -62;11,-23,-75;9,-25,-87)
(Rd(s5,d1,s2),Rd(s5,d1,s5),Rd(s5,d1,s6);Rd(s5,d2,s2),Rd(s5,d2,s5),Rd(s5,d2,
s6);Rd(s5,d7,s2),Rd(s5,d7,s5),Rd(s5,d7,s6))=(29,0, -63;11,-21,-76;2,-27,-88)
(Rd(s6,d3,s3),Rd(s6,d3,s4),Rd(s6,d3,s5),Rd(s6,d3,s6);Rd(s6,d4,s3),Rd(s6,d4,
s4),Rd(s6,d4,s5),Rd(s6,d4,s6))=(- 23, -21, -19, -42;-28,-31,-24,-49)
5. for the learning performance of more fully detection algorithm, the state action income Q of defenderd(si, d) initialization when
0 is uniformly set, does not introduce additional priori knowledge.
6. the defence policies π of defenderdAverage Strategy is taken to be initialized, i.e. πd(sk,d1)=πd(sk,d2)=... πd
(sk,dm)) andAdditional priori knowledge is not introduced.
Influence of the different parameters setting to algorithm is tested, with state s in Fig. 6 and Fig. 72For, attacker is initial in experiment
Strategy be randomized policy, analyzing different parameter values will affect the speed and effect of study, to different parameter settings do into
One pacing examination, tests six kinds of different parameter settings, specific parameter setting is as shown in Fig. 9.
Defender is in state s2D is acted to defence3And d4Select probability the results are shown in Figure 10.It can be seen from Figure 10
Survey lower algorithm is arranged in different parameters pace of learning and convergence.Show that the pace of learning of setting 1,3,6 is very fast in Figure 10, three kinds
Lower algorithm is set, optimal strategy can be obtained by the study within 1500 times, but 3 and 6 convergence is poor.Although 3 Hes are arranged
Setting 6 can learn to arrive optimal strategy, but will appear concussion later, be not provided with 1 stability it is good.
Defence income can represent algorithm to the degree of optimization of strategy, in order to ensure financial value is not only to react primary defence
As a result, taking the average value of 1000 defence incomes, every 1000 average yields variation is as shown in figure 11.It can be with from Figure 11
See that the income of setting 3 is significantly lower than other settings, but the superiority and inferiority of other settings is difficult to differentiate between.Therefore, it is arranged 1 in six groups of parameters
It is best suited for this scene.
Eligibility trace bring computing overhead is tested, algorithm carries out 10 Wan Cifang when having counted 20 times respectively with and without eligibility trace
The time of imperial decision, 20 average value are as follows: qualified mark 9.51s, disqualification mark 3.74s.Although the introducing meeting of eligibility trace
So that the decision-making time increases nearly 2.5 times, but introduce after eligibility trace and still there was only 9.51s the time required to 100,000 decisions, still
It can satisfy the demand of real-time.
By testing above, further demonstrates the present invention and construct attacking and defending Stochastic Game Model simultaneously under bounded rationality constraint
The network-combination yarn figure extracted for network state and pursuit-evasion strategy is generated, game state space has been effectively compressed;Defender passes through
Study can obtain the optimal defence policies for current attack, improve the rapid automatized defence capability to unknown attack,
With stronger practicability and operability.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other
The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment
For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part
It is bright.
The unit and method and step of each example described in conjunction with the examples disclosed in this document, can with electronic hardware,
The combination of computer software or the two is realized, in order to clearly illustrate the interchangeability of hardware and software, in above description
In generally describe each exemplary composition and step according to function.These functions are held with hardware or software mode
Row, specific application and design constraint depending on technical solution.Those of ordinary skill in the art can be to each specific
Using using different methods to achieve the described function, but this realization be not considered as it is beyond the scope of this invention.
Those of ordinary skill in the art will appreciate that all or part of the steps in the above method can be instructed by program
Related hardware is completed, and described program can store in computer readable storage medium, such as: read-only memory, disk or CD
Deng.Optionally, one or more integrated circuits also can be used to realize, accordingly in all or part of the steps of above-described embodiment
Ground, each module/unit in above-described embodiment can take the form of hardware realization, can also use the shape of software function module
Formula is realized.The present invention is not limited to the combinations of the hardware and software of any particular form.
The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application.
Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the application.Therefore, the application
It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one
The widest scope of cause.
Claims (10)
1. a kind of intelligence defence decision-making technique based on intensified learning and attacking and defending game, which is characterized in that include following content:
A attacking and defending betting model) is constructed under bounded rationality constraint, and is generated for extracting network state and attacking and defending in betting model
The attacking and defending figure of movement, the attacking and defending figure are set as centered on host, and attacking and defending node of graph extracts network state, and the analysis of attacking and defending figure side is attacked
Anti- movement;
B it) is acted based on network state and attacking and defending, in conjunction with attacking and defending betting model, intensified learning, attacking and defending is carried out to attacking and defending gambling process
According to environmental feedback in both sides' confrontation, so that defender faces to make optimal defence plan automatically when different attackers under bounded rationality
Selection slightly.
2. the intelligence defence decision-making technique according to claim 1 based on intensified learning and attacking and defending game, which is characterized in that
A in), six element group representations of attacking and defending betting model, i.e. AD-SGM=(N, S, D, R, Q, π), wherein N indicates to participate in the office of game
Middle people, S indicate that Stochastic Game state set, D indicate defender's set of actions, and R indicates that defender returns immediately, and Q indicates defence
Person's state-movement revenue function, π indicate defender's defence policies.
3. the intelligence defence decision-making technique according to claim 1 based on intensified learning and attacking and defending game, which is characterized in that
Attacking and defending figure indicates with binary group, i.e. G=(S, E), wherein S indicates network node safe condition set, E indicate attack or
Defence acts the transfer for causing node state.
4. the intelligence defence decision-making technique according to claim 3 based on intensified learning and attacking and defending game, which is characterized in that
When generating attack graph, target network is scanned obtain Network security factor first, it is real then to carry out attack in conjunction with attack template
Exampleization, be on the defensive instantiation in conjunction with defence template, ultimately produces attacking and defending figure, wherein the state set of attacking and defending betting model
It is extracted by attacking and defending node of graph, defence set of actions is extracted by attacking and defending figure side.
5. the intelligence defence decision-making technique according to claim 1 based on intensified learning and attacking and defending game, which is characterized in that
B in), in intensified learning, model intensified learning mechanism is exempted from using wolf hill climbing WoLF-PHC, by obtaining back with environmental interaction
Report and ambient condition shift knowledge, and knowledge utilization income indicates, set defender's height policy learning rate to adapt to different attacks
Person's strategy, income renewal process utilize intensified learning mechanism, determine the optimal defence policies of defender.
6. the intelligence defence decision-making technique according to claim 5 based on intensified learning and attacking and defending game, which is characterized in that
Income is expressed asThe strategy of intensified learning
Are as follows:Wherein, α is income learning rate;γ is discount factor, Rd(s, d, s') indicates defender
Return immediately after state s executes defence movement d network transitions to state s'.
7. the intelligence defence decision-making technique according to claim 6 based on intensified learning and attacking and defending game, which is characterized in that
Judgment criteria using Average Strategy as triumph and failure, formula indicate are as follows:
8. the intelligence defence decision-making technique according to claim 6 based on intensified learning and attacking and defending game, which is characterized in that
Exempt from model intensified learning mechanism, introduce state-movement locus eligibility trace for tracking recent visit, will currently return point
State-movement of dispensing recent visit is updated income using eligibility trace.
9. the intelligence defence decision-making technique according to claim 8 based on intensified learning and attacking and defending game, which is characterized in that
In intensified learning, define each state-movement eligibility trace be e (s, a), if current network state be s*, eligibility trace withMode is updated, and current return is distributed to state-movement of recent visit,
In, γ is discount factor, and λ is track decay factor.
10. a kind of intelligence defence decision making device based on intensified learning and attacking and defending game, characterized by comprising:
Attacking and defending figure generation module for constructing attacking and defending betting model under bounded rationality constraint, and is generated for extracting game mould
The attacking and defending figure of network state and attacking and defending movement, the attacking and defending figure are set as centered on host in type, and attacking and defending node of graph extracts network
Attacking and defending movement is analyzed on state, attacking and defending figure side;
Defence policies choose module, are acted based on network state and attacking and defending, in conjunction with attacking and defending betting model, to attacking and defending gambling process into
Row intensified learning, according to environmental feedback in attacking and defending both sides confrontation so that when defender faces different attackers under bounded rationality from
The dynamic selection for making optimal defence policies.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910292304.2A CN110166428B (en) | 2019-04-12 | 2019-04-12 | Intelligent defense decision-making method and device based on reinforcement learning and attack and defense game |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910292304.2A CN110166428B (en) | 2019-04-12 | 2019-04-12 | Intelligent defense decision-making method and device based on reinforcement learning and attack and defense game |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110166428A true CN110166428A (en) | 2019-08-23 |
CN110166428B CN110166428B (en) | 2021-05-07 |
Family
ID=67639176
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910292304.2A Active CN110166428B (en) | 2019-04-12 | 2019-04-12 | Intelligent defense decision-making method and device based on reinforcement learning and attack and defense game |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110166428B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110659492A (en) * | 2019-09-24 | 2020-01-07 | 北京信息科技大学 | Multi-agent reinforcement learning-based malicious software detection method and device |
CN111988415A (en) * | 2020-08-26 | 2020-11-24 | 绍兴文理学院 | Mobile sensing equipment calculation task safety unloading method based on fuzzy game |
CN112221160A (en) * | 2020-10-22 | 2021-01-15 | 厦门渊亭信息科技有限公司 | Role distribution system based on random game |
CN113132398A (en) * | 2021-04-23 | 2021-07-16 | 中国石油大学(华东) | Array honeypot system defense strategy prediction method based on Q learning |
CN113810406A (en) * | 2021-09-15 | 2021-12-17 | 浙江工业大学 | Network space security defense method based on dynamic defense graph and reinforcement learning |
CN114844668A (en) * | 2022-03-17 | 2022-08-02 | 清华大学 | Defense resource configuration method, device, equipment and readable medium |
CN115296850A (en) * | 2022-07-08 | 2022-11-04 | 中电信数智科技有限公司 | Network attack and defense exercise distributed learning method based on artificial intelligence |
CN115348064A (en) * | 2022-07-28 | 2022-11-15 | 南京邮电大学 | Power distribution network defense strategy design method based on dynamic game under network attack |
CN116032653A (en) * | 2023-02-03 | 2023-04-28 | 中国海洋大学 | Method, device, equipment and storage medium for constructing network security game strategy |
CN116708042A (en) * | 2023-08-08 | 2023-09-05 | 中国科学技术大学 | Strategy space exploration method for network defense game decision |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014100738A1 (en) * | 2012-12-21 | 2014-06-26 | InsideSales.com, Inc. | Instance weighted learning machine learning model |
CN104994569A (en) * | 2015-06-25 | 2015-10-21 | 厦门大学 | Multi-user reinforcement learning-based cognitive wireless network anti-hostile interference method |
CN107135224A (en) * | 2017-05-12 | 2017-09-05 | 中国人民解放军信息工程大学 | Cyber-defence strategy choosing method and its device based on Markov evolutionary Games |
CN108512837A (en) * | 2018-03-16 | 2018-09-07 | 西安电子科技大学 | A kind of method and system of the networks security situation assessment based on attacking and defending evolutionary Game |
CN108809979A (en) * | 2018-06-11 | 2018-11-13 | 中国人民解放军战略支援部队信息工程大学 | Automatic intrusion response decision-making technique based on Q-learning |
-
2019
- 2019-04-12 CN CN201910292304.2A patent/CN110166428B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014100738A1 (en) * | 2012-12-21 | 2014-06-26 | InsideSales.com, Inc. | Instance weighted learning machine learning model |
CN104994569A (en) * | 2015-06-25 | 2015-10-21 | 厦门大学 | Multi-user reinforcement learning-based cognitive wireless network anti-hostile interference method |
CN107135224A (en) * | 2017-05-12 | 2017-09-05 | 中国人民解放军信息工程大学 | Cyber-defence strategy choosing method and its device based on Markov evolutionary Games |
CN108512837A (en) * | 2018-03-16 | 2018-09-07 | 西安电子科技大学 | A kind of method and system of the networks security situation assessment based on attacking and defending evolutionary Game |
CN108809979A (en) * | 2018-06-11 | 2018-11-13 | 中国人民解放军战略支援部队信息工程大学 | Automatic intrusion response decision-making technique based on Q-learning |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110659492A (en) * | 2019-09-24 | 2020-01-07 | 北京信息科技大学 | Multi-agent reinforcement learning-based malicious software detection method and device |
CN110659492B (en) * | 2019-09-24 | 2021-10-15 | 北京信息科技大学 | Multi-agent reinforcement learning-based malicious software detection method and device |
CN111988415A (en) * | 2020-08-26 | 2020-11-24 | 绍兴文理学院 | Mobile sensing equipment calculation task safety unloading method based on fuzzy game |
CN111988415B (en) * | 2020-08-26 | 2021-04-02 | 绍兴文理学院 | Mobile sensing equipment calculation task safety unloading method based on fuzzy game |
CN112221160A (en) * | 2020-10-22 | 2021-01-15 | 厦门渊亭信息科技有限公司 | Role distribution system based on random game |
CN113132398A (en) * | 2021-04-23 | 2021-07-16 | 中国石油大学(华东) | Array honeypot system defense strategy prediction method based on Q learning |
CN113810406A (en) * | 2021-09-15 | 2021-12-17 | 浙江工业大学 | Network space security defense method based on dynamic defense graph and reinforcement learning |
CN114844668A (en) * | 2022-03-17 | 2022-08-02 | 清华大学 | Defense resource configuration method, device, equipment and readable medium |
CN115296850A (en) * | 2022-07-08 | 2022-11-04 | 中电信数智科技有限公司 | Network attack and defense exercise distributed learning method based on artificial intelligence |
CN115348064A (en) * | 2022-07-28 | 2022-11-15 | 南京邮电大学 | Power distribution network defense strategy design method based on dynamic game under network attack |
CN115348064B (en) * | 2022-07-28 | 2023-09-26 | 南京邮电大学 | Dynamic game-based power distribution network defense strategy design method under network attack |
CN116032653A (en) * | 2023-02-03 | 2023-04-28 | 中国海洋大学 | Method, device, equipment and storage medium for constructing network security game strategy |
CN116708042A (en) * | 2023-08-08 | 2023-09-05 | 中国科学技术大学 | Strategy space exploration method for network defense game decision |
CN116708042B (en) * | 2023-08-08 | 2023-11-17 | 中国科学技术大学 | Strategy space exploration method for network defense game decision |
Also Published As
Publication number | Publication date |
---|---|
CN110166428B (en) | 2021-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110166428A (en) | Intelligence defence decision-making technique and device based on intensified learning and attacking and defending game | |
CN111966698B (en) | Block chain-based trusted federation learning method, system, device and medium | |
Zhang et al. | Gan enhanced membership inference: A passive local attack in federated learning | |
CN108833401A (en) | Network active defensive strategy choosing method and device based on Bayes's evolutionary Game | |
CN108833402A (en) | A kind of optimal defence policies choosing method of network based on game of bounded rationality theory and device | |
CN107566387B (en) | Network defense action decision method based on attack and defense evolution game analysis | |
CN110191083A (en) | Safety defense method, device and the electronic equipment threatened towards advanced duration | |
CN107135224A (en) | Cyber-defence strategy choosing method and its device based on Markov evolutionary Games | |
CN110300106A (en) | Mobile target based on Markov time game defends decision choosing method, apparatus and system | |
CN110035066B (en) | Attack and defense behavior quantitative evaluation method and system based on game theory | |
CN110460572A (en) | Mobile target defence policies choosing method and equipment based on Markov signaling games | |
CN109327427A (en) | A kind of dynamic network variation decision-making technique and its system in face of unknown threat | |
CN107483486A (en) | Cyber-defence strategy choosing method based on random evolution betting model | |
Guo et al. | Adversarial policy learning in two-player competitive games | |
CN107070956A (en) | APT Attack Prediction methods based on dynamic bayesian game | |
CN110417733B (en) | Attack prediction method, device and system based on QBD attack and defense random evolution game model | |
CN109589607A (en) | A kind of game anti-cheating method and game anti-cheating system based on block chain | |
CN110099045A (en) | Network security threats method for early warning and device based on qualitative differential game and evolutionary Game | |
CN108696534A (en) | Real-time network security threat early warning analysis method and its device | |
Xenopoulos et al. | Graph neural networks to predict sports outcomes | |
Keegan et al. | Sic transit gloria mundi virtuali? Promise and peril in the computational social science of clandestine organizing | |
He et al. | Group password strength meter based on attention mechanism | |
Han et al. | Multiresolution tensor decomposition for multiple spatial passing networks | |
Yang et al. | Designing better strategies against human adversaries in network security games. | |
Moskal et al. | Simulating attack behaviors in enterprise networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |