CN110166428A - Intelligence defence decision-making technique and device based on intensified learning and attacking and defending game - Google Patents

Intelligence defence decision-making technique and device based on intensified learning and attacking and defending game Download PDF

Info

Publication number
CN110166428A
CN110166428A CN201910292304.2A CN201910292304A CN110166428A CN 110166428 A CN110166428 A CN 110166428A CN 201910292304 A CN201910292304 A CN 201910292304A CN 110166428 A CN110166428 A CN 110166428A
Authority
CN
China
Prior art keywords
attacking
defending
defence
intensified learning
defender
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910292304.2A
Other languages
Chinese (zh)
Other versions
CN110166428B (en
Inventor
胡浩
张玉臣
杨峻楠
谢鹏程
刘玉岭
马博文
冷强
张畅
陈周文
林野
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Engineering University of PLA Strategic Support Force
Original Assignee
Information Engineering University of PLA Strategic Support Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Engineering University of PLA Strategic Support Force filed Critical Information Engineering University of PLA Strategic Support Force
Priority to CN201910292304.2A priority Critical patent/CN110166428B/en
Publication of CN110166428A publication Critical patent/CN110166428A/en
Application granted granted Critical
Publication of CN110166428B publication Critical patent/CN110166428B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention belongs to technical field of network security, in particular to a kind of intelligence defence decision-making technique and device based on intensified learning and attacking and defending game, this method includes: constructing attacking and defending betting model under bounded rationality constraint, and it generates for extracting the attacking and defending figure that network state and attacking and defending act in betting model, the attacking and defending figure is set as centered on host, attacking and defending node of graph extracts network state, and attacking and defending movement is analyzed on attacking and defending figure side;Defender obtains the selection that defence income makes defender in face of making optimal defence policies when different attackers automatically when network state transition probability is unknown, through on-line study.The present invention is effectively compressed game state space, reduces storage and operation expense;Defender with attacker fight according to environmental feedback carry out intensified learning, face different attacks when can adaptively make optimal selection;Defender's pace of learning is promoted, defence income is improved, reduces and historical data is relied on, effectively promotes real-time and intelligence when defender's decision.

Description

Intelligence defence decision-making technique and device based on intensified learning and attacking and defending game
Technical field
The invention belongs to technical field of network security, in particular to a kind of intelligence based on intensified learning and attacking and defending game is anti- Imperial decision-making technique and device.
Background technique
In recent years, information security events are increased, bring huge loss, according to statistics, A Liyun to network security 2017 be only daily will by 1,600,000,000 times or so attacks, for different attackers, may each attacking and defending scene can only go out It is now primary, but for the defender for being represented with Ali Yun Wei, it daily will be in face of a large amount of identical attacking and defending scenes.Consider It is limited to network device hardware resource, how to comprehensively consider defence costs and benefits, to defend maximum revenue as target, makes to prevent Driver reaches a kind of balanced between risk and investment, carries out defender in a large amount of identical attacking and defending scenes to income On-line study and update, safety officer face under appropriate safety condition the predicament of " optimal policy is difficult to choose ".Game theory with Target antagonism, relationship Non-synergic possessed by network-combination yarn and tactful interdependence high fit.Currently based on game theory Defence decision-making technique, which can be divided into, to be assumed to assume two classes with bounded rationality based on rational: first is that complete based on attacking and defending participant The defence decision-making technique of rationality.Rational assume premise be each participant can reason selection optimal policy make oneself benefit Benefit maximizes, while can predict the policy selection of other participants.It is applied to wireless sensor safe field, is attacked by establishing Non-cooperative game model between person and sensor trusted node provides optimal attack strategies according to Nash Equilibrium, can be to worm The efficiency of virus attack and defence policies is analyzed.By establishing the repetition between intruding detection system and wireless sensor node Betting model can analyze the forwarding strategy of node packet.Second is that the defence decision-making technique based on attacking and defending participant's bounded rationality.Have Limit rationality means that attacking and defending both sides will not find at the very start optimal policy, can learn attacking and defending game in attacking and defending game, close Suitable study mechanism is won victory in game.Such method is unfolded mainly around evolutionary Game, and evolutionary Game is with group It completes to learn by imitating the dominating stragegy of other members using biological evolution mechanism for research object.Join in evolutionary Game Information exchange is excessive between people and mainly studies adjustment process, trend and the stability of attacking and defending collective strategy, no Conducive to the real-time policy selection for instructing individual member.Better study mechanism simulation ping-pong process how is taken, defence is improved and determines The accuracy and timeliness of plan become technical problem urgently to be resolved.
Summary of the invention
For this purpose, the present invention provides a kind of intelligence defence decision-making technique and device based on intensified learning and attacking and defending game, fit For real attacking and defending network environment, realizes the intelligent defence decision of on-line study ability, there is stronger practicability and can grasp The property made.
According to design scheme provided by the present invention, a kind of intelligence defence decision-making party based on intensified learning and attacking and defending game Method includes following content:
A attacking and defending betting model) is constructed under bounded rationality constraint, and generate for extract in betting model network state with The attacking and defending figure of attacking and defending movement, the attacking and defending figure are set as centered on host, and attacking and defending node of graph extracts network state, attacking and defending figure side point Analyse attacking and defending movement;
B it) is acted based on network state and attacking and defending, relies on attacking and defending betting model, intensified learning is carried out to attacking and defending gambling process, According to system feedback in attacking and defending both sides confrontation, so that defender is optimal anti-in face of making automatically when different attackers under bounded rationality Drive the selection of strategy.
Above-mentioned, A) in, six element group representations of attacking and defending betting model, i.e. AD-SGM=(N, S, D, R, Q, π), wherein N table Show that the player for participating in game, S indicate that Stochastic Game state set, D indicate defender's set of actions, R indicates defender immediately Return, Q indicate that defender's state-movement revenue function, π indicate defender's defence policies.
Above-mentioned, attacking and defending figure is indicated with binary group, i.e. G=(S, E), wherein S indicates that node security state set, E indicate Attack or defence movement cause the transfer of node state.
Preferably, generate attack graph when, first to target network scan obtain Network security factor, then with attack template In conjunction with attack instance is carried out, be on the defensive instantiation in conjunction with defence template, ultimately produces attacking and defending figure, wherein attacking and defending game The state set of model is extracted by attacking and defending node of graph, and defence set of actions is extracted by attacking and defending figure side.
Above-mentioned, B) in, in intensified learning, model intensified learning mechanism is exempted from using wolf hill climbing WoLF-PHC, is passed through Return is obtained with environmental interaction and ambient condition shifts knowledge, and knowledge utilization income indicates, sets defender's height policy learning Rate carries out intensified learning to adapt to attacker's strategy, by updating income, determines the optimal defence policies of defender.
Preferably, income is expressed as The strategy of intensified learning are as follows:Wherein, α is income learning rate;γ is discount factor, Rd(s,d, S' return immediately of the defender after state s executes defence movement d network transitions to state s') is indicated.
Further, the judgment criteria using Average Strategy as triumph and failure, formula indicate are as follows:
Further, exempt from model intensified learning mechanism, introduce state-movement locus money for tracking recent visit Current return is distributed to state-movement of recent visit, is updated using eligibility trace to income by lattice mark.
Further, in intensified learning, the eligibility trace for defining each state-movement is e (s, a), if current network state For s*, eligibility trace withMode is updated, and recent visit is distributed in current return State-movement, wherein γ is discount factor, and λ is track decay factor.
Further, a kind of intelligence defence decision making device based on intensified learning and attacking and defending game includes:
Attacking and defending figure generation module for constructing attacking and defending betting model under bounded rationality constraint, and generates rich for extracting The attacking and defending figure of network state and attacking and defending movement in model is played chess, which is set as centered on host, and attacking and defending node of graph extracts Attacking and defending movement is analyzed on network state, attacking and defending figure side;
Defence policies choose module, are acted based on network state and attacking and defending, in conjunction with attacking and defending betting model, to attacking and defending game Cheng Jinhang intensified learning, according to environmental feedback in attacking and defending both sides confrontation, so that defender faces different attackers under bounded rationality Shi Zidong makes the selection of optimal defence policies.
Beneficial effects of the present invention:
Attacking and defending graph model in the present invention centered on host is acted for network state and attacking and defending, is effectively compressed game shape State space;Defender uses intensified learning mechanism, with attacker fight in learn according to the feedback of environment so that limited Defender under rationality can make optimal selection when facing different attackers automatically;Eligibility trace is added in decision making device, Improve the pace of learning of defender, reduce the dependence to historical data, effectively promoted defender's decision when real-time and Intelligence.
Detailed description of the invention:
Fig. 1 is intelligently to defend decision process schematic diagram in embodiment;
Fig. 2 is that attacking and defending state shifts schematic diagram in embodiment;
Fig. 3 is intensified learning mechanism principle figure in embodiment;
Fig. 4 is Experimental Network structure in embodiment;
Fig. 5 is network vulnerability information schematic diagram in embodiment;
Fig. 6 is attack figure in embodiment;
Fig. 7 is to defend action diagram in embodiment;
Fig. 8 is to defend action description in embodiment;
Fig. 9 is experimental setup parameters figure in embodiment;
Figure 10 is that decision situation map is defendd in embodiment;
Figure 11 is that income situation map is defendd in embodiment.
Specific embodiment:
To make the object, technical solutions and advantages of the present invention clearer, understand, with reference to the accompanying drawing with technical solution pair The present invention is described in further detail.The technical term being related in embodiment is as follows:
Intensified learning is a kind of on-line study method of classics, and participant carries out independent study by the feedback of environment, Compared to biological evolution type mode of learning, pace of learning is fast, meets that change between attack and defend is fast, the strong feature of timeliness.The non-cooperation of game Property, target antagonism and the features such as tactful interdependence meet the essential characteristic of network-combination yarn.The embodiment of the present invention, referring to Fig. 1 It is shown, a kind of intelligence defence decision-making technique based on intensified learning and attacking and defending game is provided, includes:
Attacking and defending betting model is constructed under bounded rationality constraint, and is generated for extracting in betting model network state and attacking The attacking and defending figure of anti-movement, the attacking and defending figure are set as centered on host, and attacking and defending node of graph extracts network state, the analysis of attacking and defending figure side Attacking and defending movement;
It is acted based on network state and attacking and defending and intensified learning is carried out to attacking and defending betting model, foundation system in attacking and defending both sides confrontation System feedback, so that defender faces the selection for making optimal defence policies when different attackers automatically under bounded rationality.
Dynamic threats trace analysis based on attribute attack graph in attack path deduction, threatens transition probability, front and back pieces to push away Disconnected, resolution loop, in real time analysis, comprehensive multipath, privilege-escalation and access visit relationship etc. have a clear superiority.
Intensified learning mechanism is introduced into attacking and defending game, attacking and defending betting model is constructed under bounded rationality constraint, and raw At the attacking and defending figure centered on host, acted for extracting the network state in betting model and attacking and defending;Pass through intensified learning reality The defence decision that present line automates in real time.
Network-combination yarn betting model describes the randomness of network state transfer using probability value, due to current network state master It is related with previous network state, state transfer relationship is indicated using first order Markov, as shown in Fig. 2, transition probability For P (st,at,dt,st+1), wherein s is network state, and (a, d) is attacking and defending movement.Since network-combination yarn both sides have target pair Vertical property and Non-synergic, attacking and defending both sides can deliberately hide the key message of oneself, and transition probability is set as the unknown of attacking and defending both sides Information.On this basis, betting model is constructed.In another embodiment of the present invention, attacking and defending Stochastic Game Model (attack Defense stochastic game model, AD-SGM) it is indicated with a hexa-atomic group of AD-SGM=(N, S, D, R, Q, π), In, N=(attacker, defender) is two players for participating in game, respectively represent network attack person and defender;S =(s1,s2,…,sn) it is Stochastic Game state set, it is made of network state;D=(D1,D2,…,Dn) it is defender's behavior aggregate It closes, wherein Dk={ d1,d2,…dmIt is defender in game state SkSet of actions;Rd(si,d,sj) it is defender in state siIt executes defence and acts d network transitions to state sjReturn immediately afterwards;Qd(si, d) and it is to indicate in state siLower defender takes Expected revenus after acting d;πd(sk) it is defender in state skDefence policies.
Defence policies and defence the movement concept that be two different, defence policies are the set of defence movement.Defence policies Defender is defined in the form of probability vector selects anything to act in each network state, such as πd(sk)=(πd(sk,d1),…, πd(sk,dm)) it is defender in network state skStrategy, πd(sk,dm)) for its selection act dmProbability, wherein
By creating network-combination yarn figure G, from the Node extraction network state of attacking and defending figure G, the side analysis attacking and defending of attacking and defending figure G is dynamic Make, for extracting pursuit-evasion strategy.In another embodiment of the present invention, attacking and defending chart is shown as a binary group G=(S, E), wherein S ={ s1,s2,…,snIt is node security state set, si=< host, privilege >, wherein host is the unique of node Mark, privilege={ none, user, root } respectively indicate without any permission, have normal user permission, have Administrator right.E=(Ea,Ed) it is directed edge, indicate that attack or defence movement cause the transfer of node state, ek=(sr,v/d,sd), k=a, d, wherein srFor source node, sdFor purpose node.
Further, when attacking and defending map generalization, first to target network scan obtain Network security factor, then with attack Template, which combines, carries out attack instance, then the instantiation that is on the defensive in conjunction with defence template, ultimately produces attacking and defending figure.Attacking and defending is random The state set of betting model is extracted by attacking and defending node of graph, and defence set of actions is extracted by the side of attacking and defending figure.Specific steps can be set It is calculated as shown in algorithm 1:
1. attacking and defending figure generating algorithm of algorithm
Wherein, the 1) step be to generate all possible state nodes using Network security factor and initialize side;2) -11) Step is attack instance, generates all attack sides;The 12) -18) step is defence instantiation, generate all defence sides;19)- 23) step is all isolated nodes of removal;The 24) step be output attacking and defending figure.
In the embodiment of the present invention, intensified learning mechanism is introduced into attacking and defending game, describes the study and improvement of pursuit-evasion strategy Process.WoLF-PHC is that one kind typically exempts from model nitrification enhancement, and study mechanism is as shown in Figure 3.The present invention another In embodiment, in intensified learning Agent by with environment interact obtain return and ambient condition transfer knowledge, knowledge receipts Beneficial QdIt indicates, passes through and update QdTo be learnt.Its revenue function QdAre as follows:
In formula (1), α is income learning rate;γ is discount factor.The strategy of intensified learning are as follows:
Further, WoLF-PHC wolf hill climbing makes defender have two different plans by introducing WoLF mechanism Slightly learning rate uses low policy learning rate δ when winningw, high policy learning rate δ is used when failurel, as shown in formula (5).Two A learning rate enables defender to rapidly adapt to the strategy of attacker in performance difference than expected, and energy is careful when doing very well than expected Study, while ensure that convergence.Judgment criteria of the WoLF-PHC algorithm using Average Strategy as triumph and failure, As shown in formula (6) (7).
C (s)=C (s)+1 (7)
In order to improve the pace of learning of WoLF-PHC algorithm, algorithm is reduced to the degree of dependence of data volume, the present invention is another In a embodiment, eligibility trace is introduced in WoLF-PHC.Eligibility trace can track particular state-movement locus of recent visit, so Current return is distributed to state-movement of recent visit afterwards.Further, the eligibility trace for defining each state-movement is e (s, a), if current network state is s*, eligibility trace is updated in a manner of shown in formula (8), and wherein λ is track decay factor.
Defend decision-making technique to obtain better effects based on WoLF-PHC, it is reasonable to carry out to tetra- parameters of α, δ, λ and γ Setting.1) income learning rate α value range is 0 < α < 1, and more important, pace of learning is awarded in the accumulation of the bigger representative of α more rearward Also faster;The stability of the smaller algorithm of α is better.2) policy learning rate δ value range is 0 < δ < 1, is obtained, is taken according to experimentWhen can obtain better effects.3) eligibility trace decay factor λ value range is 0 < λ < 1, is responsible for state-movement Prestige is distributed, is considered as the scale of time, the λ the big, and it is bigger to distribute to historic state-movement prestige.4) discount factor γ value range is 0 < γ < 1, represents defender to the preference returned immediately with future returns.When γ is close to 0, indicate Future returns are unimportant, more value and return immediately;When γ is close to 1, representative is returned unimportant immediately, more values future Return.
Agent in WoLF-PHC, as shown in figure 3, the defender in corresponding attacking and defending Stochastic Game Model AD-SGM, Game state in the state corresponding A D-SGM of Agent, the defence movement in the behavior corresponding A D-SGM of Agent, Agent's is vertical Return the return immediately in corresponding A D-SGM, the defence policies in the tactful corresponding A D-SGM of Agent.On the basis of the above, Specific defence decision making algorithm may be designed as shown in algorithm 2:
Algorithm 2. defends decision making algorithm
1) initialization of the step to attacking and defending Stochastic Game Model AD-SGM and relevant parameter, wherein network state and attacking and defending are dynamic Work is extracted by algorithm 1, the 2) step defender detect current network state, the 3) -22) step is on the defensive decision and on-line study, Wherein 4) -5) step chooses defence movement according to current strategies, and the 6) -14) step using eligibility trace to income QdIt is updated, the 15) -21) step is according to new income QdDefence policies π is updated using hill-climbing algorithmd.The space complexity of algorithm is concentrated mainly on To Rd(s,d,s')、e(s,d)、πd(s,d)、And QdThe storage of (s, d), if | S | it is status number, | D | it is each state The measure number of defender, then space complexity be O (4 | S | | D |+| S |2·|D|).Algorithm do not need to game equilibrium into Row solves, and greatly reduces computation complexity compared with existing Stochastic Game Model, enhances the actual effect of algorithm.
Based on above-mentioned intelligence defence decision-making technique, the embodiment of the present invention also provides a kind of rich based on intensified learning and attacking and defending The intelligence defence decision making device played chess includes:
Attacking and defending figure generation module for constructing attacking and defending betting model under bounded rationality constraint, and generates rich for extracting The attacking and defending figure of network state and attacking and defending movement in model is played chess, which is set as centered on host, and attacking and defending node of graph extracts Attacking and defending movement is analyzed on network state, attacking and defending figure side;
Defence policies choose module, are acted based on network state and attacking and defending, in conjunction with attacking and defending betting model, to attacking and defending game Cheng Jinhang intensified learning, according to environmental feedback in attacking and defending both sides confrontation, so that defender faces different attackers under bounded rationality Shi Zidong makes the selection of optimal defence policies.
Target network is carried out using the above-mentioned intelligence defence decision-making technique based on intensified learning and attacking and defending game and defends plan Intelligence slightly is chosen.
For the validity for further verifying technical solution in the embodiment of the present invention, by building typical case as shown in Fig. 4 Enterprise network is tested.Attacking and defending event occurs in Intranet, and attacker comes from outer net.Network administrator is responsible for as defender The safety of Intranet.Due to the setting of firewall 1 and firewall 2, outer net normal users can only access Web server, and Web service The accessible database server of device, ftp server and e-mail server.Experimental Network is carried out using Nessus tool Scanning, Experimental Network vulnerability information are as shown in Fig. 5.
With reference to MIT Lincoln laboratory attacking and defending behavior database building attack, defence template, using A identified attacks person host, W identifies Web server, D identification database server, F mark ftp server, E and identifies e-mail server, utilizes attacking and defending Figure generating means construct network-combination yarn figure, for convenient for showing and describsion, attacking and defending figure is divided into attack graph and defence figure, respectively as attached Shown in Fig. 6 and attached drawing 7.Defend defence action description in figure as shown in Fig. 8.Construct the attacking and defending betting model of experiment scene:
1. N=(attacker, defender) is the player for participating in game, respectively represent network attack person and defence Person;
2. Stochastic Game state set S=(s0,s1,s2,s3,s4,s5,s6), Stochastic Game state is made of network state, By the Node extraction in Fig. 5 and Fig. 6;
3. defender's set of actions are as follows: D=(D0,D1,D2,D3,D4,D5,D6), wherein D0={ NULL } D1={ d1,d2}D2 ={ d3,d4}D3={ d1,d5,d6}D4={ d1,d5,d6}D5={ d1,d2,d7}D6={ d3,d4, it is extracted by the side of Fig. 6;
4. defender returns R immediatelyd(si,d,sj) quantized result are as follows:
(Rd(s0,NULL,s0),Rd(s0,NULL,s1), Rd(s0,NULL,s2))=(0, -40, -59)
(Rd(s1,d1,s0),Rd(s1,d1,s1),Rd(s1,d1,s2);Rd(s1,d2,s0),Rd(s1,d2,s1),Rd(s1,d2, s2))=(40,0, -29;5,-15,-32)
(Rd(s2,d3,s0),Rd(s2,d3,s1),Rd(s2,d3,s2),Rd(s2,d3,s3),Rd(s2,d3,s4),Rd(s2,d3, s5);Rd(s2,d4,s0),Rd(s2,d4,s1),Rd(s2,d4,s2),Rd(s2,d4,s3),Rd(s2,d4,s4),Rd(s2,d4,s5))= (24,9,-15,-55,-49,-65;19,5,-21,-61,-72,-68)
(Rd(s3,d1,s2),Rd(s3,d1,s3),Rd(s3,d1,s6);Rd(s3,d5,s2),Rd(s3,d5,s3),Rd(s3,d5, s6);Rd(s3,d6,s2),Rd(s3,d6,s3),Rd(s3,d6,s6))=(21, -16, -72;15,-23,-81;-21,-36,-81)
(Rd(s4,d1,s2),Rd(s4,d1,s4),Rd(s4,d1,s6);Rd(s4,d5,s2),Rd(s4,d5,s4),Rd(s4,d5, s6);Rd(s4,d6,s2),Rd(s4,d6,s4),Rd(s4,d6,s6))=(26,0, -62;11,-23,-75;9,-25,-87)
(Rd(s5,d1,s2),Rd(s5,d1,s5),Rd(s5,d1,s6);Rd(s5,d2,s2),Rd(s5,d2,s5),Rd(s5,d2, s6);Rd(s5,d7,s2),Rd(s5,d7,s5),Rd(s5,d7,s6))=(29,0, -63;11,-21,-76;2,-27,-88)
(Rd(s6,d3,s3),Rd(s6,d3,s4),Rd(s6,d3,s5),Rd(s6,d3,s6);Rd(s6,d4,s3),Rd(s6,d4, s4),Rd(s6,d4,s5),Rd(s6,d4,s6))=(- 23, -21, -19, -42;-28,-31,-24,-49)
5. for the learning performance of more fully detection algorithm, the state action income Q of defenderd(si, d) initialization when 0 is uniformly set, does not introduce additional priori knowledge.
6. the defence policies π of defenderdAverage Strategy is taken to be initialized, i.e. πd(sk,d1)=πd(sk,d2)=... πd (sk,dm)) andAdditional priori knowledge is not introduced.
Influence of the different parameters setting to algorithm is tested, with state s in Fig. 6 and Fig. 72For, attacker is initial in experiment Strategy be randomized policy, analyzing different parameter values will affect the speed and effect of study, to different parameter settings do into One pacing examination, tests six kinds of different parameter settings, specific parameter setting is as shown in Fig. 9.
Defender is in state s2D is acted to defence3And d4Select probability the results are shown in Figure 10.It can be seen from Figure 10 Survey lower algorithm is arranged in different parameters pace of learning and convergence.Show that the pace of learning of setting 1,3,6 is very fast in Figure 10, three kinds Lower algorithm is set, optimal strategy can be obtained by the study within 1500 times, but 3 and 6 convergence is poor.Although 3 Hes are arranged Setting 6 can learn to arrive optimal strategy, but will appear concussion later, be not provided with 1 stability it is good.
Defence income can represent algorithm to the degree of optimization of strategy, in order to ensure financial value is not only to react primary defence As a result, taking the average value of 1000 defence incomes, every 1000 average yields variation is as shown in figure 11.It can be with from Figure 11 See that the income of setting 3 is significantly lower than other settings, but the superiority and inferiority of other settings is difficult to differentiate between.Therefore, it is arranged 1 in six groups of parameters It is best suited for this scene.
Eligibility trace bring computing overhead is tested, algorithm carries out 10 Wan Cifang when having counted 20 times respectively with and without eligibility trace The time of imperial decision, 20 average value are as follows: qualified mark 9.51s, disqualification mark 3.74s.Although the introducing meeting of eligibility trace So that the decision-making time increases nearly 2.5 times, but introduce after eligibility trace and still there was only 9.51s the time required to 100,000 decisions, still It can satisfy the demand of real-time.
By testing above, further demonstrates the present invention and construct attacking and defending Stochastic Game Model simultaneously under bounded rationality constraint The network-combination yarn figure extracted for network state and pursuit-evasion strategy is generated, game state space has been effectively compressed;Defender passes through Study can obtain the optimal defence policies for current attack, improve the rapid automatized defence capability to unknown attack, With stronger practicability and operability.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part It is bright.
The unit and method and step of each example described in conjunction with the examples disclosed in this document, can with electronic hardware, The combination of computer software or the two is realized, in order to clearly illustrate the interchangeability of hardware and software, in above description In generally describe each exemplary composition and step according to function.These functions are held with hardware or software mode Row, specific application and design constraint depending on technical solution.Those of ordinary skill in the art can be to each specific Using using different methods to achieve the described function, but this realization be not considered as it is beyond the scope of this invention.
Those of ordinary skill in the art will appreciate that all or part of the steps in the above method can be instructed by program Related hardware is completed, and described program can store in computer readable storage medium, such as: read-only memory, disk or CD Deng.Optionally, one or more integrated circuits also can be used to realize, accordingly in all or part of the steps of above-described embodiment Ground, each module/unit in above-described embodiment can take the form of hardware realization, can also use the shape of software function module Formula is realized.The present invention is not limited to the combinations of the hardware and software of any particular form.
The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the application.Therefore, the application It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims (10)

1. a kind of intelligence defence decision-making technique based on intensified learning and attacking and defending game, which is characterized in that include following content:
A attacking and defending betting model) is constructed under bounded rationality constraint, and is generated for extracting network state and attacking and defending in betting model The attacking and defending figure of movement, the attacking and defending figure are set as centered on host, and attacking and defending node of graph extracts network state, and the analysis of attacking and defending figure side is attacked Anti- movement;
B it) is acted based on network state and attacking and defending, in conjunction with attacking and defending betting model, intensified learning, attacking and defending is carried out to attacking and defending gambling process According to environmental feedback in both sides' confrontation, so that defender faces to make optimal defence plan automatically when different attackers under bounded rationality Selection slightly.
2. the intelligence defence decision-making technique according to claim 1 based on intensified learning and attacking and defending game, which is characterized in that A in), six element group representations of attacking and defending betting model, i.e. AD-SGM=(N, S, D, R, Q, π), wherein N indicates to participate in the office of game Middle people, S indicate that Stochastic Game state set, D indicate defender's set of actions, and R indicates that defender returns immediately, and Q indicates defence Person's state-movement revenue function, π indicate defender's defence policies.
3. the intelligence defence decision-making technique according to claim 1 based on intensified learning and attacking and defending game, which is characterized in that Attacking and defending figure indicates with binary group, i.e. G=(S, E), wherein S indicates network node safe condition set, E indicate attack or Defence acts the transfer for causing node state.
4. the intelligence defence decision-making technique according to claim 3 based on intensified learning and attacking and defending game, which is characterized in that When generating attack graph, target network is scanned obtain Network security factor first, it is real then to carry out attack in conjunction with attack template Exampleization, be on the defensive instantiation in conjunction with defence template, ultimately produces attacking and defending figure, wherein the state set of attacking and defending betting model It is extracted by attacking and defending node of graph, defence set of actions is extracted by attacking and defending figure side.
5. the intelligence defence decision-making technique according to claim 1 based on intensified learning and attacking and defending game, which is characterized in that B in), in intensified learning, model intensified learning mechanism is exempted from using wolf hill climbing WoLF-PHC, by obtaining back with environmental interaction Report and ambient condition shift knowledge, and knowledge utilization income indicates, set defender's height policy learning rate to adapt to different attacks Person's strategy, income renewal process utilize intensified learning mechanism, determine the optimal defence policies of defender.
6. the intelligence defence decision-making technique according to claim 5 based on intensified learning and attacking and defending game, which is characterized in that Income is expressed asThe strategy of intensified learning Are as follows:Wherein, α is income learning rate;γ is discount factor, Rd(s, d, s') indicates defender Return immediately after state s executes defence movement d network transitions to state s'.
7. the intelligence defence decision-making technique according to claim 6 based on intensified learning and attacking and defending game, which is characterized in that Judgment criteria using Average Strategy as triumph and failure, formula indicate are as follows:
8. the intelligence defence decision-making technique according to claim 6 based on intensified learning and attacking and defending game, which is characterized in that Exempt from model intensified learning mechanism, introduce state-movement locus eligibility trace for tracking recent visit, will currently return point State-movement of dispensing recent visit is updated income using eligibility trace.
9. the intelligence defence decision-making technique according to claim 8 based on intensified learning and attacking and defending game, which is characterized in that In intensified learning, define each state-movement eligibility trace be e (s, a), if current network state be s*, eligibility trace withMode is updated, and current return is distributed to state-movement of recent visit, In, γ is discount factor, and λ is track decay factor.
10. a kind of intelligence defence decision making device based on intensified learning and attacking and defending game, characterized by comprising:
Attacking and defending figure generation module for constructing attacking and defending betting model under bounded rationality constraint, and is generated for extracting game mould The attacking and defending figure of network state and attacking and defending movement, the attacking and defending figure are set as centered on host in type, and attacking and defending node of graph extracts network Attacking and defending movement is analyzed on state, attacking and defending figure side;
Defence policies choose module, are acted based on network state and attacking and defending, in conjunction with attacking and defending betting model, to attacking and defending gambling process into Row intensified learning, according to environmental feedback in attacking and defending both sides confrontation so that when defender faces different attackers under bounded rationality from The dynamic selection for making optimal defence policies.
CN201910292304.2A 2019-04-12 2019-04-12 Intelligent defense decision-making method and device based on reinforcement learning and attack and defense game Active CN110166428B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910292304.2A CN110166428B (en) 2019-04-12 2019-04-12 Intelligent defense decision-making method and device based on reinforcement learning and attack and defense game

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910292304.2A CN110166428B (en) 2019-04-12 2019-04-12 Intelligent defense decision-making method and device based on reinforcement learning and attack and defense game

Publications (2)

Publication Number Publication Date
CN110166428A true CN110166428A (en) 2019-08-23
CN110166428B CN110166428B (en) 2021-05-07

Family

ID=67639176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910292304.2A Active CN110166428B (en) 2019-04-12 2019-04-12 Intelligent defense decision-making method and device based on reinforcement learning and attack and defense game

Country Status (1)

Country Link
CN (1) CN110166428B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110659492A (en) * 2019-09-24 2020-01-07 北京信息科技大学 Multi-agent reinforcement learning-based malicious software detection method and device
CN111988415A (en) * 2020-08-26 2020-11-24 绍兴文理学院 Mobile sensing equipment calculation task safety unloading method based on fuzzy game
CN112221160A (en) * 2020-10-22 2021-01-15 厦门渊亭信息科技有限公司 Role distribution system based on random game
CN113132398A (en) * 2021-04-23 2021-07-16 中国石油大学(华东) Array honeypot system defense strategy prediction method based on Q learning
CN113810406A (en) * 2021-09-15 2021-12-17 浙江工业大学 Network space security defense method based on dynamic defense graph and reinforcement learning
CN114844668A (en) * 2022-03-17 2022-08-02 清华大学 Defense resource configuration method, device, equipment and readable medium
CN115296850A (en) * 2022-07-08 2022-11-04 中电信数智科技有限公司 Network attack and defense exercise distributed learning method based on artificial intelligence
CN115348064A (en) * 2022-07-28 2022-11-15 南京邮电大学 Power distribution network defense strategy design method based on dynamic game under network attack
CN116032653A (en) * 2023-02-03 2023-04-28 中国海洋大学 Method, device, equipment and storage medium for constructing network security game strategy
CN116708042A (en) * 2023-08-08 2023-09-05 中国科学技术大学 Strategy space exploration method for network defense game decision

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014100738A1 (en) * 2012-12-21 2014-06-26 InsideSales.com, Inc. Instance weighted learning machine learning model
CN104994569A (en) * 2015-06-25 2015-10-21 厦门大学 Multi-user reinforcement learning-based cognitive wireless network anti-hostile interference method
CN107135224A (en) * 2017-05-12 2017-09-05 中国人民解放军信息工程大学 Cyber-defence strategy choosing method and its device based on Markov evolutionary Games
CN108512837A (en) * 2018-03-16 2018-09-07 西安电子科技大学 A kind of method and system of the networks security situation assessment based on attacking and defending evolutionary Game
CN108809979A (en) * 2018-06-11 2018-11-13 中国人民解放军战略支援部队信息工程大学 Automatic intrusion response decision-making technique based on Q-learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014100738A1 (en) * 2012-12-21 2014-06-26 InsideSales.com, Inc. Instance weighted learning machine learning model
CN104994569A (en) * 2015-06-25 2015-10-21 厦门大学 Multi-user reinforcement learning-based cognitive wireless network anti-hostile interference method
CN107135224A (en) * 2017-05-12 2017-09-05 中国人民解放军信息工程大学 Cyber-defence strategy choosing method and its device based on Markov evolutionary Games
CN108512837A (en) * 2018-03-16 2018-09-07 西安电子科技大学 A kind of method and system of the networks security situation assessment based on attacking and defending evolutionary Game
CN108809979A (en) * 2018-06-11 2018-11-13 中国人民解放军战略支援部队信息工程大学 Automatic intrusion response decision-making technique based on Q-learning

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110659492A (en) * 2019-09-24 2020-01-07 北京信息科技大学 Multi-agent reinforcement learning-based malicious software detection method and device
CN110659492B (en) * 2019-09-24 2021-10-15 北京信息科技大学 Multi-agent reinforcement learning-based malicious software detection method and device
CN111988415A (en) * 2020-08-26 2020-11-24 绍兴文理学院 Mobile sensing equipment calculation task safety unloading method based on fuzzy game
CN111988415B (en) * 2020-08-26 2021-04-02 绍兴文理学院 Mobile sensing equipment calculation task safety unloading method based on fuzzy game
CN112221160A (en) * 2020-10-22 2021-01-15 厦门渊亭信息科技有限公司 Role distribution system based on random game
CN113132398A (en) * 2021-04-23 2021-07-16 中国石油大学(华东) Array honeypot system defense strategy prediction method based on Q learning
CN113810406A (en) * 2021-09-15 2021-12-17 浙江工业大学 Network space security defense method based on dynamic defense graph and reinforcement learning
CN114844668A (en) * 2022-03-17 2022-08-02 清华大学 Defense resource configuration method, device, equipment and readable medium
CN115296850A (en) * 2022-07-08 2022-11-04 中电信数智科技有限公司 Network attack and defense exercise distributed learning method based on artificial intelligence
CN115348064A (en) * 2022-07-28 2022-11-15 南京邮电大学 Power distribution network defense strategy design method based on dynamic game under network attack
CN115348064B (en) * 2022-07-28 2023-09-26 南京邮电大学 Dynamic game-based power distribution network defense strategy design method under network attack
CN116032653A (en) * 2023-02-03 2023-04-28 中国海洋大学 Method, device, equipment and storage medium for constructing network security game strategy
CN116708042A (en) * 2023-08-08 2023-09-05 中国科学技术大学 Strategy space exploration method for network defense game decision
CN116708042B (en) * 2023-08-08 2023-11-17 中国科学技术大学 Strategy space exploration method for network defense game decision

Also Published As

Publication number Publication date
CN110166428B (en) 2021-05-07

Similar Documents

Publication Publication Date Title
CN110166428A (en) Intelligence defence decision-making technique and device based on intensified learning and attacking and defending game
CN111966698B (en) Block chain-based trusted federation learning method, system, device and medium
Zhang et al. Gan enhanced membership inference: A passive local attack in federated learning
CN108833401A (en) Network active defensive strategy choosing method and device based on Bayes&#39;s evolutionary Game
CN108833402A (en) A kind of optimal defence policies choosing method of network based on game of bounded rationality theory and device
CN107566387B (en) Network defense action decision method based on attack and defense evolution game analysis
CN110191083A (en) Safety defense method, device and the electronic equipment threatened towards advanced duration
CN107135224A (en) Cyber-defence strategy choosing method and its device based on Markov evolutionary Games
CN110300106A (en) Mobile target based on Markov time game defends decision choosing method, apparatus and system
CN110035066B (en) Attack and defense behavior quantitative evaluation method and system based on game theory
CN110460572A (en) Mobile target defence policies choosing method and equipment based on Markov signaling games
CN109327427A (en) A kind of dynamic network variation decision-making technique and its system in face of unknown threat
CN107483486A (en) Cyber-defence strategy choosing method based on random evolution betting model
Guo et al. Adversarial policy learning in two-player competitive games
CN107070956A (en) APT Attack Prediction methods based on dynamic bayesian game
CN110417733B (en) Attack prediction method, device and system based on QBD attack and defense random evolution game model
CN109589607A (en) A kind of game anti-cheating method and game anti-cheating system based on block chain
CN110099045A (en) Network security threats method for early warning and device based on qualitative differential game and evolutionary Game
CN108696534A (en) Real-time network security threat early warning analysis method and its device
Xenopoulos et al. Graph neural networks to predict sports outcomes
Keegan et al. Sic transit gloria mundi virtuali? Promise and peril in the computational social science of clandestine organizing
He et al. Group password strength meter based on attention mechanism
Han et al. Multiresolution tensor decomposition for multiple spatial passing networks
Yang et al. Designing better strategies against human adversaries in network security games.
Moskal et al. Simulating attack behaviors in enterprise networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant