CN110049497A

CN110049497A - A kind of user oriented intelligent attack defense method in mobile mist calculating

Info

Publication number: CN110049497A
Application number: CN201910287756.1A
Authority: CN
Inventors: 涂山山; 孟远; 于金亮
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2019-04-11
Filing date: 2019-04-11
Publication date: 2019-07-23
Anticipated expiration: 2039-04-11
Also published as: CN110049497B

Abstract

The user oriented intelligent attack defense method of one kind not only relates to computer network and wireless communication field in mobile mist calculating, but also belongs to cyberspace security fields.Theoretical (Prospect Theory, PT) and DQL (Double Q-learning, the DQL) algorithm of Utilization prospects of the present invention realizes that mobile mist calculates the intelligent attack defending of subjectivity of customer-centric in environment.Malicious user initiates the intelligence attack of different mode using open wireless network access platform when mist node is communicated with legal terminal user, such as spoof attack, interference attack, eavesdropping attack etc., influence the secure communication that mist calculates mist node and mobile subscriber in network.Intelligence attack is defendd based on PT and DQL algorithm, improve the excessive estimation problem of Q value in Q-learning algorithm, generate the optimal defence policies of legitimate user, both the detection effectiveness of legitimate user under dynamic environment can have been increased, it can reduce the subjective attack probability of intelligent attacker again, while enhancing the security protection ability that mobile mist calculates network.

Description

A kind of user oriented intelligent attack defense method in mobile mist calculating

Technical field

Utilization prospects of the present invention theoretical (Prospect Theory, PT) and DQL (Double Q-learning, DQL) are calculated Method realizes that mobile mist calculates the intelligent attack defending of subjectivity of customer-centric in environment.Malicious user is wireless using opening The intelligence that network insertion platform initiates different mode when mist node is communicated with legal terminal user is attacked, such as spoof attack, Interference attack, eavesdropping attack etc., influence the secure communication that mist calculates mist node and mobile subscriber in network.It is calculated based on PT and DQL Method defence intelligence attack, improves the excessive estimation problem of Q value in Q-learning algorithm, generates the optimal anti-of legitimate user Imperial strategy, can not only increase the detection effectiveness of legitimate user under dynamic environment, but also can reduce the subjective attack of intelligent attacker Probability, while enhancing the security protection ability that mobile mist calculates network.Intelligence is resisted using nitrification enhancement and game theory Attack, not only relates to computer network and wireless communication field, but also belong to cyberspace security fields.

Background technique

Due to the continuous popularization of technology of Internet of things and mobile intelligent terminal, a large amount of interactive datas of generation are needed in wireless network It is handled in real time in network environment, and traditional cloud computing cannot effectively meet the network demands such as its isomery, low time delay.Mist calculates will Cloud computing extends to network edge, can use the direct transmission link of equipment to improve throughput of system, solves cloud computing The problems such as poor mobility, weak, time delay is high geography information perception.It is calculated in network in mobile mist, mist counting system structure can be divided into Cloud-mist-device framework and mist-device framework two major classes, comprising the equipment close to Internet of Things edge in mist layer, they are referred to as mist Node.Under the support of wireless network, mobile mist calculates mist node and terminal device in network and is able to carry out data interaction.

However, since wireless network is easy by security threat still there are many data and lead in mist layer and user's interlayer Believe safety problem.In the case where mobile mist calculates environment, illegal terminal user can start intelligent attack to other legitimate users, pass through Radio channel status information and defence policies information are obtained, and selects suitable attack mode to destroy mist layer and user's interlayer Wireless network secure.Common attack mode includes spoof attack, interference attack, eavesdropping attack etc., starts the end intelligently attacked End subscriber (i.e. intelligent attacker) subjectively calculates network to mist using the above attack means and causes huge threat.In face of intelligence It can attack, game theory is its threat of processing, guarantees that mist calculates the strong tools of network security.It calculates in network security, grinds in mist The person of studying carefully thinks that participating in the participant of game is rationality, they use expected utility theory (Expected Utility Theory, EUT) effectiveness of participant is calculated, participant selects each walking dynamic for the purpose of obtaining greatest hope effectiveness.But It is that in dynamic wireless network, each participant does not know about the accuracy rate of whole network state and reception information, When the strategy intelligently attacked is resisted in selection, their decision has strong subjectivity, not consistent with the result of EUT.And PT It is the theory for describing people and taking different risk partiality decisions when in face of gain and loss, which thinks that people are when facing acquisition Avoid risk, face lose when preference risk, it using subjective probability calculate participate in game person effectiveness, be able to reflect decision The subjectivity of person.

Therefore, the present invention is based on prospect theory, propose a kind of mobile mist calculate in the intelligent attack defending based on DQL algorithm Method.This method has derived static state by constructing the subjective zero-sum game model of static state between intelligent attacker and legitimate user The Nash Equilibrium of subjective zero-sum game, while by DQL algorithm, it proposes to inhibit intelligent attacker's subjectivity attack for dynamic environment The method of motivation generates the optimal defence policies of legitimate user, and legitimate user is made subjectively to judge whether that physical layer is only used only Safe practice resists intelligent attack.This method can be such that the detection effectiveness of legitimate user increases, and attack rate is promoted to be effectively reduced, with It resists the method intelligently attacked based on Q-learning algorithm, Sarsa algorithm, Greedy strategy and compares, this method is in mobile mist meter It calculates in environment and has higher security protection performance.

Summary of the invention

Present invention obtains the intelligent attack defense methods based on DQL algorithm in a kind of mobile mist calculating, devise movement The security model of intelligence attacker involved in mist calculating, and intelligent attacker and legitimate user's progress are constructed based on prospect theory The static method of subjective zero-sum game and dynamic subjective game method based on DQL algorithm.By this method defensive attack, so that The defence policies of legitimate user are optimal, and improve the detection effectiveness of legitimate user, reduce attack rate, while enhancing shifting Dynamic mist calculates internet security and protective performance.

Present invention employs the following technical solution and realize step:

1. the intelligence attack security model in mobile mist calculating

Security model of the invention considers the communication of mist layer and user's interlayer, such as Fig. 1 institute towards mist node and terminal user Show, mobile mist calculates the intelligent attacker of any one in network as the terminal user with subjectivity, is likely to other Legal terminal user initiates intelligence attack, and the value set expression of intelligent attacker is? Their attack mode of moment t is represented asMoment value set expression isSeparately Outside, the value set expression of legitimate user is,In moment t,Their defence mode is represented asAssuming that at a time t, intelligent attacker 1 utilizes Intelligent programmable wireless device, takesMode is in the legitimate user under same mist node with it to some and initiates intelligence It can attack, whenWhen, indicate that the attacker halts attacks；WhenWhen, indicate the attacker by sending interference letter Number attack legitimate user, reduce legitimate user from mist node receive signal SINR；WhenWhen, indicate that the attacker takes Attack mode is eavesdropped, the information propagated between mist node and legitimate user is intercepted and captured；WhenWhen, it is false to indicate that the attacker uses Media access control address (Media Access Control Address, MAC-A) pretend to be mist node to legitimate user send out Data are sent, i.e. the attacker takes spoof attack mode；WhenWhen, indicate that the attacker takes Replay Attack mode, The data packet for sending legitimate user's received mistake achievees the purpose that cheat legitimate user.Legitimate user under attack When facing different types of attack, there are two types of the modes of defence: whenWhen, legitimate user is used only PLS defence intelligence and attacks It hits, this defence mode is referred to as basic schema；WhenWhen, legitimate user will spend more overheads, make first Preliminary detection, filtering and anti-eavesdrop are carried out with the PLS technology based on channel parameter, then detects by HLSM and is tested by physical layer The data of card.

2. a kind of intelligent attack defense method of subjectivity based on DQL algorithm

Method includes the following steps:

(1) the static zero-sum game of subjectivity between intelligent attacker and legitimate user is established based on PT.Wherein, attack mode table It is shown as SA_m, the quantity of attack mode is expressed as Num, Num >=1, and defence mode is expressed as EU_n.Based on PT, intelligent attacker and conjunction Method user takes subjective decision to carry out game, realizes Nash Equilibrium.The present invention is calculated subjective using Prelec probability right function Probability, its calculation formula is:

Wherein p be objective probability, p ∈ (0,1], σ_objectIndicate objective probability weight, σ_object∈(0,1].Object table Show the object for participating in game, herein, object=attac or object=user.The description of Prelec probability right function The game object of subjective game is participated in because the result of adjustment is given to the objective probability of decision in the influence of weight.By PT's It inspires, when facing high-probability event, subjective decision person can underestimate corresponding objective probability；On the contrary, when facing low probability event When, subjective decision person can over-evaluate corresponding objective probability.In zero-sum game, legitimate user is in defence mode EU_nLower detection intelligence Attack SA_mThe income of acquisition is expressed asIf intelligent attack is not detected, the security loss being subjected to is expressed as All there is rate of false alarm and omission factor under any defence mode, rate of false alarm refers to that the valid data that legitimate node is sent is detected For the probability of illegal data, omission factor indicates that illegal data are detected as the probability of valid data, this is that safety occur The reason of loss.Therefore, both comprehensive ratios, legitimate user is in defence mode EU_nSA is intelligently attacked in lower detection_mError rate It is expressed asAccording to system model, the attack mode of intelligent attacker shares 5 kinds, and the defence mode of legitimate user has 2 kinds, The value of utility of intelligent attacker and legitimate user are shown below:

Wherein, U_user(SA_m,EU_n) indicate legitimate user value of utility, It is non-zero etc. to be quantified as C Grade,It isProbability, and obeyProbability distribution, whereinIt is all Quantify the sum of probability

According to formula (2), when game both sides, which are based on EUT, calculates effectiveness, calculation formula are as follows:

Wherein,Indicate the value of utility that legitimate user is calculated based on EUT.When game both sides are counted using PT When calculating effectiveness, they are made a policy based on subjective probability, and there is no calculated based on objective average detected error rate.Cause This, according to formula (1) and (2), the calculation formula of both sides' effectiveness is respectively as follows:

During subjective game, game both sides change subjective probability by adjusting objective weight, pursue respective effectiveness Maximum value, reach Nash Equilibrium.When intelligent attacker think by he current time initiate intelligence attack can be legal User detected, then he can select to halt attacks.When think can using the Security mechanism of higher by legitimate user When obtaining more multi-purpose, he can enable EU_n=1.The strategy combination of Nash Equilibrium is represented as hereinThe plan Slightly combination is the combination for making game both sides obtain maximum utility, it should meet following condition:

Therefore, according to formula (4), (5), (6) and (7), this step, which summarizes, works as SA_mWhen=0,1,2,3, subjective static state zero With the Nash Equilibrium condition in game about spoof attack, they, which respectively illustrate intelligent attacker, takes and halts attacks or pretend When attacking both modes, the reason of legitimate user takes two kinds of defence pattern formation Nash Equilibrium states.

1. Nash Equilibrium strategy combination is (0,1) when the condition that meets (8), (9).

2. Nash Equilibrium strategy combination is (0,2) when the condition that meets (10), (11).

3. Nash Equilibrium strategy combination is (3,1) when the condition that meets (12), (13).

4. Nash Equilibrium strategy combination is (3,2) when the condition that meets (14), (15).

(2) dynamic subjective game method is constructed, the optimal defence policies intelligently attacked are resisted based on the acquisition of DQL algorithm.It dislikes Meaning user and legitimate user carry out static subjective game, using effectiveness as the standard of measurement decision-making results, however are dynamically moving Dynamic mist calculates in network, participates in understanding of both sides' shortage to overall network environment of game, and legitimate user not can determine that attack inspection The error rate of survey, therefore they are continually interacted, and enable legitimate user that suitable defence mode to be selected to resist malicious user hair The intelligence attack risen, increases the effectiveness of legitimate user, reduces attack rate.In intensified learning method, Q-learning algorithm is A method of obtaining optimal policy in the insufficient dynamic environment of information, it derives from Studying theory on behaviorism, by Q The superiority and inferiority that value evaluation object takes some to act in a particular state, wherein including two important parameters: learning efficiency and prize The weak coefficient of encouraging property.Learning efficiency is bigger, and the effect of training is fewer before retaining；Incentive weak coefficient is bigger, then more Ground considers income at a specified future date.When calculating income at a specified future date, Q-learning has used max function, is easy excessively to estimate Q value, Therefore, this step realizes the dynamic subjective game of malicious user and legitimate user using DQL algorithm, obtains the optimal of legitimate user Defence policies, wherein attack mode is expressed asDefence mode is expressed as

DQL algorithm alternately updates the income that respective action is executed under each state using two Q value tables.Herein, will The attack mode of intelligent attacker's selection is expressed as state in a certain moment previous time slot, will select in moment t legitimate user Defence mode be expressed as acting.The calculation formula for updating two Q value tables is as follows:

Wherein, s^tIndicating the system mode in moment t, μ is incentive decay coefficient, and δ is learning efficiency, μ ∈ (0,1], δ ∈ [0,1],Indicate that the legitimate user under moment t state of Q value table 1 takes defence modeFinancial value,Indicate that legitimate user takes defence mode under moment t stateEffectiveness immediately.WithRespectively Q₁、Q₂State s in table^t+1Under make the maximum defence mode of Q value, their calculation formula is as follows:

V(s^t) indicate to correspond to Q under each defence mode in current state₁+Q₂Mean-max, Indicate that the legitimate user under moment t+1 state of Q value table 1 takes defence modeFinancial value, therefore optimal defence policies λ^* It is given by:

In each state, legitimate user selects defence mode and follows ε-greedy strategy when updating Q value table, it is provided Mode is defendd with the probability selection suboptimum of ε, V (s is met with the probability selection of 1- ε^t) defence mode, wherein (0,1) ε ∈.

According to above-mentioned formula, the DQL algorithm steps for obtaining optimal defence policies are summarized as follows:

1. initializing, calculate

2. t=1,2,3...,

3. defending mode using ε-greedy policy selection

4. it was found that NextState

5. calculating and obtaining

6. being updated with 0.5 probability by formula (16), (18)Otherwise more by formula (17), (19) Newly

7. updating V (s by formula (20)^t)。

It 2. continues to execute 8. returning until reaching system end-state, according to Q₁、Q₂Table and formula (21) obtain optimal defence Policy lambda^*。

Creativeness of the invention is mainly reflected in:

(1) it is easy to be sent out by malicious user when the present invention is interacted for mist node in mobile mist calculating with terminal user The problem of dynamic intelligence is attacked, the description by prospect theory in game theory to game participant's subjectivity construct and intelligence The static state that attack mode and the relevant mobile mist of defence mode calculate between security model and intelligent attacker and legitimate user is main See zero-sum game；The mobility that terminal under environment user is calculated according to mobile mist, is based on the available dynamic of nitrification enhancement The optimal policy of INFORMATION OF INCOMPLETE under environment devises in a mobile mist calculating and obtains optimal defence policies based on DQL algorithm Scheme, enhance the safety communicated between mist node and terminal user.

(2) present invention demonstrates the subjective attack motivation that lower objective probability weight is able to suppress intelligent attacker, and And 4 indexs: effectiveness, attack rate, the maximum Q value, average motion value of legitimate user are provided with, by the method for proposition and it is based on Q- Learning algorithm, Sarsa algorithm, Greedy strategy are resisted the method intelligently attacked and are compared.Method energy proposed by the present invention Enough decision processes for optimizing optimal defence policies by adjusting Q value, improve the effectiveness of legitimate user, attack rate are promoted to reduce, There is good security protection performance in mobile mist environment.

Detailed description of the invention

Fig. 1 is intelligently to attack security model figure in the mobile mist calculating environment of the present invention

Fig. 2 be under the conditions of initial parameter in static subjective game objective weight to Nash Equilibrium and legitimate user's effectiveness Influence comparison diagram.

Fig. 3 is that present invention defence intelligence is attacked and Sarsa algorithm, Greedy in 1-300 time slot under the conditions of initial parameter Strategy, Q-learning algorithm defend the legitimate user's value of utility comparison diagram intelligently attacked.

Fig. 4 is that present invention defence intelligence is attacked and Sarsa algorithm, Greedy in 1-300 time slot under the conditions of initial parameter Strategy, Q-learning algorithm defend the attack rate comparison diagram intelligently attacked.

Fig. 5 is that present invention defence intelligence is attacked and Sarsa algorithm, Q- in 1-300 time slot under the conditions of initial parameter Learning algorithm defends the maximum Q value comparison diagram intelligently attacked.

Fig. 6 is that present invention defence intelligence is attacked and Sarsa algorithm, Q- in 1-300 time slot under the conditions of initial parameter Learning algorithm defends the average motion value comparison diagram intelligently attacked.

Specific embodiment

Present invention employs the following technical solution and realize step:

1. the intelligence attack security model in mobile mist calculating

Security model of the invention considers the communication of mist layer and user's interlayer, such as Fig. 1 institute towards mist node and terminal user Show, mobile mist calculates the intelligent attacker of any one in network as the terminal user with subjectivity, is likely to other Legal terminal user initiates intelligence attack, and the value set expression of intelligent attacker is? Their attack mode of moment t is represented asMoment value set expression isSeparately Outside, the value set expression of legitimate user is,In moment t,Their defence mode is represented asAssuming that at a time t, intelligent attacker 1 utilizes Intelligent programmable wireless device, takesMode is in the legitimate user under same mist node with it to some and initiates intelligence It can attack, whenWhen, indicate that the attacker halts attacks；WhenWhen, indicate the attacker by sending interference letter Number attack legitimate user, reduce legitimate user from mist node receive signal SINR；WhenWhen, indicate that the attacker adopts Eavesdropping attack mode is taken, the information propagated between mist node and legitimate user is intercepted and captured；WhenWhen, it is empty to indicate that the attacker uses False media access control address (Media Access Control Address, MAC-A) pretends to be mist node to legitimate user Data are sent, i.e. the attacker takes spoof attack mode；WhenWhen, indicate that the attacker takes Replay Attack mould Formula sends the data packet of legitimate user's received mistake, achievees the purpose that cheat legitimate user.Legitimate user under attackWhen facing different types of attack, there are two types of the modes of defence: whenWhen, legitimate user is used only PLS and defends intelligence It can attack, this defence mode is referred to as basic schema；WhenWhen, legitimate user will spend more overheads, first It first uses the PLS technology based on channel parameter to carry out Preliminary detection, filtering and anti-eavesdrop, then passes through physics by HLSM detection The data of layer verifying.

Method includes the following steps:

Wherein p be objective probability, p ∈ (0,1], σ_objectIndicate objective probability weight, σ_object∈(0,1].Object table Show the object for participating in game, herein, object=attac or object=user.The description of Prelec probability right function The game object of subjective game is participated in because the result of adjustment is given to the objective probability of decision in the influence of weight.By PT's It inspires, when facing high-probability event, subjective decision person can underestimate corresponding objective probability；On the contrary, when facing low probability event When, subjective decision person can over-evaluate corresponding objective probability.In zero-sum game, legitimate user is in defence mode EU_nLower detection intelligence Attack SA_mThe income of acquisition is expressed asIf intelligent attack is not detected, the security loss being subjected to is expressed as? All there is rate of false alarm and omission factor under any defence mode, the valid data that rate of false alarm refers to that legitimate node is sent is detected as The probability of illegal data, omission factor indicate that illegal data are detected as the probability of valid data, this is that safety damage occur The reason of mistake.Therefore, both comprehensive ratios, legitimate user is in defence mode EU_nSA is intelligently attacked in lower detection_mError rate table It is shown asAccording to system model, the attack mode of intelligent attacker shares 5 kinds, and the defence mode of legitimate user has 2 kinds, intelligence The value of utility of energy attacker and legitimate user are shown below:

According to formula (23), when game both sides, which are based on EUT, calculates effectiveness, calculation formula are as follows:

Wherein,Indicate the value of utility that legitimate user is calculated based on EUT.When game both sides are counted using PT When calculating effectiveness, they are made a policy based on subjective probability, and there is no calculated based on objective average detected error rate.Cause This, according to formula (22) and (23), the calculation formula of both sides' effectiveness is respectively as follows:

Therefore, according to formula (25), (26), (27) and (28), this step, which summarizes, works as SA_mIt is subjective quiet when=0,1,2,3 About the Nash Equilibrium condition of spoof attack in state zero-sum game, they respectively illustrate intelligent attacker take halt attacks or When both modes of spoof attack, the reason of legitimate user takes two kinds of defence pattern formation Nash Equilibrium states.

1. Nash Equilibrium strategy combination is (0,1) when the condition that meets (29), (30).

2. Nash Equilibrium strategy combination is (0,2) when the condition that meets (31), (32).

3. Nash Equilibrium strategy combination is (3,1) when the condition that meets (33), (34).

4. Nash Equilibrium strategy combination is (3,2) when the condition that meets (35), (36).

V(s^t) indicate to correspond to Q under each defence mode in current state₁+Q₂Mean-max, Indicate that the legitimate user under moment t+1 state of Q value table 1 takes defence modeFinancial value,.Therefore optimal defence policies λ^*It is given by:

1. initializing, calculate

2. t=1,2,3...,

3. defending mode using ε-greedy policy selection

4. it was found that NextState

5. calculating and obtaining

6. being updated with 0.5 probability by formula (37), (39)Otherwise more by formula (38), (40) Newly

7. updating V (s by formula (41)^t)。

It 2. continues to execute 8. returning until reaching system end-state, according to Q₁、Q₂Table and formula (42) obtain optimal defence Policy lambda^*。

The present invention is provided with 300 time slots, and each time slot is expressed as 12500/32 microsecond, and adjacent moment interval is with microsecond Unit.In resisting the subjective game intelligently attacked, the index for evaluating 4 kinds of methods is as follows:

The effectiveness of index (1) legitimate user: average utility value of the legitimate user based on PT in each time slot.

Index (2) attack rate: the attack mode sum that intelligent attacker selects in each time slot accounts for all modes Ratio.

Index (3) maximum Q value: the maximum Q value updated in each time slot in Q table renewal process.

Index (4) average motion value: the defence mode that legitimate user selects in each time slot in Q table renewal process is flat Mean value.Since action value only has 1 and 2 two kind of possible value, the defence mode that action value is 1 spends less overhead, when Average motion value gets over hour, shows that legitimate user more selects the defence mode using only safety of physical layer technology, improves System performance.

The initiation parameter meaning and value that the present invention uses are as shown in the table.

Fig. 2 is shown under the conditions of initial parameter in static subjective game objective weight to Nash Equilibrium and legitimate user's effectiveness Influence comparison, X-axis: the objective probability weight of intelligent attacker, unit are " 1 ", and Y-axis: the effectiveness of legitimate user, unit are " 1 ", solid line are legitimate user's effectiveness when the objective probability weight of legitimate user is equal to 0.7, and dotted line is the objective of legitimate user Probability right is equal to legitimate user's effectiveness when 1.Under the conditions of initial parameter in 1-300 time slot the present invention defence intelligence attack with Legitimate user's value of utility comparison that Sarsa algorithm, Greedy strategy, the defence of Q-learning algorithm are intelligently attacked such as Fig. 3, X-axis: Time slot, unit are " 1 ", and Y-axis: the effectiveness of legitimate user, unit are " 1 ", and thick dashed line is intelligently attacked based on the defence of DQL algorithm Legitimate user's value of utility, fine dotted line are that legitimate user's value of utility for intelligently attacking is defendd based on Sarsa algorithm, fine line be based on Q-learning algorithm defends the legitimate user's value of utility intelligently attacked, and heavy line is that intelligence attack is defendd based on Greedy strategy Legitimate user's value of utility.Under the conditions of initial parameter in 1-300 time slot the present invention defence intelligence attack with Sarsa algorithm, The attack rate comparison that Greedy strategy, the defence of Q-learning algorithm are intelligently attacked such as Fig. 4, X-axis: time slot, unit are " 1 ", Y Axis: attack rate, unit are " 1 ", and thick dashed line is the attack rate for defending intelligently to attack based on DQL algorithm, and fine dotted line is based on Sarsa Algorithm defends the attack rate intelligently attacked, and fine line is the attack rate for defending intelligently to attack based on Q-learning algorithm, solid Line is the attack rate for defending intelligently to attack based on Greedy strategy.The present invention defence in 1-300 time slot under the conditions of initial parameter The maximum Q value comparison that intelligence attack is intelligently attacked with Sarsa algorithm, the defence of Q-learning algorithm such as Fig. 5, X-axis: time slot, it is single Position is " 1 ", and Y-axis: maximum Q value, unit are " 1 ", and thick dashed line is the maximum Q value for defending intelligently to attack based on DQL algorithm, fine dotted line To defend the maximum Q value intelligently attacked based on Sarsa algorithm, fine line is intelligently attacked based on the defence of Q-learning algorithm Maximum Q value.The attack of present invention defence intelligence is calculated with Sarsa algorithm, Q-learning in 1-300 time slot under the conditions of initial parameter The maximum Q value comparison that method defence is intelligently attacked such as Fig. 6, X-axis: time slot, unit are " 1 ", and Y-axis: average motion value, unit are " 1 ", Thick dashed line is the average motion value for defending intelligently to attack based on DQL algorithm, and fine dotted line is that intelligence attack is defendd based on Sarsa algorithm Average motion value, fine line is to defend the average motion value intelligently attacked based on Q-learning algorithm.- 6 institute according to fig. 2 Show, method proposed by the present invention obtains more accurate optimal defence policies under the conditions of same initial parameter, improves conjunction The effectiveness of method user, reduces attack rate.

Claims

1. a kind of user oriented intelligent attack defense method in movement mist calculating, which is characterized in that the intelligence in mobile mist calculating It is specific as follows security model to be attacked:

Security model considers the communication of mist layer and user's interlayer towards mist node and terminal user, and mobile mist, which calculates in network, appoints What one intelligent attacker is likely to initiate other legal terminals user intelligent attack as the terminal user with subjectivity It hitting, the value set expression of intelligent attacker is M={ 1,2 ..., m },In their attack mode quilt of moment t It is expressed asMoment value set expression is T={ 0,1,2 ..., t },In addition, the value set of legitimate user It is expressed as, N={ 1,2 ..., n },In moment t, T={ 1,2 ..., t },Their defence mode It is represented asAssuming that at a time t, intelligent attacker 1 utilize Intelligent programmable wireless device, takeMode to Some is in the legitimate user under same mist node with it and initiates intelligence attack, whenWhen, indicate that the attacker stops attacking It hits；WhenWhen, it indicates that the attacker attacks legitimate user by sending interference signal, reduces legitimate user and connect from mist node The SINR of the collection of letters number；WhenWhen, it indicates that the attacker takes eavesdropping attack mode, intercepts and captures and passed between mist node and legitimate user The information broadcast；WhenWhen, indicate that the attacker uses false media access control address (MediaAccess Control Address, MAC-A) pretend to be mist node to send data to legitimate user, i.e. the attacker takes spoof attack mould Formula；WhenWhen, it indicates that the attacker takes Replay Attack mode, sends the data packet of legitimate user's received mistake, reach To the purpose of deception legitimate user；Legitimate user under attackWhen facing different types of attack, there are two types of defend mould Formula: whenWhen, PLS defence intelligence attack is used only in legitimate user, and this defence mode is referred to as basic schema；WhenWhen, legitimate user will spend more overheads, use the PLS technology based on channel parameter to carry out first preliminary Detection, filtering and anti-eavesdrop, the data then verified by HLSM detection by physical layer；

The following steps are included:

(1) the static zero-sum game of subjectivity between intelligent attacker and legitimate user is established based on PT；Wherein, attack mode is expressed as SA_m, the quantity of attack mode is expressed as Num, Num >=1, and defence mode is expressed as EU_n；Based on PT, intelligent attacker and legal use Family takes subjective decision to carry out game, realizes Nash Equilibrium；The present invention calculates subjective probability using Prelec probability right function, Its calculation formula is:

Wherein p be objective probability, p ∈ (0,1], σ_objectIndicate objective probability weight, σ_object∈(0,1]；Object indicates ginseng With the object of game, object=attac or object=user；Prelec probability right function describes participation subjective game Game object because the result of adjustment is given in the influence of weight to the objective probability of decision；By the inspiration of PT, when in face of height When probability event, subjective decision person can underestimate corresponding objective probability；On the contrary, when facing low probability event, subjective decision person Corresponding objective probability can be over-evaluated；In zero-sum game, legitimate user is in defence mode EU_nSA is intelligently attacked in lower detection_mThe receipts of acquisition Benefit is expressed asIf intelligent attack is not detected, the security loss being subjected to is expressed asUnder any defence mode All there is rate of false alarm and omission factor, the valid data that rate of false alarm refers to that legitimate node is sent is detected as the general of illegal data Rate, omission factor indicate that illegal data are detected as the probability of valid data, both comprehensive ratios, and legitimate user is defending Mode EU_nSA is intelligently attacked in lower detection_mError rate be expressed asAccording to system model, the attack mode of intelligent attacker is total There are 5 kinds, the defence mode of legitimate user there are 2 kinds, and the value of utility of intelligent attacker and legitimate user are shown below:

Wherein, U_user(SA_m,EU_n) indicate legitimate user value of utility, C non-zero grades are quantified as,It isProbability, and obeyProbability distribution, whereinAll quantizations The sum of probability

Wherein,Indicate the value of utility that legitimate user is calculated based on EUT；When game both sides calculate effect using PT Used time, they are made a policy based on subjective probability, and there is no calculated based on objective average detected error rate；Therefore, root According to formula (1) and (2), the calculation formula of both sides' effectiveness is respectively as follows:

During subjective game, game both sides change subjective probability by adjusting objective weight, pursue respective effectiveness most Big value, reaches Nash Equilibrium；When intelligent attacker think by he current time initiate intelligence attack can be by legitimate user It detected, then he can select to halt attacks；When legitimate user thinks to obtain using the Security mechanism of higher When more multi-purpose, he can enable EU_n=1；The strategy combination of Nash Equilibrium is represented asThe strategy combination is to make to win The combination that both sides obtain maximum utility is played chess, it should meet following condition:

Therefore, according to formula (4), (5), (6) and (7), this step, which summarizes, works as SA_mWhen=0,1,2,3, subjective static state zero and rich About the Nash Equilibrium condition of spoof attack in playing chess, they, which respectively illustrate intelligent attacker, takes and halts attacks or spoof attack When both modes, the reason of legitimate user takes two kinds of defence pattern formation Nash Equilibrium states；

1. Nash Equilibrium strategy combination is (0,1) when the condition that meets (8), (9)；

2. Nash Equilibrium strategy combination is (0,2) when the condition that meets (10), (11)；

3. Nash Equilibrium strategy combination is (3,1) when the condition that meets (12), (13)；

4. Nash Equilibrium strategy combination is (3,2) when the condition that meets (14), (15)；

(2) dynamic subjective game method is constructed, the optimal defence policies intelligently attacked are resisted based on the acquisition of DQL algorithm；

(3) the dynamic subjective game that malicious user and legitimate user are realized using DQL algorithm, obtains the optimal defence of legitimate user Strategy, wherein attack mode is expressed asDefence mode is expressed as

DQL algorithm alternately updates the income that respective action is executed under each state using two Q value tables；The a certain moment is previous The attack mode of intelligent attacker's selection is expressed as state in time slot, and the defence mode selected in moment t legitimate user is indicated For movement；The calculation formula for updating two Q value tables is as follows:

Wherein, s^tIndicating the system mode in moment t, μ is incentive decay coefficient, and δ is learning efficiency, μ ∈ (0,1], δ ∈ [0, 1],Indicate that the legitimate user under moment t state of Q value table 1 takes defence modeFinancial value,Indicate that legitimate user takes defence mode under moment t stateEffectiveness immediately；WithRespectively Q₁、Q₂State s in table^t+1Under make the maximum defence mode of Q value, their calculation formula is as follows:

V(s^t) indicate to correspond to Q under each defence mode in current state₁+Q₂Mean-max,Indicate Q The legitimate user under moment t+1 state of value table 1 takes defence modeFinancial value,；Therefore optimal defence policies λ^*Under Formula provides:

In each state, legitimate user selects defence mode and follows ε-greedy strategy when updating Q value table, it is provided with ε Probability selection suboptimum defend mode, V (s is met with the probability selection of 1- ε^t) defence mode, wherein (0,1) ε ∈；

1. initializing, μ, δ are calculated,ε,V(s^t)=0；

2. t=1,2,3...,

3. defending mode using ε-greedy policy selection

4. it was found that NextState

5. calculating and obtaining

6. being updated with 0.5 probability by formula (16), (18)Otherwise it is updated by formula (17), (19)

7. updating V (s by formula (20)^t)；

It 2. continues to execute 8. returning until reaching system end-state, according to Q₁、Q₂Table and formula (21) obtain optimal defence policies λ^*。