CN110049497A - A kind of user oriented intelligent attack defense method in mobile mist calculating - Google Patents
A kind of user oriented intelligent attack defense method in mobile mist calculating Download PDFInfo
- Publication number
- CN110049497A CN110049497A CN201910287756.1A CN201910287756A CN110049497A CN 110049497 A CN110049497 A CN 110049497A CN 201910287756 A CN201910287756 A CN 201910287756A CN 110049497 A CN110049497 A CN 110049497A
- Authority
- CN
- China
- Prior art keywords
- attack
- legitimate user
- mode
- defence
- probability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W12/00—Security arrangements; Authentication; Protecting privacy or anonymity
- H04W12/12—Detection or prevention of fraud
- H04W12/121—Wireless intrusion detection systems [WIDS]; Wireless intrusion prevention systems [WIPS]
- H04W12/122—Counter-measures against attacks; Protection against rogue devices
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W12/00—Security arrangements; Authentication; Protecting privacy or anonymity
- H04W12/009—Security arrangements; Authentication; Protecting privacy or anonymity specially adapted for networks, e.g. wireless sensor networks, ad-hoc networks, RFID networks or cloud networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Abstract
The user oriented intelligent attack defense method of one kind not only relates to computer network and wireless communication field in mobile mist calculating, but also belongs to cyberspace security fields.Theoretical (Prospect Theory, PT) and DQL (Double Q-learning, the DQL) algorithm of Utilization prospects of the present invention realizes that mobile mist calculates the intelligent attack defending of subjectivity of customer-centric in environment.Malicious user initiates the intelligence attack of different mode using open wireless network access platform when mist node is communicated with legal terminal user, such as spoof attack, interference attack, eavesdropping attack etc., influence the secure communication that mist calculates mist node and mobile subscriber in network.Intelligence attack is defendd based on PT and DQL algorithm, improve the excessive estimation problem of Q value in Q-learning algorithm, generate the optimal defence policies of legitimate user, both the detection effectiveness of legitimate user under dynamic environment can have been increased, it can reduce the subjective attack probability of intelligent attacker again, while enhancing the security protection ability that mobile mist calculates network.
Description
Technical field
Utilization prospects of the present invention theoretical (Prospect Theory, PT) and DQL (Double Q-learning, DQL) are calculated
Method realizes that mobile mist calculates the intelligent attack defending of subjectivity of customer-centric in environment.Malicious user is wireless using opening
The intelligence that network insertion platform initiates different mode when mist node is communicated with legal terminal user is attacked, such as spoof attack,
Interference attack, eavesdropping attack etc., influence the secure communication that mist calculates mist node and mobile subscriber in network.It is calculated based on PT and DQL
Method defence intelligence attack, improves the excessive estimation problem of Q value in Q-learning algorithm, generates the optimal anti-of legitimate user
Imperial strategy, can not only increase the detection effectiveness of legitimate user under dynamic environment, but also can reduce the subjective attack of intelligent attacker
Probability, while enhancing the security protection ability that mobile mist calculates network.Intelligence is resisted using nitrification enhancement and game theory
Attack, not only relates to computer network and wireless communication field, but also belong to cyberspace security fields.
Background technique
Due to the continuous popularization of technology of Internet of things and mobile intelligent terminal, a large amount of interactive datas of generation are needed in wireless network
It is handled in real time in network environment, and traditional cloud computing cannot effectively meet the network demands such as its isomery, low time delay.Mist calculates will
Cloud computing extends to network edge, can use the direct transmission link of equipment to improve throughput of system, solves cloud computing
The problems such as poor mobility, weak, time delay is high geography information perception.It is calculated in network in mobile mist, mist counting system structure can be divided into
Cloud-mist-device framework and mist-device framework two major classes, comprising the equipment close to Internet of Things edge in mist layer, they are referred to as mist
Node.Under the support of wireless network, mobile mist calculates mist node and terminal device in network and is able to carry out data interaction.
However, since wireless network is easy by security threat still there are many data and lead in mist layer and user's interlayer
Believe safety problem.In the case where mobile mist calculates environment, illegal terminal user can start intelligent attack to other legitimate users, pass through
Radio channel status information and defence policies information are obtained, and selects suitable attack mode to destroy mist layer and user's interlayer
Wireless network secure.Common attack mode includes spoof attack, interference attack, eavesdropping attack etc., starts the end intelligently attacked
End subscriber (i.e. intelligent attacker) subjectively calculates network to mist using the above attack means and causes huge threat.In face of intelligence
It can attack, game theory is its threat of processing, guarantees that mist calculates the strong tools of network security.It calculates in network security, grinds in mist
The person of studying carefully thinks that participating in the participant of game is rationality, they use expected utility theory (Expected Utility
Theory, EUT) effectiveness of participant is calculated, participant selects each walking dynamic for the purpose of obtaining greatest hope effectiveness.But
It is that in dynamic wireless network, each participant does not know about the accuracy rate of whole network state and reception information,
When the strategy intelligently attacked is resisted in selection, their decision has strong subjectivity, not consistent with the result of EUT.And PT
It is the theory for describing people and taking different risk partiality decisions when in face of gain and loss, which thinks that people are when facing acquisition
Avoid risk, face lose when preference risk, it using subjective probability calculate participate in game person effectiveness, be able to reflect decision
The subjectivity of person.
Therefore, the present invention is based on prospect theory, propose a kind of mobile mist calculate in the intelligent attack defending based on DQL algorithm
Method.This method has derived static state by constructing the subjective zero-sum game model of static state between intelligent attacker and legitimate user
The Nash Equilibrium of subjective zero-sum game, while by DQL algorithm, it proposes to inhibit intelligent attacker's subjectivity attack for dynamic environment
The method of motivation generates the optimal defence policies of legitimate user, and legitimate user is made subjectively to judge whether that physical layer is only used only
Safe practice resists intelligent attack.This method can be such that the detection effectiveness of legitimate user increases, and attack rate is promoted to be effectively reduced, with
It resists the method intelligently attacked based on Q-learning algorithm, Sarsa algorithm, Greedy strategy and compares, this method is in mobile mist meter
It calculates in environment and has higher security protection performance.
Summary of the invention
Present invention obtains the intelligent attack defense methods based on DQL algorithm in a kind of mobile mist calculating, devise movement
The security model of intelligence attacker involved in mist calculating, and intelligent attacker and legitimate user's progress are constructed based on prospect theory
The static method of subjective zero-sum game and dynamic subjective game method based on DQL algorithm.By this method defensive attack, so that
The defence policies of legitimate user are optimal, and improve the detection effectiveness of legitimate user, reduce attack rate, while enhancing shifting
Dynamic mist calculates internet security and protective performance.
Present invention employs the following technical solution and realize step:
1. the intelligence attack security model in mobile mist calculating
Security model of the invention considers the communication of mist layer and user's interlayer, such as Fig. 1 institute towards mist node and terminal user
Show, mobile mist calculates the intelligent attacker of any one in network as the terminal user with subjectivity, is likely to other
Legal terminal user initiates intelligence attack, and the value set expression of intelligent attacker is?
Their attack mode of moment t is represented asMoment value set expression isSeparately
Outside, the value set expression of legitimate user is,In moment t,Their defence mode is represented asAssuming that at a time t, intelligent attacker
1 utilizes Intelligent programmable wireless device, takesMode is in the legitimate user under same mist node with it to some and initiates intelligence
It can attack, whenWhen, indicate that the attacker halts attacks;WhenWhen, indicate the attacker by sending interference letter
Number attack legitimate user, reduce legitimate user from mist node receive signal SINR;WhenWhen, indicate that the attacker takes
Attack mode is eavesdropped, the information propagated between mist node and legitimate user is intercepted and captured;WhenWhen, it is false to indicate that the attacker uses
Media access control address (Media Access Control Address, MAC-A) pretend to be mist node to legitimate user send out
Data are sent, i.e. the attacker takes spoof attack mode;WhenWhen, indicate that the attacker takes Replay Attack mode,
The data packet for sending legitimate user's received mistake achievees the purpose that cheat legitimate user.Legitimate user under attack
When facing different types of attack, there are two types of the modes of defence: whenWhen, legitimate user is used only PLS defence intelligence and attacks
It hits, this defence mode is referred to as basic schema;WhenWhen, legitimate user will spend more overheads, make first
Preliminary detection, filtering and anti-eavesdrop are carried out with the PLS technology based on channel parameter, then detects by HLSM and is tested by physical layer
The data of card.
2. a kind of intelligent attack defense method of subjectivity based on DQL algorithm
Method includes the following steps:
(1) the static zero-sum game of subjectivity between intelligent attacker and legitimate user is established based on PT.Wherein, attack mode table
It is shown as SAm, the quantity of attack mode is expressed as Num, Num >=1, and defence mode is expressed as EUn.Based on PT, intelligent attacker and conjunction
Method user takes subjective decision to carry out game, realizes Nash Equilibrium.The present invention is calculated subjective using Prelec probability right function
Probability, its calculation formula is:
Wherein p be objective probability, p ∈ (0,1], σobjectIndicate objective probability weight, σobject∈(0,1].Object table
Show the object for participating in game, herein, object=attac or object=user.The description of Prelec probability right function
The game object of subjective game is participated in because the result of adjustment is given to the objective probability of decision in the influence of weight.By PT's
It inspires, when facing high-probability event, subjective decision person can underestimate corresponding objective probability;On the contrary, when facing low probability event
When, subjective decision person can over-evaluate corresponding objective probability.In zero-sum game, legitimate user is in defence mode EUnLower detection intelligence
Attack SAmThe income of acquisition is expressed asIf intelligent attack is not detected, the security loss being subjected to is expressed as
All there is rate of false alarm and omission factor under any defence mode, rate of false alarm refers to that the valid data that legitimate node is sent is detected
For the probability of illegal data, omission factor indicates that illegal data are detected as the probability of valid data, this is that safety occur
The reason of loss.Therefore, both comprehensive ratios, legitimate user is in defence mode EUnSA is intelligently attacked in lower detectionmError rate
It is expressed asAccording to system model, the attack mode of intelligent attacker shares 5 kinds, and the defence mode of legitimate user has 2 kinds,
The value of utility of intelligent attacker and legitimate user are shown below:
Wherein, Uuser(SAm,EUn) indicate legitimate user value of utility, It is non-zero etc. to be quantified as C
Grade,It isProbability, and obeyProbability distribution, whereinIt is all
Quantify the sum of probability
According to formula (2), when game both sides, which are based on EUT, calculates effectiveness, calculation formula are as follows:
Wherein,Indicate the value of utility that legitimate user is calculated based on EUT.When game both sides are counted using PT
When calculating effectiveness, they are made a policy based on subjective probability, and there is no calculated based on objective average detected error rate.Cause
This, according to formula (1) and (2), the calculation formula of both sides' effectiveness is respectively as follows:
During subjective game, game both sides change subjective probability by adjusting objective weight, pursue respective effectiveness
Maximum value, reach Nash Equilibrium.When intelligent attacker think by he current time initiate intelligence attack can be legal
User detected, then he can select to halt attacks.When think can using the Security mechanism of higher by legitimate user
When obtaining more multi-purpose, he can enable EUn=1.The strategy combination of Nash Equilibrium is represented as hereinThe plan
Slightly combination is the combination for making game both sides obtain maximum utility, it should meet following condition:
Therefore, according to formula (4), (5), (6) and (7), this step, which summarizes, works as SAmWhen=0,1,2,3, subjective static state zero
With the Nash Equilibrium condition in game about spoof attack, they, which respectively illustrate intelligent attacker, takes and halts attacks or pretend
When attacking both modes, the reason of legitimate user takes two kinds of defence pattern formation Nash Equilibrium states.
1. Nash Equilibrium strategy combination is (0,1) when the condition that meets (8), (9).
2. Nash Equilibrium strategy combination is (0,2) when the condition that meets (10), (11).
3. Nash Equilibrium strategy combination is (3,1) when the condition that meets (12), (13).
4. Nash Equilibrium strategy combination is (3,2) when the condition that meets (14), (15).
(2) dynamic subjective game method is constructed, the optimal defence policies intelligently attacked are resisted based on the acquisition of DQL algorithm.It dislikes
Meaning user and legitimate user carry out static subjective game, using effectiveness as the standard of measurement decision-making results, however are dynamically moving
Dynamic mist calculates in network, participates in understanding of both sides' shortage to overall network environment of game, and legitimate user not can determine that attack inspection
The error rate of survey, therefore they are continually interacted, and enable legitimate user that suitable defence mode to be selected to resist malicious user hair
The intelligence attack risen, increases the effectiveness of legitimate user, reduces attack rate.In intensified learning method, Q-learning algorithm is
A method of obtaining optimal policy in the insufficient dynamic environment of information, it derives from Studying theory on behaviorism, by Q
The superiority and inferiority that value evaluation object takes some to act in a particular state, wherein including two important parameters: learning efficiency and prize
The weak coefficient of encouraging property.Learning efficiency is bigger, and the effect of training is fewer before retaining;Incentive weak coefficient is bigger, then more
Ground considers income at a specified future date.When calculating income at a specified future date, Q-learning has used max function, is easy excessively to estimate Q value,
Therefore, this step realizes the dynamic subjective game of malicious user and legitimate user using DQL algorithm, obtains the optimal of legitimate user
Defence policies, wherein attack mode is expressed asDefence mode is expressed as
DQL algorithm alternately updates the income that respective action is executed under each state using two Q value tables.Herein, will
The attack mode of intelligent attacker's selection is expressed as state in a certain moment previous time slot, will select in moment t legitimate user
Defence mode be expressed as acting.The calculation formula for updating two Q value tables is as follows:
Wherein, stIndicating the system mode in moment t, μ is incentive decay coefficient, and δ is learning efficiency, μ ∈ (0,1], δ
∈ [0,1],Indicate that the legitimate user under moment t state of Q value table 1 takes defence modeFinancial value,Indicate that legitimate user takes defence mode under moment t stateEffectiveness immediately.WithRespectively
Q1、Q2State s in tablet+1Under make the maximum defence mode of Q value, their calculation formula is as follows:
V(st) indicate to correspond to Q under each defence mode in current state1+Q2Mean-max,
Indicate that the legitimate user under moment t+1 state of Q value table 1 takes defence modeFinancial value, therefore optimal defence policies λ*
It is given by:
In each state, legitimate user selects defence mode and follows ε-greedy strategy when updating Q value table, it is provided
Mode is defendd with the probability selection suboptimum of ε, V (s is met with the probability selection of 1- εt) defence mode, wherein (0,1) ε ∈.
According to above-mentioned formula, the DQL algorithm steps for obtaining optimal defence policies are summarized as follows:
1. initializing, calculate
2. t=1,2,3...,
3. defending mode using ε-greedy policy selection
4. it was found that NextState
5. calculating and obtaining
6. being updated with 0.5 probability by formula (16), (18)Otherwise more by formula (17), (19)
Newly
7. updating V (s by formula (20)t)。
It 2. continues to execute 8. returning until reaching system end-state, according to Q1、Q2Table and formula (21) obtain optimal defence
Policy lambda*。
Creativeness of the invention is mainly reflected in:
(1) it is easy to be sent out by malicious user when the present invention is interacted for mist node in mobile mist calculating with terminal user
The problem of dynamic intelligence is attacked, the description by prospect theory in game theory to game participant's subjectivity construct and intelligence
The static state that attack mode and the relevant mobile mist of defence mode calculate between security model and intelligent attacker and legitimate user is main
See zero-sum game;The mobility that terminal under environment user is calculated according to mobile mist, is based on the available dynamic of nitrification enhancement
The optimal policy of INFORMATION OF INCOMPLETE under environment devises in a mobile mist calculating and obtains optimal defence policies based on DQL algorithm
Scheme, enhance the safety communicated between mist node and terminal user.
(2) present invention demonstrates the subjective attack motivation that lower objective probability weight is able to suppress intelligent attacker, and
And 4 indexs: effectiveness, attack rate, the maximum Q value, average motion value of legitimate user are provided with, by the method for proposition and it is based on Q-
Learning algorithm, Sarsa algorithm, Greedy strategy are resisted the method intelligently attacked and are compared.Method energy proposed by the present invention
Enough decision processes for optimizing optimal defence policies by adjusting Q value, improve the effectiveness of legitimate user, attack rate are promoted to reduce,
There is good security protection performance in mobile mist environment.
Detailed description of the invention
Fig. 1 is intelligently to attack security model figure in the mobile mist calculating environment of the present invention
Fig. 2 be under the conditions of initial parameter in static subjective game objective weight to Nash Equilibrium and legitimate user's effectiveness
Influence comparison diagram.
Fig. 3 is that present invention defence intelligence is attacked and Sarsa algorithm, Greedy in 1-300 time slot under the conditions of initial parameter
Strategy, Q-learning algorithm defend the legitimate user's value of utility comparison diagram intelligently attacked.
Fig. 4 is that present invention defence intelligence is attacked and Sarsa algorithm, Greedy in 1-300 time slot under the conditions of initial parameter
Strategy, Q-learning algorithm defend the attack rate comparison diagram intelligently attacked.
Fig. 5 is that present invention defence intelligence is attacked and Sarsa algorithm, Q- in 1-300 time slot under the conditions of initial parameter
Learning algorithm defends the maximum Q value comparison diagram intelligently attacked.
Fig. 6 is that present invention defence intelligence is attacked and Sarsa algorithm, Q- in 1-300 time slot under the conditions of initial parameter
Learning algorithm defends the average motion value comparison diagram intelligently attacked.
Specific embodiment
Present invention obtains the intelligent attack defense methods based on DQL algorithm in a kind of mobile mist calculating, devise movement
The security model of intelligence attacker involved in mist calculating, and intelligent attacker and legitimate user's progress are constructed based on prospect theory
The static method of subjective zero-sum game and dynamic subjective game method based on DQL algorithm.By this method defensive attack, so that
The defence policies of legitimate user are optimal, and improve the detection effectiveness of legitimate user, reduce attack rate, while enhancing shifting
Dynamic mist calculates internet security and protective performance.
Present invention employs the following technical solution and realize step:
1. the intelligence attack security model in mobile mist calculating
Security model of the invention considers the communication of mist layer and user's interlayer, such as Fig. 1 institute towards mist node and terminal user
Show, mobile mist calculates the intelligent attacker of any one in network as the terminal user with subjectivity, is likely to other
Legal terminal user initiates intelligence attack, and the value set expression of intelligent attacker is?
Their attack mode of moment t is represented asMoment value set expression isSeparately
Outside, the value set expression of legitimate user is,In moment t,Their defence mode is represented asAssuming that at a time t, intelligent attacker
1 utilizes Intelligent programmable wireless device, takesMode is in the legitimate user under same mist node with it to some and initiates intelligence
It can attack, whenWhen, indicate that the attacker halts attacks;WhenWhen, indicate the attacker by sending interference letter
Number attack legitimate user, reduce legitimate user from mist node receive signal SINR;WhenWhen, indicate that the attacker adopts
Eavesdropping attack mode is taken, the information propagated between mist node and legitimate user is intercepted and captured;WhenWhen, it is empty to indicate that the attacker uses
False media access control address (Media Access Control Address, MAC-A) pretends to be mist node to legitimate user
Data are sent, i.e. the attacker takes spoof attack mode;WhenWhen, indicate that the attacker takes Replay Attack mould
Formula sends the data packet of legitimate user's received mistake, achievees the purpose that cheat legitimate user.Legitimate user under attackWhen facing different types of attack, there are two types of the modes of defence: whenWhen, legitimate user is used only PLS and defends intelligence
It can attack, this defence mode is referred to as basic schema;WhenWhen, legitimate user will spend more overheads, first
It first uses the PLS technology based on channel parameter to carry out Preliminary detection, filtering and anti-eavesdrop, then passes through physics by HLSM detection
The data of layer verifying.
2. a kind of intelligent attack defense method of subjectivity based on DQL algorithm
Method includes the following steps:
(1) the static zero-sum game of subjectivity between intelligent attacker and legitimate user is established based on PT.Wherein, attack mode table
It is shown as SAm, the quantity of attack mode is expressed as Num, Num >=1, and defence mode is expressed as EUn.Based on PT, intelligent attacker and conjunction
Method user takes subjective decision to carry out game, realizes Nash Equilibrium.The present invention is calculated subjective using Prelec probability right function
Probability, its calculation formula is:
Wherein p be objective probability, p ∈ (0,1], σobjectIndicate objective probability weight, σobject∈(0,1].Object table
Show the object for participating in game, herein, object=attac or object=user.The description of Prelec probability right function
The game object of subjective game is participated in because the result of adjustment is given to the objective probability of decision in the influence of weight.By PT's
It inspires, when facing high-probability event, subjective decision person can underestimate corresponding objective probability;On the contrary, when facing low probability event
When, subjective decision person can over-evaluate corresponding objective probability.In zero-sum game, legitimate user is in defence mode EUnLower detection intelligence
Attack SAmThe income of acquisition is expressed asIf intelligent attack is not detected, the security loss being subjected to is expressed as?
All there is rate of false alarm and omission factor under any defence mode, the valid data that rate of false alarm refers to that legitimate node is sent is detected as
The probability of illegal data, omission factor indicate that illegal data are detected as the probability of valid data, this is that safety damage occur
The reason of mistake.Therefore, both comprehensive ratios, legitimate user is in defence mode EUnSA is intelligently attacked in lower detectionmError rate table
It is shown asAccording to system model, the attack mode of intelligent attacker shares 5 kinds, and the defence mode of legitimate user has 2 kinds, intelligence
The value of utility of energy attacker and legitimate user are shown below:
Wherein, Uuser(SAm,EUn) indicate legitimate user value of utility, It is non-zero etc. to be quantified as C
Grade,It isProbability, and obeyProbability distribution, whereinIt is all
Quantify the sum of probability
According to formula (23), when game both sides, which are based on EUT, calculates effectiveness, calculation formula are as follows:
Wherein,Indicate the value of utility that legitimate user is calculated based on EUT.When game both sides are counted using PT
When calculating effectiveness, they are made a policy based on subjective probability, and there is no calculated based on objective average detected error rate.Cause
This, according to formula (22) and (23), the calculation formula of both sides' effectiveness is respectively as follows:
During subjective game, game both sides change subjective probability by adjusting objective weight, pursue respective effectiveness
Maximum value, reach Nash Equilibrium.When intelligent attacker think by he current time initiate intelligence attack can be legal
User detected, then he can select to halt attacks.When think can using the Security mechanism of higher by legitimate user
When obtaining more multi-purpose, he can enable EUn=1.The strategy combination of Nash Equilibrium is represented as hereinThe plan
Slightly combination is the combination for making game both sides obtain maximum utility, it should meet following condition:
Therefore, according to formula (25), (26), (27) and (28), this step, which summarizes, works as SAmIt is subjective quiet when=0,1,2,3
About the Nash Equilibrium condition of spoof attack in state zero-sum game, they respectively illustrate intelligent attacker take halt attacks or
When both modes of spoof attack, the reason of legitimate user takes two kinds of defence pattern formation Nash Equilibrium states.
1. Nash Equilibrium strategy combination is (0,1) when the condition that meets (29), (30).
2. Nash Equilibrium strategy combination is (0,2) when the condition that meets (31), (32).
3. Nash Equilibrium strategy combination is (3,1) when the condition that meets (33), (34).
4. Nash Equilibrium strategy combination is (3,2) when the condition that meets (35), (36).
(2) dynamic subjective game method is constructed, the optimal defence policies intelligently attacked are resisted based on the acquisition of DQL algorithm.It dislikes
Meaning user and legitimate user carry out static subjective game, using effectiveness as the standard of measurement decision-making results, however are dynamically moving
Dynamic mist calculates in network, participates in understanding of both sides' shortage to overall network environment of game, and legitimate user not can determine that attack inspection
The error rate of survey, therefore they are continually interacted, and enable legitimate user that suitable defence mode to be selected to resist malicious user hair
The intelligence attack risen, increases the effectiveness of legitimate user, reduces attack rate.In intensified learning method, Q-learning algorithm is
A method of obtaining optimal policy in the insufficient dynamic environment of information, it derives from Studying theory on behaviorism, by Q
The superiority and inferiority that value evaluation object takes some to act in a particular state, wherein including two important parameters: learning efficiency and prize
The weak coefficient of encouraging property.Learning efficiency is bigger, and the effect of training is fewer before retaining;Incentive weak coefficient is bigger, then more
Ground considers income at a specified future date.When calculating income at a specified future date, Q-learning has used max function, is easy excessively to estimate Q value,
Therefore, this step realizes the dynamic subjective game of malicious user and legitimate user using DQL algorithm, obtains the optimal of legitimate user
Defence policies, wherein attack mode is expressed asDefence mode is expressed as
DQL algorithm alternately updates the income that respective action is executed under each state using two Q value tables.Herein, will
The attack mode of intelligent attacker's selection is expressed as state in a certain moment previous time slot, will select in moment t legitimate user
Defence mode be expressed as acting.The calculation formula for updating two Q value tables is as follows:
Wherein, stIndicating the system mode in moment t, μ is incentive decay coefficient, and δ is learning efficiency, μ ∈ (0,1], δ
∈ [0,1],Indicate that the legitimate user under moment t state of Q value table 1 takes defence modeFinancial value,Indicate that legitimate user takes defence mode under moment t stateEffectiveness immediately.WithRespectively
Q1、Q2State s in tablet+1Under make the maximum defence mode of Q value, their calculation formula is as follows:
V(st) indicate to correspond to Q under each defence mode in current state1+Q2Mean-max,
Indicate that the legitimate user under moment t+1 state of Q value table 1 takes defence modeFinancial value,.Therefore optimal defence policies
λ*It is given by:
In each state, legitimate user selects defence mode and follows ε-greedy strategy when updating Q value table, it is provided
Mode is defendd with the probability selection suboptimum of ε, V (s is met with the probability selection of 1- εt) defence mode, wherein (0,1) ε ∈.
According to above-mentioned formula, the DQL algorithm steps for obtaining optimal defence policies are summarized as follows:
1. initializing, calculate
2. t=1,2,3...,
3. defending mode using ε-greedy policy selection
4. it was found that NextState
5. calculating and obtaining
6. being updated with 0.5 probability by formula (37), (39)Otherwise more by formula (38), (40)
Newly
7. updating V (s by formula (41)t)。
It 2. continues to execute 8. returning until reaching system end-state, according to Q1、Q2Table and formula (42) obtain optimal defence
Policy lambda*。
The present invention is provided with 300 time slots, and each time slot is expressed as 12500/32 microsecond, and adjacent moment interval is with microsecond
Unit.In resisting the subjective game intelligently attacked, the index for evaluating 4 kinds of methods is as follows:
The effectiveness of index (1) legitimate user: average utility value of the legitimate user based on PT in each time slot.
Index (2) attack rate: the attack mode sum that intelligent attacker selects in each time slot accounts for all modes
Ratio.
Index (3) maximum Q value: the maximum Q value updated in each time slot in Q table renewal process.
Index (4) average motion value: the defence mode that legitimate user selects in each time slot in Q table renewal process is flat
Mean value.Since action value only has 1 and 2 two kind of possible value, the defence mode that action value is 1 spends less overhead, when
Average motion value gets over hour, shows that legitimate user more selects the defence mode using only safety of physical layer technology, improves
System performance.
The initiation parameter meaning and value that the present invention uses are as shown in the table.
Fig. 2 is shown under the conditions of initial parameter in static subjective game objective weight to Nash Equilibrium and legitimate user's effectiveness
Influence comparison, X-axis: the objective probability weight of intelligent attacker, unit are " 1 ", and Y-axis: the effectiveness of legitimate user, unit are
" 1 ", solid line are legitimate user's effectiveness when the objective probability weight of legitimate user is equal to 0.7, and dotted line is the objective of legitimate user
Probability right is equal to legitimate user's effectiveness when 1.Under the conditions of initial parameter in 1-300 time slot the present invention defence intelligence attack with
Legitimate user's value of utility comparison that Sarsa algorithm, Greedy strategy, the defence of Q-learning algorithm are intelligently attacked such as Fig. 3, X-axis:
Time slot, unit are " 1 ", and Y-axis: the effectiveness of legitimate user, unit are " 1 ", and thick dashed line is intelligently attacked based on the defence of DQL algorithm
Legitimate user's value of utility, fine dotted line are that legitimate user's value of utility for intelligently attacking is defendd based on Sarsa algorithm, fine line be based on
Q-learning algorithm defends the legitimate user's value of utility intelligently attacked, and heavy line is that intelligence attack is defendd based on Greedy strategy
Legitimate user's value of utility.Under the conditions of initial parameter in 1-300 time slot the present invention defence intelligence attack with Sarsa algorithm,
The attack rate comparison that Greedy strategy, the defence of Q-learning algorithm are intelligently attacked such as Fig. 4, X-axis: time slot, unit are " 1 ", Y
Axis: attack rate, unit are " 1 ", and thick dashed line is the attack rate for defending intelligently to attack based on DQL algorithm, and fine dotted line is based on Sarsa
Algorithm defends the attack rate intelligently attacked, and fine line is the attack rate for defending intelligently to attack based on Q-learning algorithm, solid
Line is the attack rate for defending intelligently to attack based on Greedy strategy.The present invention defence in 1-300 time slot under the conditions of initial parameter
The maximum Q value comparison that intelligence attack is intelligently attacked with Sarsa algorithm, the defence of Q-learning algorithm such as Fig. 5, X-axis: time slot, it is single
Position is " 1 ", and Y-axis: maximum Q value, unit are " 1 ", and thick dashed line is the maximum Q value for defending intelligently to attack based on DQL algorithm, fine dotted line
To defend the maximum Q value intelligently attacked based on Sarsa algorithm, fine line is intelligently attacked based on the defence of Q-learning algorithm
Maximum Q value.The attack of present invention defence intelligence is calculated with Sarsa algorithm, Q-learning in 1-300 time slot under the conditions of initial parameter
The maximum Q value comparison that method defence is intelligently attacked such as Fig. 6, X-axis: time slot, unit are " 1 ", and Y-axis: average motion value, unit are " 1 ",
Thick dashed line is the average motion value for defending intelligently to attack based on DQL algorithm, and fine dotted line is that intelligence attack is defendd based on Sarsa algorithm
Average motion value, fine line is to defend the average motion value intelligently attacked based on Q-learning algorithm.- 6 institute according to fig. 2
Show, method proposed by the present invention obtains more accurate optimal defence policies under the conditions of same initial parameter, improves conjunction
The effectiveness of method user, reduces attack rate.
Claims (1)
1. a kind of user oriented intelligent attack defense method in movement mist calculating, which is characterized in that the intelligence in mobile mist calculating
It is specific as follows security model to be attacked:
Security model considers the communication of mist layer and user's interlayer towards mist node and terminal user, and mobile mist, which calculates in network, appoints
What one intelligent attacker is likely to initiate other legal terminals user intelligent attack as the terminal user with subjectivity
It hitting, the value set expression of intelligent attacker is M={ 1,2 ..., m },In their attack mode quilt of moment t
It is expressed asMoment value set expression is T={ 0,1,2 ..., t },In addition, the value set of legitimate user
It is expressed as, N={ 1,2 ..., n },In moment t, T={ 1,2 ..., t },Their defence mode
It is represented asAssuming that at a time t, intelligent attacker 1 utilize Intelligent programmable wireless device, takeMode to
Some is in the legitimate user under same mist node with it and initiates intelligence attack, whenWhen, indicate that the attacker stops attacking
It hits;WhenWhen, it indicates that the attacker attacks legitimate user by sending interference signal, reduces legitimate user and connect from mist node
The SINR of the collection of letters number;WhenWhen, it indicates that the attacker takes eavesdropping attack mode, intercepts and captures and passed between mist node and legitimate user
The information broadcast;WhenWhen, indicate that the attacker uses false media access control address (MediaAccess
Control Address, MAC-A) pretend to be mist node to send data to legitimate user, i.e. the attacker takes spoof attack mould
Formula;WhenWhen, it indicates that the attacker takes Replay Attack mode, sends the data packet of legitimate user's received mistake, reach
To the purpose of deception legitimate user;Legitimate user under attackWhen facing different types of attack, there are two types of defend mould
Formula: whenWhen, PLS defence intelligence attack is used only in legitimate user, and this defence mode is referred to as basic schema;WhenWhen, legitimate user will spend more overheads, use the PLS technology based on channel parameter to carry out first preliminary
Detection, filtering and anti-eavesdrop, the data then verified by HLSM detection by physical layer;
The following steps are included:
(1) the static zero-sum game of subjectivity between intelligent attacker and legitimate user is established based on PT;Wherein, attack mode is expressed as
SAm, the quantity of attack mode is expressed as Num, Num >=1, and defence mode is expressed as EUn;Based on PT, intelligent attacker and legal use
Family takes subjective decision to carry out game, realizes Nash Equilibrium;The present invention calculates subjective probability using Prelec probability right function,
Its calculation formula is:
Wherein p be objective probability, p ∈ (0,1], σobjectIndicate objective probability weight, σobject∈(0,1];Object indicates ginseng
With the object of game, object=attac or object=user;Prelec probability right function describes participation subjective game
Game object because the result of adjustment is given in the influence of weight to the objective probability of decision;By the inspiration of PT, when in face of height
When probability event, subjective decision person can underestimate corresponding objective probability;On the contrary, when facing low probability event, subjective decision person
Corresponding objective probability can be over-evaluated;In zero-sum game, legitimate user is in defence mode EUnSA is intelligently attacked in lower detectionmThe receipts of acquisition
Benefit is expressed asIf intelligent attack is not detected, the security loss being subjected to is expressed asUnder any defence mode
All there is rate of false alarm and omission factor, the valid data that rate of false alarm refers to that legitimate node is sent is detected as the general of illegal data
Rate, omission factor indicate that illegal data are detected as the probability of valid data, both comprehensive ratios, and legitimate user is defending
Mode EUnSA is intelligently attacked in lower detectionmError rate be expressed asAccording to system model, the attack mode of intelligent attacker is total
There are 5 kinds, the defence mode of legitimate user there are 2 kinds, and the value of utility of intelligent attacker and legitimate user are shown below:
Wherein, Uuser(SAm,EUn) indicate legitimate user value of utility, C non-zero grades are quantified as,It isProbability, and obeyProbability distribution, whereinAll quantizations
The sum of probability
According to formula (2), when game both sides, which are based on EUT, calculates effectiveness, calculation formula are as follows:
Wherein,Indicate the value of utility that legitimate user is calculated based on EUT;When game both sides calculate effect using PT
Used time, they are made a policy based on subjective probability, and there is no calculated based on objective average detected error rate;Therefore, root
According to formula (1) and (2), the calculation formula of both sides' effectiveness is respectively as follows:
During subjective game, game both sides change subjective probability by adjusting objective weight, pursue respective effectiveness most
Big value, reaches Nash Equilibrium;When intelligent attacker think by he current time initiate intelligence attack can be by legitimate user
It detected, then he can select to halt attacks;When legitimate user thinks to obtain using the Security mechanism of higher
When more multi-purpose, he can enable EUn=1;The strategy combination of Nash Equilibrium is represented asThe strategy combination is to make to win
The combination that both sides obtain maximum utility is played chess, it should meet following condition:
Therefore, according to formula (4), (5), (6) and (7), this step, which summarizes, works as SAmWhen=0,1,2,3, subjective static state zero and rich
About the Nash Equilibrium condition of spoof attack in playing chess, they, which respectively illustrate intelligent attacker, takes and halts attacks or spoof attack
When both modes, the reason of legitimate user takes two kinds of defence pattern formation Nash Equilibrium states;
1. Nash Equilibrium strategy combination is (0,1) when the condition that meets (8), (9);
2. Nash Equilibrium strategy combination is (0,2) when the condition that meets (10), (11);
3. Nash Equilibrium strategy combination is (3,1) when the condition that meets (12), (13);
4. Nash Equilibrium strategy combination is (3,2) when the condition that meets (14), (15);
(2) dynamic subjective game method is constructed, the optimal defence policies intelligently attacked are resisted based on the acquisition of DQL algorithm;
(3) the dynamic subjective game that malicious user and legitimate user are realized using DQL algorithm, obtains the optimal defence of legitimate user
Strategy, wherein attack mode is expressed asDefence mode is expressed as
DQL algorithm alternately updates the income that respective action is executed under each state using two Q value tables;The a certain moment is previous
The attack mode of intelligent attacker's selection is expressed as state in time slot, and the defence mode selected in moment t legitimate user is indicated
For movement;The calculation formula for updating two Q value tables is as follows:
Wherein, stIndicating the system mode in moment t, μ is incentive decay coefficient, and δ is learning efficiency, μ ∈ (0,1], δ ∈ [0,
1],Indicate that the legitimate user under moment t state of Q value table 1 takes defence modeFinancial value,Indicate that legitimate user takes defence mode under moment t stateEffectiveness immediately;WithRespectively
Q1、Q2State s in tablet+1Under make the maximum defence mode of Q value, their calculation formula is as follows:
V(st) indicate to correspond to Q under each defence mode in current state1+Q2Mean-max,Indicate Q
The legitimate user under moment t+1 state of value table 1 takes defence modeFinancial value,;Therefore optimal defence policies λ*Under
Formula provides:
In each state, legitimate user selects defence mode and follows ε-greedy strategy when updating Q value table, it is provided with ε
Probability selection suboptimum defend mode, V (s is met with the probability selection of 1- εt) defence mode, wherein (0,1) ε ∈;
According to above-mentioned formula, the DQL algorithm steps for obtaining optimal defence policies are summarized as follows:
1. initializing, μ, δ are calculated,ε,V(st)=0;
2. t=1,2,3...,
3. defending mode using ε-greedy policy selection
4. it was found that NextState
5. calculating and obtaining
6. being updated with 0.5 probability by formula (16), (18)Otherwise it is updated by formula (17), (19)
7. updating V (s by formula (20)t);
It 2. continues to execute 8. returning until reaching system end-state, according to Q1、Q2Table and formula (21) obtain optimal defence policies
λ*。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910287756.1A CN110049497B (en) | 2019-04-11 | 2019-04-11 | User-oriented intelligent attack defense method in mobile fog calculation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910287756.1A CN110049497B (en) | 2019-04-11 | 2019-04-11 | User-oriented intelligent attack defense method in mobile fog calculation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110049497A true CN110049497A (en) | 2019-07-23 |
CN110049497B CN110049497B (en) | 2022-09-09 |
Family
ID=67276801
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910287756.1A Active CN110049497B (en) | 2019-04-11 | 2019-04-11 | User-oriented intelligent attack defense method in mobile fog calculation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110049497B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110401675A (en) * | 2019-08-20 | 2019-11-01 | 绍兴文理学院 | Uncertain ddos attack defence method under a kind of sensing cloud environment |
CN110753383A (en) * | 2019-07-24 | 2020-02-04 | 北京工业大学 | Safe relay node selection method based on reinforcement learning in fog calculation |
CN114448660A (en) * | 2021-12-16 | 2022-05-06 | 国网江苏省电力有限公司电力科学研究院 | Internet of things data access method |
CN114666107A (en) * | 2022-03-04 | 2022-06-24 | 北京工业大学 | Advanced persistent threat defense method in mobile fog computing |
WO2022151579A1 (en) * | 2021-01-13 | 2022-07-21 | 清华大学 | Backdoor attack active defense method and device in edge computing scene |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107147670A (en) * | 2017-06-16 | 2017-09-08 | 福建中信网安信息科技有限公司 | APT defence methods based on game system |
CN107871164A (en) * | 2017-11-17 | 2018-04-03 | 济南浪潮高新科技投资发展有限公司 | A kind of mist computing environment personalization deep learning method |
CN108512837A (en) * | 2018-03-16 | 2018-09-07 | 西安电子科技大学 | A kind of method and system of the networks security situation assessment based on attacking and defending evolutionary Game |
CN108848535A (en) * | 2018-05-31 | 2018-11-20 | 国网浙江省电力有限公司电力科学研究院 | A kind of mist calculating environmental resource distribution method towards shared model |
EP3407194A2 (en) * | 2018-07-19 | 2018-11-28 | Erle Robotics, S.L. | Method for the deployment of distributed fog computing and storage architectures in robotic modular components |
CN109194685A (en) * | 2018-10-12 | 2019-01-11 | 天津大学 | Man-in-the-middle attack defence policies based on safe game theory |
-
2019
- 2019-04-11 CN CN201910287756.1A patent/CN110049497B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107147670A (en) * | 2017-06-16 | 2017-09-08 | 福建中信网安信息科技有限公司 | APT defence methods based on game system |
CN107871164A (en) * | 2017-11-17 | 2018-04-03 | 济南浪潮高新科技投资发展有限公司 | A kind of mist computing environment personalization deep learning method |
CN108512837A (en) * | 2018-03-16 | 2018-09-07 | 西安电子科技大学 | A kind of method and system of the networks security situation assessment based on attacking and defending evolutionary Game |
CN108848535A (en) * | 2018-05-31 | 2018-11-20 | 国网浙江省电力有限公司电力科学研究院 | A kind of mist calculating environmental resource distribution method towards shared model |
EP3407194A2 (en) * | 2018-07-19 | 2018-11-28 | Erle Robotics, S.L. | Method for the deployment of distributed fog computing and storage architectures in robotic modular components |
CN109194685A (en) * | 2018-10-12 | 2019-01-11 | 天津大学 | Man-in-the-middle attack defence policies based on safe game theory |
Non-Patent Citations (2)
Title |
---|
CAIXIA XIE 等: "User-centric view of smart attacks in wireless networks", 《2016 IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS WIRELESS BROADBAND (ICUWB)》 * |
SHANSHAN TU 等: "Security in Fog Computing: A Novel Technique to Tackle an Impersonation Attack", 《IEEE ACCESS》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110753383A (en) * | 2019-07-24 | 2020-02-04 | 北京工业大学 | Safe relay node selection method based on reinforcement learning in fog calculation |
CN110401675A (en) * | 2019-08-20 | 2019-11-01 | 绍兴文理学院 | Uncertain ddos attack defence method under a kind of sensing cloud environment |
WO2022151579A1 (en) * | 2021-01-13 | 2022-07-21 | 清华大学 | Backdoor attack active defense method and device in edge computing scene |
CN114448660A (en) * | 2021-12-16 | 2022-05-06 | 国网江苏省电力有限公司电力科学研究院 | Internet of things data access method |
CN114666107A (en) * | 2022-03-04 | 2022-06-24 | 北京工业大学 | Advanced persistent threat defense method in mobile fog computing |
Also Published As
Publication number | Publication date |
---|---|
CN110049497B (en) | 2022-09-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110049497A (en) | A kind of user oriented intelligent attack defense method in mobile mist calculating | |
CN108512837A (en) | A kind of method and system of the networks security situation assessment based on attacking and defending evolutionary Game | |
Sagduyu et al. | Jamming games in wireless networks with incomplete information | |
Gianvecchio et al. | Battle of botcraft: fighting bots in online games with human observational proofs | |
Min et al. | Defense against advanced persistent threats in dynamic cloud storage: A colonel blotto game approach | |
CN110166428B (en) | Intelligent defense decision-making method and device based on reinforcement learning and attack and defense game | |
CN107147670A (en) | APT defence methods based on game system | |
CN108833402A (en) | A kind of optimal defence policies choosing method of network based on game of bounded rationality theory and device | |
CN108898010A (en) | A method of establishing the attacking and defending Stochastic Game Model towards malicious code defending | |
CN109589607A (en) | A kind of game anti-cheating method and game anti-cheating system based on block chain | |
CN111064702B (en) | Active defense strategy selection method and device based on bidirectional signal game | |
CN110417733A (en) | Attack Prediction method, apparatus and system based on QBD attacking and defending random evolution betting model | |
Abdalzaher et al. | Using Stackelberg game to enhance node protection in WSNs | |
CN110278198A (en) | The safety risk estimating method of assets in network based on game theory | |
CN107517200A (en) | A kind of malice reptile defence policies system of selection of Web server | |
CN105100017A (en) | LDoS attack detection method based on signal cross correlation | |
CN112329009A (en) | Defense method for noise attack in joint learning | |
Estiri et al. | A game-theoretical model for intrusion detection in wireless sensor networks | |
Wu et al. | I-CIFA: An improved collusive interest flooding attack in named data networking | |
CN114666107A (en) | Advanced persistent threat defense method in mobile fog computing | |
CN109787996B (en) | Camouflage attack detection method based on DQL algorithm in fog calculation | |
Seredynski et al. | Evolutionary game theoretical analysis of reputation-based packet forwarding in civilian mobile ad hoc networks | |
Yang et al. | Dishonest behaviors in online rating systems: cyber competition, attack models, and attack generator | |
Zhang et al. | A multi-criteria detection scheme of collusive fraud organization for reputation aggregation in social networks | |
Miller | Distributed virtual environment scalability and security |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |