CN115796300A - Security state game method, system, terminal and medium for CPSs under DoS attack - Google Patents

Security state game method, system, terminal and medium for CPSs under DoS attack Download PDF

Info

Publication number
CN115796300A
CN115796300A CN202211347995.XA CN202211347995A CN115796300A CN 115796300 A CN115796300 A CN 115796300A CN 202211347995 A CN202211347995 A CN 202211347995A CN 115796300 A CN115796300 A CN 115796300A
Authority
CN
China
Prior art keywords
dos attack
under
state
remote
attack
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211347995.XA
Other languages
Chinese (zh)
Inventor
金增旺
李倩
张淑婷
张艳宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202211347995.XA priority Critical patent/CN115796300A/en
Publication of CN115796300A publication Critical patent/CN115796300A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Mobile Radio Communication Systems (AREA)

Abstract

The invention relates to the technical field of security of information physical systems, and discloses a security state game method, system, terminal and medium for CPSs under DoS attack. The method is based on a system measurement sequence and a prior model, utilizes a Kalman filtering method for recursive updating, and updates the estimation in real time according to the attack condition of the system so as to ensure the accuracy and the effectiveness of the estimation. And solving the Nash equilibrium strategy of the sensor and the attacker by utilizing a linear programming method at each time step, so that the sensor and the attacker can make optimal selection in real time. The invention aims to research a reinforcement learning algorithm from two angles of a reliable channel and an unreliable channel and respectively provides the reinforcement learning algorithm based on the safety state estimation. Nash equilibrium strategies of both attacking and defending parties can be obtained through an algorithm so as to guide decision selection of a sensor and an attacker.

Description

Security state game method, system, terminal and medium for CPSs under DoS attack
Technical Field
The invention relates to the technical field of information physical system security, in particular to a security state game method, system, terminal and medium for CPSs under DoS attack.
Background
Cyber-physical systems (CPSs) realize the close integration of computing resources and physical resources to perform the functions of real-time sensing, remote control, information interaction and the like. Among them, wireless sensors are widely used in key infrastructure control, aerospace systems, military systems, and other fields due to their flexibility of use, power consumption saving, and ease of expansion. However, the sensor measurement data is transmitted through the wireless network, which increases the communication efficiency and brings a certain potential safety hazard, and is susceptible to malicious network attacks such as Denial of Service (DoS) attack, spoofing attack, injection attack, and the like. Among them, doS attacks mainly prevent the remote estimator from correctly receiving and processing sensor data through an interference channel, and are one of the most common and most easily implemented network attacks, which seriously threaten the security of an cyber-physical system.
In the game of the cyber-physical system and the attacker, the goal of the sensor is to maximize the attack cost while minimizing the state estimation error, and the goal of the attacker is the opposite. The confrontation of the sensor and the attacker is considered as a two-player zero-and-deterministic game problem, as the aggressor's revenues are from the loss of the sensor. Because the state of an information physical system needs to be sensed and updated in real time, the static game method in the existing literature cannot accurately research the safety state estimation problem.
In addition, in an cyber-physical system, how sensors and attackers improve decision efficiency and accuracy is also one of the important points of research. Reinforcement learning methods have attracted extensive attention as one of the important branches of artificial intelligence, which is mainly to study how an agent learns optimal strategies through interaction with an unknown environment. In many documents, reinforcement learning methods are used to solve the problem of gaming between attackers and defenders of systems in cyber-physical systems. However, packet loss may result in such unreliable channels due to various reasons including signal degradation, channel fading, and channel congestion in the cyber-physical system.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention aims to provide a secure state gaming method, a system, a terminal and a medium for CPSs under DoS attack, so as to solve the technical problems of low security performance and low decision efficiency and accuracy of an information physical system in the prior art.
The invention is realized by the following technical scheme:
a security state game method for CPSs under DoS attack comprises the following steps:
step 1, establishing a wireless channel remote security estimation model under DoS attack;
step 2, inputting system parameters in a wireless channel remote safety estimation model under DoS attack, and recursively updating the minimum mean square error estimator and the estimation error covariance of the system state vector according to a Kalman filtering equation;
step 3, judging that the sensor and the DoS attack select the action according to the probability of epsilon in the wireless channel remote security estimation model under the DoS attack, and selecting the optimal action according to the probability of 1-;
step 4, judging different channels in the wireless channel remote safety estimation model under the DoS attack to calculate the instant reward under the current state and action combination and the next moment state;
step 5, updating a Q value function in the wireless channel remote safety estimation model under DoS attack, and further updating a Q value matrix;
step 6, calculating and comparing the updated Q cost function with the Q cost function in the current state, and returning to execute the step 3 again when the difference between the updated Q cost function value and the Q value in the current state is greater than a set threshold eta; and otherwise, a converged Q value matrix and an optimal strategy based on Nash equilibrium are obtained, and the CPSs security state game under DoS attack is completed.
Preferably, in step 1, the wireless channel remote security estimation model under DoS attack comprises an cyber-physical system, a sensor, a channel, a remote estimator and an attacker; the cyber-physical system inputs a sensor measurement vector y (k) to a sensor, and the sensor outputs a minimum mean square error estimate
Figure BDA0003918869460000021
The channel comprises a safe transmission channel and an unsafe transmission channel; the channel transmits the sensor data to the remote estimator, and the remote estimator outputs the system safety state of the remote estimator
Figure BDA00039188694600000310
An attacker performs DoS attack on the channel in an attack or non-attack mode; the remote estimator feeds back signals to the sensor and the attacker respectively.
Preferably, in step 2, system parameters are input into the wireless channel remote security estimation model under DoS attack, where the system parameters include: initializing state, action, Q value matrix of system, probability xi of packet loss under different action combination k Learning rate α, discount factor ρ, and exploration rate ε.
Preferably, in step 2, the minimum mean square error estimator of the system state vector is updated recursively according to the kalman filter equation
Figure BDA00039188694600000311
And estimating the error covariance P (k) in the following way:
estimating the system state by state recursion updating by using a local Kalman filter, wherein the minimum mean square error estimator of a system state vector x (k) at each time k
Figure BDA0003918869460000031
The data is obtained by measuring according to a Kalman filter, and the expression is as follows:
Figure BDA0003918869460000032
the corresponding estimation error covariance is:
Figure BDA0003918869460000033
recursively updating according to Kalman filtering equations
Figure BDA0003918869460000034
And P (k), the recursive update equation for the kalman filter is as follows:
Figure BDA0003918869460000035
P(k|k-1)=h(P(k-1))
K(k)=P(k|k-1)C T [CP(k|k—1)C T +R] -1
Figure BDA0003918869460000036
P(k)=g(P(k|k-1))
wherein,
Figure BDA0003918869460000037
the minimum mean square error of a state vector x (k), P (k) is the covariance of the corresponding estimation error, A, B and C are coefficient matrixes, h is a Lyapunov operator and g is a Riccati operator;
the calculation formula of the Lyapunov operator and the calculation formula of the g Riccati operator are as follows:
Figure BDA0003918869460000038
Figure BDA0003918869460000039
wherein A is a coefficient matrix; q is the process error covariance matrix.
Preferably, in step 3, the data transmission mode of the sensor includes two data transmission modes, including a first data transmission mode
Figure BDA0003918869460000041
Indicating a secure transmission; second data transmission mode
Figure BDA0003918869460000042
Indicating an unsecured transmission; the attack mode of the DoS attacker comprises attack or non-attack; wherein the second data transmission mode
Figure BDA0003918869460000043
Representing a DoS attack on the channel; first data transmission mode
Figure BDA0003918869460000044
Representing the attacker chooses not to attack; wherein the sensor and the attacker can select the action according to the exploration rate epsilon, randomly select the action according to the probability of epsilon, and select the optimal action according to the probability of 1-epsilon.
Preferably, in step 4, different channels are judged to calculate the instant reward and the next time state under the combination of the current state and the action, wherein the instant reward and the next time state comprise data packet transmission under a reliable channel and data under an unreliable channel;
the data packet transmission expression under the reliable channel is as follows:
Figure BDA0003918869460000045
the state of the remote estimator at the next moment is expressed as follows:
Figure BDA0003918869460000046
at time k, the instant prize r of the system k The expression is as follows
Figure BDA0003918869460000047
The expression of data packet transmission under the unreliable channel is as follows:
Figure BDA0003918869460000048
the next state of the remote estimator is expressed as follows:
Figure BDA0003918869460000049
at time k, the instant prize r of the system k The expression is as follows:
Figure BDA0003918869460000051
where E [ P (k) ] represents the average expectation of the remote estimation error covariance P (k), and the specific expression is as follows:
Figure BDA0003918869460000052
wherein ξ k Which indicates the probability of a packet being lost,
Figure BDA0003918869460000053
and P (k) represents a steady state to which Kalman filtering converges, P (k) represents a remote estimation error covariance, and h represents a Lyapunov operator.
Preferably, in step 5, the Q cost function is updated in the wireless channel remote security estimation model under DoS attack, where the Q cost function update expression is as follows:
Figure BDA0003918869460000054
wherein alpha is k The values are (0-1) for learning rate, p is a discount factor, r k Is the reward for the current state.
A security state game system facing CPSs under DoS attack comprises
The model establishing module is used for establishing a wireless channel remote security estimation model under the DoS attack;
the system comprises a first processing module, a second processing module and a third processing module, wherein the first processing module is used for inputting system parameters in a wireless channel remote safety estimation model under DoS attack and recursively updating the minimum mean square error estimator and the estimation error covariance of a system state vector according to a Kalman filtering equation;
the second processing module is used for judging that the sensor and the DoS attack select the action according to the probability of epsilon in the wireless channel remote security estimation model under the DoS attack and selecting the optimal action according to the probability of 1-epsilon;
the third processing module is used for judging different channels in the wireless channel remote security estimation model under the DoS attack to calculate instant rewards and a next moment state under the current state and action combination;
the fourth processing module is used for updating the Q value function in the wireless channel remote safety estimation model under the DoS attack so as to update the Q value matrix;
the comparison module is used for calculating an updated Q value function and comparing the updated Q value function with the Q value function in the current state, and when the difference between the updated Q value function and the Q value in the current state is greater than a set threshold eta, returning to execute the step 3 again; and otherwise, a converged Q value matrix and an optimal strategy based on Nash equilibrium are obtained, and the CPSs security state game under DoS attack is completed.
A mobile terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method for gaming security states of CPSs under DoS attack according to any one of claims 1 to 7.
A computer-readable storage medium storing a computer program, which when executed by a processor implements the steps of a method for playing a game of security states of CPSs under DoS attack as claimed in any one of claims 1 to 7.
Compared with the prior art, the invention has the following beneficial technical effects:
the invention provides a security state game method for CPSs under DoS attack, which finds the best selection strategy of an attacker and a sensor in an information physical system under the DoS attack by adjusting the strategies of the sensor and the attacker. The method is based on a system measurement sequence and a prior model, utilizes a Kalman filtering method for recursive updating, and updates the estimation in real time according to the attack condition of the system so as to ensure the accuracy and the effectiveness of the estimation. According to the invention, the Nash equilibrium strategy of the sensor and the attacker is obtained by utilizing a linear programming method at each time step, so that the sensor and the attacker can make optimal selection in real time. Secondly, the invention aims to research the reinforcement learning algorithm from two angles of reliable channels and unreliable channels, and respectively provides the reinforcement learning algorithm based on the safety state estimation. The Nash equilibrium strategy of the attacking and defending parties can be obtained through the algorithm so as to guide decision selection of the sensor and the attacker, and the safety performance of the information physical system is effectively improved.
Drawings
FIG. 1 is a flowchart of a security state gaming method for CPSs under DoS attack according to the present invention;
FIG. 2 is a schematic diagram of a wireless channel remote security estimation model under DoS attack in the present invention;
FIG. 3 is a diagram of the transition process of the Markov chain stochastic state sequence P (k) in the present invention;
FIG. 4 is a diagram illustrating the variation of Tr (P (k)) under a reliable channel in the present invention;
FIG. 5 is a diagram illustrating a learning process of a Q merit function under a reliable channel in the present invention;
FIG. 6 is a diagram illustrating the variation of Tr (P (k)) under unreliable channels in accordance with the present invention;
FIG. 7 is a diagram illustrating a learning process of a Q merit function under unreliable channels according to the present invention;
fig. 8 is a flowchart of implementation of a security state estimation-based reinforcement learning algorithm for an cyber-physical system under denial of service attack in an embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention is described in further detail below with reference to the accompanying drawings:
the invention aims to provide a security state game method, a security state game system, a security state game terminal and a security state game medium for CPSs under DoS attack, and aims to solve the technical problems of low security performance and low decision efficiency and accuracy of an information physical system in the prior art.
Specifically, the secure state game method for the CPSs under DoS attack comprises the following steps:
step 1, establishing a wireless channel remote security estimation model under DoS attack;
specifically, the wireless channel remote security estimation model under the DoS attack comprises an information physical system, a sensor, a channel, a remote estimator and an attacker; the cyber-physical system inputs a sensor measurement vector y (k) to a sensor, and the sensor outputs a minimum mean square error estimate
Figure BDA0003918869460000071
The channel comprises a safe transmission channel and an unsafe transmission channel; the channel transmits the data of the sensor to the remote estimator, the remote estimator outputs the system safety state of the remote estimator,
Figure BDA0003918869460000072
an attacker performs DoS attack on the channel in an attack or non-attack mode; the remote estimator feeds back signals to the sensor and the attacker respectively.
The DoS attack may occur on a wireless communication channel between the sensor and the remote estimator, and at the time k, the expression of a linear system model under the DoS attack is
Figure BDA0003918869460000081
Wherein
Figure BDA0003918869460000082
The information physical system model is shown to be a linear discrete time system, w (k) and v (k) represent a processing noise matrix and a measurement noise matrix with the mean value of zero, and A, B and H are coefficient matrixes with known and corresponding dimensions.
Step 2, inputting system parameters in a wireless channel remote safety estimation model under DoS attack, and recursively updating the minimum mean square error estimator and the estimation error covariance of the system state vector according to a Kalman filtering equation;
specifically, system parameters are input into a wireless channel remote security estimation model under DoS attack, wherein the system parameters comprise: initializing state, action, Q value matrix of system, probability xi of packet loss under different action combination k Learning rate α, discount factor ρ, and exploration rate ε.
Wherein the minimum mean square error estimator of the system state vector is updated recursively according to Kalman filtering equations
Figure BDA0003918869460000083
And estimating the error covariance P (k) in the following way:
the estimation of the system state is performed by using a local Kalman filter to perform a recursive update of the system state, and the minimum mean square error estimator of the system state vector x (k) at each time k
Figure BDA0003918869460000084
The data obtained by measurement of the Kalman filter is operated, and the expression is as follows:
Figure BDA0003918869460000085
the corresponding estimation error covariance is:
Figure BDA0003918869460000086
recursively updating according to Kalman filtering equations
Figure BDA0003918869460000087
And P (k), the Lyapunov and Riccati operators h, g defining the formula as follows for simplicity:
Figure BDA0003918869460000088
Figure BDA0003918869460000089
the recursive update equation for the kalman filter is as follows:
Figure BDA0003918869460000091
P(k|k-1)=h(P(k-1)) (7)
K(k)=P(k|k-1)C T [CP(k|k-1)C T +R] -1 (8)
Figure BDA0003918869460000092
P(k)=g(P(k|k-1)) (10)
wherein,
Figure BDA0003918869460000093
is the minimum mean square error of the state vector x (k), P (k) is the corresponding estimation error covariance, a, B, C are the coefficient matrices, h and g are the Lyapunov operator and Riccati operator defined below:
Figure BDA0003918869460000094
Figure BDA0003918869460000095
step 3, judging that the sensor and the DoS attack select the action according to the probability of epsilon in a wireless channel remote security estimation model under the DoS attack, and selecting the optimal action according to the probability of 1-epsilon;
specifically, the data transmission mode of the sensor comprises two data transmission modes, including a first data transmission mode
Figure BDA0003918869460000096
Indicating a secure transmission; second data transmission mode
Figure BDA0003918869460000097
Indicating an unsecured transmission; the attack mode of the DoS attacker comprises attack or non-attack; wherein the second data transmission mode
Figure BDA0003918869460000098
Representing a DOS attack on the channel; first data transmission mode
Figure BDA0003918869460000099
Representing the attacker chooses not to attack; wherein the sensor and the attacker can select the action according to the exploration rate epsilon, randomly select the action according to the probability of epsilon, and select the optimal action according to the probability of 1-epsilon.
In an cyber-physical system, a packet loss may be due to a DoS attack or other causes such as signal degradation, channel fading, and channel congestion. To illustrate whether a packet has arrived successfully at a remote controller, we define an arrival indicator δ k Comprises the following steps:
Figure BDA00039188694600000910
remote estimator system safe state output
Figure BDA00039188694600000911
The estimation process of (2) is: if the estimator successfully receives the data packet, it will receive the data synchronously to obtain the estimation, otherwise, the estimator will obtain the estimation according to the best estimation value obtained at the last time, and the formula is as follows:
Figure BDA0003918869460000101
wherein,
Figure BDA0003918869460000102
is the minimum mean square error of the state vector x (k) and a is the coefficient matrix.
The corresponding estimation error covariance is
Figure BDA0003918869460000103
To simplify the error covariance P (k), the time interval τ k Is defined in the following formula
Figure BDA0003918869460000104
Time interval tau k Represents the time interval from the last time a packet was received i to the current time k, so τ k Can be expressed as
Figure BDA0003918869460000105
Next, assume that the remote estimator can successfully receive the data packet at the beginning of the transmission
Figure BDA0003918869460000106
That is to say delta 0 =1, estimated error covariance of remote estimator can be easily obtained based on the above equations (13) and (14)
Figure BDA0003918869460000107
Based on the state transition process of the Markov chain in FIG. 3, we can observe that the state can only be changed from
Figure BDA0003918869460000108
Is shifted to the next adjacent state
Figure BDA0003918869460000109
Or to the original state
Figure BDA00039188694600001010
Step 4, judging different channels in the wireless channel remote safety estimation model under DoS attack to calculate the instant reward and the next moment state under the combination of the current state and the action;
specifically, different channels are judged to calculate the instant reward under the current state and action combination and the next moment state, wherein the current state and the action combination comprise data packet transmission under a reliable channel and data under an unreliable channel;
the data packet transmission expression under the reliable channel is as follows:
Figure BDA00039188694600001011
the state of the remote estimator at the next moment is expressed as follows:
Figure BDA00039188694600001012
at time k, the instant prize r of the system k The expression is as follows
Figure BDA0003918869460000111
The expression of data packet transmission under the unreliable channel is as follows:
Figure BDA0003918869460000112
when the sensor selects unsafe transmission and the attacker selects attack, the selection of the two increases the probability of data packet loss, and the DoS attack is the main reason of the packet loss and has larger influence on the data packet loss than other reasons, so that p is provided 2 >p 4 >p 1 >p 3
The next state of the remote estimator is expressed as follows:
Figure BDA0003918869460000113
at time k, the instant prize r of the system k The expression is as follows:
Figure BDA0003918869460000114
where E [ P (k) ] represents the average expectation of the remote estimation error covariance P (k), and the specific expression is as follows:
Figure BDA0003918869460000115
wherein ξ k Which indicates the probability of a packet being lost,
Figure BDA0003918869460000116
representing the steady state to which kalman filtering converges, P (k) is the remote estimation error covariance, h is the Lyapunov operator defined below:
Figure BDA0003918869460000117
step 5, updating a Q value function in the wireless channel remote safety estimation model under DoS attack, and further updating a Q value matrix;
specifically, a Q cost function is updated in the wireless channel remote security estimation model under DoS attack, wherein the Q cost function update expression is as follows:
Figure BDA0003918869460000118
wherein alpha is k The values are (0-1) for learning rate, p is a discount factor, r k Is a reward for the current status.
Step 6, calculating an updated Q value function, comparing the updated Q value function with the Q value function in the current state, and returning to execute the step 3 again when the difference between the updated Q value function and the Q value in the current state is greater than a set threshold eta; and otherwise, a converged Q value matrix and an optimal strategy based on Nash equilibrium are obtained, and the CPSs security state game under DoS attack is completed.
In the invention, the relevant parameters of the information physical system model are
Figure BDA0003918869460000121
Wherein, A and C are coefficient transfer matrixes, Q is a process error covariance matrix, and R is a measurement noise covariance matrix.
The Kalman filter converges to a steady state with an error covariance of
Figure BDA0003918869460000122
Wherein
Figure BDA0003918869460000123
Cost of setting sensor to select secure transmission c s =6.7, cost consumption c for setting up the attacker selection attack a =8, and the learning rate α =0.9 and the discount factor ρ =0.6 are set, respectively.
The kalman filter and Q learning algorithm were performed 5000 iterations, where in the first 500 iterations, the value of epsilon was set to decrease gradually from 1 in order to fully explore the reward for each action combination, and in the subsequent iterations, epsilon was set to 0.1, and the optimal action was selected with a probability of 0.9, and one action was randomly selected with a probability of 0.1.
Simulation sample based on safety state estimation reinforcement learning algorithm under reliable channel in invention
According to fig. 4, the variation of Tr (P (k)) under the reliable channel; the occurrence of packet loss situations and changes in state during the learning process can be observed. Tr (P (k)) =3, tr (P (k)) =5.6, tr (P (k)) =9.6 respectively represent
Figure BDA0003918869460000124
In the first 500 iteration cycles, the selection of the action is random, so the packet loss frequency is higher; in the subsequent iteration process, the sensor and the attacker select the optimal action of the current state, and the maximum number of consecutive data packet losses is 2.
According to FIG. 5, the states under the reliable channel are
Figure BDA0003918869460000125
And (4) a learning process of the Q value function. From fig. 5, it can be derived that the Q-value matrix converges to the optimal action-value function through the Q-learning algorithm, and the specific contents are shown in table 1.
Figure BDA0003918869460000131
TABLE 1 optimal action function under reliable channels
According to the nash equilibrium strategy, nash equilibrium points exist in any finite game, and from the Q value matrix, we can obtain nash equilibrium points, see Table 2.
Figure BDA0003918869460000132
TABLE 2 Nash equalization strategy for sensors and attackers under reliable channels
By calculating the packet loss occurrence rate in the 500-5000 iteration processes, the probability of losing two continuous data packets is 0.04%, and the probability of losing a single data packet is 2.44%.
Simulation sample based on safety state estimation reinforcement learning algorithm under unreliable channel
Transmitting data under unreliable channel, the reason of data packet loss is other than DoS attack, and the probability of data packet under different combination of sensor and attacker is set up as shown in the following expression
Figure BDA0003918869460000133
As shown in fig. 6, the change of Tr (P (k)) under the unreliable channel can observe that the probability of packet loss is much larger than that of transmission under the reliable channel, and the maximum number of consecutive data packet losses is 2 after 500 iteration cycles.
According to FIG. 7, the unreliable channel states are
Figure BDA0003918869460000134
And (4) learning process of the Q value function. From fig. 6, we can find that the Q-value matrix converges to the optimal action function through the Q-learning algorithm, and the specific content is shown in table 3.
Figure BDA0003918869460000135
Figure BDA0003918869460000141
TABLE 3 optimal action function under unreliable channel
According to the Nash equalization strategy, nash equalization points exist in any finite set, and from the Q-value matrix, we can obtain Nash equalization points, see Table 4.
Figure BDA0003918869460000142
TABLE 4 Nash equalization strategy for sensors and attackers under unreliable channels
By calculating the packet loss occurrence rate in the 500-5000 iteration processes, the probability of losing two continuous data packets is 0.42%, and the probability of losing a single data packet is 3.76%.
Examples
According to fig. 8, this embodiment provides a flowchart of implementing a reinforcement learning algorithm based on security state estimation for an cyber-physical system under a denial of service attack, where the method is implemented by the following steps:
step 1: inputting system parameters, initializing system state, action, Q value matrix, probability xi of packet loss under different action combinations k Learning rate α, discount factor ρ, and exploration rate ε, assuming a system initialization state of
Figure BDA0003918869460000143
And set the system time k toAt 1, the safety state estimation of the system is started. Recursively updating minimum mean square error estimates of system state vectors according to Kalman filtering equations
Figure BDA0003918869460000148
And an estimated error covariance P (k).
Step 2: a DoS attack may occur on the channel between the sensor and the remote estimator, and whether a DoS attack occurs depends on different choices of the sensor and the attacker. For the sensor, there are two options for sending data,
Figure BDA0003918869460000144
the cost of sending data packets by the sensor is 0, but the data packets are easily attacked by DoS (denial of service);
Figure BDA0003918869460000145
representing an unsecured transmission, the sensor needs to spend an extra cost cs to avoid the attack. On the other hand, the DoS attacker can choose to attack or not attack,
Figure BDA0003918869460000146
indicating that DoS attacks on the channel will cause an additional fixed cost ca;
Figure BDA0003918869460000147
indicating that the attacker chooses not to attack. The sensor and attacker can choose actions according to the exploration rate epsilon, choose actions randomly with a probability of epsilon, and choose the best action with a probability of 1-epsilon.
And 3, step 3: the loss of data packets is not only related to the choice of sensors and attackers, but also to the reliability of the communication channel. When a reliable channel is selected, the data packet is lost only due to DoS attack, and the instant reward r of the system under the combination of the current state and the current action due to other reasons k Can be calculated according to the formula (18) and the next time state s can be calculated according to the formula (17) k+1 (ii) a When unreliable channels are selected, packets are lost not only due to DoS attacks, but also due toOther reasons, the immediate reward r of the system under the combination of the current state and the current action k Can be calculated according to the formula (22), and the next time state s can be calculated according to the formula (21) k+1
And 4, step 4: the Q cost function, and hence the Q cost matrix, is updated according to equation (19). Since the system has n states, 4 combinations for each state sensor and attacker, the Q-worth matrix is n rows and 4 columns.
And 5: and (4) comparing the updated Q value function calculated according to the step (4) with the Q value function in the current state, and judging whether the condition of cycle termination is met.
Step 6: when the termination condition is not met, returning to the step 2; and when the termination condition is met, obtaining a converged Q value matrix and an optimal strategy based on Nash equilibrium.
The invention also provides a security state game system for the CPSs under the DoS attack, which comprises a model establishing module, a first processing module, a second processing module, a third processing module, a fourth processing module and a comparison module;
the model establishing module is used for establishing a wireless channel remote safety estimation model under DoS attack;
the system comprises a first processing module, a second processing module and a third processing module, wherein the first processing module is used for inputting system parameters in a wireless channel remote safety estimation model under DoS attack and recursively updating the minimum mean square error estimator and the estimation error covariance of a system state vector according to a Kalman filtering equation;
the second processing module is used for judging that the sensor and the DoS attack select the action according to the probability of epsilon in the wireless channel remote security estimation model under the DoS attack and selecting the optimal action according to the probability of 1-epsilon;
the third processing module is used for judging different channels in the wireless channel remote security estimation model under the DoS attack to calculate instant rewards and a next moment state under the current state and action combination;
the fourth processing module is used for updating the Q value function in the wireless channel remote safety estimation model under the DoS attack so as to update the Q value matrix;
the comparison module is used for calculating an updated Q value function and comparing the updated Q value function with the Q value function in the current state, and when the difference between the updated Q value function and the Q value in the current state is greater than a set threshold eta, returning to execute the step 3 again; and otherwise, a converged Q value matrix and an optimal strategy based on Nash equilibrium are obtained, and the CPSs security state game under DoS attack is completed.
The invention also provides a mobile terminal, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, such as a security state game program for the CPSs under the DoS attack.
When the processor executes the computer program, the steps of the secure state game method for the CPSs under DoS attack are implemented, for example:
step 1, establishing a wireless channel remote security estimation model under DoS attack;
step 2, inputting system parameters in a wireless channel remote safety estimation model under DoS attack, and recursively updating the minimum mean square error estimator and the estimation error covariance of a system state vector according to a Kalman filtering equation;
step 3, judging that the sensor and the DoS attack select the action according to the probability of epsilon in the wireless channel remote security estimation model under the DoS attack, and selecting the optimal action according to the probability of 1-epsilon;
step 4, judging different channels in the wireless channel remote safety estimation model under DoS attack to calculate the instant reward and the next moment state under the combination of the current state and the action;
step 5, updating a Q value function in the wireless channel remote safety estimation model under the DoS attack, and further updating a Q value matrix;
step 6, calculating an updated Q value function, comparing the updated Q value function with the Q value function in the current state, and returning to execute the step 3 again when the difference between the updated Q value function and the Q value in the current state is greater than a set threshold eta; and otherwise, a converged Q value matrix and an optimal strategy based on Nash balance are obtained, and the secure state game of the CPSs under the DoS attack is completed.
Alternatively, the processor implements the functions of the modules in the system when executing the computer program, for example: the model establishing module is used for establishing a wireless channel remote safety estimation model under DoS attack;
the first processing module is used for inputting system parameters in the wireless channel remote safety estimation model under the DoS attack and recursively updating the minimum mean square error estimator and the estimation error covariance of the system state vector according to a Kalman filtering equation;
the second processing module is used for judging that the sensor and the DoS attack select the action according to the probability of epsilon in the wireless channel remote security estimation model under the DoS attack and selecting the optimal action according to the probability of 1-epsilon;
the third processing module is used for judging different channels in the wireless channel remote security estimation model under the DoS attack to calculate instant rewards and a next moment state under the current state and action combination;
the fourth processing module is used for updating the Q value function in the wireless channel remote safety estimation model under the DoS attack so as to update the Q value matrix;
the comparison module is used for calculating and comparing the updated Q value function with the Q value function in the current state, and returning to execute the step 3 again when the difference between the updated Q value function and the Q value in the current state is greater than a set threshold eta; and otherwise, a converged Q value matrix and an optimal strategy based on Nash equilibrium are obtained, and the CPSs security state game under DoS attack is completed.
Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory and executed by the processor to implement the invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program in the mobile terminal. For example, the computer program may be partitioned into a model building module, a first processing module, a second processing module, a third processing module, a fourth processing module, and a comparison module;
the specific functions of each module are as follows: the model establishing module is used for establishing a wireless channel remote security estimation model under the DoS attack;
the first processing module is used for inputting system parameters in the wireless channel remote safety estimation model under the DoS attack and recursively updating the minimum mean square error estimator and the estimation error covariance of the system state vector according to a Kalman filtering equation;
the second processing module is used for judging that the sensor and the DoS attack select the action according to the probability of epsilon in the wireless channel remote security estimation model under the DoS attack and selecting the optimal action according to the probability of 1-epsilon;
the third processing module is used for judging different channels in the wireless channel remote security estimation model under the DoS attack to calculate instant rewards and a next moment state under the current state and action combination;
the fourth processing module is used for updating the Q value function in the wireless channel remote safety estimation model under the DoS attack so as to update the Q value matrix;
the comparison module is used for calculating an updated Q value function and comparing the updated Q value function with the Q value function in the current state, and when the difference between the updated Q value function and the Q value in the current state is greater than a set threshold eta, returning to execute the step 3 again; and otherwise, a converged Q value matrix and an optimal strategy based on Nash equilibrium are obtained, and the CPSs security state game under DoS attack is completed.
The mobile terminal can be a desktop computer, a notebook, a palm computer, a cloud server and other computing equipment. The mobile terminal may include, but is not limited to, a processor, a memory.
The processor may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is the control center of the mobile terminal and connects the various parts of the entire mobile terminal using various interfaces and lines.
The memory may be used to store the computer programs and/or modules, and the processor may implement various functions of the mobile terminal by executing or executing the computer programs and/or modules stored in the memory and calling data stored in the memory.
The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash memory card (FlashCard), at least one magnetic disk storage device, a flash memory device, or other volatile solid state storage device.
The invention also provides a computer readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the steps of the method for playing games of security states of CPSs under DoS attack are implemented.
The mobile terminal integrated module/unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer-readable storage medium.
Based on such understanding, all or part of the processes in the foregoing method can be implemented by a computer program to instruct related hardware, where the computer program can be stored in a computer readable storage medium, and when the computer program is executed by a processor, the steps of the security state gaming method for CPSs under DoS attack can be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc.
The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc.
It should be noted that the computer-readable medium may contain suitable additions or subtractions depending on the requirements of legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer-readable media may not include electrical carrier signals or telecommunication signals in accordance with legislation and patent practice.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (10)

1. A security state game method for CPSs under DoS attack is characterized by comprising the following steps:
step 1, establishing a wireless channel remote security estimation model under DoS attack;
step 2, inputting system parameters in a wireless channel remote safety estimation model under DoS attack, and recursively updating the minimum mean square error estimator and the estimation error covariance of the system state vector according to a Kalman filtering equation;
step 3, judging that the sensor and the DoS attack select the action according to the probability of epsilon in the wireless channel remote security estimation model under the DoS attack, and selecting the optimal action according to the probability of 1-epsilon;
step 4, judging different channels in the wireless channel remote safety estimation model under DoS attack to calculate the instant reward and the next moment state under the combination of the current state and the action;
step 5, updating a Q value function in the wireless channel remote safety estimation model under DoS attack, and further updating a Q value matrix;
step 6, calculating and comparing the updated Q cost function with the Q cost function in the current state, and returning to execute the step 3 again when the difference between the updated Q cost function value and the Q value in the current state is greater than a set threshold eta; and otherwise, a converged Q value matrix and an optimal strategy based on Nash balance are obtained, and the secure state game of the CPSs under the DoS attack is completed.
2. The DoS attack-oriented security state gaming method for the CPSs, as claimed in claim 1, wherein in step 1, the DoS attack-oriented remote security estimation model for the wireless channel comprises an cyber-physical system, a sensor, a channel, a remote estimator and an attacker; the cyber-physical system inputs a sensor measurement vector y (k) to a sensor, and the sensor outputs a minimum mean square error estimate
Figure FDA0003918869450000011
To the channel, the channel comprises a secure transmission channel and an insecure transmission channel; the channel transmits the data of the sensor to the remote estimator, and the remote estimator outputs the system safety state of the remote estimator
Figure FDA0003918869450000012
An attacker performs DoS attack on the channel in an attack or non-attack mode; the remote estimator feeds back signals to the sensor and the attacker respectively.
3. The DoS attack-oriented security state gaming method for the CPSs, as claimed in claim 1, wherein in step 2, system parameters are input into the remote security estimation model of the wireless channel under the DoS attack, wherein the system parameters include: initializing state, actions, Q-value matrix, probability of packet loss under different action combinations of systemRate xi k Learning rate α, discount factor ρ, and exploration rate ε.
4. The method for playing the game of the security states of the CPSs under the DoS attack as claimed in claim 1, wherein in step 2, the minimum mean square error estimator of the system state vector is updated recursively according to the kalman filter equation
Figure FDA0003918869450000021
And estimating the error covariance P (k) in the following way:
the estimation of the system state is performed by using a local Kalman filter to perform a recursive update of the system state, and the minimum mean square error estimator of the system state vector x (k) at each time k
Figure FDA0003918869450000022
The data obtained by measurement of the Kalman filter is operated, and the expression is as follows:
Figure FDA0003918869450000023
the corresponding estimation error covariance is:
Figure FDA0003918869450000024
recursively updating according to Kalman Filter equations
Figure FDA0003918869450000025
And P (k), the recursive update equation for the kalman filter is as follows:
Figure FDA0003918869450000026
P(k|k-1)=h(P(k-1))
K(k)=P(k|k-1)C T [CP(k|k-1)C T +R] -1
Figure FDA0003918869450000027
P(k)=g(P(k|k-1))
wherein,
Figure FDA0003918869450000028
the minimum mean square error of a state vector x (k), P (k) is the covariance of the corresponding estimation error, A, B and C are coefficient matrixes, h is a Lyapunov operator and g is a Riccati operator;
the calculation formula of the Lyapunov operator and the g is Riccati operator is as follows:
Figure FDA0003918869450000029
Figure FDA00039188694500000210
wherein A is a coefficient matrix; q is the process error covariance matrix.
5. The DoS attack CPSs security status gaming method of claim 1, wherein in step 3, the sensor data transmission mode comprises two data transmission modes, including a first data transmission mode
Figure FDA00039188694500000211
Indicating a secure transmission; second data transmission mode
Figure FDA00039188694500000212
Indicating an unsecured transmission; the attack mode of the DoS attacker comprises attack or non-attack; wherein the second data transmission mode
Figure FDA0003918869450000031
Representing a DoS attack on the channel; first data transmission mode
Figure FDA0003918869450000032
Representing that the attacker chooses not to attack; wherein the sensor and the attacker can select the action according to the exploration rate epsilon, randomly select the action according to the probability of epsilon, and select the optimal action according to the probability of 1-epsilon.
6. The method as claimed in claim 1, wherein in step 4, different channels are determined to calculate the instant prize and the next time status under the combination of the current status and the action, including data packet transmission under reliable channel and data under unreliable channel;
the data packet transmission expression under the reliable channel is as follows:
Figure FDA0003918869450000033
the state of the remote estimator at the next moment is expressed as follows:
Figure FDA0003918869450000034
at time k, the instant prize r of the system k The expression is as follows
Figure FDA0003918869450000035
The expression of data packet transmission under the unreliable channel is as follows:
Figure FDA0003918869450000036
the next state of the remote estimator is expressed as follows:
Figure FDA0003918869450000037
at time k, the instant prize r of the system k The expression is as follows:
Figure FDA0003918869450000038
where E [ P (k) ] represents the average expectation of the remote estimation error covariance P (k), and the specific expression is as follows:
Figure FDA0003918869450000041
wherein ξ k Which indicates the probability of a packet being lost,
Figure FDA0003918869450000042
and P (k) represents the steady state to which the Kalman filtering converges, the covariance of the remote estimation error is P (k), and h represents the Lyapunov operator.
7. The DoS attack-oriented security state gaming method for the CPSs, as recited in claim 1, wherein step 5, the Q cost function is updated in the remote security estimation model of the wireless channel under the DoS attack, where the Q cost function is updated as follows:
Figure FDA0003918869450000043
wherein alpha is k The learning rate is a value between (0-1), rho is a discount factor, r k Is a reward for the current status.
8. A security state gaming system for CPSs under DoS attack is characterized by comprising
The model establishing module is used for establishing a wireless channel remote security estimation model under the DoS attack;
the system comprises a first processing module, a second processing module and a third processing module, wherein the first processing module is used for inputting system parameters in a wireless channel remote safety estimation model under DoS attack and recursively updating the minimum mean square error estimator and the estimation error covariance of a system state vector according to a Kalman filtering equation;
the second processing module is used for judging that the sensor and the DoS attack select the action according to the probability of epsilon in the wireless channel remote security estimation model under the DoS attack and selecting the optimal action according to the probability of 1-epsilon;
the third processing module is used for judging different channels in the wireless channel remote safety estimation model under the DoS attack to calculate the instant reward and the next moment state under the current state and action combination;
the fourth processing module is used for updating the Q value function in the wireless channel remote safety estimation model under the DoS attack so as to update the Q value matrix;
the comparison module is used for calculating and comparing the updated Q value function with the Q value function in the current state, and returning to execute the step 3 again when the difference between the updated Q value function and the Q value in the current state is greater than a set threshold eta; and otherwise, a converged Q value matrix and an optimal strategy based on Nash equilibrium are obtained, and the CPSs security state game under DoS attack is completed.
9. A mobile terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method for secure state gaming of CPSs under DoS attack as claimed in any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, wherein the computer program is stored thereon, and when being executed by a processor, the computer program implements the steps of the method for gaming security states of CPSs under DoS attack according to any one of claims 1 to 7.
CN202211347995.XA 2022-10-31 2022-10-31 Security state game method, system, terminal and medium for CPSs under DoS attack Pending CN115796300A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211347995.XA CN115796300A (en) 2022-10-31 2022-10-31 Security state game method, system, terminal and medium for CPSs under DoS attack

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211347995.XA CN115796300A (en) 2022-10-31 2022-10-31 Security state game method, system, terminal and medium for CPSs under DoS attack

Publications (1)

Publication Number Publication Date
CN115796300A true CN115796300A (en) 2023-03-14

Family

ID=85434528

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211347995.XA Pending CN115796300A (en) 2022-10-31 2022-10-31 Security state game method, system, terminal and medium for CPSs under DoS attack

Country Status (1)

Country Link
CN (1) CN115796300A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116893690A (en) * 2023-07-25 2023-10-17 西安爱生技术集团有限公司 Unmanned aerial vehicle evasion attack input data calculation method based on reinforcement learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116893690A (en) * 2023-07-25 2023-10-17 西安爱生技术集团有限公司 Unmanned aerial vehicle evasion attack input data calculation method based on reinforcement learning

Similar Documents

Publication Publication Date Title
CN108701260B (en) System and method for aiding decision making
CN110958135A (en) Method and system for eliminating DDoS (distributed denial of service) attack in feature self-adaptive reinforcement learning
CN111563330B (en) Information physical system security optimization analysis method based on zero and game countermeasures
CN111045334B (en) Active defense elastic sliding mode control method of information physical fusion system
US20230169374A1 (en) Search device, search method, computer program product, search system, and arbitrage system
CN115796300A (en) Security state game method, system, terminal and medium for CPSs under DoS attack
CN115473677A (en) Penetration attack defense method and device based on reinforcement learning and electronic equipment
Thien et al. A transfer games actor–critic learning framework for anti-jamming in multi-channel cognitive radio networks
CN113037662A (en) Mobile equipment radio frequency distribution identification method based on federal learning
Giraud et al. Aggregation of predictors for nonstationary sub-linear processes and online adaptive forecasting of time varying autoregressive processes
Zhou et al. DRL-Based Workload Allocation for Distributed Coded Machine Learning
CN114598517A (en) Mobile equipment radio frequency distribution identification method based on federal learning
Fanti et al. Modeling cyber attacks by stochastic games and Timed Petri Nets
CN117454330A (en) Personalized federal learning method for resisting model poisoning attack
CN113904937B (en) Service function chain migration method and device, electronic equipment and storage medium
Chen et al. Adaptive Privacy Budget Allocation in Federated Learning: A Multi-Agent Reinforcement Learning Approach
CN115174237B (en) Method and device for detecting malicious traffic of Internet of things system and electronic equipment
CN117235742A (en) Intelligent penetration test method and system based on deep reinforcement learning
CN112836381B (en) Multi-source information-based ship residual life prediction method and system
CN113556304B (en) Time-varying frequency offset estimation method, system and medium based on particle filter
CN116150759A (en) CPS-based random event triggering safety state estimation method and related device
CN115842668A (en) Method and system for determining information propagation source, electronic device and storage medium
Picek et al. Tipping the Balance: Imbalanced Classes in Deep Learning Side-channel Analysis
CN117575028B (en) Network security analysis method and system based on Markov chain
Zhang et al. Optimal power control for sensors and DoS attackers over a fading channel network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination