CN115796300A

CN115796300A - Security state game method, system, terminal and medium for CPSs under DoS attack

Info

Publication number: CN115796300A
Application number: CN202211347995.XA
Authority: CN
Inventors: 金增旺; 李倩; 张淑婷; 张艳宁
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2022-10-31
Filing date: 2022-10-31
Publication date: 2023-03-14

Abstract

The invention relates to the technical field of security of information physical systems, and discloses a security state game method, system, terminal and medium for CPSs under DoS attack. The method is based on a system measurement sequence and a prior model, utilizes a Kalman filtering method for recursive updating, and updates the estimation in real time according to the attack condition of the system so as to ensure the accuracy and the effectiveness of the estimation. And solving the Nash equilibrium strategy of the sensor and the attacker by utilizing a linear programming method at each time step, so that the sensor and the attacker can make optimal selection in real time. The invention aims to research a reinforcement learning algorithm from two angles of a reliable channel and an unreliable channel and respectively provides the reinforcement learning algorithm based on the safety state estimation. Nash equilibrium strategies of both attacking and defending parties can be obtained through an algorithm so as to guide decision selection of a sensor and an attacker.

Description

Security state game method, system, terminal and medium for CPSs under DoS attack

Technical Field

The invention relates to the technical field of information physical system security, in particular to a security state game method, system, terminal and medium for CPSs under DoS attack.

Background

Cyber-physical systems (CPSs) realize the close integration of computing resources and physical resources to perform the functions of real-time sensing, remote control, information interaction and the like. Among them, wireless sensors are widely used in key infrastructure control, aerospace systems, military systems, and other fields due to their flexibility of use, power consumption saving, and ease of expansion. However, the sensor measurement data is transmitted through the wireless network, which increases the communication efficiency and brings a certain potential safety hazard, and is susceptible to malicious network attacks such as Denial of Service (DoS) attack, spoofing attack, injection attack, and the like. Among them, doS attacks mainly prevent the remote estimator from correctly receiving and processing sensor data through an interference channel, and are one of the most common and most easily implemented network attacks, which seriously threaten the security of an cyber-physical system.

In the game of the cyber-physical system and the attacker, the goal of the sensor is to maximize the attack cost while minimizing the state estimation error, and the goal of the attacker is the opposite. The confrontation of the sensor and the attacker is considered as a two-player zero-and-deterministic game problem, as the aggressor's revenues are from the loss of the sensor. Because the state of an information physical system needs to be sensed and updated in real time, the static game method in the existing literature cannot accurately research the safety state estimation problem.

In addition, in an cyber-physical system, how sensors and attackers improve decision efficiency and accuracy is also one of the important points of research. Reinforcement learning methods have attracted extensive attention as one of the important branches of artificial intelligence, which is mainly to study how an agent learns optimal strategies through interaction with an unknown environment. In many documents, reinforcement learning methods are used to solve the problem of gaming between attackers and defenders of systems in cyber-physical systems. However, packet loss may result in such unreliable channels due to various reasons including signal degradation, channel fading, and channel congestion in the cyber-physical system.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention aims to provide a secure state gaming method, a system, a terminal and a medium for CPSs under DoS attack, so as to solve the technical problems of low security performance and low decision efficiency and accuracy of an information physical system in the prior art.

The invention is realized by the following technical scheme:

a security state game method for CPSs under DoS attack comprises the following steps:

step 1, establishing a wireless channel remote security estimation model under DoS attack;

step 2, inputting system parameters in a wireless channel remote safety estimation model under DoS attack, and recursively updating the minimum mean square error estimator and the estimation error covariance of the system state vector according to a Kalman filtering equation;

step 3, judging that the sensor and the DoS attack select the action according to the probability of epsilon in the wireless channel remote security estimation model under the DoS attack, and selecting the optimal action according to the probability of 1-;

step 4, judging different channels in the wireless channel remote safety estimation model under the DoS attack to calculate the instant reward under the current state and action combination and the next moment state;

step 5, updating a Q value function in the wireless channel remote safety estimation model under DoS attack, and further updating a Q value matrix;

step 6, calculating and comparing the updated Q cost function with the Q cost function in the current state, and returning to execute the step 3 again when the difference between the updated Q cost function value and the Q value in the current state is greater than a set threshold eta; and otherwise, a converged Q value matrix and an optimal strategy based on Nash equilibrium are obtained, and the CPSs security state game under DoS attack is completed.

Preferably, in step 1, the wireless channel remote security estimation model under DoS attack comprises an cyber-physical system, a sensor, a channel, a remote estimator and an attacker; the cyber-physical system inputs a sensor measurement vector y (k) to a sensor, and the sensor outputs a minimum mean square error estimate

The channel comprises a safe transmission channel and an unsafe transmission channel; the channel transmits the sensor data to the remote estimator, and the remote estimator outputs the system safety state of the remote estimator

An attacker performs DoS attack on the channel in an attack or non-attack mode; the remote estimator feeds back signals to the sensor and the attacker respectively.

Preferably, in step 2, system parameters are input into the wireless channel remote security estimation model under DoS attack, where the system parameters include: initializing state, action, Q value matrix of system, probability xi of packet loss under different action combination _k Learning rate α, discount factor ρ, and exploration rate ε.

Preferably, in step 2, the minimum mean square error estimator of the system state vector is updated recursively according to the kalman filter equation

And estimating the error covariance P (k) in the following way:

estimating the system state by state recursion updating by using a local Kalman filter, wherein the minimum mean square error estimator of a system state vector x (k) at each time k

The data is obtained by measuring according to a Kalman filter, and the expression is as follows:

the corresponding estimation error covariance is:

recursively updating according to Kalman filtering equations

And P (k), the recursive update equation for the kalman filter is as follows:

P(k|k-1)＝h(P(k-1))

K(k)＝P(k|k-1)C ^T [CP(k|k—1)C ^T +R] ^-1

P(k)＝g(P(k|k-1))

wherein,

the minimum mean square error of a state vector x (k), P (k) is the covariance of the corresponding estimation error, A, B and C are coefficient matrixes, h is a Lyapunov operator and g is a Riccati operator;

the calculation formula of the Lyapunov operator and the calculation formula of the g Riccati operator are as follows:

wherein A is a coefficient matrix; q is the process error covariance matrix.

Preferably, in step 3, the data transmission mode of the sensor includes two data transmission modes, including a first data transmission mode

Indicating a secure transmission; second data transmission mode

Indicating an unsecured transmission; the attack mode of the DoS attacker comprises attack or non-attack; wherein the second data transmission mode

Representing a DoS attack on the channel; first data transmission mode

Representing the attacker chooses not to attack; wherein the sensor and the attacker can select the action according to the exploration rate epsilon, randomly select the action according to the probability of epsilon, and select the optimal action according to the probability of 1-epsilon.

Preferably, in step 4, different channels are judged to calculate the instant reward and the next time state under the combination of the current state and the action, wherein the instant reward and the next time state comprise data packet transmission under a reliable channel and data under an unreliable channel;

the data packet transmission expression under the reliable channel is as follows:

the state of the remote estimator at the next moment is expressed as follows:

at time k, the instant prize r of the system _k The expression is as follows

The expression of data packet transmission under the unreliable channel is as follows:

the next state of the remote estimator is expressed as follows:

at time k, the instant prize r of the system _k The expression is as follows:

where E [ P (k) ] represents the average expectation of the remote estimation error covariance P (k), and the specific expression is as follows:

wherein ξ _k Which indicates the probability of a packet being lost,

and P (k) represents a steady state to which Kalman filtering converges, P (k) represents a remote estimation error covariance, and h represents a Lyapunov operator.

Preferably, in step 5, the Q cost function is updated in the wireless channel remote security estimation model under DoS attack, where the Q cost function update expression is as follows:

wherein alpha is _k The values are (0-1) for learning rate, p is a discount factor, r _k Is the reward for the current state.

A security state game system facing CPSs under DoS attack comprises

The model establishing module is used for establishing a wireless channel remote security estimation model under the DoS attack;

the system comprises a first processing module, a second processing module and a third processing module, wherein the first processing module is used for inputting system parameters in a wireless channel remote safety estimation model under DoS attack and recursively updating the minimum mean square error estimator and the estimation error covariance of a system state vector according to a Kalman filtering equation;

the second processing module is used for judging that the sensor and the DoS attack select the action according to the probability of epsilon in the wireless channel remote security estimation model under the DoS attack and selecting the optimal action according to the probability of 1-epsilon;

the third processing module is used for judging different channels in the wireless channel remote security estimation model under the DoS attack to calculate instant rewards and a next moment state under the current state and action combination;

the fourth processing module is used for updating the Q value function in the wireless channel remote safety estimation model under the DoS attack so as to update the Q value matrix;

the comparison module is used for calculating an updated Q value function and comparing the updated Q value function with the Q value function in the current state, and when the difference between the updated Q value function and the Q value in the current state is greater than a set threshold eta, returning to execute the step 3 again; and otherwise, a converged Q value matrix and an optimal strategy based on Nash equilibrium are obtained, and the CPSs security state game under DoS attack is completed.

A mobile terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method for gaming security states of CPSs under DoS attack according to any one of claims 1 to 7.

A computer-readable storage medium storing a computer program, which when executed by a processor implements the steps of a method for playing a game of security states of CPSs under DoS attack as claimed in any one of claims 1 to 7.

Compared with the prior art, the invention has the following beneficial technical effects:

the invention provides a security state game method for CPSs under DoS attack, which finds the best selection strategy of an attacker and a sensor in an information physical system under the DoS attack by adjusting the strategies of the sensor and the attacker. The method is based on a system measurement sequence and a prior model, utilizes a Kalman filtering method for recursive updating, and updates the estimation in real time according to the attack condition of the system so as to ensure the accuracy and the effectiveness of the estimation. According to the invention, the Nash equilibrium strategy of the sensor and the attacker is obtained by utilizing a linear programming method at each time step, so that the sensor and the attacker can make optimal selection in real time. Secondly, the invention aims to research the reinforcement learning algorithm from two angles of reliable channels and unreliable channels, and respectively provides the reinforcement learning algorithm based on the safety state estimation. The Nash equilibrium strategy of the attacking and defending parties can be obtained through the algorithm so as to guide decision selection of the sensor and the attacker, and the safety performance of the information physical system is effectively improved.

Drawings

FIG. 1 is a flowchart of a security state gaming method for CPSs under DoS attack according to the present invention;

FIG. 2 is a schematic diagram of a wireless channel remote security estimation model under DoS attack in the present invention;

FIG. 3 is a diagram of the transition process of the Markov chain stochastic state sequence P (k) in the present invention;

FIG. 4 is a diagram illustrating the variation of Tr (P (k)) under a reliable channel in the present invention;

FIG. 5 is a diagram illustrating a learning process of a Q merit function under a reliable channel in the present invention;

FIG. 6 is a diagram illustrating the variation of Tr (P (k)) under unreliable channels in accordance with the present invention;

FIG. 7 is a diagram illustrating a learning process of a Q merit function under unreliable channels according to the present invention;

fig. 8 is a flowchart of implementation of a security state estimation-based reinforcement learning algorithm for an cyber-physical system under denial of service attack in an embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The invention is described in further detail below with reference to the accompanying drawings:

the invention aims to provide a security state game method, a security state game system, a security state game terminal and a security state game medium for CPSs under DoS attack, and aims to solve the technical problems of low security performance and low decision efficiency and accuracy of an information physical system in the prior art.

Specifically, the secure state game method for the CPSs under DoS attack comprises the following steps:

specifically, the wireless channel remote security estimation model under the DoS attack comprises an information physical system, a sensor, a channel, a remote estimator and an attacker; the cyber-physical system inputs a sensor measurement vector y (k) to a sensor, and the sensor outputs a minimum mean square error estimate

The channel comprises a safe transmission channel and an unsafe transmission channel; the channel transmits the data of the sensor to the remote estimator, the remote estimator outputs the system safety state of the remote estimator,

The DoS attack may occur on a wireless communication channel between the sensor and the remote estimator, and at the time k, the expression of a linear system model under the DoS attack is

Wherein

The information physical system model is shown to be a linear discrete time system, w (k) and v (k) represent a processing noise matrix and a measurement noise matrix with the mean value of zero, and A, B and H are coefficient matrixes with known and corresponding dimensions.

specifically, system parameters are input into a wireless channel remote security estimation model under DoS attack, wherein the system parameters comprise: initializing state, action, Q value matrix of system, probability xi of packet loss under different action combination _k Learning rate α, discount factor ρ, and exploration rate ε.

Wherein the minimum mean square error estimator of the system state vector is updated recursively according to Kalman filtering equations

And estimating the error covariance P (k) in the following way:

the estimation of the system state is performed by using a local Kalman filter to perform a recursive update of the system state, and the minimum mean square error estimator of the system state vector x (k) at each time k

The data obtained by measurement of the Kalman filter is operated, and the expression is as follows:

the corresponding estimation error covariance is:

recursively updating according to Kalman filtering equations

And P (k), the Lyapunov and Riccati operators h, g defining the formula as follows for simplicity:

the recursive update equation for the kalman filter is as follows:

P(k|k-1)＝h(P(k-1)) (7)

K(k)＝P(k|k-1)C ^T [CP(k|k-1)C ^T +R] ^-1 (8)

P(k)＝g(P(k|k-1)) (10)

wherein,

is the minimum mean square error of the state vector x (k), P (k) is the corresponding estimation error covariance, a, B, C are the coefficient matrices, h and g are the Lyapunov operator and Riccati operator defined below:

step 3, judging that the sensor and the DoS attack select the action according to the probability of epsilon in a wireless channel remote security estimation model under the DoS attack, and selecting the optimal action according to the probability of 1-epsilon;

specifically, the data transmission mode of the sensor comprises two data transmission modes, including a first data transmission mode

Indicating a secure transmission; second data transmission mode

Representing a DOS attack on the channel; first data transmission mode

In an cyber-physical system, a packet loss may be due to a DoS attack or other causes such as signal degradation, channel fading, and channel congestion. To illustrate whether a packet has arrived successfully at a remote controller, we define an arrival indicator δ _k Comprises the following steps:

remote estimator system safe state output

The estimation process of (2) is: if the estimator successfully receives the data packet, it will receive the data synchronously to obtain the estimation, otherwise, the estimator will obtain the estimation according to the best estimation value obtained at the last time, and the formula is as follows:

wherein,

is the minimum mean square error of the state vector x (k) and a is the coefficient matrix.

The corresponding estimation error covariance is

To simplify the error covariance P (k), the time interval τ _k Is defined in the following formula

Time interval tau _k Represents the time interval from the last time a packet was received i to the current time k, so τ _k Can be expressed as

Next, assume that the remote estimator can successfully receive the data packet at the beginning of the transmission

That is to say delta ₀ =1, estimated error covariance of remote estimator can be easily obtained based on the above equations (13) and (14)

Based on the state transition process of the Markov chain in FIG. 3, we can observe that the state can only be changed from

Is shifted to the next adjacent state

Or to the original state

Step 4, judging different channels in the wireless channel remote safety estimation model under DoS attack to calculate the instant reward and the next moment state under the combination of the current state and the action;

specifically, different channels are judged to calculate the instant reward under the current state and action combination and the next moment state, wherein the current state and the action combination comprise data packet transmission under a reliable channel and data under an unreliable channel;

the state of the remote estimator at the next moment is expressed as follows:

at time k, the instant prize r of the system _k The expression is as follows

when the sensor selects unsafe transmission and the attacker selects attack, the selection of the two increases the probability of data packet loss, and the DoS attack is the main reason of the packet loss and has larger influence on the data packet loss than other reasons, so that p is provided ₂ ＞p ₄ ＞p ₁ ＞p ₃ 。

The next state of the remote estimator is expressed as follows:

at time k, the instant prize r of the system _k The expression is as follows:

wherein ξ _k Which indicates the probability of a packet being lost,

representing the steady state to which kalman filtering converges, P (k) is the remote estimation error covariance, h is the Lyapunov operator defined below:

specifically, a Q cost function is updated in the wireless channel remote security estimation model under DoS attack, wherein the Q cost function update expression is as follows:

wherein alpha is _k The values are (0-1) for learning rate, p is a discount factor, r _k Is a reward for the current status.

Step 6, calculating an updated Q value function, comparing the updated Q value function with the Q value function in the current state, and returning to execute the step 3 again when the difference between the updated Q value function and the Q value in the current state is greater than a set threshold eta; and otherwise, a converged Q value matrix and an optimal strategy based on Nash equilibrium are obtained, and the CPSs security state game under DoS attack is completed.

In the invention, the relevant parameters of the information physical system model are

Wherein, A and C are coefficient transfer matrixes, Q is a process error covariance matrix, and R is a measurement noise covariance matrix.

The Kalman filter converges to a steady state with an error covariance of

Wherein

Cost of setting sensor to select secure transmission c _s =6.7, cost consumption c for setting up the attacker selection attack _a =8, and the learning rate α =0.9 and the discount factor ρ =0.6 are set, respectively.

The kalman filter and Q learning algorithm were performed 5000 iterations, where in the first 500 iterations, the value of epsilon was set to decrease gradually from 1 in order to fully explore the reward for each action combination, and in the subsequent iterations, epsilon was set to 0.1, and the optimal action was selected with a probability of 0.9, and one action was randomly selected with a probability of 0.1.

Simulation sample based on safety state estimation reinforcement learning algorithm under reliable channel in invention

According to fig. 4, the variation of Tr (P (k)) under the reliable channel; the occurrence of packet loss situations and changes in state during the learning process can be observed. Tr (P (k)) =3, tr (P (k)) =5.6, tr (P (k)) =9.6 respectively represent

In the first 500 iteration cycles, the selection of the action is random, so the packet loss frequency is higher; in the subsequent iteration process, the sensor and the attacker select the optimal action of the current state, and the maximum number of consecutive data packet losses is 2.

According to FIG. 5, the states under the reliable channel are

And (4) a learning process of the Q value function. From fig. 5, it can be derived that the Q-value matrix converges to the optimal action-value function through the Q-learning algorithm, and the specific contents are shown in table 1.

TABLE 1 optimal action function under reliable channels

According to the nash equilibrium strategy, nash equilibrium points exist in any finite game, and from the Q value matrix, we can obtain nash equilibrium points, see Table 2.

TABLE 2 Nash equalization strategy for sensors and attackers under reliable channels

By calculating the packet loss occurrence rate in the 500-5000 iteration processes, the probability of losing two continuous data packets is 0.04%, and the probability of losing a single data packet is 2.44%.

Simulation sample based on safety state estimation reinforcement learning algorithm under unreliable channel

Transmitting data under unreliable channel, the reason of data packet loss is other than DoS attack, and the probability of data packet under different combination of sensor and attacker is set up as shown in the following expression

As shown in fig. 6, the change of Tr (P (k)) under the unreliable channel can observe that the probability of packet loss is much larger than that of transmission under the reliable channel, and the maximum number of consecutive data packet losses is 2 after 500 iteration cycles.

According to FIG. 7, the unreliable channel states are

And (4) learning process of the Q value function. From fig. 6, we can find that the Q-value matrix converges to the optimal action function through the Q-learning algorithm, and the specific content is shown in table 3.

TABLE 3 optimal action function under unreliable channel

According to the Nash equalization strategy, nash equalization points exist in any finite set, and from the Q-value matrix, we can obtain Nash equalization points, see Table 4.

TABLE 4 Nash equalization strategy for sensors and attackers under unreliable channels

By calculating the packet loss occurrence rate in the 500-5000 iteration processes, the probability of losing two continuous data packets is 0.42%, and the probability of losing a single data packet is 3.76%.

Examples

According to fig. 8, this embodiment provides a flowchart of implementing a reinforcement learning algorithm based on security state estimation for an cyber-physical system under a denial of service attack, where the method is implemented by the following steps:

step 1: inputting system parameters, initializing system state, action, Q value matrix, probability xi of packet loss under different action combinations _k Learning rate α, discount factor ρ, and exploration rate ε, assuming a system initialization state of

And set the system time k toAt 1, the safety state estimation of the system is started. Recursively updating minimum mean square error estimates of system state vectors according to Kalman filtering equations

And an estimated error covariance P (k).

Step 2: a DoS attack may occur on the channel between the sensor and the remote estimator, and whether a DoS attack occurs depends on different choices of the sensor and the attacker. For the sensor, there are two options for sending data,

the cost of sending data packets by the sensor is 0, but the data packets are easily attacked by DoS (denial of service);

representing an unsecured transmission, the sensor needs to spend an extra cost cs to avoid the attack. On the other hand, the DoS attacker can choose to attack or not attack,

indicating that DoS attacks on the channel will cause an additional fixed cost ca;

indicating that the attacker chooses not to attack. The sensor and attacker can choose actions according to the exploration rate epsilon, choose actions randomly with a probability of epsilon, and choose the best action with a probability of 1-epsilon.

And 3, step 3: the loss of data packets is not only related to the choice of sensors and attackers, but also to the reliability of the communication channel. When a reliable channel is selected, the data packet is lost only due to DoS attack, and the instant reward r of the system under the combination of the current state and the current action due to other reasons _k Can be calculated according to the formula (18) and the next time state s can be calculated according to the formula (17) _k+1 (ii) a When unreliable channels are selected, packets are lost not only due to DoS attacks, but also due toOther reasons, the immediate reward r of the system under the combination of the current state and the current action _k Can be calculated according to the formula (22), and the next time state s can be calculated according to the formula (21) _k+1 。

And 4, step 4: the Q cost function, and hence the Q cost matrix, is updated according to equation (19). Since the system has n states, 4 combinations for each state sensor and attacker, the Q-worth matrix is n rows and 4 columns.

And 5: and (4) comparing the updated Q value function calculated according to the step (4) with the Q value function in the current state, and judging whether the condition of cycle termination is met.

Step 6: when the termination condition is not met, returning to the step 2; and when the termination condition is met, obtaining a converged Q value matrix and an optimal strategy based on Nash equilibrium.

The invention also provides a security state game system for the CPSs under the DoS attack, which comprises a model establishing module, a first processing module, a second processing module, a third processing module, a fourth processing module and a comparison module;

the model establishing module is used for establishing a wireless channel remote safety estimation model under DoS attack;

The invention also provides a mobile terminal, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, such as a security state game program for the CPSs under the DoS attack.

When the processor executes the computer program, the steps of the secure state game method for the CPSs under DoS attack are implemented, for example:

step 2, inputting system parameters in a wireless channel remote safety estimation model under DoS attack, and recursively updating the minimum mean square error estimator and the estimation error covariance of a system state vector according to a Kalman filtering equation;

step 3, judging that the sensor and the DoS attack select the action according to the probability of epsilon in the wireless channel remote security estimation model under the DoS attack, and selecting the optimal action according to the probability of 1-epsilon;

step 5, updating a Q value function in the wireless channel remote safety estimation model under the DoS attack, and further updating a Q value matrix;

step 6, calculating an updated Q value function, comparing the updated Q value function with the Q value function in the current state, and returning to execute the step 3 again when the difference between the updated Q value function and the Q value in the current state is greater than a set threshold eta; and otherwise, a converged Q value matrix and an optimal strategy based on Nash balance are obtained, and the secure state game of the CPSs under the DoS attack is completed.

Alternatively, the processor implements the functions of the modules in the system when executing the computer program, for example: the model establishing module is used for establishing a wireless channel remote safety estimation model under DoS attack;

the first processing module is used for inputting system parameters in the wireless channel remote safety estimation model under the DoS attack and recursively updating the minimum mean square error estimator and the estimation error covariance of the system state vector according to a Kalman filtering equation;

the comparison module is used for calculating and comparing the updated Q value function with the Q value function in the current state, and returning to execute the step 3 again when the difference between the updated Q value function and the Q value in the current state is greater than a set threshold eta; and otherwise, a converged Q value matrix and an optimal strategy based on Nash equilibrium are obtained, and the CPSs security state game under DoS attack is completed.

Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory and executed by the processor to implement the invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program in the mobile terminal. For example, the computer program may be partitioned into a model building module, a first processing module, a second processing module, a third processing module, a fourth processing module, and a comparison module;

the specific functions of each module are as follows: the model establishing module is used for establishing a wireless channel remote security estimation model under the DoS attack;

The mobile terminal can be a desktop computer, a notebook, a palm computer, a cloud server and other computing equipment. The mobile terminal may include, but is not limited to, a processor, a memory.

The processor may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is the control center of the mobile terminal and connects the various parts of the entire mobile terminal using various interfaces and lines.

The memory may be used to store the computer programs and/or modules, and the processor may implement various functions of the mobile terminal by executing or executing the computer programs and/or modules stored in the memory and calling data stored in the memory.

The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash memory card (FlashCard), at least one magnetic disk storage device, a flash memory device, or other volatile solid state storage device.

The invention also provides a computer readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the steps of the method for playing games of security states of CPSs under DoS attack are implemented.

The mobile terminal integrated module/unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer-readable storage medium.

Based on such understanding, all or part of the processes in the foregoing method can be implemented by a computer program to instruct related hardware, where the computer program can be stored in a computer readable storage medium, and when the computer program is executed by a processor, the steps of the security state gaming method for CPSs under DoS attack can be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc.

The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc.

It should be noted that the computer-readable medium may contain suitable additions or subtractions depending on the requirements of legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer-readable media may not include electrical carrier signals or telecommunication signals in accordance with legislation and patent practice.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims

1. A security state game method for CPSs under DoS attack is characterized by comprising the following steps:

step 6, calculating and comparing the updated Q cost function with the Q cost function in the current state, and returning to execute the step 3 again when the difference between the updated Q cost function value and the Q value in the current state is greater than a set threshold eta; and otherwise, a converged Q value matrix and an optimal strategy based on Nash balance are obtained, and the secure state game of the CPSs under the DoS attack is completed.

2. The DoS attack-oriented security state gaming method for the CPSs, as claimed in claim 1, wherein in step 1, the DoS attack-oriented remote security estimation model for the wireless channel comprises an cyber-physical system, a sensor, a channel, a remote estimator and an attacker; the cyber-physical system inputs a sensor measurement vector y (k) to a sensor, and the sensor outputs a minimum mean square error estimate

To the channel, the channel comprises a secure transmission channel and an insecure transmission channel; the channel transmits the data of the sensor to the remote estimator, and the remote estimator outputs the system safety state of the remote estimator

3. The DoS attack-oriented security state gaming method for the CPSs, as claimed in claim 1, wherein in step 2, system parameters are input into the remote security estimation model of the wireless channel under the DoS attack, wherein the system parameters include: initializing state, actions, Q-value matrix, probability of packet loss under different action combinations of systemRate xi _k Learning rate α, discount factor ρ, and exploration rate ε.

4. The method for playing the game of the security states of the CPSs under the DoS attack as claimed in claim 1, wherein in step 2, the minimum mean square error estimator of the system state vector is updated recursively according to the kalman filter equation

And estimating the error covariance P (k) in the following way:

the corresponding estimation error covariance is:

recursively updating according to Kalman Filter equations

And P (k), the recursive update equation for the kalman filter is as follows:

P(k|k-1)＝h(P(k-1))

K(k)＝P(k|k-1)C ^T [CP(k|k-1)C ^T +R] ^-1

P(k)＝g(P(k|k-1))

wherein,

the calculation formula of the Lyapunov operator and the g is Riccati operator is as follows:

wherein A is a coefficient matrix; q is the process error covariance matrix.

5. The DoS attack CPSs security status gaming method of claim 1, wherein in step 3, the sensor data transmission mode comprises two data transmission modes, including a first data transmission mode

Indicating a secure transmission; second data transmission mode

Representing a DoS attack on the channel; first data transmission mode

Representing that the attacker chooses not to attack; wherein the sensor and the attacker can select the action according to the exploration rate epsilon, randomly select the action according to the probability of epsilon, and select the optimal action according to the probability of 1-epsilon.

6. The method as claimed in claim 1, wherein in step 4, different channels are determined to calculate the instant prize and the next time status under the combination of the current status and the action, including data packet transmission under reliable channel and data under unreliable channel;

the state of the remote estimator at the next moment is expressed as follows:

at time k, the instant prize r of the system _k The expression is as follows

the next state of the remote estimator is expressed as follows:

at time k, the instant prize r of the system _k The expression is as follows:

wherein ξ _k Which indicates the probability of a packet being lost,

and P (k) represents the steady state to which the Kalman filtering converges, the covariance of the remote estimation error is P (k), and h represents the Lyapunov operator.

7. The DoS attack-oriented security state gaming method for the CPSs, as recited in claim 1, wherein step 5, the Q cost function is updated in the remote security estimation model of the wireless channel under the DoS attack, where the Q cost function is updated as follows:

wherein alpha is _k The learning rate is a value between (0-1), rho is a discount factor, r _k Is a reward for the current status.

8. A security state gaming system for CPSs under DoS attack is characterized by comprising

the third processing module is used for judging different channels in the wireless channel remote safety estimation model under the DoS attack to calculate the instant reward and the next moment state under the current state and action combination;

9. A mobile terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method for secure state gaming of CPSs under DoS attack as claimed in any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, wherein the computer program is stored thereon, and when being executed by a processor, the computer program implements the steps of the method for gaming security states of CPSs under DoS attack according to any one of claims 1 to 7.