CN115118477B - Smart grid state recovery method and system based on deep reinforcement learning - Google Patents

Smart grid state recovery method and system based on deep reinforcement learning Download PDF

Info

Publication number
CN115118477B
CN115118477B CN202210709649.5A CN202210709649A CN115118477B CN 115118477 B CN115118477 B CN 115118477B CN 202210709649 A CN202210709649 A CN 202210709649A CN 115118477 B CN115118477 B CN 115118477B
Authority
CN
China
Prior art keywords
state
power system
interaction
reinforcement learning
deep reinforcement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210709649.5A
Other languages
Chinese (zh)
Other versions
CN115118477A (en
Inventor
安豆
张斐烨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Digital Economy Industry Development Research Institute
Original Assignee
Sichuan Digital Economy Industry Development Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Digital Economy Industry Development Research Institute filed Critical Sichuan Digital Economy Industry Development Research Institute
Priority to CN202210709649.5A priority Critical patent/CN115118477B/en
Publication of CN115118477A publication Critical patent/CN115118477A/en
Application granted granted Critical
Publication of CN115118477B publication Critical patent/CN115118477B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1466Active attacks involving interception, injection, modification, spoofing of data unit addresses, e.g. hijacking, packet injection or TCP sequence number attacks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses a smart power grid state recovery method and a smart power grid state recovery system based on deep reinforcement learning, comprising the steps of constructing an attack model and a power grid state estimation system, injecting the attack model into the power grid state estimation system, and constructing a Markov decision process model based on the process of injecting the attack model into the power grid state estimation system; and carrying out strategy optimization on the power system by a deep reinforcement learning method based on the Markov decision process model to obtain a recovery strategy.

Description

Smart grid state recovery method and system based on deep reinforcement learning
Technical Field
The invention relates to the technical field of power system optimization scheduling, in particular to a smart grid state recovery method and system based on deep reinforcement learning.
Background
As a typical information physical system (Cyber-PHYSICAL SYSTEM), smart grids integrate advanced sensors, efficient measurement techniques and advanced control methods to achieve economical, efficient, and environmentally friendly operation of the grid system.
However, due to the diversity and openness of the smart grid network environment, the state estimation process of the power system is easily invaded by malicious attackers, and unpredictable significant loss is brought to the grid operation.
The reinforcement learning algorithm explores the environment by repeating trial and error and obtains the optimal strategy of sequential decision problems through training. The intelligent power grid security policy establishment method can formulate an effective policy for an intelligent agent under the condition of not explicitly constructing a complete decision model, and provides a very attractive force for the research of the intelligent power grid security policy. However, when the reinforcement learning method is applied in the power system state recovery process, the following difficulties still remain:
1) The existing electric power system safety strategy research based on reinforcement learning is concentrated on the attack detection direction, and lacks of the research on the state recovery strategy after the electric power system is attacked. 2) The existing state recovery method based on reinforcement learning generally discretizes the state of the system, and completely ignores the characteristic that the state recovery action space of the power system is continuous. For this reason, it is challenging and desirable to propose a smart grid state recovery strategy based on deep reinforcement learning.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a smart grid state recovery method and a smart grid state recovery system based on deep reinforcement learning.
In one aspect, to achieve the above technical objective, the present invention provides a smart grid state recovery method based on deep reinforcement learning, including
Constructing an attack model and a power grid state estimation system, injecting the attack model into the power grid state estimation system, and constructing a Markov decision process model based on the process of injecting the attack model into the power grid state estimation system;
and carrying out strategy optimization on the power system by a deep reinforcement learning method based on the Markov decision process model to obtain a recovery strategy.
Optionally, the markov decision process model includes: time state, time action, state transition equation and instant rewards; the moment state is calculated and acquired based on a state estimation value and a measurement vector of the power grid state estimation system, the moment action is calculated and acquired based on a measurement vector of the power grid state estimation system, the state transfer equation is calculated and acquired based on the moment state, and the instantaneous rewards are calculated and acquired based on the moment state and the moment action.
Optionally, the process of performing policy optimization on the power system includes:
Based on a Markov decision process model, constructing an interaction process of the power system and an external environment, and acquiring an interaction state, an interaction action, an interaction reward and a state at the next moment based on the interaction process;
And constructing a deep reinforcement learning model, wherein the deep reinforcement learning model is a deep reinforcement learning framework of an execution-evaluation framework, constructing a training set based on the interaction state, the interaction action, the interaction rewards and the next moment state, training the deep reinforcement learning model through the training set, and performing strategy optimization on the electric power system through the trained deep reinforcement learning model to obtain a recovery strategy.
Optionally, the process of constructing the training set includes:
Sampling the interaction state, the interaction action and the interaction rewards through an experience playback method, and normalizing the sampling result to obtain a training set; wherein the sampling probability in the empirical playback method is a time differential error.
Optionally, the training the deep reinforcement learning model includes:
Calculating the gradient of the execution network and the error of the evaluation network through the training set, wherein the deep reinforcement learning model comprises the execution network and the evaluation network;
And updating parameters of the execution network and the evaluation network based on the calculation result, and updating the execution network and the evaluation network by the update result to obtain a trained deep reinforcement learning model.
On the other hand, in order to achieve the above technical object, the present invention provides a smart grid state recovery system based on deep reinforcement learning, including:
The construction module is used for constructing an attack model and a power grid state estimation system, injecting the attack model into the power grid state estimation system, and constructing a Markov decision process model based on the process of injecting the attack model into the power grid state estimation system;
the optimization model is used for carrying out strategy optimization on the power system through a deep reinforcement learning method based on the Markov decision process model to obtain a recovery strategy.
Optionally, constructing the markov decision process model in the model includes: time state, time action, state transition equation and instant rewards; the moment state is calculated and acquired based on a state estimation value and a measurement vector of the power grid state estimation system, the moment action is calculated and acquired based on a measurement vector of the power grid state estimation system, the state transfer equation is calculated and acquired based on the moment state, and the instantaneous rewards are calculated and acquired based on the moment state and the moment action.
Optionally, the optimization module comprises a first optimization module, wherein the first optimization module constructs an interaction process between the power system and the external environment based on the Markov decision process model, and acquires an interaction state, an interaction action, an interaction reward and a next time state based on the interaction process; and constructing a deep reinforcement learning model, wherein the deep reinforcement learning model is a deep reinforcement learning framework of an execution-evaluation framework, constructing a training set based on the interaction state, the interaction action, the interaction rewards and the next moment state, training the deep reinforcement learning model through the training set, and performing strategy optimization on the electric power system through the trained deep reinforcement learning model to obtain a recovery strategy.
Optionally, the optimization module comprises a second optimization module, and the second optimization module is used for sampling the interaction state, the interaction action and the interaction rewards through an experience playback method and normalizing the sampling result to obtain a training set; wherein the sampling probability in the empirical playback method is a time differential error.
Optionally, the optimization module includes a third optimization module, and the third optimization module is configured to calculate, through a training set, a gradient of the execution network and an error of the evaluation network, where the deep reinforcement learning model includes the execution network and the evaluation network; and updating parameters of the execution network and the evaluation network based on the calculation result, and updating the execution network and the evaluation network by the update result to obtain a trained deep reinforcement learning model.
The invention has the following technical effects:
The method and the system can effectively improve the coping capability of the power system after being attacked by the false data injection, enhance the safety of the power system state estimation process and ensure the efficient operation of the intelligent power network.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a method according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As shown in fig. 1, the invention discloses a smart grid state recovery method based on deep reinforcement learning, which has a remarkable effect on reducing the influence of a dummy data injection attack on grid state estimation on a power system. The invention constructs a Markov decision process model from a state recovery process of the power system after being attacked by the false data injection. Secondly, a state recovery process of the power system after being attacked is adaptively learned by a power grid state recovery strategy based on deep reinforcement learning. The invention can formulate the optimal state recovery strategy for the power system on the basis of not explicitly constructing complex mathematical models such as state transition probability, optimization function and the like, and has simple implementation and strong practicability.
The invention discloses a smart grid state recovery method based on deep reinforcement learning. In the power system, a dummy data injection attack model for a power grid state estimation system exists, and a borderline attack vector y is injected into a quantity measurement z of the power system, so that an abnormal information detection mechanism of a smart grid is bypassed, wherein the abnormal information detection mechanism is represented as follows:
zbad=z+y
Wherein z bad represents an actual measurement value after the power system is attacked.
In order to cope with a false data injection attack for a power grid state estimation process, the invention describes a state recovery process after the power grid is attacked as a Markov decision process with sequential decision characteristics, and the markov decision process mainly comprises the following four modules:
Module 1: s t S represents the state at time t in the markov decision process model, and is represented by a state estimation value and a quantity measurement of the power system, specifically as follows:
Where x t represents the power system state estimate, z t represents the measurement vector, ω is the adjustment parameter for the state. As can be seen from the above equation, when the system is operating normally, the state estimate x t is close to the measurement vector z t, for which reason the state value s t is small; when the system is attacked, there is a significant difference between the state estimate x t and the measurement vector z t, and the observed state value s t is large. The power system can judge whether the system is attacked by the dummy data injection according to the state value s t.
Module 2: a t e a represents the action of the power system at time t, representing the state restoration policy to be executed by the power system according to state s t. The invention adopts a continuous action space to represent a power grid state recovery strategy, namely a t epsilon [ -1,1] represents the correction of the measured value z t by the power system, and the measured value of the power system corrected by the state recovery strategy can be represented as:
module 3: p represents the state transition equation of the power system, which is a mapping from state and motion space to state space, expressed as:
P:st×at→st+1
Module 4: r t epsilon R is the instantaneous prize earned by the power system at time t. The transient rewards received by the power system after the transition from state s t to s t+1 by taking action are expressed as the effect of state restoration as follows:
Where s' t represents the state value of the power system corrected by the state restoration action a t at time t.
Based on a smart power grid state recovery Markov decision process model, the invention designs a deep reinforcement learning method, and the state recovery strategy of a power system is optimized through continuous interaction between the power system and the environment, and the method comprises the following steps: interaction, preferential experience playback and policy training. The specific steps of the algorithm are described as follows:
Step 1: and (5) interaction. In the designed smart grid state recovery algorithm based on deep reinforcement learning, the power system is regarded as an intelligent agent, and the optimal state recovery strategy is learned by continuously interacting with the environment. At time t, the power system first obtains the state of the environment at that time s t. Due to the continuity of the action space, the invention adopts a deep reinforcement learning framework based on an execution-evaluation framework, specifically, a power system selects a state recovery action through an execution network pi η with the parameterization of eta, and the state recovery action is expressed as follows:
at=πη(si,t)
In order for an agent to have the ability to explore the environment space, the strategy is prevented from falling into a local optimum. The invention adopts the exploration scheme of epsilon-greedy, so that the probability of epsilon of the electric power system at the moment t randomly selects actions in an action space, epsilon is set to be 1 at the initial moment and gradually decreases along with continuous training, and in addition, the minimum value of epsilon is set to be 0.05, so that the electric power system maintains a certain exploration capacity in the tail sound part of the training process. After the agent has selected the state restoration strategy, the environment changes from state s t to s t+1 and generates a reward r t for the power system.
Step 2: experience playback is prioritized. After the power system (agent) completes the interaction, the resulting state s t, action a t, reward r t, and next time state s t+1 constitute a training experience, which is stored in an experience Buffer for training of the strategy. Because of the concealment and abrupt nature of the attack, in the state recovery environment studied by the present invention, as the number of experiences in the experience buffer grows, the experience containing the attacked state occupies only a small portion of the total experience, and for this reason, if sampled according to conventional sampling methods, more valuable experience containing the attacked state will be difficult to collect. To solve this problem, the present invention employs a preferential experience replay method to ensure that the agent samples experiences containing attack states with a higher probability, the sampling probability being measured by a time difference (temporal difference) error δ, which is expressed as follows:
δ=Qeva-Qtar
Wherein Q eva and Q tar represent an estimated value function and a target value function, respectively (described in detail in step 4). The probability of being sampled for a certain experience i is determined by normalization from the delta of all experiences stored in the experience buffer, expressed as:
where k represents all experiences stored in the experience buffer.
Step 3: and (5) strategy training. After sufficient training experience is collected in the experience buffer, a training sample of size N is first sampled from the experience buffer, denoted S t,At,Rt,St+1. Each agent updates parameters of the execution network pi η and the evaluation network Q φ with training samples, and expresses the gradient of the execution network and the error of the evaluation network as follows:
Qeva=Q(St,At)
Where Q 'φ' and pi' η represent a target evaluation network and a target execution network, respectively. Q 'φ' and pi' η copy parameters from the evaluation network Q φ and execution network pi η at initialization of training, expressed as:
η'=η
φ'=φ
updating is performed according to the following formula in the training process:
η←τη+(1-τ)η'
φ←τφ+(1-τ)φ'
The core of the intelligent power grid electric energy transaction bidding strategy learning method based on deep reinforcement learning is a deep reinforcement learning algorithm based on an execution-evaluation network. The specific implementation mode is as follows:
And constructing a Markov decision process model. Describing the state recovery process after the power grid is attacked as a Markov decision process with sequential decision characteristics mainly comprises the following four modules: the states, actions, state transfer functions, and rewards are specifically as described above.
And (5) an interaction process. The invention first initializes parameters of the execution network pi η, the evaluation network Q φ, the target execution network pi 'η', and the target evaluation network Q' φ' of the power system. And then enabling the power system to interact with the environment for E 1 fragments, wherein the process of each fragment is expressed as follows: at time t, the power system first obtains the state of the environment at that time s t. The power system selects a state recovery action through an execution network pi η with the self parameterization of eta, and the state recovery action is expressed as follows:
at=πη(si,t)
After the agent has selected the state restoration strategy, the environment changes from state s t to s t+1 and generates a reward r t for the power system.
The experience playback process is prioritized. After the power system (agent) completes the interaction, the resulting state s t, action a t, reward r t, and next time state s t+1 constitute a training experience, which is stored in an experience Buffer for training of the strategy. The invention adopts a method of preferential experience playback to ensure that an agent samples experiences containing attack states with higher probability, the sampling probability is measured by time difference (temporal difference) errors, the sampled probability of a certain experience i is determined by delta of all experiences stored in an experience buffer zone through normalization, and the method is expressed as follows:
Policy training process. After sufficient training experience is collected in the experience buffer, a training sample of size N is first sampled from the experience buffer, denoted S t,At,Rt,St+1. Each agent updates parameters of the execution network pi η and the evaluation network Q φ with training samples, and expresses the gradient of the execution network and the error of the evaluation network as follows:
Qeva=Q(St,At)
where Q 'φ' and pi' η represent a target evaluation network and a target execution network, respectively. Updating is performed according to the following formula in the training process:
η←τη+(1-τ)η'
φ←τφ+(1-τ)φ'
After the strategy update converges, the algorithm uses the power system trained execution network pi η to interact with the environment to output the optimal state recovery strategy. The method can effectively improve the coping capability of the power system after being attacked by the false data injection, enhance the safety of the power system state estimation process and ensure the efficient operation of the intelligent power network.
Example two
In order to achieve the above technical object, the present invention provides a smart grid state recovery system based on deep reinforcement learning, comprising:
The construction module is used for constructing an attack model and a power grid state estimation system, injecting the attack model into the power grid state estimation system, and constructing a Markov decision process model based on the process of injecting the attack model into the power grid state estimation system;
The optimization model is used for carrying out strategy optimization on the power system through a deep reinforcement learning method based on the Markov decision process model to obtain a recovery strategy. The system corresponds to the above method, and will not be described here.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (8)

1. The intelligent power grid state recovery method based on deep reinforcement learning is characterized by comprising the following steps of:
constructing an attack model and a power grid state estimation system, injecting the attack model into the power grid state estimation system, and constructing a Markov decision process model based on the process of injecting the attack model into the power grid state estimation system;
based on a Markov decision process model, carrying out strategy optimization on the power system by a deep reinforcement learning method to obtain a recovery strategy;
The Markov decision process model comprises: time state, time action, state transition equation and instant rewards; the moment state is calculated and acquired based on a state estimation value and a measurement vector of the power grid state estimation system, the moment action is calculated and acquired based on a measurement vector of the power grid state estimation system, the state transfer equation is calculated and acquired based on the moment state, and the instantaneous rewards are calculated and acquired based on the moment state and the moment action;
By injecting a borderline attack vector y into the quantity measurement z of the power system, the abnormal information detection mechanism of the intelligent power grid is bypassed, and the abnormal information detection mechanism is expressed as follows:
zbad=z+y
Wherein z bad represents an actual measurement value of the power system after being attacked;
In order to cope with a false data injection attack for a power grid state estimation process, a state recovery process after the power grid is attacked comprises the following four modules:
Module 1: s t S represents the state at time t in the markov decision process model, and is represented by a state estimation value and a quantity measurement of the power system, specifically as follows:
Where x t represents the power system state estimate, z t represents the measurement vector, ω is the adjustment parameter of the state; as can be seen from the above equation, when the system is operating normally, the state estimate x t is close to the measurement vector z t, for which reason the state value s t becomes smaller; when the system is attacked, a significant difference exists between the state estimation value x t and the measurement vector z t, and the observed value state value s t becomes large; the power system judges whether the system is attacked by false data injection according to the state value s t;
Module 2: a t epsilon A represents the action of the power system at the time t, and represents the state recovery strategy to be executed by the power system according to the state s t; the continuous action space is adopted to represent the power grid state recovery strategy, namely a t epsilon [ -1,1] represents the correction of the measured value z t by the power system, and the measured value of the power system after the correction of the state recovery strategy is represented as:
module 3: p represents the state transition equation of the power system, which is a mapping from state and motion space to state space, expressed as:
P:st×at→st+1
Module 4: r t ε R is the instantaneous prize that the power system gets at time t; the transient rewards received by the power system after the transition from state s t to s t+1 by taking an action are represented as an effect of state restoration.
2. The method according to claim 1, wherein:
the process of policy optimization for the power system includes:
Based on a Markov decision process model, constructing an interaction process of the power system and an external environment, and acquiring an interaction state, an interaction action, an interaction reward and a state at the next moment based on the interaction process;
And constructing a deep reinforcement learning model, wherein the deep reinforcement learning model is a deep reinforcement learning framework of an execution-evaluation framework, constructing a training set based on the interaction state, the interaction action, the interaction rewards and the next moment state, training the deep reinforcement learning model through the training set, and performing strategy optimization on the electric power system through the trained deep reinforcement learning model to obtain a recovery strategy.
3. The method according to claim 2, characterized in that:
The process of constructing the training set comprises the following steps:
Sampling the interaction state, the interaction action and the interaction rewards through an experience playback method, and normalizing the sampling result to obtain a training set; wherein the sampling probability in the empirical playback method is a time differential error.
4. The method according to claim 2, characterized in that:
the training process of the deep reinforcement learning model comprises the following steps:
Calculating the gradient of the execution network and the error of the evaluation network through the training set, wherein the deep reinforcement learning model comprises the execution network and the evaluation network;
And updating parameters of the execution network and the evaluation network based on the calculation result, and updating the execution network and the evaluation network by the update result to obtain a trained deep reinforcement learning model.
5. A smart grid state recovery system based on deep reinforcement learning, comprising:
The construction module is used for constructing an attack model and a power grid state estimation system, injecting the attack model into the power grid state estimation system, and constructing a Markov decision process model based on the process of injecting the attack model into the power grid state estimation system;
The optimization model is used for carrying out strategy optimization on the power system through a deep reinforcement learning method based on the Markov decision process model to obtain a recovery strategy;
The Markov decision process model comprises: time state, time action, state transition equation and instant rewards; the moment state is calculated and acquired based on a state estimation value and a measurement vector of the power grid state estimation system, the moment action is calculated and acquired based on a measurement vector of the power grid state estimation system, the state transfer equation is calculated and acquired based on the moment state, and the instantaneous rewards are calculated and acquired based on the moment state and the moment action;
By injecting a borderline attack vector y into the quantity measurement z of the power system, the abnormal information detection mechanism of the intelligent power grid is bypassed, and the abnormal information detection mechanism is expressed as follows:
zbad=z+y
Wherein z bad represents an actual measurement value of the power system after being attacked;
In order to cope with a false data injection attack for a power grid state estimation process, a state recovery process after the power grid is attacked comprises the following four modules:
Module 1: s t S represents the state at time t in the markov decision process model, and is represented by a state estimation value and a quantity measurement of the power system, specifically as follows:
Where x t represents the power system state estimate, z t represents the measurement vector, ω is the adjustment parameter of the state; as can be seen from the above equation, when the system is operating normally, the state estimate x t is close to the measurement vector z t, for which reason the state value s t becomes smaller; when the system is attacked, a significant difference exists between the state estimation value x t and the measurement vector z t, and the observed value state value s t becomes large; the power system judges whether the system is attacked by false data injection according to the state value s t;
Module 2: a t epsilon A represents the action of the power system at the time t, and represents the state recovery strategy to be executed by the power system according to the state s t; the continuous action space is adopted to represent the power grid state recovery strategy, namely a t epsilon [ -1,1] represents the correction of the measured value z t by the power system, and the measured value of the power system after the correction of the state recovery strategy is represented as:
module 3: p represents the state transition equation of the power system, which is a mapping from state and motion space to state space, expressed as:
P:st×at→st+1
Module 4: r t ε R is the instantaneous prize that the power system gets at time t; the transient rewards received by the power system after the transition from state s t to s t+1 by taking an action are represented as an effect of state restoration.
6. The system according to claim 5, wherein:
The optimization module comprises a first optimization module, wherein the first optimization module constructs an interaction process of the power system and the external environment based on a Markov decision process model, and acquires an interaction state, an interaction action, an interaction reward and a next moment state based on the interaction process; and constructing a deep reinforcement learning model, wherein the deep reinforcement learning model is a deep reinforcement learning framework of an execution-evaluation framework, constructing a training set based on the interaction state, the interaction action, the interaction rewards and the next moment state, training the deep reinforcement learning model through the training set, and performing strategy optimization on the electric power system through the trained deep reinforcement learning model to obtain a recovery strategy.
7. The system according to claim 6, wherein:
The optimization module comprises a second optimization module, and the second optimization module is used for sampling the interaction state, the interaction action and the interaction rewards through an experience playback method and normalizing the sampling result to obtain a training set; wherein the sampling probability in the empirical playback method is a time differential error.
8. The system according to claim 6, wherein:
the optimization module comprises a third optimization module, wherein the third optimization module is used for calculating the gradient of the execution network and the error of the evaluation network through a training set, and the deep reinforcement learning model comprises the execution network and the evaluation network; and updating parameters of the execution network and the evaluation network based on the calculation result, and updating the execution network and the evaluation network by the update result to obtain a trained deep reinforcement learning model.
CN202210709649.5A 2022-06-22 2022-06-22 Smart grid state recovery method and system based on deep reinforcement learning Active CN115118477B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210709649.5A CN115118477B (en) 2022-06-22 2022-06-22 Smart grid state recovery method and system based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210709649.5A CN115118477B (en) 2022-06-22 2022-06-22 Smart grid state recovery method and system based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN115118477A CN115118477A (en) 2022-09-27
CN115118477B true CN115118477B (en) 2024-05-24

Family

ID=83329484

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210709649.5A Active CN115118477B (en) 2022-06-22 2022-06-22 Smart grid state recovery method and system based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN115118477B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118485282A (en) * 2024-07-15 2024-08-13 华北电力大学 Electric automobile charging scheduling method and system based on robust reinforcement learning

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101382978A (en) * 2008-10-30 2009-03-11 中国人民解放军国防科学技术大学 Method for early alarming by-path attack in safety chip
US10193902B1 (en) * 2015-11-02 2019-01-29 Deep Instinct Ltd. Methods and systems for malware detection
CN110334507A (en) * 2019-06-18 2019-10-15 北京中科物联安全科技有限公司 A kind of method, apparatus and electronic equipment detecting network system safety
CN111376954A (en) * 2020-06-01 2020-07-07 北京全路通信信号研究设计院集团有限公司 Train autonomous scheduling method and system
CN112202736A (en) * 2020-09-15 2021-01-08 浙江大学 Industrial control system communication network abnormity classification method based on statistical learning and deep learning
CN112381359A (en) * 2020-10-27 2021-02-19 惠州蓄能发电有限公司 Multi-critic reinforcement learning power economy scheduling method based on data mining
CN112491818A (en) * 2020-11-12 2021-03-12 南京邮电大学 Power grid transmission line defense method based on multi-agent deep reinforcement learning
CN112800420A (en) * 2020-12-30 2021-05-14 南京理工大学 False data injection attack strategy evaluation method for alternating current-direct current hybrid system
CN113361132A (en) * 2021-06-28 2021-09-07 浩鲸云计算科技股份有限公司 Air-cooled data center energy-saving method based on deep Q learning block network
CN113706197A (en) * 2021-08-26 2021-11-26 西安交通大学 Multi-microgrid electric energy transaction pricing strategy and system based on reinforcement and simulation learning
CN113992350A (en) * 2021-09-24 2022-01-28 杭州意能电力技术有限公司 Smart grid false data injection attack detection system based on deep learning
CN114036506A (en) * 2021-11-05 2022-02-11 东南大学 Method for detecting and defending false data injection attack based on LM-BP neural network
CN114048989A (en) * 2021-11-05 2022-02-15 浙江工业大学 Deep reinforcement learning-based power system sequence recovery method and device
CN114089627A (en) * 2021-10-08 2022-02-25 北京师范大学 Non-complete information game strategy optimization method based on double-depth Q network learning
US11275646B1 (en) * 2019-03-11 2022-03-15 Marvell Asia Pte, Ltd. Solid-state drive error recovery based on machine learning
CN114243799A (en) * 2022-01-05 2022-03-25 国网浙江省电力有限公司宁波供电公司 Deep reinforcement learning power distribution network fault recovery method based on distributed power supply

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108429259B (en) * 2018-03-29 2019-10-18 山东大学 A kind of online dynamic decision method and system of unit recovery

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101382978A (en) * 2008-10-30 2009-03-11 中国人民解放军国防科学技术大学 Method for early alarming by-path attack in safety chip
US10193902B1 (en) * 2015-11-02 2019-01-29 Deep Instinct Ltd. Methods and systems for malware detection
US11275646B1 (en) * 2019-03-11 2022-03-15 Marvell Asia Pte, Ltd. Solid-state drive error recovery based on machine learning
CN110334507A (en) * 2019-06-18 2019-10-15 北京中科物联安全科技有限公司 A kind of method, apparatus and electronic equipment detecting network system safety
CN111376954A (en) * 2020-06-01 2020-07-07 北京全路通信信号研究设计院集团有限公司 Train autonomous scheduling method and system
CN112202736A (en) * 2020-09-15 2021-01-08 浙江大学 Industrial control system communication network abnormity classification method based on statistical learning and deep learning
CN112381359A (en) * 2020-10-27 2021-02-19 惠州蓄能发电有限公司 Multi-critic reinforcement learning power economy scheduling method based on data mining
CN112491818A (en) * 2020-11-12 2021-03-12 南京邮电大学 Power grid transmission line defense method based on multi-agent deep reinforcement learning
CN112800420A (en) * 2020-12-30 2021-05-14 南京理工大学 False data injection attack strategy evaluation method for alternating current-direct current hybrid system
CN113361132A (en) * 2021-06-28 2021-09-07 浩鲸云计算科技股份有限公司 Air-cooled data center energy-saving method based on deep Q learning block network
CN113706197A (en) * 2021-08-26 2021-11-26 西安交通大学 Multi-microgrid electric energy transaction pricing strategy and system based on reinforcement and simulation learning
CN113992350A (en) * 2021-09-24 2022-01-28 杭州意能电力技术有限公司 Smart grid false data injection attack detection system based on deep learning
CN114089627A (en) * 2021-10-08 2022-02-25 北京师范大学 Non-complete information game strategy optimization method based on double-depth Q network learning
CN114036506A (en) * 2021-11-05 2022-02-11 东南大学 Method for detecting and defending false data injection attack based on LM-BP neural network
CN114048989A (en) * 2021-11-05 2022-02-15 浙江工业大学 Deep reinforcement learning-based power system sequence recovery method and device
CN114243799A (en) * 2022-01-05 2022-03-25 国网浙江省电力有限公司宁波供电公司 Deep reinforcement learning power distribution network fault recovery method based on distributed power supply

Also Published As

Publication number Publication date
CN115118477A (en) 2022-09-27

Similar Documents

Publication Publication Date Title
Li et al. A data-driven residual-based method for fault diagnosis and isolation in wind turbines
CN112884131A (en) Deep reinforcement learning strategy optimization defense method and device based on simulation learning
CN115118477B (en) Smart grid state recovery method and system based on deep reinforcement learning
CN113947016B (en) Vulnerability assessment method for deep reinforcement learning model in power grid emergency control system
CN113276852B (en) Unmanned lane keeping method based on maximum entropy reinforcement learning framework
CN107085764A (en) A kind of load decomposition method and device based on improvement DFHMM models
Oozeer et al. Cognitive dynamic system for control and cyber-attack detection in smart grid
CN113298252B (en) Deep reinforcement learning-oriented strategy anomaly detection method and device
CN114839884B (en) Underwater vehicle bottom layer control method and system based on deep reinforcement learning
CN116186643B (en) Multi-sensor collaborative target tracking method, system, equipment and medium
CN113765880A (en) Power system network attack detection method based on space-time correlation
CN114154397B (en) Implicit opponent modeling method based on deep reinforcement learning
CN114415056A (en) Distributed power supply island fault detection method
Deng et al. Real-time detection of false data injection attacks based on load forecasting in smart grid
CN116540665A (en) Multi-unmanned aerial vehicle system safety control method based on unknown input observer
CN115459982A (en) Power network false data injection attack detection method
CN113344071B (en) Intrusion detection algorithm based on depth strategy gradient
CN116055209A (en) Network attack detection method based on deep reinforcement learning
CN104715343A (en) Method for evaluating adequacy of electric power system on basis of reinforcement learning and self-adapting sequential importance sampling
ALmutairy et al. Identification and correction of false data injection attacks against AC state estimation using deep learning
Li et al. Covid-19 Epidemic Trend Prediction Based on CNN-StackBiLSTM
Zhou et al. Temporal-adaptive hierarchical reinforcement learning
Hu et al. Convolutional Neural Network Based Power Information Network Security Situational Awareness Model
CN118365099B (en) Multi-AGV scheduling method, device, equipment and storage medium
CN117584792B (en) Online prediction method and system for charging power of electric vehicle charging station

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant