CN114048989B - Power system sequence recovery method and device based on deep reinforcement learning - Google Patents

Power system sequence recovery method and device based on deep reinforcement learning Download PDF

Info

Publication number
CN114048989B
CN114048989B CN202111305997.8A CN202111305997A CN114048989B CN 114048989 B CN114048989 B CN 114048989B CN 202111305997 A CN202111305997 A CN 202111305997A CN 114048989 B CN114048989 B CN 114048989B
Authority
CN
China
Prior art keywords
power system
network
recovery
reinforcement learning
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111305997.8A
Other languages
Chinese (zh)
Other versions
CN114048989A (en
Inventor
高宇馨
黄伟
张添益
程威
黄泽真
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202111305997.8A priority Critical patent/CN114048989B/en
Publication of CN114048989A publication Critical patent/CN114048989A/en
Application granted granted Critical
Publication of CN114048989B publication Critical patent/CN114048989B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06316Sequencing of tasks or work
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Health & Medical Sciences (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Educational Administration (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Primary Health Care (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses a power system sequential recovery method and device based on deep reinforcement learning. Based on the electric power network after cascade failure, the recovery capability of the electric power network system to cascade failure in the system recovery process is evaluated through the bus recovery sequence obtained after deep reinforcement learning, reinforcement learning is combined with the electric power network, the recovery problem of the electric power network is considered in the angle of an defender, and the combination of the reinforcement learning and the neural network expands the realization range of the electric power network, so that the recovery optimal strategy of the large-scale electric network can be found.

Description

Power system sequence recovery method and device based on deep reinforcement learning
Technical Field
The application belongs to the technical field of power system cascade failure recovery, and particularly relates to a power system sequence recovery method and device based on deep reinforcement learning.
Background
The power grid is an important infrastructure of a modern civilization society, large-scale interconnection of the power grid has become a necessary trend of development of power systems worldwide, and safe operation of the power grid has become effective guarantee of efficient operation of social, economic and life. But grid cascading failures and blackout incidents present challenges to the safe operation of the grid. In complex grids, the evolution from an initial local fault to an avalanche type cascade fault often results in catastrophic consequences of a large area breakdown of the grid. Because the fault process has randomness and unpredictability, the recovery of cascading faults is the basis and key of complex power network construction.
Most of the existing researches are from the point of view of an attacker, the description of an defender is less, and from the point of view of the defender, the problem of recovering cascade failure is considered, so that the method has more practicability for the high development of a modern power network.
Disclosure of Invention
The application aims to provide a deep reinforcement learning-based power system sequence recovery method and device, which are used for smoothly recovering cascading faults.
In order to achieve the above purpose, the technical scheme of the application is as follows:
a power system sequence recovery method based on deep reinforcement learning comprises the following steps:
constructing a power system recovery model comprising a deep reinforcement learning Q value estimation network and a Target Q network, and initializing the Q value estimation network, the Target Q network and an experience playback pool;
Acquiring a power system data set for training, randomly selecting and deleting a preset number of buses in the power system data set to serve as initial bus states, randomly selecting one bus state to serve as a current state to be input into a Q value estimation network, selecting actions according to an epsilon greedy strategy, executing the actions, generating corresponding state information of rewards and the next moment, and putting the current bus state, the actions, the rewards and the next moment state serving as a training sample into an experience playback pool;
sampling and extracting training samples from the experience playback pool according to the sample selection interval, training a Q value estimation network by adopting the acquired training samples, and updating network parameters of a Target Q network by adopting network parameters of the Q value estimation network until the preset cycle times are reached;
Inputting the bus state of the electric power system after the cascade failure into a trained electric power system recovery model, acquiring recovery actions, and recovering the electric power system after the cascade failure.
Further, the training Q value estimation network using the obtained training samples employs the following loss function:
Where γ is the attenuation factor, max a′Q(sj+1, a'; θ') is the cumulative award after the Target Q-value network performs the optimal action when the state s j+1 is entered, Q (s j,aj, θ) is the cumulative award after the Q-value estimation network performs the action a j when the state s j is entered, a j is the action to be selectively performed at the time j, and r j is the immediate award generated after the action is performed at the time j. a ' represents one of all possible actions that may be performed, the optimal action being the action that is performed when Q (s j+1, a '; θ ') is at a maximum.
Further, the step of inputting the bus state of the power system after the cascade failure into a trained power system recovery model, obtaining a recovery action, and after recovering the power system after the cascade failure, further includes:
and (5) island detection is carried out, and the island and the power transmission line are deleted from the power system.
Further, the step of inputting the bus state of the power system after the cascade failure into a trained power system recovery model, obtaining a recovery action, and after recovering the power system after the cascade failure, further includes:
And (5) carrying out power rescheduling to realize load balancing.
Further, the step of inputting the bus state of the power system after the cascade failure into a trained power system recovery model, obtaining a recovery action, and after recovering the power system after the cascade failure, further includes:
recalculating the power flow of each power transmission line in the DC power flow model based on the DC power flow model;
each transmission line is monitored, a line with a power flow exceeding the line capacity is defined as an overload line, and if the line is overloaded, the line with the largest overload is selected for tripping.
The application also provides a power system sequence recovery device based on the deep reinforcement learning, which comprises a processor and a memory storing a plurality of computer instructions, wherein the computer instructions realize the steps of the power system sequence recovery method based on the deep reinforcement learning when being executed by the processor.
According to the electric power system sequence recovery method and device based on deep reinforcement learning, based on the electric power network after cascade failure, the recovery capacity of the electric power network system to cascade failures in the system recovery process is evaluated through the bus recovery sequence obtained after the deep reinforcement learning, reinforcement learning is combined with the electric power network, the recovery problem of the electric power network is considered in the angle of defenders, and the combination of the electric power network and the neural network expands the implementation range of the electric power network, so that the recovery optimal strategy of a large-scale electric network can be found.
Drawings
FIG. 1 is a flow chart of a method for sequentially recovering an electric power system based on deep reinforcement learning;
FIG. 2 is a schematic diagram of the operation of the recovery model of the power system of the present application.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The application approximates the power flow on each network component by adopting the DC power flow model to evaluate the recovery capability of the power system in the cascade failure process. In a power network system, overload of a power transmission line causes the power transmission line to be cut off, unbalance between power generation and demand causes node tripping, the two aspects cause node failure, and a cascade failure mechanism based on a DC power flow model is constructed. Considering a sequential topology restoration process in the context of a cascading failure of a power system, among a plurality of steps of sequential restoration, a next restoration step is set after a cascading process triggered by a previous action. In sequential topology restoration, the budget is limited by quantifying the number of components repaired, assuming that the cost of restoring each bus is equal.
The application takes cascading failure of a large-scale power network as a background, evaluates the recovery capability of a smart grid, aims at recovering the whole power network, considers the network architecture and the power balance constraint, and finds the optimal node recovery sequence by utilizing the topology information of the power network under the framework of Deep Q-Learning.
As shown in fig. 1, there is provided a deep reinforcement learning-based power system sequential recovery method, including:
and S1, constructing a power system recovery model comprising a deep reinforcement learning Q value estimation network and a Target Q network, and initializing the Q value estimation network, the Target Q network and an experience playback pool.
As shown in fig. 2, the power system recovery model of the present application includes a Q value estimation network and a Target Q network. The deep reinforcement learning Q value estimation network comprises 1 input layer, 3 convolution layers, 2 full connection layers and 1 output layer, and network parameters are theta. The deep reinforcement learning Target Q network is completely consistent with the Q value estimation network, and the network parameter is theta ', wherein theta' =. Q-learning is a relatively mature technology in deep learning, after inputting a state s t and an action a t, executing the action a t returns a new state s t+1 and a reward r t(st,at), also abbreviated as r t, and the optimal action can be found through continuous deep learning.
The application also initializes an empirical playback pool D, which can accommodate a number of data stripes M. And initializing training parameters, namely greedy strategy factor epsilon, learning rate alpha, attenuation factor gamma, sample selection interval T, sample selection number N, iteration maximum number K and neural network weight assignment interval C.
Variables, actions, and targets in the power system network are defined as states, actions, and rewards in the deep reinforcement learning network:
s t denotes a set of bus states, S t={St(1),St(2)…St(N)},St (b) denotes one state in the set, b belongs to {1, …, N };
Wherein the method comprises the steps of Namely, if the corresponding state of one bus is 1 when online service is performed, the corresponding transition state is 0 when offline;
a t denotes the set of actions at time t, a t={a1,a2…aj, where a i denotes switching the state of bus i from 0 to 1 and adding the bus to the system by reestablishing its connection to the system. r t(st,at) represents an immediate reward for performing action a t at state s t, i.e., defining the relative number of branches of the online service in the system remaining after the recovery action is performed as a reward.
The bus is a collection of bus bars, and when the transmission line or the bus bar is physically damaged, the state is changed from 1 to 0. When the bus is disconnected or restored, there is a probability that other lines will collapse due to overload, when this occurs, and when the line collapse reaches a certain threshold of all lines, it is defined as a cascading failure.
Step S2, acquiring a power system data set for training, randomly selecting and deleting a preset number of buses in the power system data set to serve as initial bus states, randomly selecting one bus state to serve as a current state to be input into a Q value estimation network, selecting actions according to an epsilon greedy strategy, executing the actions, generating corresponding rewards and state information of the next moment, and putting the current bus state, the actions, the rewards and the state of the next moment into an experience playback pool as a training sample.
Specifically, a training data set is prepared first, and the data set is preprocessed. The application selects subset IEEE 2383-bus data in a dataset MATPOWER as a data sample, specifically comprises 2383 buses, 2896 branches and 327 generators, stores topology information of the electric power system in a matrix form, and comprises a 2383x22 double-type bus matrix, a 2896x23 double-type branch matrix, a 1826x12 double-type branch matrix and a 2383x2383 double-type bus sparse adjacent matrix, and randomly selects and deletes 10% of buses in the data sample to obtain residual nodes after cascade failure and takes the residual nodes as an initial state.
The application randomly selects a bus state s t, inputs the bus state s t into a Q value estimation network to obtain the accumulated rewardsThe method is mainly used for calculating the accumulated rewards of n actions. Where r t(st,at) represents the immediate remuneration of performing action a t at state s t, i.e., defining the relative number of online service branches in the system remaining after the recovery action is performed as a reward. Gamma is an adjustable constant and if gamma=1, the accumulated rewards per action will be treated equally, if gamma=0, the rewards of the first action only are considered in the jackpot. In order to allow convergence of the jackpot Q during reinforcement learning, γ is set to a constant slightly less than 1, γ t-1 means that over time, the bonus effect of follow-up actions decreases when calculating the jackpot. Action a t is selected according to the epsilon greedy strategy, a reward r t is generated, and the next state s t+1 is skipped, and the data (s t,at,rt(st,at),st+1) is put into the experience playback pool D as a training sample.
Training samples are continuously generated and the experience playback pool is accessed, and when the capacity of the experience playback pool is exceeded, old data is popped up and new data is added.
And S3, sampling and extracting training samples from the experience playback pool according to the sample selection interval, training the Q value estimation network by adopting the acquired training samples, and updating the network parameters of the Target Q network by adopting the network parameters of the Q value estimation network until the preset cycle times are reached.
Specifically, sample data is sampled from the experience playback pool D uniformly and randomly, for example, N training samples are sampled each time, and are input into the Q value estimation network for training. For example, for training samples (s j,aj,rj(sj,aj),sj+1), after input to the Q-value estimation network for training, a loss function is calculated:
Where γ is the attenuation factor, max a′Q(sj+1, a'; θ') is the cumulative award after the Target Q-value network performs the optimal action when the state s j+1 is entered, Q (s j,aj, θ) is the cumulative award after the Q-value estimation network performs the action a j when the state s j is entered, a j is the action to be selectively performed at the time j, and r j is the immediate award generated after the action is performed at the time j. a ' represents one of all possible actions that may be performed, the optimal action being the action that is performed when Q (s j+1, a '; θ ') is at a maximum.
To minimize the loss function, a gradient descent algorithm is performed and the network parameters θ of the Q-value estimation network are updated: and updates the network parameter θ=θ+Δθ of the Q function approximation accordingly.
The application acquires training samples from the experience playback pool D at intervals of sample selection T for training. In the training process, the network parameters of the Q value estimation network are continuously updated.
The network parameters of the Target Q network, i.e., θ' =θ, are updated every C steps after C training samples are continuously acquired.
Training is performed in this way until the end of K cycles is reached.
And S4, inputting the bus state of the electric power system after the cascade failure into a trained electric power system recovery model, acquiring recovery actions, and recovering the electric power system after the cascade failure.
The method comprises the steps of training a Q value estimation network, updating network parameters of a Target Q network, adopting a trained power system recovery model, inputting the power system bus state after cascade failure into the trained power system recovery model, outputting an optimal recovery action by the power system recovery model, executing the recovery action, and recovering the power system after cascade failure.
When the network model finds the optimal strategy, that is, the optimal recovery sequence of the bus is found, the recovery of the bus changes the topology structure of the power network and causes load flow change, which may cause problems such as line overload and grid island (the transmission line connected with the bus can normally run after the recovery of the bus is not performed), so that the recovery capability of the power network to cascade faults in the system recovery process needs to be evaluated.
The application relates to a deep reinforcement learning-based power system sequence recovery method, which further comprises the following steps:
Step S4.1, initializing the power system, initializing the initial load of the power system, and setting the upper and lower power limits of the power generation line according to the actual, maximum and minimum power output of the power generation node Wherein/> And/>The maximum, minimum and original actual power output of the generating bus i, respectively, and α is the generator power ramp parameter.
And S4.2, island detection is carried out, and the island and the power transmission line are deleted from the power system.
In the cascade failure process, the power system may generate an island effect, that is, under the condition of power loss of the power network, the generator is used as an isolated power source to supply power to the load, and in the case, the island and the power transmission line are deleted from the system;
And S4.3, performing power rescheduling to realize load balancing.
First, under the limits of the minimum and maximum power of the generator, the generator is allowed to increase or decrease to match the supply and demand as closely as possible. These limits are set as the maximum limit in the upper and lower power generation limits of the machine or the ramp rate of the machine multiplied by the amount of time since the last power flow calculation. Thus, for the case where the time interval between power flow calculations is longer, the generator is allowed to increase/decrease the output over a larger range of values. If after rescheduling the generator, the remaining power generation is still more than the load, starting from the smallest machine, tripping the generator from the machine in turn until the load is balanced;
and S4.4, recalculating the power flow of each power transmission line in the DC power flow model based on the DC power flow model.
And S4.5, monitoring each power transmission line, defining a line with power flow exceeding the line capacity as an overload line, and selecting a line with the largest overload to trip if the line is overloaded.
Monitoring each power transmission line, defining a line with power flow exceeding the line capacity as an overload line, namely defining the line power flow of each line l as F l, and defining the line with power flow exceeding C l,Fl-Cl >0 as the overload line; in each iteration process, if the line is overloaded, selecting the line with the largest overload to trip, and returning to the initial step again; otherwise, stopping the cascade failure process.
In another embodiment, the application also provides a power system sequence restoration device based on deep reinforcement learning, which comprises a processor and a memory storing a plurality of computer instructions, wherein the computer instructions realize the steps of the power system sequence restoration method based on the deep reinforcement learning when being executed by the processor.
For specific limitations on the deep reinforcement learning-based power system sequential recovery device, reference may be made to the above limitation on the deep reinforcement learning-based power system sequential recovery method, and the description thereof will not be repeated here. The power system sequence restoration device based on deep reinforcement learning can be fully or partially realized by software, hardware and a combination thereof. May be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor invokes the corresponding operations.
The memory and the processor are electrically connected directly or indirectly to each other for data transmission or interaction. For example, the components may be electrically connected to each other by one or more communication buses or signal lines. The memory stores a computer program that can be executed on a processor that implements the network topology layout method in the embodiment of the present invention by executing the computer program stored in the memory.
The Memory may be, but is not limited to, random access Memory (Random Access Memory, RAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc. The memory is used for storing a program, and the processor executes the program after receiving an execution instruction.
The processor may be an integrated circuit chip having data processing capabilities. The processor may be a general-purpose processor including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), and the like. The methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (5)

1. The power system sequence recovery method based on the deep reinforcement learning is characterized by comprising the following steps of:
constructing a power system recovery model comprising a deep reinforcement learning Q value estimation network and a Target Q network, and initializing the Q value estimation network, the Target Q network and an experience playback pool;
Acquiring a power system data set for training, randomly selecting and deleting a preset number of buses in the power system data set to serve as initial bus states, randomly selecting one bus state to serve as a current state to be input into a Q value estimation network, selecting actions according to an epsilon greedy strategy, executing the actions, generating corresponding state information of rewards and the next moment, and putting the current bus state, the actions, the rewards and the next moment state serving as a training sample into an experience playback pool;
sampling and extracting training samples from the experience playback pool according to the sample selection interval, training a Q value estimation network by adopting the acquired training samples, and updating network parameters of a Target Q network by adopting network parameters of the Q value estimation network until the preset cycle times are reached;
inputting the bus state of the electric power system after the cascade failure into a trained electric power system recovery model, acquiring recovery actions, and recovering the electric power system after the cascade failure;
Inputting the bus state of the electric power system after the cascade failure into a trained electric power system recovery model, acquiring recovery actions, and recovering the electric power system after the cascade failure, wherein the method further comprises the following steps:
recalculating the power flow of each power transmission line in the DC power flow model based on the DC power flow model;
each transmission line is monitored, a line with a power flow exceeding the line capacity is defined as an overload line, and if the line is overloaded, the line with the largest overload is selected for tripping.
2. The deep reinforcement learning-based power system sequence restoration method according to claim 1, wherein the training Q value estimation network using the acquired training samples employs the following loss function:
Where gamma is the attenuation factor and where, Is the cumulative award after the Target Q-value network performs the optimal action when the state s j+1 is entered, Q (s j,aj, θ) is the cumulative award after the Q-value estimation network performs the action a j when the state s j is entered, a j is the action selected to be performed at the time j, r j is the instant award generated after the action is performed at the time j, a ' represents one of all possible actions performed, and the optimal action is the action performed when Q (s j+1, a '; θ ') is the maximum.
3. The deep reinforcement learning-based power system sequential recovery method according to claim 1, wherein the step of inputting the power system bus state after the cascade failure into a trained power system recovery model to obtain a recovery action, and after recovering the power system after the cascade failure, further comprises:
and (5) island detection is carried out, and the island and the power transmission line are deleted from the power system.
4. The deep reinforcement learning-based power system sequential recovery method according to claim 1, wherein the step of inputting the power system bus state after the cascade failure into a trained power system recovery model to obtain a recovery action, and after recovering the power system after the cascade failure, further comprises:
And (5) carrying out power rescheduling to realize load balancing.
5. A deep reinforcement learning based power system sequence restoration device comprising a processor and a memory storing a number of computer instructions, wherein the computer instructions when executed by the processor implement the steps of the method of any one of claims 1 to 4.
CN202111305997.8A 2021-11-05 2021-11-05 Power system sequence recovery method and device based on deep reinforcement learning Active CN114048989B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111305997.8A CN114048989B (en) 2021-11-05 2021-11-05 Power system sequence recovery method and device based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111305997.8A CN114048989B (en) 2021-11-05 2021-11-05 Power system sequence recovery method and device based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN114048989A CN114048989A (en) 2022-02-15
CN114048989B true CN114048989B (en) 2024-04-30

Family

ID=80207309

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111305997.8A Active CN114048989B (en) 2021-11-05 2021-11-05 Power system sequence recovery method and device based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN114048989B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115118477B (en) * 2022-06-22 2024-05-24 四川数字经济产业发展研究院 Smart grid state recovery method and system based on deep reinforcement learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109193646A (en) * 2018-10-22 2019-01-11 西南交通大学 Distribution network failure recovery scheme objective evaluation method based on induced ordered weighted averaging operator
CN111709672A (en) * 2020-07-20 2020-09-25 国网黑龙江省电力有限公司 Virtual power plant economic dispatching method based on scene and deep reinforcement learning
CN112636357A (en) * 2020-12-10 2021-04-09 南京理工大学 Power grid vulnerability analysis method based on reinforcement learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107257167B (en) * 2014-05-27 2020-01-21 松下知识产权经营株式会社 Power transmission device and wireless power transmission system
US11900031B2 (en) * 2019-08-15 2024-02-13 State Grid Smart Research Institute Co., Ltd. Systems and methods of composite load modeling for electric power systems

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109193646A (en) * 2018-10-22 2019-01-11 西南交通大学 Distribution network failure recovery scheme objective evaluation method based on induced ordered weighted averaging operator
CN111709672A (en) * 2020-07-20 2020-09-25 国网黑龙江省电力有限公司 Virtual power plant economic dispatching method based on scene and deep reinforcement learning
CN112636357A (en) * 2020-12-10 2021-04-09 南京理工大学 Power grid vulnerability analysis method based on reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于强化学习的电力通信网故障恢复方法;贾惠彬 等;中国电力;20200630;53(第06期);33-40 *

Also Published As

Publication number Publication date
CN114048989A (en) 2022-02-15

Similar Documents

Publication Publication Date Title
CN112287504B (en) Offline/online integrated simulation system and method for power distribution network
Ni et al. A reinforcement learning approach for sequential decision-making process of attacks in smart grid
Levitin et al. Optimal mission aborting in multistate systems with storage
Mohammadi et al. Machine learning assisted stochastic unit commitment during hurricanes with predictable line outages
Li et al. Integrating reinforcement learning and optimal power dispatch to enhance power grid resilience
Silva et al. An effective algorithm for computing all‐terminal reliability bounds
CN114048989B (en) Power system sequence recovery method and device based on deep reinforcement learning
Wang et al. Data-driven prediction method for characteristics of voltage sag based on fuzzy time series
CN113239534A (en) Fault and service life prediction method and device of wind generating set
Lin et al. A new approach to power system fault diagnosis based on fuzzy temporal order Petri nets
Hassani et al. Real-time out-of-step prediction control to prevent emerging blackouts in power systems: A reinforcement learning approach
Gautam et al. A deep reinforcement learning-based approach to post-disaster routing of movable energy resources
Stanly Jayaprakash et al. Deep q-network with reinforcement learning for fault detection in cyber-physical systems
Bi et al. Efficient multiway graph partitioning method for fault section estimation in large-scale power networks
CN116991615A (en) Cloud primary system fault self-healing method and device based on online learning
Gautam et al. Reconfiguration of distribution networks for resilience enhancement: A deep reinforcement learning-based approach
CN116683431A (en) Rapid power distribution system restoring force assessment index and assessment method and system
Mohanta et al. Importance and uncertainty analysis in software reliability assessment of computer relay
CN115986729A (en) Solving method, device and equipment based on source network load storage data driving model
Ma et al. A reliability allocation method based on Bayesian networks and Analytic Hierarchy Process
Li et al. Power distribution network reconfiguration for bounded transient power loss
Shouman et al. Hybrid mean variance mapping optimization for dynamic economic dispatch with valve point effects
CN110717079B (en) Electricity price partitioning method and device, computer equipment and storage medium
CN108629417B (en) A kind of couple of DUCG carries out the high efficiency method of Layering memory reasoning
CN112070200B (en) Harmonic group optimization method and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant