CN113872213B - Autonomous optimization control method and device for power distribution network voltage - Google Patents

Autonomous optimization control method and device for power distribution network voltage Download PDF

Info

Publication number
CN113872213B
CN113872213B CN202111054034.5A CN202111054034A CN113872213B CN 113872213 B CN113872213 B CN 113872213B CN 202111054034 A CN202111054034 A CN 202111054034A CN 113872213 B CN113872213 B CN 113872213B
Authority
CN
China
Prior art keywords
time
network
voltage
agent
ith
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111054034.5A
Other languages
Chinese (zh)
Other versions
CN113872213A (en
Inventor
周俊
赵景涛
丁孝华
蔡月明
刘明祥
张强
何钊睿
王文轩
陈琛
周强
孙建东
封士永
陈亚楼
樊轶
刘遐龄
张世栋
宋祺鹏
张林利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Online Shanghai Energy Internet Research Institute Co ltd
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
NARI Nanjing Control System Co Ltd
State Grid Electric Power Research Institute
Original Assignee
China Online Shanghai Energy Internet Research Institute Co ltd
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
NARI Nanjing Control System Co Ltd
State Grid Electric Power Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Online Shanghai Energy Internet Research Institute Co ltd, Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd, NARI Nanjing Control System Co Ltd, State Grid Electric Power Research Institute filed Critical China Online Shanghai Energy Internet Research Institute Co ltd
Priority to CN202111054034.5A priority Critical patent/CN113872213B/en
Publication of CN113872213A publication Critical patent/CN113872213A/en
Application granted granted Critical
Publication of CN113872213B publication Critical patent/CN113872213B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/12Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/04Circuit arrangements for ac mains or ac distribution networks for connecting networks of the same frequency but supplied from different sources
    • H02J3/06Controlling transfer of power between connected networks; Controlling sharing of load between connected networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/18Arrangements for adjusting, eliminating or compensating reactive power in networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/28Arrangements for balancing of the load in a network by storage of energy
    • H02J3/32Arrangements for balancing of the load in a network by storage of energy using batteries with converting means
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J7/00Circuit arrangements for charging or depolarising batteries or for supplying loads from batteries
    • H02J7/34Parallel operation in networks using both storage and other dc sources, e.g. providing buffering
    • H02J7/345Parallel operation in networks using both storage and other dc sources, e.g. providing buffering using capacitors as storage or buffering devices
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/10Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2207/00Indexing scheme relating to details of circuit arrangements for charging or depolarising batteries or for supplying loads from batteries
    • H02J2207/50Charging of capacitors, supercapacitors, ultra-capacitors or double layer capacitors
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/30Reactive power compensation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E70/00Other energy conversion or management systems reducing GHG emissions
    • Y02E70/30Systems combining energy storage with energy generation of non-fossil origin

Landscapes

  • Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Supply And Distribution Of Alternating Current (AREA)
  • Control Of Electrical Variables (AREA)

Abstract

The invention discloses an autonomous optimal control method and device for power distribution network voltage, wherein the method comprises the steps of constructing a power distribution network double-time-scale optimal voltage control model based on various voltage regulating devices; converting the voltage control problem into a markov decision process; constructing a depth Q network agent based on discrete control variables in the Markov decision process, and constructing a depth deterministic strategy gradient agent based on continuous variable; constructing a training set and a testing set based on historical state data of the pressure regulating device, and training an intelligent agent; and inputting the state data of the voltage regulating device obtained in real time into a trained deep Q network intelligent agent and a trained deep deterministic strategy gradient intelligent agent to obtain an output action, and autonomously controlling the voltage. The method establishes a double-time scale voltage autonomous optimization control scheme, combines active adjustment and reactive adjustment, and realizes the optimal control of various voltage regulating devices.

Description

Autonomous optimization control method and device for power distribution network voltage
Technical Field
The invention relates to the technical field of electrical engineering, in particular to an autonomous optimal control method and device for power distribution network voltage.
Background
The development of permeability distributed photovoltaics has faced voltage regulation problems for power distribution networks. Typically, the voltage distribution in a power distribution network is regulated by controlling slow-speed regulation devices, such as on-load tap changers (OLTC) and parallel capacitors, and fast-speed regulation devices, such as photovoltaic inverters and Static Var Compensators (SVC), but these devices only regulate the distribution of reactive power in the power network. However, active power flow also affects node voltages in the distribution network. Thus, active and reactive power control of different devices should be considered to mitigate potential voltage violations.
The lack of a measurement system in the conventional power distribution network results in insufficient information on both sides, so that a model-based method is generally adopted for voltage control, and the voltage control is highly dependent on an accurate physical model. Essentially, voltage control through active and reactive optimization is a highly nonlinear programming problem with multiple variables and constraints. Solving such problems using classical optimization resolution methods (e.g., second order cone relaxation techniques and dual theory) is often limited by the number of variables and may even fail when the power distribution network is too complex. Thus, heuristic algorithms are applied to solve such problems as particle swarm optimization, genetic algorithms, etc. However, these algorithms have the disadvantages of high randomness, long search time, easy sinking into local optimal solution, and the like, and cannot meet the real-time voltage control of a fast time scale. In addition, in the classical analysis method and the heuristic algorithm, each optimization solution is mutually independent, and if the actual running condition (such as the output of a distributed power supply) is slightly changed, the previous optimization result cannot be fully utilized, so that quick solution is realized.
In recent years, the continuous development of artificial intelligence technology, particularly Deep Reinforcement Learning (DRL) technology and successful experience thereof in different fields have attracted relevant scholars to explore their application in power systems. As a branch of reinforcement learning theory, DRL adopts "trial and error" mechanism to interact with dynamic environment, finding the best strategy for the agent. It has great advantages in solving complex multivariable problems. Meanwhile, the coverage areas of a monitoring and data acquisition (SCADA) unit and a Phasor Measurement Unit (PMU) unit are enlarged, and the construction of the Internet of things technology provides an effective way for voltage control based on DRL.
Q-learning has been studied to solve the reactive power optimization problem, but Q-learning is easy to sink into the dimensional curse and is only suitable for the problem that the action space and the state space are discrete. Inspired by the strong exploration ability of the Neural Network (NN) for high-dimensional search space, the Deep Q Network (DQN) uses NN to approach the action value function to process the continuous state domain, and can be used to process the problems of continuous state space and discrete action space. Furthermore, to address the problem of sequential states and motion space, depth deterministic strategy gradients (DDPGs) use two neural networks to approximate the strategy function and the motion value function. However, existing DRL voltage control methods focus only on reactive power control, cannot handle discrete and continuous control variables simultaneously, and cannot achieve control of different devices in different time scales.
Disclosure of Invention
The invention aims to provide an autonomous voltage optimization control method and device for a power distribution network, which are used for realizing the combination of active adjustment and reactive adjustment and the optimal control of various voltage regulating devices by establishing an autonomous voltage optimization control scheme with double time scales.
In order to achieve the above purpose, the invention adopts the following technical scheme:
the invention provides an autonomous optimization control method for power distribution network voltage, which comprises the following steps:
converting the voltage optimization control problem into a Markov decision process based on a pre-established power distribution network double-time-scale optimal voltage control model;
constructing a depth Q network agent based on discrete control variables in the Markov decision process, and constructing a depth deterministic strategy gradient agent based on continuous variables in the Markov decision process;
constructing a training set and a testing set based on historical state data of the voltage regulating device, and training the depth Q network agent and the depth deterministic strategy gradient agent; the voltage regulating device comprises a capacitor bank, a photovoltaic inverter and an energy storage device;
and inputting the state data of the voltage regulating device obtained in real time into a trained deep Q network intelligent agent and a trained deep deterministic strategy gradient intelligent agent to obtain an output action, and optimally controlling the voltage of the power distribution network.
Further, the constructing a power distribution network double-time scale optimal voltage control model based on a plurality of voltage regulating devices comprises the following steps:
dividing the whole time period into a plurality of intervals, defining the intervals as long time scales, dividing each interval into a plurality of time slots, defining the time slots as short time scales, and establishing a power distribution network double-time-scale optimal voltage control model as follows:
the constraint conditions to be satisfied are as follows:
P j (T,t)=P L,j (T,t)+P Batt,j (T,t)-P PV,j (T,t);
Q j (T,t)=Q L,j (T,t)-Q Cap,j (T,t)-Q PV,j (T,t);
wherein U is i (T, T) represents the voltage amplitude at bus node i, U j (T, T) represents the magnitude of the voltage at the busbar node j, T is a short time scale, T is a long time scale,mean value operation, N T Representing the number of long time scales, N t Represents the number of short time scales divided by a long time scale, N represents the number of bus nodes, a cap,i (T) is the ith capacitor control variable, representing the on-off state of the capacitor, a batt,i (T, T) is the control variable of the ith energy storage device at the moment T, a pv,i (T, T) is the control variable of the ith photovoltaic inverter at the moment T, Q Cap,i (T, T) is the reactive power at time T of the ith capacitor,/and>for the ith capacitor reactive power nameplate value, SOC i (T, T) is the state of charge at time T of the ith energy storage device, P Batt,i (T, T) is the charge-discharge power of the ith energy storage device at time T, +. >Maximum charge/discharge power of the ith energy storage device, SOC i,min 、SOC i,max For the minimum, maximum safety capacity of the ith energy storage device, < >>Rated capacity of ith photovoltaic inverter, Q PV,i (T, T) is the reactive output of the ith photovoltaic inverter at time T, < ->For the maximum reactive output value, P, at the moment t of the ith inverter PV,i (T, T) is the active output of the ith photovoltaic inverter at time T, P L,j (T, T) and Q L,j (T, T) represents the active load and the reactive load of the node j at the time T, I ij (T, T) is the current amplitude, r, on branch (i, j) at time T ij And x ij Is the resistance and reactance of branch (i, j), P ij (T, T) and Q ij (T, T) is the active and reactive power flowing from node i to node j at time T, ψ (j) and φ (j) are the parent and child busbar sets, respectively, { a cap (T) } represents the set of all capacitor control variables, { a batt (T, T) } represents the set of all energy storage device control variables, { a pv (T, T) } represents all photovoltaic inverter control variable sets.
Further, the converting the voltage optimization control problem into a markov decision process includes:
defining capacitor control variables, energy storage device control variables and photovoltaic inverter control variables as actions in a Markov decision process, defining the running state of the power distribution network as the state in the Markov decision process, taking a voltage control target as rewards in the Markov decision process, and converting a voltage optimization control problem into the Markov decision process;
Wherein the expected discount rewards after action a is taken at state s under policy μ in the markov decision process are represented by a Q function expressed as:
wherein s is t And a t State and operation at time t, Q μ (s t ,a t ) Represented in state s t Take action a down t Q value at the time, r k The prize value representing the kth transfer, gamma being the discount factor, T all For the length of each epoode in the training phase, s 0 And a 0 Is an initial value;
using the bellman equation, the Q function is expressed as:
solving the optimal strategy equivalent is solving an optimal Q function:
further, the discrete control variables in the markov decision process are capacitor control variables, and the deep Q network agent is constructed based on the discrete control variables in the markov decision process, including:
the state vector of the deep Q network agent is expressed as:
wherein s is cap (T) represents a state vector of deep Q network agents within a long time scale T,a vector representing the average active power composition of each busbar node over a long time scale T; a, a cap (T-1) represents a vector of individual capacitor control variables within the long time scale T-1;
the motion vector of the deep Q network agent is expressed as:
a cap (T)=[a cap,1 (T),a cap,2 (T),…,a cap,Ncap (T)] T
wherein a is cap (T) represents the capacitor control variable vector, a, over a long time scale T cap,i (T) represents the ith capacitor control variable, a, over a long time scale T cap,i (T) ∈ {0,1}, i=1, 2, …, ncap being the total number of capacitors;
the rewards of the deep Q network agent are as follows:
wherein r is cap (T) represents rewards of deep Q network agents within a long time scale T,
the act of selecting a capacitor using an epsilon-greedy strategy is expressed as:
wherein a is T For the currently selected capacitor to operate, A is the operation space, Q μ (s T ,a T ;θ Q ) As Q function, θ Q For Q network parameters, subscript T denotes the current time interval, ε [0,1 ]]Is constant, beta.epsilon.0, 1]Randomly generated.
Further, in the training process, a random gradient descent method is adopted, and the Q network parameters of the depth Q network agent are updated based on the minimum pool and a loss function, wherein the loss function is expressed as:
wherein L (θ) Q ) Representing the loss function, M is the number of experiences randomly sampled in the minimum pool, subscript i is the label of each experience in the minimum pool, and i.e. [1, M]A' represents the state of the deep Q network intelligent agents i+1 Action of the lower part, r i Indicating the prize value, θ' Q Is a parameter of the introduced target Q network.
Further, the continuous control variables in the markov decision process are photovoltaic inverter control variables and energy storage device control variables, and the depth deterministic strategy gradient agent is constructed based on the continuous control variables in the markov decision process, and comprises the following steps:
The depth deterministic strategy gradient agent adopts an actor network mu (s; theta) μ ) And critic network Q μ (s,a;θ Q ) Respectively simulating a strategy function and a Q function;
the state vector of the depth deterministic strategy gradient agent is expressed as:
s PVbatt (t)=[U T (t),SOC T (t)] T
wherein s is PVbatt (t) represents a state vector of the depth deterministic strategy gradient agent at the time t, U (t) represents a vector composed of voltage amplitude values of all bus nodes, and SOC (t) represents a vector composed of charge states of all energy storage devices at the time t;
the motion vector of the depth deterministic strategy gradient agent is expressed as:
wherein a is PVbatt (t) represents the motion vector of the depth deterministic strategy gradient agent at time t,representing the control variable of each photovoltaic inverter at the time tVectors of composition->A represents a vector composed of control variables of each energy storage device at the moment t PV,i (t) represents the ith photovoltaic inverter control variable at time t, a batt,i (t) represents the ith energy storage device control variable at time t, N PV And N batt Respectively the total number of the photovoltaic inverter and the energy storage device, and a PV,i (t)∈[-1,1],a batt,i (t)∈[-1,1];
The rewards of the depth deterministic strategy gradient agent are as follows:
wherein r is PVbatt (t) rewards of depth deterministic strategy gradient agent at t moment, U i (t) represents the voltage amplitude of the i bus node at the time t;
the following policies are employed to select actions:
a t =μ(s t ;θ μ )+ξ t
Wherein a is t For the action selected at time t, θ μ Is an actor network parameter, ζ t Is random noise.
Further, during training, the critic network is updated by minimizing the loss function as shown below:
wherein L (θ) Q ) Representing the loss function, gamma being the discount factor, theta Q For critic network parameters, M is the number of experiences randomly sampled in the minimum pool, subscript i is the label of each experience in the minimum pool, and i.e. [1, M],r i Represents the prize value, μ' (s; θ) μ' ) For incoming target actor network, θ μ' For the target actor network parameters, Q' μ' (s,a;θ Q' ) For guidingIncoming target critic network, θ Q' The target critic network parameters;
updating an Actor network using a policy gradient, expressed as:
updating the target network using:
wherein lambda is less than 1.
The invention also provides an autonomous power distribution network voltage optimization control device, which comprises:
the conversion module is used for converting the voltage optimization control problem into a Markov decision process based on a pre-established power distribution network double-time-scale optimal voltage control model;
the intelligent agent module is used for constructing a depth Q network intelligent agent based on discrete control variables in the Markov decision process and constructing a depth deterministic strategy gradient intelligent agent based on continuous variables in the Markov decision process;
The training module is used for constructing a training set and a testing set based on historical state data of the voltage regulating device and training the depth Q network agent and the depth deterministic strategy gradient agent; the voltage regulating device comprises a capacitor bank, a photovoltaic inverter and an energy storage device;
the method comprises the steps of,
and the optimization module is used for inputting the state data of the voltage regulating device obtained in real time into the trained deep Q network intelligent agent and the depth deterministic strategy gradient intelligent agent to obtain an output action and optimally controlling the voltage of the power distribution network.
Further, the device also comprises a construction module which is particularly used for the following steps,
dividing the whole time period into a plurality of intervals, defining the intervals as long time scales, dividing each interval into a plurality of time slots, defining the time slots as short time scales, and establishing a power distribution network double-time-scale optimal voltage control model as follows:
the constraint conditions to be satisfied are as follows:
P j (T,t)=P L,j (T,t)+P Batt,j (T,t)-P PV,j (T,t);
Q j (T,t)=Q L,j (T,t)-Q Cap,j (T,t)-Q PV,j (T,t);
wherein U is i (T, T) represents the magnitude of the voltage at bus node i,U j (T, T) represents the magnitude of the voltage at the busbar node j, T is a short time scale, T is a long time scale,mean value operation, N T Representing the number of long time scales, N t Represents the number of short time scales divided by a long time scale, N represents the number of bus nodes, a cap,i (T) is the ith capacitor control variable, representing the on-off state of the capacitor, a batt,i (T, T) is the control variable of the ith energy storage device at the moment T, a pv,i (T, T) is the control variable of the ith photovoltaic inverter at the moment T, Q Cap,i (T, T) is the reactive power at time T of the ith capacitor,/and>for the ith capacitor reactive power nameplate value, SOC i (T, T) is the state of charge at time T of the ith energy storage device, P Batt,i (T, T) is the charge-discharge power of the ith energy storage device at time T, +.>Maximum charge/discharge power of the ith energy storage device, SOC i,min 、SOC i,max For the minimum, maximum safety capacity of the ith energy storage device, < >>Rated capacity of ith photovoltaic inverter, Q PV,i (T, T) is the reactive output of the ith photovoltaic inverter at time T, < ->For the maximum reactive output value, P, at the moment t of the ith inverter PV,i (T, T) is the active output of the ith photovoltaic inverter at time T, P L,j (T, T) and Q L,j (T, T) represents the active load and the reactive load of the node j at the time T, I ij (T, T) is the current amplitude, r, on branch (i, j) at time T ij And x ij Is the resistance and reactance of branch (i, j), P ij (T,t) and Q ij (T, T) is the active and reactive power flowing from node i to node j at time T, ψ (j) and φ (j) are the parent and child busbar sets, respectively, { a cap (T) } represents the set of all capacitor control variables, { a batt (T, T) } represents the set of all energy storage device control variables, { a pv (T, T) } represents all photovoltaic inverter control variable sets.
Further, the conversion module is specifically used for,
defining capacitor control variables, energy storage device control variables and photovoltaic inverter control variables as actions in a Markov decision process, defining the running state of the power distribution network as the state in the Markov decision process, taking a voltage control target as rewards in the Markov decision process, and converting a voltage optimization control problem into the Markov decision process;
wherein the expected discount rewards after action a is taken at state s under policy μ in the markov decision process are represented by a Q function expressed as:
wherein s is t And a t State and operation at time t, Q μ (s t ,a t ) Represented in state s t Take action a down t Q value at the time, r k The prize value representing the kth transfer, gamma being the discount factor, T all For the length of each epoode in the training phase, s 0 And a 0 Is an initial value;
using the bellman equation, the Q function is expressed as:
solving the optimal strategy equivalent is solving an optimal Q function:
further, the agent module is specifically configured to,
the deep Q network agent is built based on capacitor control variables in a markov decision process as follows:
The state vector of the deep Q network agent is expressed as:
wherein s is cap (T) represents a state vector of deep Q network agents within a long time scale T,a vector representing the average active power composition of each busbar node over a long time scale T; a, a cap (T-1) represents a vector of individual capacitor control variables within the long time scale T-1;
the motion vector of the deep Q network agent is expressed as:
a cap (T)=[a cap,1 (T),a cap,2 (T),…,a cap,Ncap (T)]T,
wherein a is cap (T) represents the capacitor control variable vector, a, over a long time scale T cap,i (T) represents the ith capacitor control variable, a, over a long time scale T cap,i (T) ∈ {0,1}, i=1, 2, …, ncap being the total number of capacitors;
the rewards of the deep Q network agent are as follows:
wherein r is cap (T) represents rewards of deep Q network agents within a long time scale T,
the act of selecting a capacitor using an epsilon-greedy strategy is expressed as:
wherein a is T For the currently selected capacitor to operate, A is the operation space, Q μ (s T ,a T ;θ Q ) As Q function, θ Q For Q network parameters, subscript T denotes the current time interval, ε [0,1 ]]Is constant, beta.epsilon.0, 1]Randomly generating;
the depth deterministic strategy gradient agent is constructed based on the photovoltaic inverter control variable and the energy storage device control variable in the Markov decision process, and is as follows:
The depth deterministic strategy gradient agent adopts an actor network mu (s; theta) μ ) And critic network Q μ (s,a;θ Q ) Respectively simulating a strategy function and a Q function;
the state vector of the depth deterministic strategy gradient agent is expressed as:
s PVbatt (t)=[U T (t),SOC T (t)] T
wherein s is PVbatt (t) represents a state vector of the depth deterministic strategy gradient agent at the time t, U (t) represents a vector composed of voltage amplitude values of all bus nodes, and SOC (t) represents a vector composed of charge states of all energy storage devices at the time t;
the motion vector of the depth deterministic strategy gradient agent is expressed as:
wherein a is PVbatt (t) represents a depth deterministic strategy gradient at time tThe motion vector of the agent is calculated,vector representing the composition of the control variables of each photovoltaic inverter at time t +.>A represents a vector composed of control variables of each energy storage device at the moment t PV,i (t) represents the ith photovoltaic inverter control variable at time t, a batt,i (t) represents the ith energy storage device control variable at time t, N PV And N batt Respectively the total number of the photovoltaic inverter and the energy storage device, and a PV,i (t)∈[-1,1],a batt,i (t)∈[-1,1];
The rewards of the depth deterministic strategy gradient agent are as follows:
wherein r is PVbatt (t) rewards of depth deterministic strategy gradient agent at t moment, U i (t) represents the voltage amplitude of the i bus node at the time t;
the following policies are employed to select actions:
a t =μ(s t ;θ μ )+ξ t
Wherein a is t For the action selected at time t, θ μ Is an actor network parameter, ζ t Is random noise.
The invention has the beneficial effects that:
the invention provides a power distribution network voltage autonomous optimization control method based on deep reinforcement learning, which is based on the network access of a large number of distributed and controllable elements from the viewpoint of optimizing power distribution network voltage control, establishes a double-time scale voltage control model for various control devices, and provides a DRL algorithm based on DQN and DDPG to simultaneously process continuous and discrete voltage regulating devices so as to control voltage.
Drawings
FIG. 1 is a flow chart of a method for autonomous optimal control of power distribution network voltage based on deep reinforcement learning.
Fig. 2 is a system configuration diagram of an IEEE-123 node power distribution network in an embodiment of the present invention.
Fig. 3 is a graph of load and PV active output in a training set in an embodiment of the invention.
Fig. 4 is a graph of load and PV active output in a test set in an embodiment of the invention.
FIG. 5 is a graph of voltage amplitude under various voltage control methods in an embodiment of the invention; fig. 5 (a) is a voltage control curve of the bus bar 1, and fig. 5 (b) is a voltage control curve of the bus bar 24.
FIG. 6 is a graph of an epicode reward in an embodiment of the present invention.
Detailed Description
The invention is further described below. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.
The embodiment of the invention provides a power distribution network voltage autonomous optimization control method, which comprises the following steps:
constructing a power distribution network double-time-scale optimal voltage control model based on a capacitor bank, a photovoltaic inverter and various voltage regulating devices of an energy storage device;
converting the voltage control problem into a Markov Decision Process (MDP) based on the established voltage control model;
based on the obtained MDP process, distributing discrete control variables to Depth Q Network (DQN) agents, distributing continuous variables to depth deterministic strategy gradient (DDPG) agents, and solving the MDP process by using a depth reinforcement learning algorithm;
training DQN agents and DDPG agents simultaneously through interactions with the environment;
and inputting the states obtained from the environment into the trained DQN agent and the DDPG agent to obtain corresponding actions, thereby realizing voltage control.
In the embodiment of the invention, a power distribution network double-time scale optimal voltage control model based on a capacitor bank, a photovoltaic inverter and various voltage regulating devices of an energy storage device is constructed, and the method comprises the following steps of:
Active power adjustment is realized by adjusting the charge and discharge power of the stored energy, and reactive power adjustment is realized by adjusting the on-off state of a capacitor and the output of a photovoltaic inverter. The overall control process can be divided into long time scale control and short time scale control in consideration of response time and control cost of different devices. Specifically, the entire time period may be divided into N T Each interval can be further divided into N t And each time slot. On the long time scale T, the capacitor configuration is performed at the beginning of the time interval, while on the short time scale T the output of the photovoltaic inverter and the stored energy is adjusted at the beginning of each time slot.
On the basis, constructing a corresponding control device model as follows:
wherein a is cap,i (T) is the ith capacitor control variable and represents the on-off state of the capacitor; a, a batt,i (T, T) is the ith energy storage control variable; a, a pv,i (T, T) is the ith inverter control variable; q (Q) Cap,i (T, T) is the ith capacitor reactive power,the reactive power nameplate value of the ith capacitor; SOC (State of Charge) i (T, T) is the charge state of the ith stored energy, P Batt,i (T, T) is the charge-discharge power of the ith energy storage; />Maximum charge-discharge power for the ith energy storage, SOC i,min 、SOC i,max Minimum and maximum safe capacity for the ith stored energy; / >For the ith inverter rated capacity, Q PV,i (T, T) is the ith photovoltaic reactive output, P PV,i (T, T) is the ith photovoltaic active output, ">Is the maximum reactive output value of the ith inverter at this time.
Finally, according to the branch flow model, constructing a power distribution network double-time-scale optimal voltage control model, wherein the objective function is as follows:
the constraint conditions are as follows:
(1)-(3) (4b)
P j (T,t)=P L,j (T,t)+P Batt,j (T,t)-P PV,j (T,t) (4g)
Q j (T,t)=Q L,j (T,t)-Q Cap,j (T,t)-Q PV,j (T,t) (4h)
wherein { a }, a cap (T) } represents the set of all capacitor control variables, { a batt (T, T) } represents the set of all energy storage device control variables, { a pv (T, T) } represents all photovoltaic inverter control variable sets,mean value operation, N T Indicating the number of long time intervals divided into the whole time period, N t Represents the number of short time intervals divided into a long time interval, N represents the number of nodes, U i (T, T) represents the voltage amplitude at node i, U j (T, T) represents the voltage amplitude at node j, I ij (T, T) is the current amplitude, r, on branch (i, j) ij And x ij Is the resistance and reactance of branch (i, j), P ij (T, T) and Q ij (T, T) is the active and reactive power flowing from node i to node j, P L,j (T, T) and Q L,j (T, T) represents the active and reactive loads of node j at time T, and ψ (j) and φ (j) are the parent and child busbar sets of node j, respectively. In the above description, the node refers to a bus.
In the embodiment of the invention, based on the established voltage control model, the voltage control problem is converted into a Markov Decision Process (MDP), and the method specifically comprises the following steps:
MDP is defined as a tuple (S, A, P, R, gamma) for describing the interaction process between an agent (i.e. different controllers) and an environment (i.e. power flow environment of a power distribution network), where S is a state space, A is an action space, P is a state transition probability, usually unknown, R is a prize value for each transition, expressed as R t =R(s t ,a t ),γ∈[0,1]Is a discount factor. The goal of the voltage control problem is to solve the MDP, i.e., learn the optimal strategy for each agent to maximize the rewards associated with long-term average voltage deviations.
When solving for the MDP using a deep reinforcement learning algorithm, there are typically the following definitions: policy μ is defined as the mapping function of action a taken by the agent in state s, denoted μ (a|s). In this embodiment, the state s is the operation condition of the distribution network observed by the agents from the environment, and the state vector component units of each agent need to be set according to the target.
During the training phase, the expected discount rewards after action a is taken at state s under strategy μ is represented by the Q function, expressed as:
wherein s is t And a t State and operation at time t, Q μ (s t ,a t ) Represented in state s t Take action a down t Q value at the time, r k A prize value, T, representing the kth transfer all For the length of each epoode in the training phase, s 0 And a 0 Is in an initial state.
R in the above expression k I.e. based on a voltage control model, i.e. r k =-(U i (T,t)-1) 2 By training to make Q gradually larger, r k Gradually become larger, then (U) i (T,t)-1) 2 Gradually becomes smaller, and the control target for minimizing the voltage deviation is realized.
Using the bellman equation, the Q function can be further expressed as:
/>
then solve the optimal strategy mu * Equivalent is solving the optimal Q function:
in the embodiment of the invention, based on the obtained MDP process, the control variable is distributed to different agents according to the type of the control variable of the control equipment, namely, discrete control variable is distributed to DQN agents, and continuous control variable is distributed to DDPG agents, specifically comprising the following steps:
for discrete variables of the capacitor, a DQN agent is established. DQN uses a deep neural network (i.e., Q network) to model the Q function, denoted Q μ (s,a;θ Q ) The input is a state vector and the output is the Q value of all possible actions. Experience replay buffer D for storing experience e T =(s T ,a T ,r T ,s T+1 ) The minimum pool is used to store the experience of M random samples.
In order to update the parameters of the Q network, a target Q network is introduced, wherein the parameters are theta' Q . The parameters of the Q network are then updated based on the minimum pool and a loss function using random gradient descent (SGD), which can be expressed as:
where M is the number of experiences randomly sampled in the minimum pool, subscript i is the label of each experience in the minimum pool, and i.epsilon.1, M]A' represents that the agent is in state s i+1 Possible actions of r i Representing the prize value, parameter θ 'of the target Q network' Q By duplicating the Q network parameters θ for each B period Q To update, B is a self-set super parameter.
For a capacitor, its state vector consists of the actions of the various bus bars during T and T-1, i.e.:
the motion vector is defined as the configuration of the capacitor, expressed as:
wherein a is cap,i (T)∈{0,1},N cap For the total number of capacitors.
When the state is fed into the input layer and passes through the hidden layerThe output layer generates Q values of all specific configurations of the capacitors, and the output layer is composed ofThe individual neurons are constituted.
To meet control objectives, the reward is designed to be negative of all busbar voltage deviations, which can be expressed as:
the act of using the epsilon-greedy strategy to select a capacitor can be expressed as:
wherein epsilon [0,1] is a constant; beta epsilon [0,1] is randomly generated by the computer. When beta < epsilon, the intelligent agent randomly selects an action in the action space; otherwise, the agent selects the action with the maximum Q value in the current state.
And establishing a DDPG intelligent body aiming at continuous control variables of the inverter and the energy storage. DDPG uses two deep neural networks (i.e., actor network μ (s; θ) μ ) And critic network Q μ (s,a;θ Q ) To simulate the policy function and the Q function, respectively. Likewise, two target networks (i.e., target actor networks μ' (s; θ) μ' ) And target critic network Q' μ' (s,a;θ Q' ) An actor network and a critic network are updated.
During the training phase, the following formula is used to select the continuous action:
a t =μ(s t ;θ μ )+ξ t (7)
ξ t is random noise used to search for actions in the action space. The neural network is also updated with the empirical replay buffer and the minimum pool. The critic network is updated by minimizing the loss function as shown below:
the Actor network may use policy gradient updates, expressed as:
the target network uses the following formula to perform soft update:
wherein lambda is less than 1.
For the inverter and energy storage, the state vector of the inverter consists of the voltage amplitude values of all buses at the moment t and the charging state of an energy storage battery, namely:
s PVbatt (t)=[U T (t),SOC T (t)] T
the motion vector includes the motion variable of the inverter and the motion variable of the stored energy, namely:
wherein N is PV And N batt Respectively an inverter and the total energy storage quantity, and a PV,i (t)∈[-1,1],a batt,i (t)∈[-1,1]。
The rewarding value is as follows:
in addition, considering the capacity boundary of the stored energy, the action is adjusted by adopting a method of forced output constraint, namely:
When (when)Exceeding the upper and lower limits of the stored energy,
SOC (system on chip) of only punching (discharging) i,max -SOC i (t)(SOC i (t)-SOC i,min ) Is a power supply.
In the embodiment of the invention, based on the established DQN intelligent agent and DDPG intelligent agent, corresponding super parameters are designed, the DQN intelligent agent and the DDPG intelligent agent are trained simultaneously through interaction with the environment, a capacitor is configured at the beginning of a long time scale T, and the output of an energy storage and an inverter is controlled on a short time scale T; each agent is trained by using the training data set until a good reward is obtained and the reward value curve gradually converges and flattens. The specific training process is as follows:
in the embodiment of the invention, based on the trained DQN and DDPG intelligent agents, the trained intelligent agents in the execution stage obtain corresponding actions through states obtained from the environment so as to realize double-time scale voltage control. At this time, at the beginning of each T, the DQN agent selects an action having the largest Q value by the state obtained from the environmentAs a capacitor configuration; at the beginning of each t, the DDPG agent directly gets action through its obtained state, i.e. a PVbatt (t)=μ(s PVbatt (t);θ μ ) And further obtaining corresponding inverter and energy storage output.
Examples
The present embodiment uses the modified IEEE-123 bus system to analyze the validity and feasibility of the scheme. The method for autonomously optimizing and controlling the voltage of the power distribution network based on the deep reinforcement learning in the embodiment, referring to fig. 1, comprises the following steps:
Step 10), constructing a power distribution network double-time-scale optimal voltage control model based on a capacitor bank, a photovoltaic inverter and various voltage regulating devices of an energy storage device;
step 20) converting the voltage control problem into a Markov Decision Process (MDP) based on the voltage control model established in step 10);
step 30) assigning discrete control variables to Depth Q Network (DQN) agents, assigning continuous variables to Depth Deterministic Policy Gradient (DDPG) agents, and solving the MDP process using a depth reinforcement learning algorithm based on the MDP process obtained in step 20);
step 40) training the DQN and DDPG agents simultaneously by interaction with the environment based on the DQN agents and DDPG agents established in step 30);
step 50) based on the DQN and DDPG agents trained in step 40), the trained agents in the execution phase get corresponding actions from the states obtained from the environment to achieve voltage control.
In this embodiment, the IEEE-123 bus system is modified to be a balanced system and the bus numbers are rearranged as shown in FIG. 2. The rated voltage of the test feeder is 4.16kV, and the power reference value is 100MVA. Furthermore, 12 photovoltaic units were mounted on buses 24, 31, 39, 50, 63, 70, 79, 87, 92, 100, 106 and 113, with capacities of 300kVA,200kVA,400kVA,300kVA,400kVA, 200kVA,400kVA,300kVA, 200kVA, respectively. Each photovoltaic unit is equipped with a smart inverter. On buses 20, 59, 66, 114, 4 capacitors are mounted, each with a capacity of 40kvar. Meanwhile, 4 energy storage devices, the maximum capacity of which is 600kWh and the rated charge/discharge power of which is 100kW, are respectively installed on the bus 56,83,96,116. The load data is obtained by modifying the standardized load curve according to the actual load of a certain area of Jiangsu, and the standardized load curve is multiplied by different constants so that the load distribution of each bus is different from each other. 2880 sets of data, including load data and PV output, are used as training data, as shown in fig. 3. At the same time, another 288 sets of data were used as test data, as shown in fig. 4. All parameters in the system have been converted to per unit values.
This embodiment is implemented based on the Pytorch framework, the training process being executed on the CPU. For the DQN agent, 4 capacitors yield 16 possible capacitor configurations, so the Q network consists of 3 fully connected layers, including one input layer, two hidden layers of 95 and 22 neurons respectively, and one output layer of 16 neurons. A sigmoid function is used at the end of the output layer to keep the Q value between 0, 1. For DDPG agent, the actor and critic network also consists of three fully connected layers, the hidden layers are 90, 30 neurons and 46, 14 neurons respectively. While the output layer of the actor network has 16 neurons, and the output layer of the critic network has 1 neuron. the tanh function is applied at the end of the actor network to keep the action variable between [ -1,1 ]. All hidden layers use ReLU as an activation function. The following table is a detailed set of other hyper-parameters:
first, T is set to 30 minutes, T is set to 5 minutes, and the PV output is based on fig. 4. Based on the optimal power flow, the bus voltage distribution without any voltage control condition is analyzed. The most problematic voltages appear on bus 1, bus 2, bus 7 and bus 123, violating the usual maximum voltage line of 1.05. Taking the voltage amplitudes of the bus bar 1 and the bus bar 24 (the photovoltaic device is mounted) as an example, as shown by the black straight lines in fig. 5 (a) and 5 (b).
Secondly, the power distribution network voltage autonomous optimization control method based on deep reinforcement learning is applied to a control strategy for learning different voltage control devices. The DQN and DDPG agents were trained according to the procedure shown in algorithm 1. During the training process, the combination of daily photovoltaic power generation and load is randomly selected from the training set to represent different grid operating conditions. Training was performed for 300 epodes, each of which ended after 288 samples of a day had been trained. The epicode rewards are shown in figure 6. In the figure, the horizontal axis represents the number of epocodes, and the vertical axis represents the prize value. It can be seen that after about 60 epodes, the reward curve flattens out, indicating that the DQN and DDPG agents have been trained, with the ability to control voltage.
During the test phase, the configuration of the capacitors, the inverter and the energy storage output are controlled using the trained DQN and DDPG agents according to the test data in fig. 4. The control effect is shown as gray lines in fig. 5 (a) and 5 (b). It can be seen that these trained DRL agents demonstrate effective voltage control performance compared to the case without voltage control means, and all bus amplitudes remain within safety limits, especially for buses with voltage problems. Algorithm 1 enables the controller to explore the relationship between its configuration and the inherent uncertainties and variability of photovoltaic output and load power, and can take corresponding strategies in the face of new operating conditions.
Meanwhile, in order to check the effectiveness of the voltage control method of the present invention, it is compared with a two-stage optimal control scheme, as shown by gray straight lines and black broken lines in fig. 5 (a) and 5 (b). It can be seen that the control effect of the voltage control method of the present invention is similar to the two-stage preferred control scheme.
The time for calculating the solution of the two is shown in the following table, so that the time consumed by the method is shorter, and the requirement of real-time control can be met.
The invention also provides an autonomous power distribution network voltage optimization control device, which comprises:
the construction module is used for constructing a power distribution network double-time-scale optimal voltage control model based on various voltage regulating devices; the voltage regulating device comprises a capacitor bank, a photovoltaic inverter and an energy storage device;
the conversion module is used for converting the voltage optimization control problem into a Markov decision process based on the established power distribution network double-time-scale optimal voltage control model;
the intelligent agent module is used for constructing a depth Q network intelligent agent based on discrete control variables in the Markov decision process and constructing a depth deterministic strategy gradient intelligent agent based on continuous variables in the Markov decision process;
the training module is used for constructing a training set and a testing set based on historical state data of the voltage regulating device and training the depth Q network agent and the depth deterministic strategy gradient agent;
The method comprises the steps of,
and the optimization module is used for inputting the state data of the voltage regulating device obtained in real time into the trained deep Q network intelligent agent and the depth deterministic strategy gradient intelligent agent to obtain an output action and optimally controlling the voltage of the power distribution network.
In the embodiment of the invention, the construction module is specifically used for,
dividing the whole time period into a plurality of intervals, defining the intervals as long time scales, dividing each interval into a plurality of time slots, defining the time slots as short time scales, and establishing a power distribution network double-time-scale optimal voltage control model as follows:
the constraint conditions to be satisfied are as follows:
P j (T,t)=P L,j (T,t)+P Batt,j (T,t)-P PV,j (T,t);
Q j (T,t)=Q L,j (T,t)-Q Cap,j (T,t)-Q PV,j (T,t);
wherein U is i (T, T) represents the voltage amplitude at bus node i, U j (T, T) represents the magnitude of the voltage at the busbar node j, T is a short time scale, T is a long time scale,mean value operation, N T Representing the number of long time scales, N t Represents the number of short time scales divided by a long time scale, N represents the number of bus nodes, a cap,i (T) is the ith capacitor control variable, representing the on-off state of the capacitor, a batt,i (T, T) is the control variable of the ith energy storage device at the moment T, a pv,i (T, T) is the control variable of the ith photovoltaic inverter at the moment T, Q Cap,i (T, T) is the absence of the ith capacitor at time T Work power,/->For the ith capacitor reactive power nameplate value, SOC i (T, T) is the state of charge at time T of the ith energy storage device, P Batt,i (T, T) is the charge-discharge power of the ith energy storage device at time T, +.>Maximum charge/discharge power of the ith energy storage device, SOC i,min 、SOC i,max For the minimum, maximum safety capacity of the ith energy storage device, < >>Rated capacity of ith photovoltaic inverter, Q PV,i (T, T) is the reactive output of the ith photovoltaic inverter at time T, < ->For the maximum reactive output value, P, at the moment t of the ith inverter PV,i (T, T) is the active output of the ith photovoltaic inverter at time T, P L,j (T, T) and Q L,j (T, T) represents the active load and the reactive load of the node j at the time T, I ij (T, T) is the current amplitude, r, on branch (i, j) at time T ij And x ij Is the resistance and reactance of branch (i, j), P ij (T, T) and Q ij (T, T) is the active and reactive power flowing from node i to node j at time T, ψ (j) and φ (j) are the parent and child busbar sets, respectively, { a cap (T) } represents the set of all capacitor control variables, { a batt (T, T) } represents the set of all energy storage device control variables, { a pv (T, T) } represents all photovoltaic inverter control variable sets.
In the embodiment of the invention, the conversion module is specifically used for,
defining capacitor control variables, energy storage device control variables and photovoltaic inverter control variables as actions in a Markov decision process, defining the running state of the power distribution network as the state in the Markov decision process, taking a voltage control target as rewards in the Markov decision process, and converting a voltage optimization control problem into the Markov decision process;
Wherein the expected discount rewards after action a is taken at state s under policy μ in the markov decision process are represented by a Q function expressed as:
wherein s is t And a t State and operation at time t, Q μ (s t ,a t ) Represented in state s t Take action a down t Q value at the time, r k The prize value representing the kth transfer, gamma being the discount factor, T all For the length of each epoode in the training phase, s 0 And a 0 Is an initial value;
using the bellman equation, the Q function is expressed as:
solving the optimal strategy equivalent is solving an optimal Q function:
in the embodiment of the invention, the intelligent agent module is particularly used for,
the deep Q network agent is built based on capacitor control variables in a markov decision process as follows:
the state vector of the deep Q network agent is expressed as:
wherein s is cap (T) represents a state vector of deep Q network agents within a long time scale T,a vector representing the average active power composition of each busbar node over a long time scale T; a, a cap (T-1) represents a vector of individual capacitor control variables within the long time scale T-1;
the motion vector of the deep Q network agent is expressed as:
a cap (T)=[a cap,1 (T),a cap,2 (T),…,a cap,Ncap (T)] T
wherein a is cap (T) represents the capacitor control variable vector, a, over a long time scale T cap,i (T) represents the ith capacitor control variable, a, over a long time scale T cap,i (T) ∈ {0,1}, i=1, 2, …, ncap being the total number of capacitors;
the rewards of the deep Q network agent are as follows:
wherein r is cap (T) represents rewards of deep Q network agents within a long time scale T,
the act of selecting a capacitor using an epsilon-greedy strategy is expressed as:
wherein a is T For the currently selected capacitor to operate, A is the operation space, Q μ (s T ,a T ;θ Q ) As Q function, θ Q For Q network parameters, subscript T denotes the current time interval, ε [0,1 ]]Is constant, beta.epsilon.0, 1]Randomly generating;
the depth deterministic strategy gradient agent is constructed based on the photovoltaic inverter control variable and the energy storage device control variable in the Markov decision process, and is as follows:
depth deterministic strategy gradient intelligenceThe body adopts an actor network mu (s; theta) μ ) And critic network Q μ (s,a;θ Q ) Respectively simulating a strategy function and a Q function;
the state vector of the depth deterministic strategy gradient agent is expressed as:
s PVbatt (t)=[U T (t),SOC T (t)] T
wherein s is PVbatt (t) represents a state vector of the depth deterministic strategy gradient agent at the time t, U (t) represents a vector composed of voltage amplitude values of all bus nodes, and SOC (t) represents a vector composed of charge states of all energy storage devices at the time t;
The motion vector of the depth deterministic strategy gradient agent is expressed as:
wherein a is PVbatt (t) represents the motion vector of the depth deterministic strategy gradient agent at time t,vector representing the composition of the control variables of each photovoltaic inverter at time t +.>A represents a vector composed of control variables of each energy storage device at the moment t PV,i (t) represents the ith photovoltaic inverter control variable at time t, a batt,i (t) represents the ith energy storage device control variable at time t, N PV And N batt Respectively the total number of the photovoltaic inverter and the energy storage device, and a PV,i (t)∈[-1,1],a batt,i (t)∈[-1,1];
The rewards of the depth deterministic strategy gradient agent are as follows:
wherein r is PVbatt (t) rewards of depth deterministic strategy gradient agent at t moment, U i (t) represents the voltage amplitude of the i bus node at the time t;
the following policies are employed to select actions:
a t =μ(s t ;θ μ )+ξ t
wherein a is t For the action selected at time t, θ μ Is an actor network parameter, ζ t Is random noise.
It should be noted that the embodiment of the apparatus corresponds to the embodiment of the method, and the implementation manner of the embodiment of the method is applicable to the embodiment of the apparatus and can achieve the same or similar technical effects, so that the description thereof is omitted herein.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims (9)

1. The autonomous optimal control method for the voltage of the power distribution network is characterized by comprising the following steps of:
converting the voltage optimization control problem into a Markov decision process based on a pre-established power distribution network double-time-scale optimal voltage control model; the power distribution network double-time scale optimal voltage control model is as follows:
Dividing the whole time period into a plurality of intervals, defining the intervals as long time scales, dividing each interval into a plurality of time slots, defining the time slots as short time scales, and establishing a power distribution network double-time-scale optimal voltage control model as follows:
the constraint conditions to be satisfied are as follows:
P j (T,t)=P L,j (T,t)+P Batt,j (T,t)-P PV,j (T,t);
Q j (T,t)=Q L,j (T,t)-Q Cap,j (T,t)-Q PV,j (T,t);
wherein U is i (T, T) represents the voltage amplitude at bus node i, U j (T, T) represents the magnitude of the voltage at the busbar node j, T is a short time scale, T is a long time scale,mean value operation, N T Representing the number of long time scales, N t Represents the number of short time scales divided by a long time scale, N represents the number of bus nodes, a cap,i (T) is the ith capacitor control variable, representing the on-off state of the capacitor, a batt,i (T, T) is the control variable of the ith energy storage device at the moment T, a pv,i (T, T) is the control variable of the ith photovoltaic inverter at the moment T, Q Cap,i (T, T) is the reactive power at time T of the ith capacitor,/and>for the ith capacitor reactive power nameplate value, SOC i (T, T) is the state of charge at time T of the ith energy storage device, P Batt,i (T, T) is the charge-discharge power of the ith energy storage device at time T, +.>Maximum charge/discharge power of the ith energy storage device, SOC i,min 、SOC i,max For the minimum, maximum safety capacity of the ith energy storage device, < > >Rated capacity of ith photovoltaic inverter, Q PV,i (T, T) is the reactive output of the ith photovoltaic inverter at time T, < ->For the maximum reactive output value, P, at the moment t of the ith inverter PV,i (T, T) is the time T of the ith photovoltaic inverterActive output, P L,j (T, T) and Q L,j (T, T) represents the active load and the reactive load of the node j at the time T, I ij (T, T) is the current amplitude, r, on branch (i, j) at time T ij And x ij Is the resistance and reactance of branch (i, j), P ij (T, T) and Q ij (T, T) is the active and reactive power flowing from node i to node j at time T, ψ (j) and φ (j) are the parent and child busbar sets, respectively, { a cap (T) } represents the set of all capacitor control variables, { a batt (T, T) } represents the set of all energy storage device control variables, { a pv (T, T) } represents all photovoltaic inverter control variable sets;
constructing a depth Q network agent based on discrete control variables in the Markov decision process, and constructing a depth deterministic strategy gradient agent based on continuous variables in the Markov decision process; the discrete control variable in the Markov decision process is a capacitor control variable, and the depth Q network intelligent agent is constructed based on the discrete control variable in the Markov decision process, and comprises the following components:
The state vector of the deep Q network agent is expressed as:
wherein s is cap (T) represents a state vector of deep Q network agents within a long time scale T,a vector representing the average active power composition of each busbar node over a long time scale T; a, a cap (T-1) represents a vector of individual capacitor control variables within the long time scale T-1;
the motion vector of the deep Q network agent is expressed as:
a cap (T)=[a cap,1 (T),a cap,2 (T),…,a cap,Ncap (T)] T
wherein a is cap (T) Representing the capacitor control variable vector, a, over a long time scale, T cap,i (T) represents the ith capacitor control variable, a, over a long time scale T cap,i (T) ∈ {0,1}, i=1, 2, …, ncap being the total number of capacitors;
the rewards of the deep Q network agent are as follows:
wherein r is cap (T) represents rewards of deep Q network agents within a long time scale T,
the act of selecting a capacitor using an epsilon-greedy strategy is expressed as:
wherein a is T For the currently selected capacitor to operate, A is the operation space, Q μ (s T ,a T ;θ Q ) As Q function, s T θ for the currently selected capacitor state Q For Q network parameters, subscript T denotes the current time interval, ε [0,1 ]]Is constant, beta.epsilon.0, 1]Randomly generating;
constructing a training set and a testing set based on historical state data of a voltage regulating device, and training the deep Q network intelligent agent and the depth deterministic strategy gradient intelligent agent, wherein the voltage regulating device comprises a capacitor bank, a photovoltaic inverter and an energy storage device;
And inputting the state data of the voltage regulating device obtained in real time into a trained deep Q network intelligent agent and a trained deep deterministic strategy gradient intelligent agent to obtain an output action, and optimally controlling the voltage of the power distribution network.
2. The method for autonomous optimal control of a voltage of a power distribution network according to claim 1, wherein the converting the voltage optimal control problem into a markov decision process comprises:
defining capacitor control variables, energy storage device control variables and photovoltaic inverter control variables as actions in a Markov decision process, defining the running state of the power distribution network as the state in the Markov decision process, taking a voltage control target as rewards in the Markov decision process, and converting a voltage optimization control problem into the Markov decision process;
wherein the expected discount rewards after action a is taken at state s under policy μ in the markov decision process are represented by a Q function expressed as:
wherein s is t And a t State and operation at time t, Q μ (s t ,a t ) Represented in state s t Take action a down t Q value at the time, r k The prize value representing the kth transfer, gamma being the discount factor, T all For the length of each epoode in the training phase, s 0 And a 0 Is an initial value;
using the bellman equation, the Q function is expressed as:
solving the optimal strategy equivalent is solving an optimal Q function:
3. the autonomous optimization control method for power distribution network voltage according to claim 1, wherein a random gradient descent method is adopted in the training process, and the Q network parameters of the deep Q network agent are updated based on a minimum pool and a loss function, wherein the loss function is expressed as:
wherein L (θ) Q ) Representing the loss function, M is the number of experiences randomly sampled in the minimum pool, subscript i is the label of each experience in the minimum pool, and i.e. [1, M]A' represents the state s of the deep Q network agent i+1 Action of the lower part, r i Indicating the prize value, θ' Q Is a parameter of the introduced target Q network.
4. The method for autonomous optimization control of power distribution network voltage according to claim 2, wherein the continuous control variables in the markov decision process are a photovoltaic inverter control variable and an energy storage device control variable, and the depth deterministic strategy gradient agent is constructed based on the continuous control variables in the markov decision process, comprising:
the depth deterministic strategy gradient agent adopts an actor network mu (s; theta) μ ) And critic network Q μ (s,a;θ Q ) Respectively simulating a strategy function and a Q function;
the state vector of the depth deterministic strategy gradient agent is expressed as:
s PVbatt (t)=[U T (t),SOC T (t)] T
wherein s is PVbatt (t) represents a state vector of the depth deterministic strategy gradient agent at the time t, U (t) represents a vector composed of voltage amplitude values of all bus nodes, and SOC (t) represents a vector composed of charge states of all energy storage devices at the time t;
the motion vector of the depth deterministic strategy gradient agent is expressed as:
wherein a is PVbatt (t) represents the motion vector of the depth deterministic strategy gradient agent at time t,vector representing the composition of the control variables of each photovoltaic inverter at time t +.>A vector representing the composition of the control variables of each energy storage device at time t,representing the ith photovoltaic inverter control variable at time t, a batt,i (t) represents the ith energy storage device control variable at time t, N PV And N batt Respectively the total number of the photovoltaic inverter and the energy storage device, and a PV,i (t)∈[-1,1],a batt,i (t)∈[-1,1];
The rewards of the depth deterministic strategy gradient agent are as follows:
wherein r is PVbatt (t) rewards of depth deterministic strategy gradient agent at t moment, U i (t) represents the voltage amplitude of the i bus node at the time t;
the following policies are employed to select actions:
a t =μ(s t ;θ μ )+ξ t
wherein a is t For the action selected at time t, θ μ Is an actor network parameter, ζ t Is random noise.
5. The method of autonomous optimization control of distribution network voltage according to claim 4, wherein the critical network is updated during training by minimizing a loss function as follows:
wherein L (θ) Q ) Representing the loss function, gamma being the discount factor, theta Q For critic network parameters, M is the number of experiences randomly sampled in the minimum pool, subscript i is the label of each experience in the minimum pool, and i.e. [1, M],r i Represents the prize value, μ' (s; θ) μ' ) For incoming target actor network, θ μ' For the target actor network parameters, Q' μ' (s,a;θ Q' ) For the introduced target critic network, θ Q' The target critic network parameters;
updating an Actor network using a policy gradient, expressed as:
updating the target network using:
wherein lambda is less than 1.
6. An autonomous optimization control device for power distribution network voltage, characterized in that it is used for implementing the autonomous optimization control method for power distribution network voltage according to any one of claims 1 to 5, and the device comprises:
the conversion module is used for converting the voltage optimization control problem into a Markov decision process based on a pre-established power distribution network double-time-scale optimal voltage control model;
the intelligent agent module is used for constructing a depth Q network intelligent agent based on discrete control variables in the Markov decision process and constructing a depth deterministic strategy gradient intelligent agent based on continuous variables in the Markov decision process;
The training module is used for constructing a training set and a testing set based on historical state data of the voltage regulating device and training the depth Q network agent and the depth deterministic strategy gradient agent; the voltage regulating device comprises a capacitor bank, a photovoltaic inverter and an energy storage device;
the method comprises the steps of,
and the optimization module is used for inputting the state data of the voltage regulating device obtained in real time into the trained deep Q network intelligent agent and the depth deterministic strategy gradient intelligent agent to obtain an output action and optimally controlling the voltage of the power distribution network.
7. The power distribution network voltage autonomous optimization control device in accordance with claim 6, further comprising a construction module,
the building block is in particular intended for use,
dividing the whole time period into a plurality of intervals, defining the intervals as long time scales, dividing each interval into a plurality of time slots, defining the time slots as short time scales, and establishing a power distribution network double-time-scale optimal voltage control model as follows:
the constraint conditions to be satisfied are as follows:
P j (T,t)=P L,j (T,t)+P Batt,j (T,t)-P PV,j (T,t);
Q j (T,t)=Q L,j (T,t)-Q Cap,j (T,t)-Q PV,j (T,t);
wherein U is i (T, T) represents the voltage amplitude at bus node i, U j (T, T) represents the magnitude of the voltage at the busbar node j, T is a short time scale, T is a long time scale,mean value operation, N T Representing the number of long time scales, N t Represents the number of short time scales divided by a long time scale, N represents the number of bus nodes, a cap,i (T) is the ith capacitor control variable, representing the on-off state of the capacitor, a batt,i (T, T) is the control variable of the ith energy storage device at the moment T, a pv,i (T, T) is the control variable of the ith photovoltaic inverter at the moment T, Q Cap,i (T, T) is the reactive power at time T of the ith capacitor,/and>for the ith capacitor reactive power nameplate value, SOC i (T, T) is the state of charge at time T of the ith energy storage device, P Batt,i (T, T) is the charge-discharge power of the ith energy storage device at time T, +.>Maximum charge/discharge power of the ith energy storage device, SOC i,min 、SOC i,max For the minimum, maximum safety capacity of the ith energy storage device, < >>Rated capacity of ith photovoltaic inverter, Q PV,i (T, T) is the reactive output of the ith photovoltaic inverter at time T, < ->For the maximum reactive output value, P, at the moment t of the ith inverter PV,i (T, T) is the active output of the ith photovoltaic inverter at time T, P L,j (T, T) and Q L,j (T, T) represents the active load and the reactive load of the node j at the time T, I ij (T, T) is the current amplitude, r, on branch (i, j) at time T ij And x ij Is the resistance and reactance of branch (i, j), P ij (T, T) and Q ij (T, T) is the active and reactive power flowing from node i to node j at time T, ψ (j) and φ (j) are the parent and child busbar sets, respectively, { a cap (T) } represents the set of all capacitor control variables, { a batt (T, T) } represents the set of all energy storage device control variables, { a pv (T, T) } represents all photovoltaic inverter control variable sets.
8. The power distribution network voltage autonomous optimization control device in accordance with claim 7, wherein the conversion module is specifically configured to,
defining capacitor control variables, energy storage device control variables and photovoltaic inverter control variables as actions in a Markov decision process, defining the running state of the power distribution network as the state in the Markov decision process, taking a voltage control target as rewards in the Markov decision process, and converting a voltage optimization control problem into the Markov decision process;
wherein the expected discount rewards after action a is taken at state s under policy μ in the markov decision process are represented by a Q function expressed as:
wherein s is t And a t State and operation at time t, Q μ (s t ,a t ) Represented in state s t Take action a down t Q value at the time, r k The prize value representing the kth transfer, gamma being the discount factor, T all For the length of each epoode in the training phase, s 0 And a 0 Is an initial value;
using the bellman equation, the Q function is expressed as:
solving the optimal strategy equivalent is solving an optimal Q function:
9. The power distribution network voltage autonomous optimization control device in accordance with claim 8, wherein the intelligent agent module is specifically configured to,
the deep Q network agent is built based on capacitor control variables in a markov decision process as follows:
the state vector of the deep Q network agent is expressed as:
wherein s is cap (T) represents a state vector of deep Q network agents within a long time scale T,a vector representing the average active power composition of each busbar node over a long time scale T; a, a cap (T-1) represents a vector of individual capacitor control variables within the long time scale T-1;
the motion vector of the deep Q network agent is expressed as:
a cap (T)=[a cap,1 (T),a cap,2 (T),…,a cap,Ncap (T)] T
wherein a is cap (T) represents the capacitor control variable vector, a, over a long time scale T cap,i (T) represents the ith capacitor control variable, a, over a long time scale T cap,i (T) ∈ {0,1}, i=1, 2, ncap is the total number of capacitors;
the rewards of the deep Q network agent are as follows:
wherein r is cap (T) represents rewards of deep Q network agents within a long time scale T,
the act of selecting a capacitor using an epsilon-greedy strategy is expressed as:
wherein a is T For the currently selected capacitor to operate, A is the operation space, Q μ (s T ,a T ;θ Q ) As Q function, s T θ for the currently selected capacitor state Q For Q network parameters, subscript T denotes the current time interval, ε [0,1 ]]Is constant, beta.epsilon.0, 1]Randomly generating;
the depth deterministic strategy gradient agent is constructed based on the photovoltaic inverter control variable and the energy storage device control variable in the Markov decision process, and is as follows:
the depth deterministic strategy gradient agent adopts an actor network mu (s; theta) μ ) And critic network Q μ (s,a;θ Q ) Respectively simulating a strategy function and a Q function;
the state vector of the depth deterministic strategy gradient agent is expressed as:
s PVbatt (t)=[U T (t),SOC T (t)] T
wherein s is PVbatt (t) represents a state vector of the depth deterministic strategy gradient agent at the time t, U (t) represents a vector composed of voltage amplitude values of all bus nodes, and SOC (t) represents a vector composed of charge states of all energy storage devices at the time t;
the motion vector of the depth deterministic strategy gradient agent is expressed as:
wherein a is PVbatt (t) represents the motion vector of the depth deterministic strategy gradient agent at time t,representing each photovoltaic inverter control at time tVector of variables>A vector representing the composition of the control variables of each energy storage device at time t,representing the ith photovoltaic inverter control variable at time t, a batt,i (t) represents the ith energy storage device control variable at time t, N PV And N batt The total number of the photovoltaic inverter and the energy storage device are respectively, and +.>a batt,i (t)∈[-1,1];
The rewards of the depth deterministic strategy gradient agent are as follows:
wherein r is PVbatt (t) rewards of depth deterministic strategy gradient agent at t moment, U i (t) represents the voltage amplitude of the i bus node at the time t;
the following policies are employed to select actions:
a t =μ(s t ;θ μ )+ξ t
wherein a is t For the action selected at time t, θ μ Is an actor network parameter, ζ t Is random noise.
CN202111054034.5A 2021-09-09 2021-09-09 Autonomous optimization control method and device for power distribution network voltage Active CN113872213B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111054034.5A CN113872213B (en) 2021-09-09 2021-09-09 Autonomous optimization control method and device for power distribution network voltage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111054034.5A CN113872213B (en) 2021-09-09 2021-09-09 Autonomous optimization control method and device for power distribution network voltage

Publications (2)

Publication Number Publication Date
CN113872213A CN113872213A (en) 2021-12-31
CN113872213B true CN113872213B (en) 2023-08-29

Family

ID=78994982

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111054034.5A Active CN113872213B (en) 2021-09-09 2021-09-09 Autonomous optimization control method and device for power distribution network voltage

Country Status (1)

Country Link
CN (1) CN113872213B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114725936B (en) * 2022-04-21 2023-04-18 电子科技大学 Power distribution network optimization method based on multi-agent deep reinforcement learning
CN116388280A (en) * 2023-06-02 2023-07-04 电力规划总院有限公司 Comprehensive energy system voltage control method and system based on deep reinforcement learning algorithm

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112994022A (en) * 2021-03-16 2021-06-18 南京邮电大学 Source-storage-load distributed cooperative voltage control method and system thereof
CN113363997A (en) * 2021-05-28 2021-09-07 浙江大学 Reactive voltage control method based on multi-time scale and multi-agent deep reinforcement learning
CN113363998A (en) * 2021-06-21 2021-09-07 东南大学 Power distribution network voltage control method based on multi-agent deep reinforcement learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200119556A1 (en) * 2018-10-11 2020-04-16 Di Shi Autonomous Voltage Control for Power System Using Deep Reinforcement Learning Considering N-1 Contingency
US20210143639A1 (en) * 2019-11-08 2021-05-13 Global Energy Interconnection Research Institute Co. Ltd Systems and methods of autonomous voltage control in electric power systems

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112994022A (en) * 2021-03-16 2021-06-18 南京邮电大学 Source-storage-load distributed cooperative voltage control method and system thereof
CN113363997A (en) * 2021-05-28 2021-09-07 浙江大学 Reactive voltage control method based on multi-time scale and multi-agent deep reinforcement learning
CN113363998A (en) * 2021-06-21 2021-09-07 东南大学 Power distribution network voltage control method based on multi-agent deep reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度确定策略梯度算法的主动配电网协调优化;龚锦霞;刘艳敏;;电力系统自动化(第06期);全文 *

Also Published As

Publication number Publication date
CN113872213A (en) 2021-12-31

Similar Documents

Publication Publication Date Title
Li et al. Coordinated load frequency control of multi-area integrated energy system using multi-agent deep reinforcement learning
CN113363997B (en) Reactive voltage control method based on multi-time scale and multi-agent deep reinforcement learning
Gorostiza et al. Deep reinforcement learning-based controller for SOC management of multi-electrical energy storage system
Sun et al. A customized voltage control strategy for electric vehicles in distribution networks with reinforcement learning method
Li et al. Learning to operate distribution networks with safe deep reinforcement learning
CN113363998B (en) Power distribution network voltage control method based on multi-agent deep reinforcement learning
US20210143639A1 (en) Systems and methods of autonomous voltage control in electric power systems
CN112615379A (en) Power grid multi-section power automatic control method based on distributed multi-agent reinforcement learning
Erlich et al. Optimal dispatch of reactive sources in wind farms
CN112507614B (en) Comprehensive optimization method for power grid in distributed power supply high-permeability area
CN114362196B (en) Multi-time-scale active power distribution network voltage control method
CN113872213B (en) Autonomous optimization control method and device for power distribution network voltage
CN107516892A (en) The method that the quality of power supply is improved based on processing active optimization constraints
Li et al. Day-ahead optimal dispatch strategy for active distribution network based on improved deep reinforcement learning
CN110165714A (en) Micro-capacitance sensor integration scheduling and control method, computer readable storage medium based on limit dynamic programming algorithm
Yin et al. Expandable deep width learning for voltage control of three-state energy model based smart grids containing flexible energy sources
Yin et al. Sequential reconfiguration of unbalanced distribution network with soft open points based on deep reinforcement learning
CN115313403A (en) Real-time voltage regulation and control method based on deep reinforcement learning algorithm
CN115588998A (en) Graph reinforcement learning-based power distribution network voltage reactive power optimization method
CN113097994A (en) Power grid operation mode adjusting method and device based on multiple reinforcement learning agents
Liu et al. An AGC dynamic optimization method based on proximal policy optimization
Liu et al. Deep reinforcement learning-based voltage control method for distribution network with high penetration of renewable energy
Yang et al. Genetic Algorithm for PI Controller Design of Grid-connected Inverter based on Multilayer Perceptron Model
CN114298429A (en) Power distribution network scheme aided decision-making method, system, device and storage medium
Wang et al. Real-time excitation control-based voltage regulation using ddpg considering system dynamic performance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant