CN111799808A - Power grid reactive voltage distributed control method and system - Google Patents

Power grid reactive voltage distributed control method and system Download PDF

Info

Publication number
CN111799808A
CN111799808A CN202010581959.4A CN202010581959A CN111799808A CN 111799808 A CN111799808 A CN 111799808A CN 202010581959 A CN202010581959 A CN 202010581959A CN 111799808 A CN111799808 A CN 111799808A
Authority
CN
China
Prior art keywords
reactive voltage
reactive
neural network
region
grid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010581959.4A
Other languages
Chinese (zh)
Other versions
CN111799808B (en
Inventor
吴文传
刘昊天
孙宏斌
王彬
郭庆来
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202010581959.4A priority Critical patent/CN111799808B/en
Publication of CN111799808A publication Critical patent/CN111799808A/en
Application granted granted Critical
Publication of CN111799808B publication Critical patent/CN111799808B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/12Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load
    • H02J3/16Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load by adjustment of reactive power
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J13/00Circuit arrangements for providing remote indication of network conditions, e.g. an instantaneous record of the open or closed condition of each circuitbreaker in the network; Circuit arrangements for providing remote control of switching means in a power distribution network, e.g. switching in and out of current consumers by using a pulse code signal carried by the network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/04Power grid distribution networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/10Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/30Reactive power compensation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/70Smart grids as climate change mitigation technology in the energy generation sector
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The invention provides a power grid reactive voltage distributed control method, which comprises the following steps: according to the whole reactive voltage control target and the optimization model of the controlled power grid, formulating reactive voltage control targets of all controlled areas and establishing a reactive voltage optimization model; constructing a multi-agent interactive training framework based on the Markov game by combining the actual configuration conditions of the optimization model and the power grid; initializing each neural network and relevant control process variables and issuing the neural networks and relevant control process variables to each control area; the local controllers in all the areas execute control steps in parallel according to the received strategy neural network; the local controllers in all the areas execute the step of uploading samples in parallel, and upload the measurement samples to the cloud server; the cloud server learns the strategies of all the controllers in parallel and issues the updated strategies to all the regional controllers. The invention realizes the flexible control of reactive voltage and the optimal control under the condition of incomplete model.

Description

Power grid reactive voltage distributed control method and system
Technical Field
The invention belongs to the technical field of operation and control of power systems, and particularly relates to a power grid reactive voltage distributed control method and system.
Background
Under the promotion of energy and environmental problems, the proportion of clean and dispersed renewable energy (DG for short) in a power grid is increased day by day, and large-scale and high-permeability DG power generation grid connection becomes the leading edge and hot spot of the energy and power field. Due to the large dispersion and strong fluctuation of the DG amount, a series of negative effects are brought on the aspects of voltage quality, scheduling operation and the like of a power distribution network and even a power transmission network. The DGs are usually connected to the grid through power electronic inverters, and have flexible and high-speed regulation capacity. In order to efficiently control the DG and improve the voltage quality of the high permeability power grid, reactive voltage control has become an important issue for the regulation and control operation of the power grid. In a traditional power grid, reactive voltage control is usually realized by adopting a centralized optimization method based on a power grid model, and the loss of a controlled power grid is improved while voltage out-of-limit is eliminated.
However, the centralized optimization control method often has the key problems of single point failure, high communication and calculation burden, serious influence of communication delay and the like. Particularly, in a high-permeability power grid, controlled DGs are numerous, and the network structure is complex, so that a centralized control method is severely limited, and high-speed resources cannot be reasonably regulated and controlled. Therefore, a series of distributed reactive voltage control methods are developed, and compared with a centralized method, the distributed method is often weaker in requirement on communication conditions and faster in control speed.
However, the existing distributed control usually adopts a model-based optimization method, because an ideal model of the power grid is difficult to obtain, the model-based optimization method cannot guarantee the control effect, the existing distributed control optimization method usually has the situations that the control instruction is far away from the optimal point and the power grid operates in a suboptimal state, and the requirements of high-efficiency and safe control are more difficult to meet under a continuous online operation scene.
Therefore, it is an urgent technical problem to be solved in the art to provide a method for controlling reactive voltage of a power grid with high safety, high efficiency and high flexibility.
Disclosure of Invention
In order to solve the above problems, the present invention provides a distributed control method for reactive voltage of a power grid, comprising:
step 1: according to the whole reactive voltage control target and the optimization model of the controlled power grid, formulating reactive voltage control targets of all controlled areas and establishing a reactive voltage optimization model;
step 2: constructing a multi-agent interactive training framework based on the Markov game by combining the actual configuration conditions of the optimization model and the power grid;
and step 3: initializing each neural network and relevant control process variables and issuing the neural networks and relevant control process variables to each control area;
and 4, step 4: the local controllers in all the areas execute control steps in parallel according to the received strategy neural network;
and 5: the local controllers in all the areas execute the step of uploading samples in parallel, and upload the measurement samples to the cloud server;
step 6: the cloud server parallelly learns the strategies of all the controllers and issues the updated strategies to all the regional controllers;
and 7: and (5) repeatedly executing the steps 4, 5 and 6.
Further, the step 1 comprises:
step 1-1: establishing a whole reactive voltage control target and optimization model of the controlled power grid:
Figure BDA0002552639940000021
wherein the content of the first and second substances,
Figure BDA0002552639940000022
is a collection of all nodes of the grid, VjIs the voltage amplitude of node j; pjIs the active power output of node j; qGjDG reactive power output for node j; qCjSVC reactive power output for node j;
Figure BDA0002552639940000023
at voltages of node j respectivelyA limit and an upper limit;
Figure BDA0002552639940000024
respectively is the lower limit and the upper limit of the SVC reactive power output of the node j; sGj,PGjDG installed capacity and active power output, respectively, for node j;
step 1-2: splitting the reactive voltage control target and the optimization model to form reactive voltage control targets and optimization models of each controlled area:
Figure BDA0002552639940000031
wherein the content of the first and second substances,
Figure BDA0002552639940000032
for the complete set of nodes for the ith region,
Figure BDA0002552639940000033
the network output power for the ith zone.
Further, step 2 comprises:
step 2-1: corresponding to the system measurement of each region, construct the observation variable o of each regioni,t
Figure BDA0002552639940000034
Wherein P isi,QiInjecting vectors formed by active power and reactive power into each node of the ith area; viA vector formed by voltages of all nodes in the ith area;
Figure BDA0002552639940000035
outputting active power and reactive power for the network of the ith area; t is a discrete time variable of the control process;
step 2-2: corresponding to the reactive voltage optimization target of each region, establishing a uniform feedback variable r of each regiont
Figure BDA0002552639940000036
PjIs the active power output of the node j,
Figure BDA0002552639940000037
outputting active power for the network of the area i;
step 2-3: corresponding to the reactive voltage optimization constraint of each region, constructing constraint feedback variables of each region
Figure BDA0002552639940000038
Figure BDA0002552639940000039
Wherein [ x ]]+=max(0,x);βiIs the cooperation coefficient of the i-th area, Vj(t) is the voltage at node j at time t,
Figure BDA00025526399400000310
the upper limit of the voltage is represented,Vis the upper voltage limit;
step 2-4: corresponding to the reactive power of the controllable flexible resources, constructing action variables a of each areai,t
ai,t=(QGi,QCi)t(0.25)
Wherein Q isGi,QCiThe vectors of the DG and SVC reactive power output of the ith area are respectively.
Further, the step 3 comprises:
step 3-1: initializing each neural network and relevant control process variables and issuing the neural networks and relevant control process variables to each control area;
step 3-2: initializing each region Lagrange multiplier lambdaiIs a scalar;
step 3-3: issuing an initial strategy neural network through a communication network
Figure BDA0002552639940000041
And
Figure BDA0002552639940000042
a controller to zone i;
step 3-4: initializing a discrete time variable t as 0, wherein the actual time interval between two steps is delta t;
step 3-5: initialization policy update period TuFor every TuPerforming strategy updating once at the delta t time;
step 3-6: initialization sample upload period TsThe ratio of m to sample upload is equal to [1, T ∈s]For every TsEach controller of delta t uploads a sample once and uploads m samples in the previous uploading period;
step 3-7: initializing cloud server experience bases
Figure BDA0002552639940000043
Local caching experience base of each controller
Figure BDA0002552639940000044
Further, the step 3-1 comprises:
step 3-1-1: defining a neural network
Figure BDA0002552639940000045
Is an input (o)i,t,ai,t) A neural network outputting a single scalar value; the activation function is a ReLU function; note the book
Figure BDA0002552639940000046
Has a network parameter of phiiCorresponding freezing parameter is
Figure BDA0002552639940000047
And randomly initializing phiiAnd
Figure BDA0002552639940000048
step 3-1-2: defining a neural network
Figure BDA0002552639940000049
Is an input (o)i,t,ai,t) A neural network outputting a single scalar value; the activation function is a ReLU function; note the book
Figure BDA00025526399400000410
Is recorded as
Figure BDA00025526399400000411
The corresponding freezing parameter is
Figure BDA00025526399400000412
Random initialization
Figure BDA00025526399400000413
And
Figure BDA00025526399400000414
step 3-1-3: definition of
Figure BDA00025526399400000415
And
Figure BDA00025526399400000416
for two inputs oi,tOutput and action ai,tThe neural networks with the same shape as the vector,
Figure BDA00025526399400000417
and
Figure BDA00025526399400000418
the device is provided with independent output layers respectively, and simultaneously shares the same neural network input layer and hidden layer; the activation function is a ReLU function; note the book
Figure BDA00025526399400000419
And
Figure BDA00025526399400000420
all network parameters of (2) are thetaiRandom initialization of thetai
Further, the step 4 comprises:
step 4-1: obtaining measurement signals from a measurement device of a regional power grid to form a corresponding observation variable oi,t
Step 4-2: neural network according to local policy
Figure BDA00025526399400000421
And
Figure BDA00025526399400000422
generating the corresponding action a of the current timei,t
Figure BDA0002552639940000051
Step 4-3: the controller will ai,tSending the data to local controlled flexible resources, such as DG nodes and SVC nodes;
step 4-4: will (o)i,t,ai,t) Is stored to
Figure BDA0002552639940000052
In (1).
Further, the step 5 comprises:
step 5-1: will be provided with
Figure BDA0002552639940000053
Uploading m +1 samples to experience base D of cloud serveriPerforming the following steps;
step 5-2: emptying
Figure BDA0002552639940000054
Step 5-3: calculating r for the first m groups of uploaded data of the current round on the cloud servertAnd
Figure BDA0002552639940000055
step 5-4: if communication faults occur, samples in a certain area cannot be uploaded, and the sampling uploading at this time can be directly ignored.
Further, the step 6 comprises:
step 6-1: from experience libraries DiExtract a set of experiences
Figure BDA0002552639940000056
The number is B;
step 6-2: calculating a parameter phiiLoss function of (2):
Figure BDA0002552639940000057
wherein x is (o)1,...,oN) All regional observations; x' is the observation value at the next moment corresponding to x; a is1,...,aNMotion vectors for region 1 to region N, respectively;
Figure BDA0002552639940000058
is shown in
Figure BDA0002552639940000059
Obtaining; y isiComprises the following steps:
Figure BDA00025526399400000510
wherein γ is a reduction coefficient; alpha is alphaiAn entropy maximization factor for region i;
Figure BDA00025526399400000511
to get to
Figure BDA00025526399400000512
A probability value of (d);
Figure BDA00025526399400000513
comprises the following steps:
Figure BDA00025526399400000514
l denotes bit-wise multiplication o'iIs the observed value of the area i at the next moment;
step 6-3: updating the parameter phii
Figure BDA00025526399400000515
Where ρ isiIn order to learn the step size,
Figure BDA0002552639940000061
the expression is for a variable phiiCalculating a gradient;
step 6-4: calculating parameters
Figure BDA0002552639940000062
A loss function of (d);
Figure BDA0002552639940000063
wherein
Figure BDA0002552639940000064
Comprises the following steps:
Figure BDA0002552639940000065
step 6-5: updating parameters
Figure BDA0002552639940000066
Figure BDA0002552639940000067
Step 6-6: calculate lagrangian function:
Figure BDA0002552639940000068
wherein
Figure BDA0002552639940000069
Limiting the voltage crossing thread degree;
Figure BDA00025526399400000610
comprises the following steps:
Figure BDA00025526399400000611
step 6-7: updating the parameter θi
Figure BDA00025526399400000612
And 6-8: updating the parameter lambdai
Figure BDA00025526399400000613
Step 6-9: updating freeze parameters
Figure BDA00025526399400000614
And
Figure BDA00025526399400000615
Figure BDA00025526399400000616
wherein η is the freezing coefficient;
step 6-10: issuing updated policy neural networks
Figure BDA00025526399400000617
And
Figure BDA00025526399400000618
to region i.
Further, the step 4, the step 5 and the step 6 are executed in parallel.
The invention also provides a power grid reactive voltage distributed control system, which comprises:
the model building module is used for making reactive voltage control targets of all controlled areas according to the whole reactive voltage control target and the optimization model of the controlled power grid and building a reactive voltage optimization model;
the training frame construction module is used for constructing a multi-agent interactive training frame based on the Markov game by combining the actual configuration conditions of the optimization model and the power grid;
the initialization module is used for initializing each neural network and relevant control process variables and issuing the neural networks and the relevant control process variables to each control area;
the controller module is used for being arranged in each region locally and executing the control steps in parallel according to the received strategy neural network;
the sample uploading module is used for being arranged in each area locally, executing the step of uploading samples in parallel and uploading the measurement samples to the cloud server;
the strategy learning module is arranged on the cloud server, is used for learning each controller strategy in parallel and issuing the updated strategy to each regional controller;
the controller module, the sample uploading module and the strategy learning module are used for being repeatedly called and executed.
The invention has the advantages and beneficial effects that:
when each region controller executes control operation, the region controller does not need to communicate with a cloud server or other controllers, can quickly generate control instructions according to a stored strategy neural network, efficiently utilizes high-speed flexible resources, and improves the efficiency of reactive voltage control;
all controllers run in parallel, and the three steps of local control, sample uploading and centralized learning run in parallel, so that communication and computing resources can be fully utilized, and the robustness to communication and computing conditions is good.
Based on multi-agent deep reinforcement learning, an accurate power grid model can be not established, the characteristics of the power grid are learned only through control process data, model-free optimization is carried out, and reactive power distribution of the power grid can be controlled to be in an optimized state under the condition that the model is incomplete;
compared with other distributed learning methods, the centralized learning method has the advantages that the computing cost of each controller can be greatly saved, and the utilization efficiency of cloud computing resources is improved;
compared with the existing power grid optimization method based on multi-agent reinforcement learning, the method has the advantages of high sample efficiency, high voltage safety, simple control structure and lower implementation cost.
According to the power grid reactive voltage distributed control method and system, on one hand, high-speed flexible control and high-speed reactive voltage control of communication robustness are achieved through distributed control, on the other hand, optimal reactive voltage control under the condition of incomplete model is achieved through online learning of control process data through a deep reinforcement learning method, the requirement of continuous online operation of power grid reactive voltage control can be met, the voltage quality of a power grid is greatly improved, and the operation loss of the power grid is reduced.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 shows a flow chart of a method for distributed control of reactive voltage of a power grid according to an embodiment of the invention;
fig. 2 shows a block diagram of a grid reactive voltage distributed control system according to an embodiment of the invention;
fig. 3 shows a schematic structural diagram of a module of the grid reactive voltage distributed control system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a distributed control method for reactive voltage of a power grid, in particular to a distributed control method for reactive voltage of the power grid based on multi-agent deep reinforcement learning, which comprises the following steps as shown in figure 1:
step 1: according to the whole reactive voltage control target and the optimization model of the controlled power grid, formulating reactive voltage control targets of all controlled areas and establishing a reactive voltage optimization model;
step 2: constructing a multi-agent interactive training framework based on the Markov game by combining the actual configuration conditions of the optimization model and the power grid;
and step 3: initializing each neural network and relevant control process variables and issuing the neural networks and relevant control process variables to each control area;
and 4, step 4: the local controllers in all the areas execute control steps in parallel according to the received strategy neural network;
and 5: the local controllers in all the areas execute the sampling step in parallel, and upload the measurement samples to the cloud server;
step 6: the cloud server parallelly learns the strategies of all the controllers and issues the updated strategies to all the regional controllers;
and 7: and repeating and executing the steps 4, 5 and 6 in parallel.
The specific implementation of each step is described in detail below.
In the step 1, according to the whole reactive voltage control target and the optimization model of the controlled power grid, the reactive voltage control target of each controlled area is formulated, and the reactive voltage optimization model is established. This step may be performed at a regional grid regulation center as shown in fig. 2, and in particular may be performed on a cloud server.
The method comprises the following steps:
step 1-1: establishing a whole reactive voltage control target and optimization model of the controlled power grid:
Figure BDA0002552639940000091
wherein the content of the first and second substances,
Figure BDA0002552639940000092
is a collection of all nodes of the grid, VjIs the voltage amplitude of node j; pjIs the active power output of node j; qGjDG reactive power output for node j; qCjSVC (Static Var Compensator) reactive power output for node j;
Figure BDA0002552639940000093
the lower voltage limit and the upper voltage limit of the node j are respectively;
Figure BDA0002552639940000094
respectively is the lower limit and the upper limit of the SVC reactive power output of the node j; sGj,PGjDG installed capacity and active power output, respectively, for node j.
Step 1-2: and splitting the reactive voltage control target and the optimization model to form the reactive voltage control target and the optimization model of each controlled area.
As shown in fig. 2, the controlled grid is divided into N regions according to an actual controller installation situation, each region includes a plurality of nodes, illustratively, the nodes include DG nodes and SVC nodes, and a branch is formed between the nodes. Each zone is equipped with a local controller. Illustratively, the controlled area 1 is equipped with a controlled area controller 1, and the controlled area 2 is equipped with a controlled area controller 2 …, and the controlled area N is equipped with a controlled area controller N. The controller of the controlled area, called controller for short, can quickly obtain the measuring signal of the area. The controller is also communicated with a cloud server of a regional power grid regulation and control center, namely a cloud server for short, through communication. In the embodiment of the present invention, the cloud server may include one or more computing devices. Specifically, the controller can obtain voltage measurement, current measurement, power measurement and the like of the nodes through the measuring devices installed on the nodes, and upload the sample data of the reactive voltage control process to the cloud server. The controller also receives a reactive voltage control strategy corresponding to the region from the cloud server and issues a control signal to the node.
In the embodiment of the invention, for the ith epsilon [1, N ] controlled area, splitting a reactive voltage control target and an optimization model into the controlled area reactive voltage control target and the optimization model corresponding to N areas:
Figure BDA0002552639940000101
wherein the content of the first and second substances,
Figure BDA0002552639940000102
for the complete set of nodes for the ith region,
Figure BDA0002552639940000103
the network output power for the ith zone. In the examples of the present invention, the same symbols appearing represent the same physical meanings, such as SGj,PGjDG installed capacity and active power output, respectively, of node j, where node j is a node
Figure BDA0002552639940000104
Step 2: and constructing a multi-agent interactive training framework based on the Markov game by combining the actual configuration conditions of the optimization model and the power grid.
Step 2-1: corresponding to the system measurement of each region, construct the observation variable o of each regioni,tAs shown in (0.41).
Figure BDA0002552639940000111
Wherein P isi,QiInjecting vectors formed by active power and reactive power into each node of the ith area; viVector formed by voltage of each node of ith area;
Figure BDA0002552639940000112
Outputting active power and reactive power for the network of the ith area; t is a discrete time variable of the control process.
Step 2-2: corresponding to the reactive voltage optimization target of each region, establishing a uniform feedback variable r of each regiontAs shown at (0.42).
Figure BDA0002552639940000113
PjIs the active power output of the node j,
Figure BDA0002552639940000114
and outputting active power for the network of the area i.
Step 2-3: corresponding to the reactive voltage optimization constraint of each region, constructing constraint feedback variables of each region
Figure BDA0002552639940000115
As shown in (0.43):
Figure BDA0002552639940000116
wherein [ x ]]+=max(0,x);βiThe cooperation coefficient of the ith area; vj(t) is the voltage at node j at time t,
Figure BDA0002552639940000117
the upper limit of the voltage is represented,Vis the upper voltage limit; generally, the upper voltage limit is consistent across the nodes, although it may vary under particular circumstances; here, according to the convention, the voltage upper limit is taken as the same, namely the voltage upper limit identifies the voltage upper limit of each node, and the voltage lower limit also does;
step 2-4: corresponding to the reactive power of the controllable flexible resources, constructing action variables a of each areai,tAs shown in (0.44):
ai,t=(QGi,QCi)t(0.44)
wherein Q isGi,QCiThe vectors of the DG and SVC reactive power output of the ith area are respectively.
And step 3: initializing each neural network and related control process variables;
step 3-1: initializing each neural network and relevant control process variables and issuing the neural networks and relevant control process variables to each control area. Firstly, initializing a neural network corresponding to each region, and storing the neural network on a cloud server, wherein the neural network comprises the following steps:
step 3-1-1: defining a neural network
Figure BDA0002552639940000121
Is an input (o)i,t,ai,t) A neural network that outputs a single scalar value, comprising several hidden layers (typically taken as 2 hidden layers), each hidden layer containing several neurons (typically taken as 512 neurons), the activation function being a ReLU function, the mathematical expression of which is ReLU (x) max (0, x). Note the book
Figure BDA0002552639940000122
Has a network parameter of phiiCorresponding freezing parameter is
Figure BDA0002552639940000123
And randomly initializing phiiAnd
Figure BDA0002552639940000124
step 3-1-2: defining a neural network
Figure BDA0002552639940000125
Is an input (o)i,t,ai,t) A neural network outputting a single scalar value comprises a plurality of hidden layers (typically 2 hidden layers), each hidden layer comprises a plurality of neurons (typically 512 neurons), and an activation function is a ReLU function. Note the book
Figure BDA0002552639940000126
Is recorded as
Figure BDA0002552639940000127
The corresponding freezing parameter is
Figure BDA0002552639940000128
Random initialization
Figure BDA0002552639940000129
And
Figure BDA00025526399400001210
step 3-1-3: definition of
Figure BDA00025526399400001211
And
Figure BDA00025526399400001212
for two inputs oi,tOutput and action ai,tNeural networks of the same vector shape.
Figure BDA00025526399400001213
And
Figure BDA00025526399400001214
the neural network has independent output layers respectively, and simultaneously shares the same neural network input layer and hidden layer, and comprises a plurality of hidden layers (typically 2 hidden layers), each hidden layer comprises a plurality of neurons (typically 512 neurons), and the activation function is a ReLU function. Note the book
Figure BDA00025526399400001215
And
Figure BDA00025526399400001216
all network parameters of (2) are thetai. Random initialization of thetai
Step 3-2: initializing each region Lagrange multiplier lambdaiIs a scalar, typically with an initial value of 1;
step 3-3: issuing an initial strategy neural network through a communication network
Figure BDA00025526399400001217
And
Figure BDA00025526399400001218
a controller to zone i;
step 3-4: initializing a discrete time variable t as 0, wherein the actual time interval between two steps is delta t, controlling the time interval once every step, and specifically determining according to the actual measurement and the instruction control speed of a local controller;
step 3-5: initialization policy update period TuI.e. every TuStrategy updating is executed once at delta T time, the strategy updating is determined according to the training speed of the cloud server, and the typical value can be Tu=8;
Step 3-6: initialization sample upload period TsThe ratio of m to sample upload is equal to [1, T ∈s]. Every other TsAnd (4) uploading samples once by each controller, and uploading m samples in the previous uploading period. T issM is determined according to the communication speed, and the typical value can be Ts=8,m=1;
Step 3-7: initializing cloud server experience bases
Figure BDA0002552639940000131
Local caching experience base of each controller
Figure BDA0002552639940000132
And 4, step 4: and the local controllers of all the regions execute control steps in parallel according to the received strategy neural network. The local controllers of the areas i execute the following control steps at the time t, and the control steps are executed in parallel without interference:
step 4-1: obtaining measurement signals from a measurement device of a regional power grid to form a corresponding observation variable oi,t
Step 4-2: neural network according to local policy
Figure BDA0002552639940000133
And
Figure BDA0002552639940000134
generating the corresponding action a of the current timei,t
Figure BDA0002552639940000135
Step 4-3: the controller will ai,tSending the data to local controlled flexible resources, such as DG nodes and SVC nodes;
step 4-4: will (o)i,t,ai,t) Is stored to
Figure BDA0002552639940000136
In (1).
And 5: and the local controllers of all the areas execute the step of uploading the samples in parallel and upload the measurement samples to the cloud server. And uploading the local samples to a cloud server by the region controller according to the uploading period. Illustratively, if tmodTsAnd (5) when the local controller of each area i is equal to 0, the following sampling steps are executed at the time t, and the following steps are executed in parallel without interference:
step 5-1: through a communication network, will
Figure BDA0002552639940000141
Uploading m +1 samples to experience base D of cloud serveriPerforming the following steps;
step 5-2: emptying
Figure BDA0002552639940000142
Step 5-3: after all the controllers are uploaded, the front m groups of r of the data uploaded in the current round are calculated on the cloud servertAnd
Figure BDA0002552639940000143
step 5-4: if communication faults occur, samples in a certain area cannot be uploaded, the sampling uploading can be directly ignored, and the follow-up execution is not affected.
Step 6: the cloud server learns the strategies of all the controllers in parallel and issues updatesThe latter strategy goes to each zone controller. And the cloud server uses the updated experience base to learn the strategies of each controller in parallel according to the updating period, and sends the generated updated strategy to each controller. Illustratively, if tmodTuWhen the value is 0, the cloud server parallelly learns each controller strategy at the time T and issues the strategy, namely, the following learning steps are executed for the neural network of each area i for a plurality of times (the typical value is T)uSecond, adjustable according to cloud server computing power):
step 6-1: from experience libraries DiExtract a set of experiences
Figure BDA0002552639940000144
The number B (typical value 64);
step 6-2: calculating a parameter phiiLoss function of
Figure BDA0002552639940000145
Wherein x is (o)1,...,oN) All regional observations; x' is the observation value at the next moment corresponding to x; a is1,...,aNMotion vectors for region 1 to region N, respectively;
Figure BDA0002552639940000146
is shown in
Figure BDA00025526399400001413
Obtaining; y isiComprises the following steps:
Figure BDA0002552639940000148
wherein γ is a reduction coefficient, typically 0.98; alpha is alphaiAn entropy maximization factor for region i, with a typical value of 0.1;
Figure BDA0002552639940000149
to get to
Figure BDA00025526399400001410
A probability value of (d);
Figure BDA00025526399400001411
comprises the following steps:
Figure BDA00025526399400001412
l denotes bit-wise multiplication o'iIs the observed value of the area i at the next moment. In the embodiment of the invention, the cloud server learns the strategies of all the controllers in parallel, and the global observation value is used for learning and calculation of each region. I.e. learning using global information and execution using only local information. The reliability and superiority of the control strategy are improved.
Step 6-3: updating the parameter phii
Figure BDA0002552639940000151
Where ρ isiFor learning the step size, a typical value is 0.0001,
Figure BDA0002552639940000152
the expression is for a variable phiiAnd (5) calculating a gradient.
Step 6-4: calculating parameters
Figure BDA0002552639940000153
A loss function of (d);
Figure BDA0002552639940000154
wherein
Figure BDA0002552639940000155
Comprises the following steps:
Figure BDA0002552639940000156
the superscript C denotes "constraint", a constraint-related variable.
Step 6-5: updating parameters
Figure BDA0002552639940000157
Figure BDA0002552639940000158
Step 6-6: calculate lagrangian function:
Figure BDA0002552639940000159
wherein
Figure BDA00025526399400001510
To override the threading limit for voltage, a typical value is taken to be 0.
Figure BDA00025526399400001511
Comprises the following steps:
Figure BDA00025526399400001512
step 6-7: updating the parameter θi
Figure BDA00025526399400001513
And 6-8: updating the parameter lambdai
Figure BDA0002552639940000161
Step 6-9: updating freeze parameters
Figure BDA0002552639940000162
And
Figure BDA0002552639940000163
Figure BDA0002552639940000164
where η is the freezing coefficient and a typical value is 0.995.
Step 6-10: issuing updated strategy neural network through communication network
Figure BDA0002552639940000165
And
Figure BDA0002552639940000166
to region i.
And 7: in the next operation, steps 4, 5, and 6 are repeatedly executed in parallel. Specifically, t is t +1, the procedure returns to step 4, and steps 4, 5, and 6 are repeated. The steps 4, 5 and 6 can be executed in parallel without mutual interference, and the related communication and calculation do not obstruct the normal execution of other controllers and other steps.
Based on the same inventive concept, an embodiment of the present invention further provides a grid reactive voltage distributed control system, as shown in fig. 3, the system includes:
the model building module is used for making reactive voltage control targets of all controlled areas according to the whole reactive voltage control target and the optimization model of the controlled power grid and building a reactive voltage optimization model;
the training frame construction module is used for constructing a multi-agent interactive training frame based on the Markov game by combining the actual configuration conditions of the optimization model and the power grid;
the initialization module is used for initializing each neural network and relevant control process variables and issuing the neural networks and the relevant control process variables to each control area;
the controller module is used for being arranged in each region locally, namely local computer equipment, and the controller module executes the control steps in parallel according to the received strategy neural network;
the sample uploading module is used for being arranged in each local area, parallelly executing the step of uploading the samples and uploading the measurement samples to the cloud server;
the strategy learning module is arranged on the cloud server and used for learning each controller strategy in parallel and issuing the updated strategy to each regional controller;
the controller module, the sample uploading module and the strategy learning module are repeatedly called and executed and can be executed in parallel.
Without loss of generality, the model building module, the training framework building module and the initialization module can be deployed on the cloud server, and can also be deployed on computer equipment different from the cloud server. The modules on the server are in data connection with the modules on the local part of each control area through a communication network.
The specific execution process and algorithm of each module may be obtained according to the embodiment of the distributed control method for reactive voltage of the power grid, and are not described herein again.
The control method and the control system adopt a control framework combining online centralized learning and distributed control, continuously and intensively collect control data of each controller through an efficient deep reinforcement learning algorithm, intensively learn on a cloud server to obtain a control strategy of each controller, and locally execute the strategy by each controller according to local measurement after the strategy is issued to each controller by a communication network. On one hand, the invention gives full play to the speed advantage of distributed control, and the local controller can carry out rapid control according to real-time local measurement without communication, thereby being particularly suitable for the reactive voltage control of high-speed DG resources and SVC resources; and on the other hand, an efficient deep reinforcement learning algorithm is provided, the information advantages of centralized learning are fully utilized, the optimal strategy of each intelligent agent is obtained, and the optimal operation of the system is guaranteed under the condition that the model is incomplete. The method greatly improves the efficiency, safety and flexibility of the reactive voltage control method of the power grid under the condition of model imperfection, is particularly suitable for regional power grids with serious model imperfection problems, saves high cost caused by repeated maintenance of accurate models, reduces the requirements on communication conditions and calculation conditions of each controller, exerts the advantages of flexibility and high efficiency of distributed control, avoids the problems of high single-point failure risk, large control instruction delay and the like caused by centralized control, and is suitable for large-scale popularization.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A reactive voltage distributed control method for a power grid is characterized by comprising the following steps:
step 1: according to the whole reactive voltage control target and the optimization model of the controlled power grid, formulating reactive voltage control targets of all controlled areas and establishing a reactive voltage optimization model;
step 2: constructing a multi-agent interactive training framework based on the Markov game by combining the actual configuration conditions of the optimization model and the power grid;
and step 3: initializing each neural network and relevant control process variables and issuing the neural networks and relevant control process variables to each control area;
and 4, step 4: the local controllers in all the areas execute control steps in parallel according to the received strategy neural network;
and 5: the local controllers in all the areas execute the step of uploading samples in parallel, and upload the measurement samples to the cloud server;
step 6: the cloud server parallelly learns the strategies of all the controllers and issues the updated strategies to all the regional controllers;
and 7: and (5) repeatedly executing the steps 4, 5 and 6.
2. The grid reactive voltage distributed control method according to claim 1, wherein the step 1 comprises:
step 1-1: establishing a whole reactive voltage control target and optimization model of the controlled power grid:
Figure FDA0002552639930000011
wherein the content of the first and second substances,
Figure FDA0002552639930000012
is a collection of all nodes of the grid, VjIs the voltage amplitude of node j; pjIs the active power output of node j; qGjDG reactive power output for node j; qCjSVC reactive power output for node j; jV,
Figure FDA0002552639930000013
the lower voltage limit and the upper voltage limit of the node j are respectively; CjQ,
Figure FDA0002552639930000014
respectively is the lower limit and the upper limit of the SVC reactive power output of the node j; sGj,PGjDG installed capacity and active power output, respectively, for node j;
step 1-2: splitting the reactive voltage control target and the optimization model to form reactive voltage control targets and optimization models of each controlled area:
Figure FDA0002552639930000021
wherein the content of the first and second substances,
Figure FDA0002552639930000022
for the complete set of nodes for the ith region,
Figure FDA0002552639930000023
the network output power for the ith zone.
3. The grid reactive voltage distributed control method according to claim 2, wherein the step 2 comprises:
step 2-1: corresponding to the system measurement of each region, construct the observation variable o of each regioni,t
Figure FDA0002552639930000024
Wherein P isi,QiInjecting vectors formed by active power and reactive power into each node of the ith area; viA vector formed by voltages of all nodes in the ith area; pi e,
Figure FDA0002552639930000025
Outputting active power and reactive power for the network of the ith area; t is a discrete time variable of the control process;
step 2-2: corresponding to the reactive voltage optimization target of each region, establishing a uniform feedback variable r of each regiont
Figure FDA0002552639930000026
PjIs the active power output of the node j,
Figure FDA0002552639930000027
outputting active power for the network of the area i;
step 2-3: corresponding to the reactive voltage optimization constraint of each region, constructing constraint feedback variables of each region
Figure FDA0002552639930000028
Figure FDA0002552639930000029
Wherein [ x ]]+=max(0,x);βiIs the cooperation coefficient of the i-th area, Vj(t) is the voltage at node j at time t,
Figure FDA0002552639930000031
the upper limit of the voltage is represented,Vis the upper voltage limit;
step 2-4: corresponding to the reactive power of the controllable flexible resources, constructing action variables a of each areai,t
ai,t=(QGi,QCi)t(0.6)
Wherein Q isGi,QCiThe vectors of the DG and SVC reactive power output of the ith area are respectively.
4. The grid reactive voltage distributed control method according to claim 3, wherein the step 3 comprises:
step 3-1: initializing each neural network and relevant control process variables and issuing the neural networks and relevant control process variables to each control area;
step 3-2: initializing each region Lagrange multiplier lambdaiIs a scalar;
step 3-3: issuing an initial strategy neural network through a communication network
Figure FDA00025526399300000319
And
Figure FDA00025526399300000318
a controller to zone i;
step 3-4: initializing a discrete time variable t as 0, wherein the actual time interval between two steps is delta t;
step 3-5: initialization policy update period TuFor every TuPerforming strategy updating once at the delta t time;
step 3-6: initialization sample upload period TsThe ratio of m to sample upload is equal to [1, T ∈s]For every TsEach controller of delta t uploads a sample once and uploads m samples in the previous uploading period;
step 3-7: initializing cloud server experience bases
Figure FDA0002552639930000032
Local caching experience base of each controller
Figure FDA0002552639930000033
5. The grid reactive voltage distributed control method according to claim 4, wherein the step 3-1 comprises:
step 3-1-1: defining a neural network
Figure FDA0002552639930000034
Is an input (o)i,t,ai,t) A neural network outputting a single scalar value; the activation function is a ReLU function; note the book
Figure FDA0002552639930000035
Has a network parameter of phiiCorresponding freezing parameter is
Figure FDA0002552639930000036
And randomly initializing phiiAnd
Figure FDA0002552639930000037
step 3-1-2: defining a neural network
Figure FDA0002552639930000038
Is an input (o)i,t,ai,t) A neural network outputting a single scalar value; the activation function is a ReLU function; note the book
Figure FDA0002552639930000039
Is recorded as
Figure FDA00025526399300000310
The corresponding freezing parameter is
Figure FDA00025526399300000311
Random initialization
Figure FDA00025526399300000312
And
Figure FDA00025526399300000313
step 3-1-3: definition of
Figure FDA00025526399300000314
And
Figure FDA00025526399300000315
for two inputs oi,tOutput and action ai,tThe neural networks with the same shape as the vector,
Figure FDA00025526399300000316
and
Figure FDA00025526399300000317
the device is provided with independent output layers respectively, and simultaneously shares the same neural network input layer and hidden layer; the activation function is a ReLU function; note the book
Figure FDA0002552639930000041
And
Figure FDA0002552639930000042
all network parameters of (2) are thetaiRandom initialization of thetai
6. The grid reactive voltage distributed control method according to claim 5, wherein the step 4 comprises:
step 4-1: obtaining measurement signals from a measurement device of a regional power grid to form a corresponding observation variable oi,t
Step 4-2: neural network according to local policy
Figure FDA0002552639930000043
And
Figure FDA0002552639930000044
generating the corresponding action a of the current timei,t
Figure FDA0002552639930000045
Step 4-3: the controller will ai,tSending the data to local controlled flexible resources, such as DG nodes and SVC nodes;
step 4-4: will (o)i,t,ai,t) Is stored to
Figure FDA0002552639930000046
In (1).
7. The grid reactive voltage distributed control method according to claim 6, wherein the step 5 comprises:
step 5-1: will be provided with
Figure FDA0002552639930000047
Uploading m +1 samples to experience base D of cloud serveriPerforming the following steps;
step 5-2: emptying
Figure FDA0002552639930000048
Step 5-3: calculating r for the first m groups of uploaded data of the current round on the cloud servertAnd
Figure FDA0002552639930000049
step 5-4: if communication faults occur, samples in a certain area cannot be uploaded, and the sampling uploading at this time can be directly ignored.
8. The grid reactive voltage distributed control method according to claim 7, wherein the step 6 comprises:
step 6-1: from experience libraries DiExtract a set of experiences
Figure FDA00025526399300000410
The number is B;
step 6-2: calculating a parameter phiiLoss function of (2):
Figure FDA00025526399300000411
wherein x is (o)1,...,oN) All regional observations; x' is the observation value at the next moment corresponding to x; a is1,...,aNMotion vectors for region 1 to region N, respectively;
Figure FDA0002552639930000051
is shown in
Figure FDA0002552639930000052
Obtaining; y isiComprises the following steps:
Figure FDA0002552639930000053
wherein γ is a reduction coefficient; alpha is alphaiAn entropy maximization factor for region i;
Figure FDA0002552639930000054
to get to
Figure FDA0002552639930000055
A probability value of (d);
Figure FDA0002552639930000056
comprises the following steps:
Figure FDA0002552639930000057
l denotes bit-wise multiplication o'iIs the observed value of the area i at the next moment;
step 6-3: updating the parameter phii
Figure FDA0002552639930000058
Where ρ isiIn order to learn the step size,
Figure FDA0002552639930000059
the expression is for a variable phiiCalculating a gradient;
step 6-4: calculating parameters
Figure FDA00025526399300000510
A loss function of (d);
Figure FDA00025526399300000511
wherein
Figure FDA00025526399300000512
Comprises the following steps:
Figure FDA00025526399300000513
step 6-5: updating parameters
Figure FDA00025526399300000514
Figure FDA00025526399300000515
Step 6-6: calculate lagrangian function:
Figure FDA00025526399300000516
wherein
Figure FDA00025526399300000517
Limiting the voltage crossing thread degree;
Figure FDA00025526399300000518
comprises the following steps:
Figure FDA00025526399300000519
step 6-7: updating the parameter θi
Figure FDA00025526399300000520
And 6-8: updating the parameter lambdai
Figure FDA0002552639930000061
Step 6-9: updating freeze parameters
Figure FDA0002552639930000062
And
Figure FDA0002552639930000063
Figure FDA0002552639930000064
wherein η is the freezing coefficient;
step 6-10: issuing updated policy neural networks
Figure FDA0002552639930000065
And
Figure FDA0002552639930000066
to region i.
9. The grid reactive voltage distributed control method according to any of claims 1-8,
and the step 4, the step 5 and the step 6 are executed in parallel.
10. A grid reactive voltage distributed control system, comprising:
the model building module is used for making reactive voltage control targets of all controlled areas according to the whole reactive voltage control target and the optimization model of the controlled power grid and building a reactive voltage optimization model;
the training frame construction module is used for constructing a multi-agent interactive training frame based on the Markov game by combining the actual configuration conditions of the optimization model and the power grid;
the initialization module is used for initializing each neural network and relevant control process variables and issuing the neural networks and the relevant control process variables to each control area;
the controller module is used for being arranged in each region locally and executing the control steps in parallel according to the received strategy neural network;
the sample uploading module is used for being arranged in each area locally, executing the step of uploading samples in parallel and uploading the measurement samples to the cloud server;
the strategy learning module is arranged on the cloud server, is used for learning each controller strategy in parallel and issuing the updated strategy to each regional controller;
the controller module, the sample uploading module and the strategy learning module are used for being repeatedly called and executed.
CN202010581959.4A 2020-06-23 2020-06-23 Voltage distributed control method and system based on multi-agent deep reinforcement learning Active CN111799808B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010581959.4A CN111799808B (en) 2020-06-23 2020-06-23 Voltage distributed control method and system based on multi-agent deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010581959.4A CN111799808B (en) 2020-06-23 2020-06-23 Voltage distributed control method and system based on multi-agent deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN111799808A true CN111799808A (en) 2020-10-20
CN111799808B CN111799808B (en) 2022-06-28

Family

ID=72803612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010581959.4A Active CN111799808B (en) 2020-06-23 2020-06-23 Voltage distributed control method and system based on multi-agent deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN111799808B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507614A (en) * 2020-12-01 2021-03-16 广东电网有限责任公司中山供电局 Comprehensive optimization method for power grid in distributed power supply high-permeability area
CN113258581A (en) * 2021-05-31 2021-08-13 广东电网有限责任公司佛山供电局 Source-load coordination voltage control method and device based on multiple intelligent agents
EP4148939A1 (en) * 2021-09-09 2023-03-15 Siemens Aktiengesellschaft System and method for controlling power distribution systems using graph-based reinforcement learning

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103580061A (en) * 2013-10-28 2014-02-12 贵州电网公司电网规划研究中心 Microgrid operating method
US20160105023A1 (en) * 2013-05-22 2016-04-14 Vito Nv Power supply network control system and method
CN109120011A (en) * 2018-09-29 2019-01-01 清华大学 A kind of Distributed power net congestion dispatching method considering distributed generation resource
CN110365056A (en) * 2019-08-14 2019-10-22 南方电网科学研究院有限责任公司 A kind of distributed energy participation power distribution network pressure regulation optimization method based on DDPG
CN110729740A (en) * 2019-07-03 2020-01-24 清华大学 Power distribution network reactive power optimization method and device, computer equipment and readable storage medium
CN110768262A (en) * 2019-10-31 2020-02-07 上海电力大学 Active power distribution network reactive power supply configuration method based on node clustering partition
US20200082305A1 (en) * 2018-09-06 2020-03-12 Trevor N. Werho Induced markov chain for wind farm generation forecasting
US20200119556A1 (en) * 2018-10-11 2020-04-16 Di Shi Autonomous Voltage Control for Power System Using Deep Reinforcement Learning Considering N-1 Contingency

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160105023A1 (en) * 2013-05-22 2016-04-14 Vito Nv Power supply network control system and method
CN103580061A (en) * 2013-10-28 2014-02-12 贵州电网公司电网规划研究中心 Microgrid operating method
US20200082305A1 (en) * 2018-09-06 2020-03-12 Trevor N. Werho Induced markov chain for wind farm generation forecasting
CN109120011A (en) * 2018-09-29 2019-01-01 清华大学 A kind of Distributed power net congestion dispatching method considering distributed generation resource
US20200119556A1 (en) * 2018-10-11 2020-04-16 Di Shi Autonomous Voltage Control for Power System Using Deep Reinforcement Learning Considering N-1 Contingency
CN110729740A (en) * 2019-07-03 2020-01-24 清华大学 Power distribution network reactive power optimization method and device, computer equipment and readable storage medium
CN110365056A (en) * 2019-08-14 2019-10-22 南方电网科学研究院有限责任公司 A kind of distributed energy participation power distribution network pressure regulation optimization method based on DDPG
CN110768262A (en) * 2019-10-31 2020-02-07 上海电力大学 Active power distribution network reactive power supply configuration method based on node clustering partition

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
AUGUSTO C. RUEDA-MEDINA 等: "Distributed Generators as Providers of Reactive Power Support—A Market Approach", 《IEEE TRANSACTIONS ON POWER SYSTEMS》 *
DI CAO 等: "Distributed Voltage Regulation of Active Distribution System Based on Enhanced Multi-agent Deep Reinforcement Learning", 《ARXIV》 *
NAN ZOU 等: "Auxiliary Frequency and Voltage Regulation in Microgrid via Intelligent Electric Vehicle Charging", 《2014 IEEE INTERNATIONAL CONFERENCE ON SMART GRID COMMUNICATIONS》 *
PENG KOU 等: "Safe deep reinforcement learning-based constrained optimal control scheme for active distribution networks", 《APPLIED ENERGY》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507614A (en) * 2020-12-01 2021-03-16 广东电网有限责任公司中山供电局 Comprehensive optimization method for power grid in distributed power supply high-permeability area
CN113258581A (en) * 2021-05-31 2021-08-13 广东电网有限责任公司佛山供电局 Source-load coordination voltage control method and device based on multiple intelligent agents
CN113258581B (en) * 2021-05-31 2021-10-08 广东电网有限责任公司佛山供电局 Source-load coordination voltage control method and device based on multiple intelligent agents
EP4148939A1 (en) * 2021-09-09 2023-03-15 Siemens Aktiengesellschaft System and method for controlling power distribution systems using graph-based reinforcement learning

Also Published As

Publication number Publication date
CN111799808B (en) 2022-06-28

Similar Documents

Publication Publication Date Title
Li et al. Coordinated load frequency control of multi-area integrated energy system using multi-agent deep reinforcement learning
CN110535146B (en) Electric power system reactive power optimization method based on depth determination strategy gradient reinforcement learning
Xi et al. A novel multi-agent DDQN-AD method-based distributed strategy for automatic generation control of integrated energy systems
CN111799808B (en) Voltage distributed control method and system based on multi-agent deep reinforcement learning
CN111564849B (en) Two-stage deep reinforcement learning-based power grid reactive voltage control method
CN112615379A (en) Power grid multi-section power automatic control method based on distributed multi-agent reinforcement learning
CN111666713B (en) Power grid reactive voltage control model training method and system
CN113471982B (en) Cloud edge cooperation and power grid privacy protection distributed power supply in-situ voltage control method
CN114217524A (en) Power grid real-time self-adaptive decision-making method based on deep reinforcement learning
Xi et al. A virtual generation ecosystem control strategy for automatic generation control of interconnected microgrids
Li et al. Grid-area coordinated load frequency control strategy using large-scale multi-agent deep reinforcement learning
CN109494721A (en) A kind of power distribution network distributed self-adaption control method suitable for being switched containing flexible multimode
CN110429652A (en) A kind of intelligent power generation control method for expanding the adaptive Dynamic Programming of deep width
CN110165714A (en) Micro-capacitance sensor integration scheduling and control method, computer readable storage medium based on limit dynamic programming algorithm
CN113422371B (en) Distributed power supply local voltage control method based on graph convolution neural network
Yin et al. Quantum deep reinforcement learning for rotor side converter control of double-fed induction generator-based wind turbines
CN117039981A (en) Large-scale power grid optimal scheduling method, device and storage medium for new energy
CN113872213B (en) Autonomous optimization control method and device for power distribution network voltage
Li et al. Distributed deep reinforcement learning for integrated generation‐control and power‐dispatch of interconnected power grid with various renewable units
Xi et al. Multi-agent deep reinforcement learning strategy for distributed energy
Vohra et al. End-to-end learning with multiple modalities for system-optimised renewables nowcasting
Wang et al. Intelligent load frequency control for improving wind power penetration in power systems
Wang et al. Robust active yaw control for offshore wind farms using stochastic predictive control based on online adaptive scenario generation
Flórez et al. Explicit coordination for MPC-based distributed control with application to Hydro-Power Valleys
Ma et al. A Reinforcement learning based coordinated but differentiated load frequency control method with heterogeneous frequency regulation resources

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant