CN111799808B - Voltage distributed control method and system based on multi-agent deep reinforcement learning - Google Patents

Voltage distributed control method and system based on multi-agent deep reinforcement learning Download PDF

Info

Publication number
CN111799808B
CN111799808B CN202010581959.4A CN202010581959A CN111799808B CN 111799808 B CN111799808 B CN 111799808B CN 202010581959 A CN202010581959 A CN 202010581959A CN 111799808 B CN111799808 B CN 111799808B
Authority
CN
China
Prior art keywords
steps
voltage
area
control
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010581959.4A
Other languages
Chinese (zh)
Other versions
CN111799808A (en
Inventor
吴文传
刘昊天
孙宏斌
王彬
郭庆来
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202010581959.4A priority Critical patent/CN111799808B/en
Publication of CN111799808A publication Critical patent/CN111799808A/en
Application granted granted Critical
Publication of CN111799808B publication Critical patent/CN111799808B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/12Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load
    • H02J3/16Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load by adjustment of reactive power
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J13/00Circuit arrangements for providing remote indication of network conditions, e.g. an instantaneous record of the open or closed condition of each circuitbreaker in the network; Circuit arrangements for providing remote control of switching means in a power distribution network, e.g. switching in and out of current consumers by using a pulse code signal carried by the network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/04Power grid distribution networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/10Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/30Reactive power compensation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/70Smart grids as climate change mitigation technology in the energy generation sector
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Power Engineering (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Supply And Distribution Of Alternating Current (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention provides a voltage distributed control method based on multi-agent deep reinforcement learning, which comprises the following steps: according to the whole reactive voltage control target and the optimization model of the controlled power grid, formulating reactive voltage control targets of all controlled areas and establishing a reactive voltage optimization model; constructing a multi-agent interactive training framework based on the Markov game by combining the actual configuration conditions of the optimization model and the power grid; initializing each neural network and relevant control process variables and issuing the neural networks and relevant control process variables to each control area; the local controllers in all the areas execute control steps in parallel according to the received strategy neural network; the local controllers in all the areas execute the step of uploading samples in parallel, and upload the measurement samples to the cloud server; the cloud server learns the strategies of all the controllers in parallel and issues the updated strategies to all the regional controllers. The invention realizes the flexible control of reactive voltage and the optimal control under the condition of incomplete model.

Description

Voltage distributed control method and system based on multi-agent deep reinforcement learning
Technical Field
The invention belongs to the technical field of operation and control of power systems, and particularly relates to a voltage distributed control method and system based on multi-agent deep reinforcement learning.
Background
Under the promotion of energy and environmental problems, the proportion of clean and decentralized renewable energy (DG for short) in a power grid is increased day by day, and large-scale DG power Generation and grid connection with high permeability become the frontier and hot points in the fields of energy and power. Due to the large dispersion and strong fluctuation of the DG quantity, the DG quantity brings a series of negative effects on the aspects of voltage quality, dispatching operation and the like of a power distribution network and even a power transmission network. The DGs are usually connected to the grid through power electronic inverters, and have flexible and high-speed regulation capacity. In order to efficiently control the DG and improve the voltage quality of the high-permeability power grid, reactive voltage control has become an important issue for the regulation and control operation of the power grid. In a traditional power grid, reactive voltage control is usually realized by adopting a centralized optimization method based on a power grid model, and the loss of a controlled power grid is improved while voltage out-of-limit is eliminated.
However, the centralized optimization control method often has the key problems of single point failure, high communication and calculation burden, serious influence of communication delay, and the like. Particularly, in a high-permeability power grid, controlled DGs are numerous, and the network structure is complex, so that a centralized control method is severely limited, and high-speed resources cannot be reasonably regulated and controlled. Therefore, a series of distributed reactive voltage control methods have been developed, and compared with a centralized method, the distributed method tends to have weaker requirements on communication conditions and faster control speed.
However, the existing distributed control often adopts a model-based optimization method, because an ideal model of the power grid is difficult to obtain, the model-based optimization method cannot guarantee the control effect, the existing distributed control optimization method often has the situations that the control instruction is far away from the optimal point and the power grid runs in a suboptimal state, and the requirements of efficient and safe control are more difficult to meet under a continuous online operation scene.
Therefore, it is an urgent technical problem in the art to provide a method for controlling reactive voltage of a power grid with high safety, high efficiency and high flexibility.
Disclosure of Invention
In order to solve the above problems, the present invention provides a voltage distributed control method based on multi-agent deep reinforcement learning, which comprises:
step 1: according to the whole reactive voltage control target and the optimization model of the controlled power grid, formulating reactive voltage control targets of all controlled areas and establishing a reactive voltage optimization model;
and 2, step: constructing a multi-agent interactive training framework based on a Markov game by combining the actual configuration conditions of the optimization model and the power grid;
and step 3: initializing each neural network and relevant control process variables and issuing the neural networks and relevant control process variables to each control area;
And 4, step 4: the local controllers in each area execute control steps in parallel according to the received strategy neural network;
and 5: the local controllers in all the areas execute the step of uploading the samples in parallel, and the measured samples are uploaded to a cloud server;
and 6: the cloud server parallelly learns the strategies of each controller and issues the updated strategies to each regional controller;
and 7: and (5) repeatedly executing the steps 4, 5 and 6.
Further, the step 1 comprises:
step 1-1: establishing a whole reactive voltage control target and optimization model of the controlled power grid:
Figure GDA0002560516510000021
wherein the content of the first and second substances,
Figure GDA0002560516510000022
is a collection of all nodes of the grid, VjIs the voltage amplitude of node j; pjIs the active power output of node j; qGjDG reactive power output for node j; qCjSVC reactive power output for node j;
Figure GDA0002560516510000023
the lower voltage limit and the upper voltage limit of the node j are respectively;
Figure GDA0002560516510000024
respectively is the lower limit and the upper limit of the SVC reactive power output of the node j; sGj,PGjDG installed capacity and active power output, respectively, for node j;
step 1-2: splitting the reactive voltage control target and the optimization model to form reactive voltage control targets and optimization models of each controlled area:
Figure GDA0002560516510000031
wherein the content of the first and second substances,
Figure GDA0002560516510000032
for the complete set of nodes for the ith region,
Figure GDA0002560516510000033
the network output power for the ith zone.
Further, step 2 comprises:
step 2-1: corresponding to the system measurement of each region, construct the observation variable o of each regioni,t
Figure GDA0002560516510000034
Wherein P isi,QiInjecting vectors formed by active power and reactive power into each node of the ith area; viA vector formed by voltages of all nodes in the ith area;
Figure GDA0002560516510000035
outputting active power and reactive power for the network of the ith area; t is a discrete time variable of the control process;
step 2-2: corresponding to the reactive voltage optimization target of each region, establishing a uniform feedback variable r of each regiont
Figure GDA0002560516510000036
PjIs the active power output of the node j,
Figure GDA0002560516510000037
outputting active power for the network of the area i;
step 2-3: corresponding to the reactive voltage optimization constraint of each region, constructing constraint feedback variables of each region
Figure GDA0002560516510000038
Figure GDA0002560516510000039
Wherein [ x ]]+=max(0,x);βiIs the cooperation coefficient of the i-th area, Vj(t) is the voltage at node j at time t,
Figure GDA00025605165100000310
the upper limit of the voltage is represented,Vis the upper voltage limit;
step 2-4: corresponding to the reactive power of the controllable flexible resources, constructing action variables a of each areai,t
ai,t=(QGi,QCi)t (1.6)
Wherein Q isGi,QCiThe vectors of the DG and SVC reactive power output of the ith area are respectively.
Further, the step 3 comprises:
step 3-1: initializing each neural network and relevant control process variables and issuing the neural networks and relevant control process variables to each control area;
step 3-2: initializing each region Lagrange multiplier lambda iIs a scalar;
step 3-3: issuing an initial strategy neural network through a communication network
Figure GDA00025605165100000423
And with
Figure GDA00025605165100000424
A controller to zone i;
step 3-4: initializing a discrete time variable t to be 0, wherein the actual time interval between two steps is delta t;
step 3-5: initialization policy update period TuFor every TuPerforming strategy updating once at the delta t time;
step 3-6: initialization sample upload period TsThe ratio of m to sample upload is equal to [1, T ∈s]For every TsEach controller of delta t uploads a sample once and uploads m samples in the previous uploading period;
step 3-7: initializing cloud server experience bases
Figure GDA0002560516510000041
Local caching experience base of each controller
Figure GDA0002560516510000042
Further, the step 3-1 comprises:
step 3-1-1: defining a neural network
Figure GDA0002560516510000043
Is an input (o)i,t,ai,t) A neural network outputting a single scalar value; the activation function is a ReLU function; note the book
Figure GDA0002560516510000044
Has a network parameter of phiiCorresponding freezing parameter is
Figure GDA0002560516510000045
And randomly initializing phiiAnd
Figure GDA0002560516510000046
step 3-1-2: defining a neural network
Figure GDA0002560516510000047
Is an input (o)i,t,ai,t) Neural network outputting single scalar value(ii) a The activation function is a ReLU function; note the book
Figure GDA0002560516510000048
Is recorded as
Figure GDA0002560516510000049
The corresponding freezing parameter is
Figure GDA00025605165100000410
Random initialization
Figure GDA00025605165100000411
And
Figure GDA00025605165100000412
step 3-1-3: definition of
Figure GDA00025605165100000413
And
Figure GDA00025605165100000414
for two inputs oi,tOutput and action a i,tThe neural networks of the same vector shape,
Figure GDA00025605165100000415
and with
Figure GDA00025605165100000416
The device is provided with independent output layers respectively, and simultaneously shares the same neural network input layer and hidden layer; the activation function is a ReLU function; note book
Figure GDA00025605165100000417
And
Figure GDA00025605165100000418
all network parameters of (2) are thetaiRandom initialization of thetai
Further, the step 4 comprises:
step 4-1: measuring device from regional power gridObtaining the measurement signal to form the corresponding observed variable oi,t
Step 4-2: neural network according to local policy
Figure GDA00025605165100000419
And
Figure GDA00025605165100000420
generating the corresponding action a of the current timei,t
Figure GDA00025605165100000421
Step 4-3: the controller will ai,tSending the data to local controlled flexible resources, such as DG nodes and SVC nodes;
step 4-4: will (o)i,t,ai,t) Is stored to
Figure GDA00025605165100000422
In (1).
Further, the step 5 comprises:
step 5-1: will be provided with
Figure GDA0002560516510000051
Uploading m +1 samples to experience base D of cloud serveriPerforming the following steps;
step 5-2: emptying
Figure GDA0002560516510000052
Step 5-3: calculating r for the first m groups of uploaded data of the current round on the cloud servertAnd
Figure GDA0002560516510000053
step 5-4: if communication faults occur, samples in a certain area cannot be uploaded, and the sampling uploading at this time can be directly ignored.
Further, the step 6 comprises:
step 6-1: from experience libraries DiExtract a set of experiences
Figure GDA0002560516510000054
The number is B;
step 6-2: calculating a parameter phiiLoss function of (2):
Figure GDA0002560516510000055
wherein x is (o) 1,...,oN) Observed values for all regions; x' is the observation value at the next moment corresponding to x; a is a1,...,aNMotion vectors for region 1 to region N, respectively;
Figure GDA0002560516510000056
is shown in
Figure GDA00025605165100000518
Obtaining; y isiComprises the following steps:
Figure GDA0002560516510000058
wherein γ is a reduction coefficient; alpha is alphaiAn entropy maximization factor for region i;
Figure GDA0002560516510000059
to get to
Figure GDA00025605165100000510
A probability value of (d);
Figure GDA00025605165100000511
comprises the following steps:
Figure GDA00025605165100000512
l denotes bit-wise multiplication o'iIs the observed value of the area i at the next moment;
step 6-3: updating the parameter phii
Figure GDA00025605165100000513
Where ρ isiIn order to learn the step size,
Figure GDA00025605165100000514
the expression is for a variable phiiCalculating a gradient;
step 6-4: calculating parameters
Figure GDA00025605165100000515
A loss function of (d);
Figure GDA00025605165100000516
wherein
Figure GDA00025605165100000517
Comprises the following steps:
Figure GDA0002560516510000061
step 6-5: updating parameters
Figure GDA0002560516510000062
Figure GDA0002560516510000063
Step 6-6: calculate lagrangian function:
Figure GDA0002560516510000064
wherein
Figure GDA0002560516510000065
Limiting the voltage crossing thread degree;
Figure GDA0002560516510000066
comprises the following steps:
Figure GDA0002560516510000067
step 6-7: updating the parameter θi
Figure GDA0002560516510000068
And 6-8: updating the parameter lambdai
Figure GDA0002560516510000069
Step 6-9: updating freeze parameters
Figure GDA00025605165100000610
And
Figure GDA00025605165100000611
Figure GDA00025605165100000612
wherein η is the freezing coefficient;
step 6-10: issuing updated policy neural networks
Figure GDA00025605165100000613
And
Figure GDA00025605165100000614
to region i.
Further, the step 4, the step 5 and the step 6 are executed in parallel.
The invention also provides a voltage distributed control system based on multi-agent deep reinforcement learning, which comprises:
the model building module is used for making reactive voltage control targets of all controlled areas according to the whole reactive voltage control target and the optimization model of the controlled power grid and building a reactive voltage optimization model;
The training frame construction module is used for constructing a multi-agent interactive training frame based on the Markov game by combining the actual configuration conditions of the optimization model and the power grid;
the initialization module is used for initializing each neural network and relevant control process variables and issuing the neural networks and the relevant control process variables to each control area;
the controller module is used for being arranged in each region and executing the control step in parallel according to the received strategy neural network;
the sample uploading module is used for setting the sample uploading module in each local area, executing the sample uploading step in parallel and uploading the measurement samples to the cloud server;
the strategy learning module is arranged on the cloud server, is used for learning each controller strategy in parallel and issuing the updated strategy to each regional controller;
the controller module, the sample uploading module and the strategy learning module are used for being repeatedly called and executed.
The invention has the advantages and beneficial effects that:
when each region controller executes control operation, the region controller does not need to communicate with a cloud server or other controllers, can quickly generate control instructions according to a stored strategy neural network, efficiently utilizes high-speed flexible resources, and improves the efficiency of reactive voltage control;
all controllers run in parallel, and the three steps of local control, sample uploading and centralized learning run in parallel, so that communication and computing resources can be fully utilized, and the robustness to communication and computing conditions is good.
Based on multi-agent deep reinforcement learning, an accurate power grid model can not be established, the characteristics of the power grid are learned only through control process data, model-free optimization is carried out, and reactive power distribution of the power grid can still be controlled to be in an optimized state under the condition that the model is incomplete;
compared with other distributed learning methods, the centralized learning method has the advantages that the computing cost of each controller can be greatly saved, and the utilization efficiency of cloud computing resources is improved;
compared with the existing power grid optimization method based on multi-agent reinforcement learning, the method has the advantages of high sample efficiency, high voltage safety, simple control structure and lower implementation cost.
According to the voltage distributed control method and system based on multi-agent deep reinforcement learning, on one hand, high-speed flexible control and high-speed reactive voltage control of communication robustness are achieved through distributed control, on the other hand, optimal reactive voltage control under the condition of incomplete model is achieved through online learning of control process data through the deep reinforcement learning method, the requirement of continuous online operation of reactive voltage control of a power grid can be met, the voltage quality of the power grid is greatly improved, and the operation grid loss of the power grid is reduced.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 shows a flow diagram of a voltage distributed control method based on multi-agent deep reinforcement learning according to an embodiment of the invention;
FIG. 2 illustrates a block diagram of a multi-agent deep reinforcement learning based voltage distributed control system according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a module structure of a voltage distributed control system based on multi-agent deep reinforcement learning according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The embodiment of the invention provides a voltage distributed control method based on multi-agent deep reinforcement learning, in particular to a power grid reactive voltage distributed control method based on multi-agent deep reinforcement learning, and as shown in figure 1, the method comprises the following steps:
step 1: according to the whole reactive voltage control target and the optimization model of the controlled power grid, formulating reactive voltage control targets of all controlled areas and establishing a reactive voltage optimization model;
and 2, step: constructing a multi-agent interactive training framework based on the Markov game by combining the actual configuration conditions of the optimization model and the power grid;
and step 3: initializing each neural network and relevant control process variables and issuing the neural networks and relevant control process variables to each control area;
And 4, step 4: the local controllers in each area execute control steps in parallel according to the received strategy neural network;
and 5: the local controllers in all the areas execute the sampling step in parallel, and upload the measurement samples to the cloud server;
and 6: the cloud server parallelly learns the strategies of all the controllers and issues the updated strategies to all the regional controllers;
and 7: and repeating and executing the steps 4, 5 and 6 in parallel.
The specific implementation of each step is described in detail below.
In the step 1, according to the whole reactive voltage control target and the optimization model of the controlled power grid, the reactive voltage control target of each controlled area is formulated, and the reactive voltage optimization model is established. This step may be performed at a regional grid regulation center as shown in fig. 2, and in particular may be performed on a cloud server. The method comprises the following steps:
step 1-1: establishing a whole reactive voltage control target and optimization model of the controlled power grid:
Figure GDA0002560516510000091
wherein the content of the first and second substances,
Figure GDA0002560516510000092
is a collection of all nodes of the grid, VjIs the voltage amplitude of node j; pjIs the active power output of node j; qGjDG reactive power output for node j; qCjSVC (Static Var Compensator) reactive power output for node j;
Figure GDA0002560516510000093
the lower voltage limit and the upper voltage limit of the node j are respectively;
Figure GDA0002560516510000094
Respectively is the lower limit and the upper limit of the SVC reactive power output of the node j; sGj,PGjDG installed capacity and active power output, respectively, for node j.
Step 1-2: and splitting the reactive voltage control target and the optimization model to form the reactive voltage control target and the optimization model of each controlled area.
As shown in fig. 2, the controlled grid is divided into N regions according to an actual controller installation situation, each region includes a plurality of nodes, illustratively, the nodes include DG nodes and SVC nodes, and a branch is formed between the nodes. Each zone is equipped with a local controller. Illustratively, the controlled area 1 is equipped with a controlled area controller 1, and the controlled area 2 is equipped with a controlled area controller 2 …, and the controlled area N is equipped with a controlled area controller N. The controller of the controlled area, called controller for short, can quickly obtain the measuring signal of the area. The controller is also communicated with a cloud server of a regional power grid regulation and control center, namely a cloud server for short, through communication. In the embodiment of the present invention, the cloud server may include one or more computing devices. Specifically, the controller can obtain voltage measurement, current measurement, power measurement and the like of the nodes through the measuring devices installed on the nodes, and upload the sample data of the reactive voltage control process to the cloud server. The controller also receives a reactive voltage control strategy corresponding to the region from the cloud server and issues a control signal to the node.
In the embodiment of the invention, for the ith epsilon [1, N ] controlled area, splitting a reactive voltage control target and an optimization model into the controlled area reactive voltage control target and the optimization model corresponding to N areas:
Figure GDA0002560516510000101
wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0002560516510000102
for the entire set of nodes of the ith zone,
Figure GDA0002560516510000103
the network output power for the ith zone. In the examples of the present invention, the same symbols appearing represent the same physical meanings, e.g. SGj,PGjDG installed capacity and active power output, respectively, of node j, where node j is a node
Figure GDA0002560516510000104
And 2, step: and constructing a multi-agent interactive training framework based on the Markov game by combining the actual configuration conditions of the optimization model and the power grid.
Step 2-1: corresponding to the system measurement of each region, construct the observation variable o of each regioni,tSuch as (A), (B), (C)1.3) is shown.
Figure GDA0002560516510000105
Wherein P isi,QiInjecting vectors formed by active power and reactive power into each node of the ith area; viA vector formed by voltages of all nodes in the ith area;
Figure GDA0002560516510000111
outputting active power and reactive power for the network of the ith area; t is a discrete time variable of the control process.
Step 2-2: corresponding to the reactive voltage optimization target of each region, establishing a uniform feedback variable r of each regiontAs shown in (1.4).
Figure GDA0002560516510000112
PjIs the active power output of the node j,
Figure GDA0002560516510000113
And outputting active power for the network of the area i.
Step 2-3: corresponding to the reactive voltage optimization constraint of each region, constructing constraint feedback variables of each region
Figure GDA0002560516510000114
As shown in (1.5):
Figure GDA0002560516510000115
wherein [ x ]]+=max(0,x);βiThe cooperation coefficient of the ith area; vj(t) is the voltage at node j at time t,
Figure GDA0002560516510000116
the upper limit of the voltage is represented,Vis the upper limit of voltage(ii) a Generally, the upper voltage limit is consistent across the nodes, although it may vary under particular circumstances; here, according to the convention, the voltage upper limit is taken as the same, namely the voltage upper limit identifies the voltage upper limit of each node, and the voltage lower limit also does;
step 2-4: corresponding to the reactive power of the controllable flexible resources, constructing action variables a of each areai,tAs shown in (1.6):
ai,t=(QGi,QCi)t (1.6)
wherein Q isGi,QCiThe vectors of the DG and SVC reactive power output of the ith area are respectively.
And step 3: initializing each neural network and related control process variables;
step 3-1: initializing each neural network and relevant control process variables and issuing the neural networks and relevant control process variables to each control area. Firstly, initializing a neural network corresponding to each region, and storing the neural network on a cloud server, wherein the neural network comprises the following steps:
step 3-1-1: defining a neural network
Figure GDA0002560516510000121
Is an input (o)i,t,ai,t) A neural network that outputs a single scalar value, comprising several hidden layers (typically taken as 2 hidden layers), each hidden layer containing several neurons (typically taken as 512 neurons), the activation function being a ReLU function, the mathematical expression of which is ReLU (x) max (0, x). Note the book
Figure GDA0002560516510000122
Has a network parameter of phiiCorresponding freezing parameter is
Figure GDA0002560516510000123
And randomly initializing phiiAnd
Figure GDA0002560516510000124
step 3-1-2: defining a neural network
Figure GDA0002560516510000125
Is an input (o)i,t,ai,t) A neural network outputting a single scalar value comprises a plurality of hidden layers (typically 2 hidden layers), each hidden layer comprises a plurality of neurons (typically 512 neurons), and an activation function is a ReLU function. Note the book
Figure GDA0002560516510000126
Is recorded as
Figure GDA0002560516510000127
The corresponding freezing parameter is
Figure GDA0002560516510000128
Random initialization
Figure GDA0002560516510000129
And
Figure GDA00025605165100001210
step 3-1-3: definition of
Figure GDA00025605165100001211
And
Figure GDA00025605165100001212
for two inputs oi,tOutput and action ai,tNeural networks of the same vector shape.
Figure GDA00025605165100001213
And
Figure GDA00025605165100001214
the neural network has independent output layers respectively, and simultaneously shares the same neural network input layer and hidden layer, and comprises a plurality of hidden layers (typically 2 hidden layers), each hidden layer comprises a plurality of neurons (typically 512 neurons), and the activation function is a ReLU function. Note the book
Figure GDA00025605165100001215
And
Figure GDA00025605165100001216
all network parameters of (2) are thetai. Random initialization of thetai
Step 3-2: initializing each region Lagrange multiplier lambdaiIs a scalar, typically with an initial value of 1;
step 3-3: issuing an initial strategy neural network through a communication network
Figure GDA00025605165100001217
And
Figure GDA00025605165100001218
a controller to zone i;
step 3-4: initializing a discrete time variable t as 0, wherein the actual time interval between two steps is delta t, controlling the time interval once every step, and specifically determining according to the actual measurement and the instruction control speed of a local controller;
Step 3-5: initialization policy update period TuI.e. every TuStrategy updating is executed once at delta T time, the strategy updating is determined according to the training speed of the cloud server, and the typical value can be Tu=8;
Step 3-6: initialization sample upload period TsThe ratio of m to sample upload is equal to [1, T ∈s]. Every other TsAnd (4) uploading samples once by each controller, and uploading m samples in the previous uploading period. T issM is determined according to the communication speed, and the typical value can be Ts=8,m=1;
Step 3-7: initializing cloud server experience bases
Figure GDA0002560516510000131
Local caching experience base of each controller
Figure GDA0002560516510000132
And 4, step 4: and the local controllers of all the regions execute control steps in parallel according to the received strategy neural network. The local controllers of the areas i execute the following control steps at the time t, and the control steps are executed in parallel without interference:
step 4-1: obtaining measurement signals from a measurement device of a regional power grid to form a corresponding observation variable oi,t
Step 4-2: neural network according to local policy
Figure GDA0002560516510000133
And
Figure GDA0002560516510000134
generating the corresponding action a of the current timei,t
Figure GDA0002560516510000135
Step 4-3: the controller will ai,tSending the data to local controlled flexible resources, such as DG nodes and SVC nodes;
step 4-4: will (o)i,t,ai,t) Is stored to
Figure GDA0002560516510000139
In (1).
And 5: and the local controllers of all the areas execute the step of uploading the samples in parallel and upload the measurement samples to the cloud server. And uploading the local samples to a cloud server by the region controller according to the uploading period. Illustratively, if tmodT sAnd (3) when the local controller of each area i is equal to 0, the following sampling steps are executed at the time t, and the following steps are executed in parallel without interference:
step 5-1: through a communication network, will
Figure GDA0002560516510000136
Uploading m +1 samples before the process to an experience library D of a cloud serveriThe preparation method comprises the following steps of (1) performing;
step 5-2: emptying
Figure GDA0002560516510000137
Step 5-3: after all the controllers are uploaded, the front m groups of r of the data uploaded in the current round are calculated on the cloud servertAnd with
Figure GDA0002560516510000138
Step 5-4: if communication faults occur, samples in a certain area cannot be uploaded, the sampling uploading can be directly ignored, and the follow-up execution is not affected.
Step 6: the cloud server learns the strategies of all the controllers in parallel and issues the updated strategies to all the regional controllers. And the cloud server uses the updated experience base to learn the strategies of each controller in parallel according to the updating period, and sends the generated updated strategy to each controller. Illustratively, if tmodTuWhen the value is 0, the cloud server parallelly learns each controller strategy at the time T and issues the strategy, namely, the following learning steps are executed for the neural network of each area i for a plurality of times (the typical value is T)uSecond, adjustable according to cloud server computing power):
step 6-1: from experience libraries DiExtract a set of experiences
Figure GDA0002560516510000141
The number B (typical value 64);
Step 6-2: calculating a parameter phiiLoss function of
Figure GDA0002560516510000142
Wherein x is (o)1,…,oN) All regional observations; x' is the observation value at the next moment corresponding to x; a is1,…,aNMotion vectors for region 1 to region N, respectively;
Figure GDA0002560516510000143
is shown in
Figure GDA00025605165100001413
Obtaining; y isiComprises the following steps:
Figure GDA0002560516510000145
wherein γ is a reduction coefficient, typically 0.98; alpha is alphaiAn entropy maximization factor for region i, with a typical value of 0.1;
Figure GDA0002560516510000146
to get to
Figure GDA0002560516510000147
A probability value of (d);
Figure GDA0002560516510000148
comprises the following steps:
Figure GDA0002560516510000149
l denotes bit-wise multiplication o'iIs the observed value of the area i at the next moment. In the embodiment of the invention, the cloud server learns the strategies of all the controllers in parallel, and the global observation value is used for learning and calculation of each region. I.e. learning using global information and execution using only local information. The reliability and superiority of the control strategy are improved.
Step 6-3: updating the parameter phii
Figure GDA00025605165100001410
Where ρ isiFor learning the step size, a typical value is 0.0001,
Figure GDA00025605165100001411
the expression is for a variable phiiAnd (5) calculating a gradient.
Step 6-4: calculating parameters
Figure GDA00025605165100001412
A loss function of (d);
Figure GDA0002560516510000151
wherein
Figure GDA0002560516510000152
Comprises the following steps:
Figure GDA0002560516510000153
the superscript C denotes "constraint", a constraint-related variable.
Step 6-5: updating parameters
Figure GDA0002560516510000154
Figure GDA0002560516510000155
Step 6-6: calculate lagrangian function:
Figure GDA0002560516510000156
wherein
Figure GDA0002560516510000157
To override the threading limit for voltage, a typical value is taken to be 0.
Figure GDA0002560516510000158
Comprises the following steps:
Figure GDA0002560516510000159
step 6-7: updating the parameter θ i
Figure GDA00025605165100001510
And 6-8: updating the parameter lambdai
Figure GDA00025605165100001511
Step 6-9: updating freeze parameters
Figure GDA00025605165100001512
And
Figure GDA00025605165100001513
Figure GDA00025605165100001514
where η is the freezing coefficient and a typical value is 0.995.
Step 6-10: issuing updated strategy neural network through communication network
Figure GDA00025605165100001515
And
Figure GDA00025605165100001516
to region i.
And 7: in the next operation, steps 4, 5, and 6 are repeatedly executed in parallel. Specifically, t is t +1, the procedure returns to step 4, and steps 4, 5, and 6 are repeated. The steps 4, 5 and 6 can be executed in parallel without mutual interference, and the related communication and calculation do not obstruct the normal execution of other controllers and other steps.
Based on the same inventive concept, an embodiment of the present invention further provides a voltage distributed control system based on multi-agent deep reinforcement learning, as shown in fig. 3, the system includes:
the model building module is used for making reactive voltage control targets of all controlled areas according to the whole reactive voltage control target and the optimization model of the controlled power grid and building a reactive voltage optimization model;
the training frame construction module is used for constructing a multi-agent interactive training frame based on the Markov game by combining the actual configuration conditions of the optimization model and the power grid;
the initialization module is used for initializing each neural network and relevant control process variables and issuing the neural networks and the relevant control process variables to each control area;
The controller module is used for being arranged in each region locally, namely local computer equipment, and the controller module executes the control steps in parallel according to the received strategy neural network;
the sample uploading module is used for being arranged in each local area, parallelly executing the step of uploading the samples and uploading the measurement samples to the cloud server;
the strategy learning module is arranged on the cloud server and used for learning each controller strategy in parallel and issuing the updated strategy to each regional controller;
the controller module, the sample uploading module and the strategy learning module are repeatedly called and executed and can be executed in parallel.
Without loss of generality, the model building module, the training framework building module and the initialization module can be deployed on the cloud server, and can also be deployed on computer equipment different from the cloud server. The modules on the server are in data connection with the modules on the local part of each control area through a communication network.
The specific execution process and algorithm of each module can be obtained according to the embodiment of the voltage distributed control method based on multi-agent deep reinforcement learning, and are not described herein again.
The control method and the control system adopt a control framework combining online centralized learning and distributed control, continuously and intensively collect control data of each controller through an efficient deep reinforcement learning algorithm, intensively learn on a cloud server to obtain a control strategy of each controller, and locally execute the strategy by each controller according to local measurement after the strategy is issued to each controller by a communication network. On one hand, the invention gives full play to the speed advantage of distributed control, and the local controller can carry out rapid control according to real-time local measurement without communication, thereby being particularly suitable for the reactive voltage control of high-speed DG resources and SVC resources; and on the other hand, an efficient deep reinforcement learning algorithm is provided, the information advantages of centralized learning are fully utilized, the optimal strategy of each intelligent agent is obtained, and the optimal operation of the system is guaranteed under the condition that the model is incomplete. The method greatly improves the efficiency, safety and flexibility of the reactive voltage control method of the power grid under the condition of model imperfection, is particularly suitable for regional power grids with serious model imperfection problems, saves high cost caused by repeated maintenance of accurate models, reduces the requirements on communication conditions and calculation conditions of each controller, exerts the advantages of flexibility and high efficiency of distributed control, avoids the problems of high single-point failure risk, large control instruction delay and the like caused by centralized control, and is suitable for large-scale popularization.
Although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (3)

1. The voltage distributed control method based on multi-agent deep reinforcement learning is characterized by comprising the following steps:
step 1: according to the whole reactive voltage control target and the optimization model of the controlled power grid, formulating reactive voltage control targets of all controlled areas and establishing a reactive voltage optimization model;
step 2: constructing a multi-agent interactive training framework based on the Markov game by combining the actual configuration conditions of the optimization model and the power grid;
and 3, step 3: initializing each neural network and relevant control process variables and issuing the neural networks and relevant control process variables to each control area;
and 4, step 4: the local controllers in all the areas execute control steps in parallel according to the received strategy neural network;
and 5: the local controllers in all the areas execute the step of uploading samples in parallel, and upload the measurement samples to the cloud server;
And 6: the cloud server parallelly learns the strategies of each controller and issues the updated strategies to each regional controller;
and 7: repeating the steps 4, 5 and 6;
the step 1 comprises the following steps:
step 1-1: establishing a whole reactive voltage control target and optimization model of the controlled power grid:
Figure FDA0003348051700000011
wherein the content of the first and second substances,
Figure FDA0003348051700000014
is a collection of all nodes of the grid, VjIs the voltage amplitude of node j; pjIs the active power output of node j; qGjDG reactive power output for node j; qCjSVC reactive power output for node j;
Figure FDA0003348051700000012
the lower voltage limit and the upper voltage limit of the node j are respectively;
Figure FDA0003348051700000013
respectively is the lower limit and the upper limit of the SVC reactive power output of the node j; sGj,PGjDG installed capacity and active power output, respectively, for node j;
step 1-2: splitting the reactive voltage control target and the optimization model to form reactive voltage control targets and optimization models of each controlled area:
Figure FDA0003348051700000021
wherein the content of the first and second substances,
Figure FDA0003348051700000022
for the complete set of nodes for the ith region,
Figure FDA0003348051700000023
network output power for the ith area;
the step 2 comprises the following steps:
step 2-1: corresponding to the system measurement of each region, construct the observation variable o of each regioni,t
Figure FDA0003348051700000024
Wherein P isi,QiInjecting vectors formed by active power and reactive power into each node of the ith area; viA vector formed by voltages of all nodes in the ith area;
Figure FDA0003348051700000025
Outputting active power and reactive power for the network of the ith area; t is a discrete time variable of the control process;
step 2-2: corresponding to the reactive voltage optimization target of each area, a unified feedback variable r of each area is constructedt
Figure FDA0003348051700000026
PjIs the active power output of the node j,
Figure FDA0003348051700000027
outputting active power for the network of the area i;
step 2-3: corresponding to the reactive voltage optimization constraint of each region, constructing constraint feedback variables of each region
Figure FDA0003348051700000028
Figure FDA0003348051700000029
Wherein [ x ]]+=max(0,x);βiIs the cooperation coefficient of the i-th area, Vj(t) is the voltage at node j at time t,
Figure FDA00033480517000000210
the upper limit of the voltage is represented,Vis the upper voltage limit;
step 2-4: corresponding to the reactive power of the controllable flexible resources, constructing action variables a of each areai,t
ai,t=(QGi,QCi)t (1.6)
Wherein Q isGi,QCiRespectively outputting vectors of DG and SVC reactive power of the ith area;
the step 3 comprises the following steps:
step 3-1: initializing each neural network and relevant control process variables and issuing the neural networks and relevant control process variables to each control area;
step 3-2: initializing each region Lagrange multiplier lambdaiIs a scalar;
step 3-3: issuing an initial strategy neural network through a communication network
Figure FDA0003348051700000031
And
Figure FDA0003348051700000032
a controller to zone i;
step 3-4: initializing a discrete time variable t as 0, wherein the actual time interval between two steps is delta t;
step 3-5: initialization policy update period T uFor every TuPerforming strategy updating once at the delta t time;
step 3-6: initialization sample upload period TsAnd a sampleThe upload proportion m is equal to [1, T ∈s]For every TsEach controller of delta t uploads a sample once and uploads m samples in the previous uploading period;
step 3-7: initializing cloud server experience bases
Figure FDA0003348051700000033
Local caching experience base of each controller
Figure FDA0003348051700000034
The step 3-1 comprises the following steps:
step 3-1-1: defining a neural network QφiIs an input (o)i,t,ai,t) A neural network outputting a single scalar value; the activation function is a ReLU function; note QφiHas a network parameter of phiiCorresponding freezing parameter is
Figure FDA0003348051700000035
And randomly initializing phiiAnd
Figure FDA0003348051700000036
step 3-1-2: defining a neural network
Figure FDA0003348051700000037
Is an input (o)i,t,ai,t) A neural network outputting a single scalar value; the activation function is a ReLU function; note the book
Figure FDA0003348051700000038
Is recorded as
Figure FDA0003348051700000039
The corresponding freezing parameter is
Figure FDA00033480517000000310
Random initialization
Figure FDA00033480517000000311
And
Figure FDA00033480517000000312
step 3-1-3: definition of
Figure FDA00033480517000000313
And
Figure FDA00033480517000000314
for two inputs oi,tOutput and action ai,tThe neural networks with the same shape as the vector,
Figure FDA00033480517000000315
and
Figure FDA00033480517000000316
the device is provided with independent output layers respectively, and simultaneously shares the same neural network input layer and hidden layer; the activation function is a ReLU function; note the book
Figure FDA00033480517000000317
And
Figure FDA00033480517000000318
all network parameters of (2) are thetaiRandom initialization of thetai
The step 4 comprises the following steps:
step 4-1: obtaining measurement signals from a measurement device of a regional power grid to form a corresponding observation variable o i,t
Step 4-2: neural network according to local policy
Figure FDA00033480517000000319
And
Figure FDA00033480517000000320
generating the corresponding action a of the current timei,t
Figure FDA00033480517000000321
Step 4-3: the controller will ai,tSending the data to local controlled flexible resources, such as DG nodes and SVC nodes;
step 4-4: will (o)i,t,ai,t) Is stored to
Figure FDA0003348051700000041
Performing the following steps;
the step 5 comprises the following steps:
step 5-1: will be provided with
Figure FDA0003348051700000042
Uploading m +1 samples to experience base D of cloud serveriPerforming the following steps;
step 5-2: emptying
Figure FDA0003348051700000043
Step 5-3: calculating r for the first m groups of uploaded data of the current round on the cloud servertAnd
Figure FDA0003348051700000044
step 5-4: if communication faults occur, samples in a certain area cannot be uploaded, and the sampling uploading at this time can be directly ignored;
the step 6 comprises the following steps:
step 6-1: from experience libraries DiExtract a set of experiences
Figure FDA0003348051700000045
The number is B;
step 6-2: calculating a parameter phiiLoss function of (2):
Figure FDA0003348051700000046
wherein x is (o)1,…,oN) All regional observations; x' is the observation value at the next moment corresponding to x; a is1,…,aNMotion vectors for region 1 to region N, respectively;
Figure FDA0003348051700000047
is shown in
Figure FDA0003348051700000048
Obtaining; y isiComprises the following steps:
Figure FDA0003348051700000049
wherein γ is a reduction coefficient; alpha is alphaiAn entropy maximization factor for region i;
Figure FDA00033480517000000410
to get to
Figure FDA00033480517000000411
A probability value of (d);
Figure FDA00033480517000000412
comprises the following steps:
Figure FDA00033480517000000413
l denotes bit-wise multiplication o'iIs the observed value of the area i at the next moment;
step 6-3: updating the parameter phii
Figure FDA00033480517000000414
Where ρ isiIn order to learn the step size,
Figure FDA00033480517000000415
the expression is for a variable phi iCalculating a gradient;
and 6-4: calculating parameters
Figure FDA00033480517000000416
A loss function of (d);
Figure FDA00033480517000000417
wherein
Figure FDA0003348051700000051
Comprises the following steps:
Figure FDA0003348051700000052
step 6-5: updating parameters
Figure FDA0003348051700000053
Figure FDA0003348051700000054
Step 6-6: calculate lagrangian function:
Figure FDA0003348051700000055
wherein
Figure FDA0003348051700000056
Limiting the voltage crossing thread degree;
Figure FDA0003348051700000057
comprises the following steps:
Figure FDA0003348051700000058
step 6-7: updating the parameter θi
Figure FDA0003348051700000059
And 6-8: updating the parameter lambdai
Figure FDA00033480517000000510
Step 6-9: updating freeze parameters
Figure FDA00033480517000000511
And
Figure FDA00033480517000000512
Figure FDA00033480517000000513
wherein η is the freezing coefficient;
step 6-10: issuing updated policy neural networks
Figure FDA00033480517000000514
And
Figure FDA00033480517000000515
to region i.
2. The multi-agent deep reinforcement learning-based voltage distributed control method according to claim 1,
and the step 4, the step 5 and the step 6 are executed in parallel.
3. A multi-agent deep reinforcement learning based voltage distributed control system for performing the method of claim 1 or 2, the system comprising:
the model building module is used for making reactive voltage control targets of all controlled areas according to the whole reactive voltage control target and the optimization model of the controlled power grid and building a reactive voltage optimization model;
the training frame construction module is used for constructing a multi-agent interactive training frame based on the Markov game by combining the actual configuration conditions of the optimization model and the power grid;
the initialization module is used for initializing each neural network and relevant control process variables and issuing the neural networks and the relevant control process variables to each control area;
The controller module is used for being arranged in each region and executing the control step in parallel according to the received strategy neural network;
the sample uploading module is used for setting the sample uploading module in each local area, executing the sample uploading step in parallel and uploading the measurement samples to the cloud server;
the strategy learning module is arranged on the cloud server, is used for learning each controller strategy in parallel and issuing the updated strategy to each regional controller;
the controller module, the sample uploading module and the strategy learning module are used for being repeatedly called and executed.
CN202010581959.4A 2020-06-23 2020-06-23 Voltage distributed control method and system based on multi-agent deep reinforcement learning Active CN111799808B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010581959.4A CN111799808B (en) 2020-06-23 2020-06-23 Voltage distributed control method and system based on multi-agent deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010581959.4A CN111799808B (en) 2020-06-23 2020-06-23 Voltage distributed control method and system based on multi-agent deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN111799808A CN111799808A (en) 2020-10-20
CN111799808B true CN111799808B (en) 2022-06-28

Family

ID=72803612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010581959.4A Active CN111799808B (en) 2020-06-23 2020-06-23 Voltage distributed control method and system based on multi-agent deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN111799808B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507614B (en) * 2020-12-01 2021-09-07 广东电网有限责任公司中山供电局 Comprehensive optimization method for power grid in distributed power supply high-permeability area
CN113258581B (en) * 2021-05-31 2021-10-08 广东电网有限责任公司佛山供电局 Source-load coordination voltage control method and device based on multiple intelligent agents
US20230074995A1 (en) * 2021-09-09 2023-03-09 Siemens Aktiengesellschaft System and method for controlling power distribution systems using graph-based reinforcement learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103580061A (en) * 2013-10-28 2014-02-12 贵州电网公司电网规划研究中心 Microgrid operating method
CN110729740A (en) * 2019-07-03 2020-01-24 清华大学 Power distribution network reactive power optimization method and device, computer equipment and readable storage medium
CN110768262A (en) * 2019-10-31 2020-02-07 上海电力大学 Active power distribution network reactive power supply configuration method based on node clustering partition

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2806520A1 (en) * 2013-05-22 2014-11-26 Vito NV Power supply network control system and method
US10796252B2 (en) * 2018-09-06 2020-10-06 Arizona Board Of Regents On Behalf Of Arizona State University Induced Markov chain for wind farm generation forecasting
CN109120011B (en) * 2018-09-29 2019-12-13 清华大学 distributed power distribution network congestion scheduling method considering distributed power sources
US20200119556A1 (en) * 2018-10-11 2020-04-16 Di Shi Autonomous Voltage Control for Power System Using Deep Reinforcement Learning Considering N-1 Contingency
CN110365056B (en) * 2019-08-14 2021-03-12 南方电网科学研究院有限责任公司 Distributed energy participation power distribution network voltage regulation optimization method based on DDPG

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103580061A (en) * 2013-10-28 2014-02-12 贵州电网公司电网规划研究中心 Microgrid operating method
CN110729740A (en) * 2019-07-03 2020-01-24 清华大学 Power distribution network reactive power optimization method and device, computer equipment and readable storage medium
CN110768262A (en) * 2019-10-31 2020-02-07 上海电力大学 Active power distribution network reactive power supply configuration method based on node clustering partition

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Auxiliary Frequency and Voltage Regulation in Microgrid via Intelligent Electric Vehicle Charging;Nan Zou 等;《2014 IEEE International Conference on Smart Grid Communications》;20141106;全文 *
Distributed Generators as Providers of Reactive Power Support—A Market Approach;Augusto C. Rueda-Medina 等;《IEEE TRANSACTIONS ON POWER SYSTEMS》;20130228;全文 *
Distributed Voltage Regulation of Active Distribution System Based on Enhanced Multi-agent Deep Reinforcement Learning;Di Cao 等;《ARXIV》;20200531;第1-8页 *
Safe deep reinforcement learning-based constrained optimal control scheme for active distribution networks;Peng Kou 等;《Applied Energy》;20200306;全文 *

Also Published As

Publication number Publication date
CN111799808A (en) 2020-10-20

Similar Documents

Publication Publication Date Title
CN111799808B (en) Voltage distributed control method and system based on multi-agent deep reinforcement learning
Li et al. Coordinated load frequency control of multi-area integrated energy system using multi-agent deep reinforcement learning
CN111564849B (en) Two-stage deep reinforcement learning-based power grid reactive voltage control method
Xi et al. A novel multi-agent DDQN-AD method-based distributed strategy for automatic generation control of integrated energy systems
CN112615379B (en) Power grid multi-section power control method based on distributed multi-agent reinforcement learning
Wang et al. Wind power interval prediction based on improved PSO and BP neural network
CN111666713B (en) Power grid reactive voltage control model training method and system
CN114217524A (en) Power grid real-time self-adaptive decision-making method based on deep reinforcement learning
Chen et al. Reinforcement-based robust variable pitch control of wind turbines
Xi et al. A virtual generation ecosystem control strategy for automatic generation control of interconnected microgrids
Li et al. Grid-area coordinated load frequency control strategy using large-scale multi-agent deep reinforcement learning
CN113489015A (en) Power distribution network multi-time scale reactive voltage control method based on reinforcement learning
CN110165714A (en) Micro-capacitance sensor integration scheduling and control method, computer readable storage medium based on limit dynamic programming algorithm
CN113471982A (en) Cloud edge cooperation and power grid privacy protection distributed power supply in-situ voltage control method
Yin et al. Quantum deep reinforcement learning for rotor side converter control of double-fed induction generator-based wind turbines
CN113422371B (en) Distributed power supply local voltage control method based on graph convolution neural network
Li et al. Distributed deep reinforcement learning for integrated generation‐control and power‐dispatch of interconnected power grid with various renewable units
CN113872213B (en) Autonomous optimization control method and device for power distribution network voltage
Xi et al. Multi-agent deep reinforcement learning strategy for distributed energy
Tao et al. On comparing six optimization algorithms for network-based wind speed forecasting
Vohra et al. End-to-end learning with multiple modalities for system-optimised renewables nowcasting
Wang et al. Intelligent load frequency control for improving wind power penetration in power systems
CN115632406B (en) Reactive voltage control method and system based on digital-mechanism fusion driving modeling
CN111799820A (en) Double-layer intelligent hybrid zero-star cloud energy storage countermeasure regulation and control method for power system
CN115793456A (en) Lightweight sensitivity-based power distribution network edge side multi-mode self-adaptive control method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant