CN111799808A - Power grid reactive voltage distributed control method and system - Google Patents
Power grid reactive voltage distributed control method and system Download PDFInfo
- Publication number
- CN111799808A CN111799808A CN202010581959.4A CN202010581959A CN111799808A CN 111799808 A CN111799808 A CN 111799808A CN 202010581959 A CN202010581959 A CN 202010581959A CN 111799808 A CN111799808 A CN 111799808A
- Authority
- CN
- China
- Prior art keywords
- reactive voltage
- reactive
- neural network
- region
- grid
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/12—Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load
- H02J3/16—Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load by adjustment of reactive power
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J13/00—Circuit arrangements for providing remote indication of network conditions, e.g. an instantaneous record of the open or closed condition of each circuitbreaker in the network; Circuit arrangements for providing remote control of switching means in a power distribution network, e.g. switching in and out of current consumers by using a pulse code signal carried by the network
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2113/00—Details relating to the application field
- G06F2113/04—Power grid distribution networks
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/10—Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/20—Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2300/00—Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
- H02J2300/20—The dispersed energy generation being of renewable origin
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02E—REDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
- Y02E40/00—Technologies for an efficient electrical power generation, transmission or distribution
- Y02E40/30—Reactive power compensation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02E—REDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
- Y02E40/00—Technologies for an efficient electrical power generation, transmission or distribution
- Y02E40/70—Smart grids as climate change mitigation technology in the energy generation sector
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Abstract
The invention provides a power grid reactive voltage distributed control method, which comprises the following steps: according to the whole reactive voltage control target and the optimization model of the controlled power grid, formulating reactive voltage control targets of all controlled areas and establishing a reactive voltage optimization model; constructing a multi-agent interactive training framework based on the Markov game by combining the actual configuration conditions of the optimization model and the power grid; initializing each neural network and relevant control process variables and issuing the neural networks and relevant control process variables to each control area; the local controllers in all the areas execute control steps in parallel according to the received strategy neural network; the local controllers in all the areas execute the step of uploading samples in parallel, and upload the measurement samples to the cloud server; the cloud server learns the strategies of all the controllers in parallel and issues the updated strategies to all the regional controllers. The invention realizes the flexible control of reactive voltage and the optimal control under the condition of incomplete model.
Description
Technical Field
The invention belongs to the technical field of operation and control of power systems, and particularly relates to a power grid reactive voltage distributed control method and system.
Background
Under the promotion of energy and environmental problems, the proportion of clean and dispersed renewable energy (DG for short) in a power grid is increased day by day, and large-scale and high-permeability DG power generation grid connection becomes the leading edge and hot spot of the energy and power field. Due to the large dispersion and strong fluctuation of the DG amount, a series of negative effects are brought on the aspects of voltage quality, scheduling operation and the like of a power distribution network and even a power transmission network. The DGs are usually connected to the grid through power electronic inverters, and have flexible and high-speed regulation capacity. In order to efficiently control the DG and improve the voltage quality of the high permeability power grid, reactive voltage control has become an important issue for the regulation and control operation of the power grid. In a traditional power grid, reactive voltage control is usually realized by adopting a centralized optimization method based on a power grid model, and the loss of a controlled power grid is improved while voltage out-of-limit is eliminated.
However, the centralized optimization control method often has the key problems of single point failure, high communication and calculation burden, serious influence of communication delay and the like. Particularly, in a high-permeability power grid, controlled DGs are numerous, and the network structure is complex, so that a centralized control method is severely limited, and high-speed resources cannot be reasonably regulated and controlled. Therefore, a series of distributed reactive voltage control methods are developed, and compared with a centralized method, the distributed method is often weaker in requirement on communication conditions and faster in control speed.
However, the existing distributed control usually adopts a model-based optimization method, because an ideal model of the power grid is difficult to obtain, the model-based optimization method cannot guarantee the control effect, the existing distributed control optimization method usually has the situations that the control instruction is far away from the optimal point and the power grid operates in a suboptimal state, and the requirements of high-efficiency and safe control are more difficult to meet under a continuous online operation scene.
Therefore, it is an urgent technical problem to be solved in the art to provide a method for controlling reactive voltage of a power grid with high safety, high efficiency and high flexibility.
Disclosure of Invention
In order to solve the above problems, the present invention provides a distributed control method for reactive voltage of a power grid, comprising:
step 1: according to the whole reactive voltage control target and the optimization model of the controlled power grid, formulating reactive voltage control targets of all controlled areas and establishing a reactive voltage optimization model;
step 2: constructing a multi-agent interactive training framework based on the Markov game by combining the actual configuration conditions of the optimization model and the power grid;
and step 3: initializing each neural network and relevant control process variables and issuing the neural networks and relevant control process variables to each control area;
and 4, step 4: the local controllers in all the areas execute control steps in parallel according to the received strategy neural network;
and 5: the local controllers in all the areas execute the step of uploading samples in parallel, and upload the measurement samples to the cloud server;
step 6: the cloud server parallelly learns the strategies of all the controllers and issues the updated strategies to all the regional controllers;
and 7: and (5) repeatedly executing the steps 4, 5 and 6.
Further, the step 1 comprises:
step 1-1: establishing a whole reactive voltage control target and optimization model of the controlled power grid:
wherein the content of the first and second substances,is a collection of all nodes of the grid, VjIs the voltage amplitude of node j; pjIs the active power output of node j; qGjDG reactive power output for node j; qCjSVC reactive power output for node j;at voltages of node j respectivelyA limit and an upper limit;respectively is the lower limit and the upper limit of the SVC reactive power output of the node j; sGj,PGjDG installed capacity and active power output, respectively, for node j;
step 1-2: splitting the reactive voltage control target and the optimization model to form reactive voltage control targets and optimization models of each controlled area:
wherein the content of the first and second substances,for the complete set of nodes for the ith region,the network output power for the ith zone.
Further, step 2 comprises:
step 2-1: corresponding to the system measurement of each region, construct the observation variable o of each regioni,t:
Wherein P isi,QiInjecting vectors formed by active power and reactive power into each node of the ith area; viA vector formed by voltages of all nodes in the ith area;outputting active power and reactive power for the network of the ith area; t is a discrete time variable of the control process;
step 2-2: corresponding to the reactive voltage optimization target of each region, establishing a uniform feedback variable r of each regiont:
step 2-3: corresponding to the reactive voltage optimization constraint of each region, constructing constraint feedback variables of each region
Wherein [ x ]]+=max(0,x);βiIs the cooperation coefficient of the i-th area, Vj(t) is the voltage at node j at time t,the upper limit of the voltage is represented,Vis the upper voltage limit;
step 2-4: corresponding to the reactive power of the controllable flexible resources, constructing action variables a of each areai,t:
ai,t=(QGi,QCi)t(0.25)
Wherein Q isGi,QCiThe vectors of the DG and SVC reactive power output of the ith area are respectively.
Further, the step 3 comprises:
step 3-1: initializing each neural network and relevant control process variables and issuing the neural networks and relevant control process variables to each control area;
step 3-2: initializing each region Lagrange multiplier lambdaiIs a scalar;
step 3-3: issuing an initial strategy neural network through a communication networkAnda controller to zone i;
step 3-4: initializing a discrete time variable t as 0, wherein the actual time interval between two steps is delta t;
step 3-5: initialization policy update period TuFor every TuPerforming strategy updating once at the delta t time;
step 3-6: initialization sample upload period TsThe ratio of m to sample upload is equal to [1, T ∈s]For every TsEach controller of delta t uploads a sample once and uploads m samples in the previous uploading period;
step 3-7: initializing cloud server experience basesLocal caching experience base of each controller
Further, the step 3-1 comprises:
step 3-1-1: defining a neural networkIs an input (o)i,t,ai,t) A neural network outputting a single scalar value; the activation function is a ReLU function; note the bookHas a network parameter of phiiCorresponding freezing parameter isAnd randomly initializing phiiAnd
step 3-1-2: defining a neural networkIs an input (o)i,t,ai,t) A neural network outputting a single scalar value; the activation function is a ReLU function; note the bookIs recorded asThe corresponding freezing parameter isRandom initializationAnd
step 3-1-3: definition ofAndfor two inputs oi,tOutput and action ai,tThe neural networks with the same shape as the vector,andthe device is provided with independent output layers respectively, and simultaneously shares the same neural network input layer and hidden layer; the activation function is a ReLU function; note the bookAndall network parameters of (2) are thetaiRandom initialization of thetai。
Further, the step 4 comprises:
step 4-1: obtaining measurement signals from a measurement device of a regional power grid to form a corresponding observation variable oi,t;
Step 4-2: neural network according to local policyAndgenerating the corresponding action a of the current timei,t:
Step 4-3: the controller will ai,tSending the data to local controlled flexible resources, such as DG nodes and SVC nodes;
Further, the step 5 comprises:
step 5-1: will be provided withUploading m +1 samples to experience base D of cloud serveriPerforming the following steps;
Step 5-3: calculating r for the first m groups of uploaded data of the current round on the cloud servertAnd
step 5-4: if communication faults occur, samples in a certain area cannot be uploaded, and the sampling uploading at this time can be directly ignored.
Further, the step 6 comprises:
step 6-2: calculating a parameter phiiLoss function of (2):
wherein x is (o)1,...,oN) All regional observations; x' is the observation value at the next moment corresponding to x; a is1,...,aNMotion vectors for region 1 to region N, respectively;is shown inObtaining; y isiComprises the following steps:
wherein γ is a reduction coefficient; alpha is alphaiAn entropy maximization factor for region i;to get toA probability value of (d);comprises the following steps:
l denotes bit-wise multiplication o'iIs the observed value of the area i at the next moment;
step 6-3: updating the parameter phii:
Where ρ isiIn order to learn the step size,the expression is for a variable phiiCalculating a gradient;
Step 6-6: calculate lagrangian function:
step 6-7: updating the parameter θi:
And 6-8: updating the parameter lambdai:
wherein η is the freezing coefficient;
Further, the step 4, the step 5 and the step 6 are executed in parallel.
The invention also provides a power grid reactive voltage distributed control system, which comprises:
the model building module is used for making reactive voltage control targets of all controlled areas according to the whole reactive voltage control target and the optimization model of the controlled power grid and building a reactive voltage optimization model;
the training frame construction module is used for constructing a multi-agent interactive training frame based on the Markov game by combining the actual configuration conditions of the optimization model and the power grid;
the initialization module is used for initializing each neural network and relevant control process variables and issuing the neural networks and the relevant control process variables to each control area;
the controller module is used for being arranged in each region locally and executing the control steps in parallel according to the received strategy neural network;
the sample uploading module is used for being arranged in each area locally, executing the step of uploading samples in parallel and uploading the measurement samples to the cloud server;
the strategy learning module is arranged on the cloud server, is used for learning each controller strategy in parallel and issuing the updated strategy to each regional controller;
the controller module, the sample uploading module and the strategy learning module are used for being repeatedly called and executed.
The invention has the advantages and beneficial effects that:
when each region controller executes control operation, the region controller does not need to communicate with a cloud server or other controllers, can quickly generate control instructions according to a stored strategy neural network, efficiently utilizes high-speed flexible resources, and improves the efficiency of reactive voltage control;
all controllers run in parallel, and the three steps of local control, sample uploading and centralized learning run in parallel, so that communication and computing resources can be fully utilized, and the robustness to communication and computing conditions is good.
Based on multi-agent deep reinforcement learning, an accurate power grid model can be not established, the characteristics of the power grid are learned only through control process data, model-free optimization is carried out, and reactive power distribution of the power grid can be controlled to be in an optimized state under the condition that the model is incomplete;
compared with other distributed learning methods, the centralized learning method has the advantages that the computing cost of each controller can be greatly saved, and the utilization efficiency of cloud computing resources is improved;
compared with the existing power grid optimization method based on multi-agent reinforcement learning, the method has the advantages of high sample efficiency, high voltage safety, simple control structure and lower implementation cost.
According to the power grid reactive voltage distributed control method and system, on one hand, high-speed flexible control and high-speed reactive voltage control of communication robustness are achieved through distributed control, on the other hand, optimal reactive voltage control under the condition of incomplete model is achieved through online learning of control process data through a deep reinforcement learning method, the requirement of continuous online operation of power grid reactive voltage control can be met, the voltage quality of a power grid is greatly improved, and the operation loss of the power grid is reduced.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 shows a flow chart of a method for distributed control of reactive voltage of a power grid according to an embodiment of the invention;
fig. 2 shows a block diagram of a grid reactive voltage distributed control system according to an embodiment of the invention;
fig. 3 shows a schematic structural diagram of a module of the grid reactive voltage distributed control system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a distributed control method for reactive voltage of a power grid, in particular to a distributed control method for reactive voltage of the power grid based on multi-agent deep reinforcement learning, which comprises the following steps as shown in figure 1:
step 1: according to the whole reactive voltage control target and the optimization model of the controlled power grid, formulating reactive voltage control targets of all controlled areas and establishing a reactive voltage optimization model;
step 2: constructing a multi-agent interactive training framework based on the Markov game by combining the actual configuration conditions of the optimization model and the power grid;
and step 3: initializing each neural network and relevant control process variables and issuing the neural networks and relevant control process variables to each control area;
and 4, step 4: the local controllers in all the areas execute control steps in parallel according to the received strategy neural network;
and 5: the local controllers in all the areas execute the sampling step in parallel, and upload the measurement samples to the cloud server;
step 6: the cloud server parallelly learns the strategies of all the controllers and issues the updated strategies to all the regional controllers;
and 7: and repeating and executing the steps 4, 5 and 6 in parallel.
The specific implementation of each step is described in detail below.
In the step 1, according to the whole reactive voltage control target and the optimization model of the controlled power grid, the reactive voltage control target of each controlled area is formulated, and the reactive voltage optimization model is established. This step may be performed at a regional grid regulation center as shown in fig. 2, and in particular may be performed on a cloud server.
The method comprises the following steps:
step 1-1: establishing a whole reactive voltage control target and optimization model of the controlled power grid:
wherein the content of the first and second substances,is a collection of all nodes of the grid, VjIs the voltage amplitude of node j; pjIs the active power output of node j; qGjDG reactive power output for node j; qCjSVC (Static Var Compensator) reactive power output for node j;the lower voltage limit and the upper voltage limit of the node j are respectively;respectively is the lower limit and the upper limit of the SVC reactive power output of the node j; sGj,PGjDG installed capacity and active power output, respectively, for node j.
Step 1-2: and splitting the reactive voltage control target and the optimization model to form the reactive voltage control target and the optimization model of each controlled area.
As shown in fig. 2, the controlled grid is divided into N regions according to an actual controller installation situation, each region includes a plurality of nodes, illustratively, the nodes include DG nodes and SVC nodes, and a branch is formed between the nodes. Each zone is equipped with a local controller. Illustratively, the controlled area 1 is equipped with a controlled area controller 1, and the controlled area 2 is equipped with a controlled area controller 2 …, and the controlled area N is equipped with a controlled area controller N. The controller of the controlled area, called controller for short, can quickly obtain the measuring signal of the area. The controller is also communicated with a cloud server of a regional power grid regulation and control center, namely a cloud server for short, through communication. In the embodiment of the present invention, the cloud server may include one or more computing devices. Specifically, the controller can obtain voltage measurement, current measurement, power measurement and the like of the nodes through the measuring devices installed on the nodes, and upload the sample data of the reactive voltage control process to the cloud server. The controller also receives a reactive voltage control strategy corresponding to the region from the cloud server and issues a control signal to the node.
In the embodiment of the invention, for the ith epsilon [1, N ] controlled area, splitting a reactive voltage control target and an optimization model into the controlled area reactive voltage control target and the optimization model corresponding to N areas:
wherein the content of the first and second substances,for the complete set of nodes for the ith region,the network output power for the ith zone. In the examples of the present invention, the same symbols appearing represent the same physical meanings, such as SGj,PGjDG installed capacity and active power output, respectively, of node j, where node j is a node
Step 2: and constructing a multi-agent interactive training framework based on the Markov game by combining the actual configuration conditions of the optimization model and the power grid.
Step 2-1: corresponding to the system measurement of each region, construct the observation variable o of each regioni,tAs shown in (0.41).
Wherein P isi,QiInjecting vectors formed by active power and reactive power into each node of the ith area; viVector formed by voltage of each node of ith area;Outputting active power and reactive power for the network of the ith area; t is a discrete time variable of the control process.
Step 2-2: corresponding to the reactive voltage optimization target of each region, establishing a uniform feedback variable r of each regiontAs shown at (0.42).
PjIs the active power output of the node j,and outputting active power for the network of the area i.
Step 2-3: corresponding to the reactive voltage optimization constraint of each region, constructing constraint feedback variables of each regionAs shown in (0.43):
wherein [ x ]]+=max(0,x);βiThe cooperation coefficient of the ith area; vj(t) is the voltage at node j at time t,the upper limit of the voltage is represented,Vis the upper voltage limit; generally, the upper voltage limit is consistent across the nodes, although it may vary under particular circumstances; here, according to the convention, the voltage upper limit is taken as the same, namely the voltage upper limit identifies the voltage upper limit of each node, and the voltage lower limit also does;
step 2-4: corresponding to the reactive power of the controllable flexible resources, constructing action variables a of each areai,tAs shown in (0.44):
ai,t=(QGi,QCi)t(0.44)
wherein Q isGi,QCiThe vectors of the DG and SVC reactive power output of the ith area are respectively.
And step 3: initializing each neural network and related control process variables;
step 3-1: initializing each neural network and relevant control process variables and issuing the neural networks and relevant control process variables to each control area. Firstly, initializing a neural network corresponding to each region, and storing the neural network on a cloud server, wherein the neural network comprises the following steps:
step 3-1-1: defining a neural networkIs an input (o)i,t,ai,t) A neural network that outputs a single scalar value, comprising several hidden layers (typically taken as 2 hidden layers), each hidden layer containing several neurons (typically taken as 512 neurons), the activation function being a ReLU function, the mathematical expression of which is ReLU (x) max (0, x). Note the bookHas a network parameter of phiiCorresponding freezing parameter isAnd randomly initializing phiiAnd
step 3-1-2: defining a neural networkIs an input (o)i,t,ai,t) A neural network outputting a single scalar value comprises a plurality of hidden layers (typically 2 hidden layers), each hidden layer comprises a plurality of neurons (typically 512 neurons), and an activation function is a ReLU function. Note the bookIs recorded asThe corresponding freezing parameter isRandom initializationAnd
step 3-1-3: definition ofAndfor two inputs oi,tOutput and action ai,tNeural networks of the same vector shape.Andthe neural network has independent output layers respectively, and simultaneously shares the same neural network input layer and hidden layer, and comprises a plurality of hidden layers (typically 2 hidden layers), each hidden layer comprises a plurality of neurons (typically 512 neurons), and the activation function is a ReLU function. Note the bookAndall network parameters of (2) are thetai. Random initialization of thetai。
Step 3-2: initializing each region Lagrange multiplier lambdaiIs a scalar, typically with an initial value of 1;
step 3-3: issuing an initial strategy neural network through a communication networkAnda controller to zone i;
step 3-4: initializing a discrete time variable t as 0, wherein the actual time interval between two steps is delta t, controlling the time interval once every step, and specifically determining according to the actual measurement and the instruction control speed of a local controller;
step 3-5: initialization policy update period TuI.e. every TuStrategy updating is executed once at delta T time, the strategy updating is determined according to the training speed of the cloud server, and the typical value can be Tu=8;
Step 3-6: initialization sample upload period TsThe ratio of m to sample upload is equal to [1, T ∈s]. Every other TsAnd (4) uploading samples once by each controller, and uploading m samples in the previous uploading period. T issM is determined according to the communication speed, and the typical value can be Ts=8,m=1;
Step 3-7: initializing cloud server experience basesLocal caching experience base of each controller
And 4, step 4: and the local controllers of all the regions execute control steps in parallel according to the received strategy neural network. The local controllers of the areas i execute the following control steps at the time t, and the control steps are executed in parallel without interference:
step 4-1: obtaining measurement signals from a measurement device of a regional power grid to form a corresponding observation variable oi,t;
Step 4-2: neural network according to local policyAndgenerating the corresponding action a of the current timei,t:
Step 4-3: the controller will ai,tSending the data to local controlled flexible resources, such as DG nodes and SVC nodes;
And 5: and the local controllers of all the areas execute the step of uploading the samples in parallel and upload the measurement samples to the cloud server. And uploading the local samples to a cloud server by the region controller according to the uploading period. Illustratively, if tmodTsAnd (5) when the local controller of each area i is equal to 0, the following sampling steps are executed at the time t, and the following steps are executed in parallel without interference:
step 5-1: through a communication network, willUploading m +1 samples to experience base D of cloud serveriPerforming the following steps;
Step 5-3: after all the controllers are uploaded, the front m groups of r of the data uploaded in the current round are calculated on the cloud servertAnd
step 5-4: if communication faults occur, samples in a certain area cannot be uploaded, the sampling uploading can be directly ignored, and the follow-up execution is not affected.
Step 6: the cloud server learns the strategies of all the controllers in parallel and issues updatesThe latter strategy goes to each zone controller. And the cloud server uses the updated experience base to learn the strategies of each controller in parallel according to the updating period, and sends the generated updated strategy to each controller. Illustratively, if tmodTuWhen the value is 0, the cloud server parallelly learns each controller strategy at the time T and issues the strategy, namely, the following learning steps are executed for the neural network of each area i for a plurality of times (the typical value is T)uSecond, adjustable according to cloud server computing power):
step 6-2: calculating a parameter phiiLoss function of
Wherein x is (o)1,...,oN) All regional observations; x' is the observation value at the next moment corresponding to x; a is1,...,aNMotion vectors for region 1 to region N, respectively;is shown inObtaining; y isiComprises the following steps:
wherein γ is a reduction coefficient, typically 0.98; alpha is alphaiAn entropy maximization factor for region i, with a typical value of 0.1;to get toA probability value of (d);comprises the following steps:
l denotes bit-wise multiplication o'iIs the observed value of the area i at the next moment. In the embodiment of the invention, the cloud server learns the strategies of all the controllers in parallel, and the global observation value is used for learning and calculation of each region. I.e. learning using global information and execution using only local information. The reliability and superiority of the control strategy are improved.
Step 6-3: updating the parameter phii:
Where ρ isiFor learning the step size, a typical value is 0.0001,the expression is for a variable phiiAnd (5) calculating a gradient.
the superscript C denotes "constraint", a constraint-related variable.
Step 6-6: calculate lagrangian function:
whereinTo override the threading limit for voltage, a typical value is taken to be 0.Comprises the following steps:
step 6-7: updating the parameter θi:
And 6-8: updating the parameter lambdai:
where η is the freezing coefficient and a typical value is 0.995.
And 7: in the next operation, steps 4, 5, and 6 are repeatedly executed in parallel. Specifically, t is t +1, the procedure returns to step 4, and steps 4, 5, and 6 are repeated. The steps 4, 5 and 6 can be executed in parallel without mutual interference, and the related communication and calculation do not obstruct the normal execution of other controllers and other steps.
Based on the same inventive concept, an embodiment of the present invention further provides a grid reactive voltage distributed control system, as shown in fig. 3, the system includes:
the model building module is used for making reactive voltage control targets of all controlled areas according to the whole reactive voltage control target and the optimization model of the controlled power grid and building a reactive voltage optimization model;
the training frame construction module is used for constructing a multi-agent interactive training frame based on the Markov game by combining the actual configuration conditions of the optimization model and the power grid;
the initialization module is used for initializing each neural network and relevant control process variables and issuing the neural networks and the relevant control process variables to each control area;
the controller module is used for being arranged in each region locally, namely local computer equipment, and the controller module executes the control steps in parallel according to the received strategy neural network;
the sample uploading module is used for being arranged in each local area, parallelly executing the step of uploading the samples and uploading the measurement samples to the cloud server;
the strategy learning module is arranged on the cloud server and used for learning each controller strategy in parallel and issuing the updated strategy to each regional controller;
the controller module, the sample uploading module and the strategy learning module are repeatedly called and executed and can be executed in parallel.
Without loss of generality, the model building module, the training framework building module and the initialization module can be deployed on the cloud server, and can also be deployed on computer equipment different from the cloud server. The modules on the server are in data connection with the modules on the local part of each control area through a communication network.
The specific execution process and algorithm of each module may be obtained according to the embodiment of the distributed control method for reactive voltage of the power grid, and are not described herein again.
The control method and the control system adopt a control framework combining online centralized learning and distributed control, continuously and intensively collect control data of each controller through an efficient deep reinforcement learning algorithm, intensively learn on a cloud server to obtain a control strategy of each controller, and locally execute the strategy by each controller according to local measurement after the strategy is issued to each controller by a communication network. On one hand, the invention gives full play to the speed advantage of distributed control, and the local controller can carry out rapid control according to real-time local measurement without communication, thereby being particularly suitable for the reactive voltage control of high-speed DG resources and SVC resources; and on the other hand, an efficient deep reinforcement learning algorithm is provided, the information advantages of centralized learning are fully utilized, the optimal strategy of each intelligent agent is obtained, and the optimal operation of the system is guaranteed under the condition that the model is incomplete. The method greatly improves the efficiency, safety and flexibility of the reactive voltage control method of the power grid under the condition of model imperfection, is particularly suitable for regional power grids with serious model imperfection problems, saves high cost caused by repeated maintenance of accurate models, reduces the requirements on communication conditions and calculation conditions of each controller, exerts the advantages of flexibility and high efficiency of distributed control, avoids the problems of high single-point failure risk, large control instruction delay and the like caused by centralized control, and is suitable for large-scale popularization.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A reactive voltage distributed control method for a power grid is characterized by comprising the following steps:
step 1: according to the whole reactive voltage control target and the optimization model of the controlled power grid, formulating reactive voltage control targets of all controlled areas and establishing a reactive voltage optimization model;
step 2: constructing a multi-agent interactive training framework based on the Markov game by combining the actual configuration conditions of the optimization model and the power grid;
and step 3: initializing each neural network and relevant control process variables and issuing the neural networks and relevant control process variables to each control area;
and 4, step 4: the local controllers in all the areas execute control steps in parallel according to the received strategy neural network;
and 5: the local controllers in all the areas execute the step of uploading samples in parallel, and upload the measurement samples to the cloud server;
step 6: the cloud server parallelly learns the strategies of all the controllers and issues the updated strategies to all the regional controllers;
and 7: and (5) repeatedly executing the steps 4, 5 and 6.
2. The grid reactive voltage distributed control method according to claim 1, wherein the step 1 comprises:
step 1-1: establishing a whole reactive voltage control target and optimization model of the controlled power grid:
wherein the content of the first and second substances,is a collection of all nodes of the grid, VjIs the voltage amplitude of node j; pjIs the active power output of node j; qGjDG reactive power output for node j; qCjSVC reactive power output for node j; jV,the lower voltage limit and the upper voltage limit of the node j are respectively; CjQ,respectively is the lower limit and the upper limit of the SVC reactive power output of the node j; sGj,PGjDG installed capacity and active power output, respectively, for node j;
step 1-2: splitting the reactive voltage control target and the optimization model to form reactive voltage control targets and optimization models of each controlled area:
3. The grid reactive voltage distributed control method according to claim 2, wherein the step 2 comprises:
step 2-1: corresponding to the system measurement of each region, construct the observation variable o of each regioni,t:
Wherein P isi,QiInjecting vectors formed by active power and reactive power into each node of the ith area; viA vector formed by voltages of all nodes in the ith area; pi e,Outputting active power and reactive power for the network of the ith area; t is a discrete time variable of the control process;
step 2-2: corresponding to the reactive voltage optimization target of each region, establishing a uniform feedback variable r of each regiont:
step 2-3: corresponding to the reactive voltage optimization constraint of each region, constructing constraint feedback variables of each region
Wherein [ x ]]+=max(0,x);βiIs the cooperation coefficient of the i-th area, Vj(t) is the voltage at node j at time t,the upper limit of the voltage is represented,Vis the upper voltage limit;
step 2-4: corresponding to the reactive power of the controllable flexible resources, constructing action variables a of each areai,t:
ai,t=(QGi,QCi)t(0.6)
Wherein Q isGi,QCiThe vectors of the DG and SVC reactive power output of the ith area are respectively.
4. The grid reactive voltage distributed control method according to claim 3, wherein the step 3 comprises:
step 3-1: initializing each neural network and relevant control process variables and issuing the neural networks and relevant control process variables to each control area;
step 3-2: initializing each region Lagrange multiplier lambdaiIs a scalar;
step 3-3: issuing an initial strategy neural network through a communication networkAnda controller to zone i;
step 3-4: initializing a discrete time variable t as 0, wherein the actual time interval between two steps is delta t;
step 3-5: initialization policy update period TuFor every TuPerforming strategy updating once at the delta t time;
step 3-6: initialization sample upload period TsThe ratio of m to sample upload is equal to [1, T ∈s]For every TsEach controller of delta t uploads a sample once and uploads m samples in the previous uploading period;
5. The grid reactive voltage distributed control method according to claim 4, wherein the step 3-1 comprises:
step 3-1-1: defining a neural networkIs an input (o)i,t,ai,t) A neural network outputting a single scalar value; the activation function is a ReLU function; note the bookHas a network parameter of phiiCorresponding freezing parameter isAnd randomly initializing phiiAnd
step 3-1-2: defining a neural networkIs an input (o)i,t,ai,t) A neural network outputting a single scalar value; the activation function is a ReLU function; note the bookIs recorded asThe corresponding freezing parameter isRandom initializationAnd
step 3-1-3: definition ofAndfor two inputs oi,tOutput and action ai,tThe neural networks with the same shape as the vector,andthe device is provided with independent output layers respectively, and simultaneously shares the same neural network input layer and hidden layer; the activation function is a ReLU function; note the bookAndall network parameters of (2) are thetaiRandom initialization of thetai。
6. The grid reactive voltage distributed control method according to claim 5, wherein the step 4 comprises:
step 4-1: obtaining measurement signals from a measurement device of a regional power grid to form a corresponding observation variable oi,t;
Step 4-2: neural network according to local policyAndgenerating the corresponding action a of the current timei,t:
Step 4-3: the controller will ai,tSending the data to local controlled flexible resources, such as DG nodes and SVC nodes;
7. The grid reactive voltage distributed control method according to claim 6, wherein the step 5 comprises:
step 5-1: will be provided withUploading m +1 samples to experience base D of cloud serveriPerforming the following steps;
Step 5-3: calculating r for the first m groups of uploaded data of the current round on the cloud servertAnd
step 5-4: if communication faults occur, samples in a certain area cannot be uploaded, and the sampling uploading at this time can be directly ignored.
8. The grid reactive voltage distributed control method according to claim 7, wherein the step 6 comprises:
step 6-2: calculating a parameter phiiLoss function of (2):
wherein x is (o)1,...,oN) All regional observations; x' is the observation value at the next moment corresponding to x; a is1,...,aNMotion vectors for region 1 to region N, respectively;is shown inObtaining; y isiComprises the following steps:
wherein γ is a reduction coefficient; alpha is alphaiAn entropy maximization factor for region i;to get toA probability value of (d);comprises the following steps:
l denotes bit-wise multiplication o'iIs the observed value of the area i at the next moment;
step 6-3: updating the parameter phii:
Where ρ isiIn order to learn the step size,the expression is for a variable phiiCalculating a gradient;
Step 6-6: calculate lagrangian function:
step 6-7: updating the parameter θi:
And 6-8: updating the parameter lambdai:
wherein η is the freezing coefficient;
9. The grid reactive voltage distributed control method according to any of claims 1-8,
and the step 4, the step 5 and the step 6 are executed in parallel.
10. A grid reactive voltage distributed control system, comprising:
the model building module is used for making reactive voltage control targets of all controlled areas according to the whole reactive voltage control target and the optimization model of the controlled power grid and building a reactive voltage optimization model;
the training frame construction module is used for constructing a multi-agent interactive training frame based on the Markov game by combining the actual configuration conditions of the optimization model and the power grid;
the initialization module is used for initializing each neural network and relevant control process variables and issuing the neural networks and the relevant control process variables to each control area;
the controller module is used for being arranged in each region locally and executing the control steps in parallel according to the received strategy neural network;
the sample uploading module is used for being arranged in each area locally, executing the step of uploading samples in parallel and uploading the measurement samples to the cloud server;
the strategy learning module is arranged on the cloud server, is used for learning each controller strategy in parallel and issuing the updated strategy to each regional controller;
the controller module, the sample uploading module and the strategy learning module are used for being repeatedly called and executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010581959.4A CN111799808B (en) | 2020-06-23 | 2020-06-23 | Voltage distributed control method and system based on multi-agent deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010581959.4A CN111799808B (en) | 2020-06-23 | 2020-06-23 | Voltage distributed control method and system based on multi-agent deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111799808A true CN111799808A (en) | 2020-10-20 |
CN111799808B CN111799808B (en) | 2022-06-28 |
Family
ID=72803612
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010581959.4A Active CN111799808B (en) | 2020-06-23 | 2020-06-23 | Voltage distributed control method and system based on multi-agent deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111799808B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112507614A (en) * | 2020-12-01 | 2021-03-16 | 广东电网有限责任公司中山供电局 | Comprehensive optimization method for power grid in distributed power supply high-permeability area |
CN113258581A (en) * | 2021-05-31 | 2021-08-13 | 广东电网有限责任公司佛山供电局 | Source-load coordination voltage control method and device based on multiple intelligent agents |
EP4148939A1 (en) * | 2021-09-09 | 2023-03-15 | Siemens Aktiengesellschaft | System and method for controlling power distribution systems using graph-based reinforcement learning |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103580061A (en) * | 2013-10-28 | 2014-02-12 | 贵州电网公司电网规划研究中心 | Microgrid operating method |
US20160105023A1 (en) * | 2013-05-22 | 2016-04-14 | Vito Nv | Power supply network control system and method |
CN109120011A (en) * | 2018-09-29 | 2019-01-01 | 清华大学 | A kind of Distributed power net congestion dispatching method considering distributed generation resource |
CN110365056A (en) * | 2019-08-14 | 2019-10-22 | 南方电网科学研究院有限责任公司 | A kind of distributed energy participation power distribution network pressure regulation optimization method based on DDPG |
CN110729740A (en) * | 2019-07-03 | 2020-01-24 | 清华大学 | Power distribution network reactive power optimization method and device, computer equipment and readable storage medium |
CN110768262A (en) * | 2019-10-31 | 2020-02-07 | 上海电力大学 | Active power distribution network reactive power supply configuration method based on node clustering partition |
US20200082305A1 (en) * | 2018-09-06 | 2020-03-12 | Trevor N. Werho | Induced markov chain for wind farm generation forecasting |
US20200119556A1 (en) * | 2018-10-11 | 2020-04-16 | Di Shi | Autonomous Voltage Control for Power System Using Deep Reinforcement Learning Considering N-1 Contingency |
-
2020
- 2020-06-23 CN CN202010581959.4A patent/CN111799808B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160105023A1 (en) * | 2013-05-22 | 2016-04-14 | Vito Nv | Power supply network control system and method |
CN103580061A (en) * | 2013-10-28 | 2014-02-12 | 贵州电网公司电网规划研究中心 | Microgrid operating method |
US20200082305A1 (en) * | 2018-09-06 | 2020-03-12 | Trevor N. Werho | Induced markov chain for wind farm generation forecasting |
CN109120011A (en) * | 2018-09-29 | 2019-01-01 | 清华大学 | A kind of Distributed power net congestion dispatching method considering distributed generation resource |
US20200119556A1 (en) * | 2018-10-11 | 2020-04-16 | Di Shi | Autonomous Voltage Control for Power System Using Deep Reinforcement Learning Considering N-1 Contingency |
CN110729740A (en) * | 2019-07-03 | 2020-01-24 | 清华大学 | Power distribution network reactive power optimization method and device, computer equipment and readable storage medium |
CN110365056A (en) * | 2019-08-14 | 2019-10-22 | 南方电网科学研究院有限责任公司 | A kind of distributed energy participation power distribution network pressure regulation optimization method based on DDPG |
CN110768262A (en) * | 2019-10-31 | 2020-02-07 | 上海电力大学 | Active power distribution network reactive power supply configuration method based on node clustering partition |
Non-Patent Citations (4)
Title |
---|
AUGUSTO C. RUEDA-MEDINA 等: "Distributed Generators as Providers of Reactive Power Support—A Market Approach", 《IEEE TRANSACTIONS ON POWER SYSTEMS》 * |
DI CAO 等: "Distributed Voltage Regulation of Active Distribution System Based on Enhanced Multi-agent Deep Reinforcement Learning", 《ARXIV》 * |
NAN ZOU 等: "Auxiliary Frequency and Voltage Regulation in Microgrid via Intelligent Electric Vehicle Charging", 《2014 IEEE INTERNATIONAL CONFERENCE ON SMART GRID COMMUNICATIONS》 * |
PENG KOU 等: "Safe deep reinforcement learning-based constrained optimal control scheme for active distribution networks", 《APPLIED ENERGY》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112507614A (en) * | 2020-12-01 | 2021-03-16 | 广东电网有限责任公司中山供电局 | Comprehensive optimization method for power grid in distributed power supply high-permeability area |
CN113258581A (en) * | 2021-05-31 | 2021-08-13 | 广东电网有限责任公司佛山供电局 | Source-load coordination voltage control method and device based on multiple intelligent agents |
CN113258581B (en) * | 2021-05-31 | 2021-10-08 | 广东电网有限责任公司佛山供电局 | Source-load coordination voltage control method and device based on multiple intelligent agents |
EP4148939A1 (en) * | 2021-09-09 | 2023-03-15 | Siemens Aktiengesellschaft | System and method for controlling power distribution systems using graph-based reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN111799808B (en) | 2022-06-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Coordinated load frequency control of multi-area integrated energy system using multi-agent deep reinforcement learning | |
CN110535146B (en) | Electric power system reactive power optimization method based on depth determination strategy gradient reinforcement learning | |
Xi et al. | A novel multi-agent DDQN-AD method-based distributed strategy for automatic generation control of integrated energy systems | |
CN111799808B (en) | Voltage distributed control method and system based on multi-agent deep reinforcement learning | |
CN111564849B (en) | Two-stage deep reinforcement learning-based power grid reactive voltage control method | |
CN112615379A (en) | Power grid multi-section power automatic control method based on distributed multi-agent reinforcement learning | |
CN111666713B (en) | Power grid reactive voltage control model training method and system | |
CN113471982B (en) | Cloud edge cooperation and power grid privacy protection distributed power supply in-situ voltage control method | |
CN114217524A (en) | Power grid real-time self-adaptive decision-making method based on deep reinforcement learning | |
Xi et al. | A virtual generation ecosystem control strategy for automatic generation control of interconnected microgrids | |
Li et al. | Grid-area coordinated load frequency control strategy using large-scale multi-agent deep reinforcement learning | |
CN109494721A (en) | A kind of power distribution network distributed self-adaption control method suitable for being switched containing flexible multimode | |
CN110429652A (en) | A kind of intelligent power generation control method for expanding the adaptive Dynamic Programming of deep width | |
CN110165714A (en) | Micro-capacitance sensor integration scheduling and control method, computer readable storage medium based on limit dynamic programming algorithm | |
CN113422371B (en) | Distributed power supply local voltage control method based on graph convolution neural network | |
Yin et al. | Quantum deep reinforcement learning for rotor side converter control of double-fed induction generator-based wind turbines | |
CN117039981A (en) | Large-scale power grid optimal scheduling method, device and storage medium for new energy | |
CN113872213B (en) | Autonomous optimization control method and device for power distribution network voltage | |
Li et al. | Distributed deep reinforcement learning for integrated generation‐control and power‐dispatch of interconnected power grid with various renewable units | |
Xi et al. | Multi-agent deep reinforcement learning strategy for distributed energy | |
Vohra et al. | End-to-end learning with multiple modalities for system-optimised renewables nowcasting | |
Wang et al. | Intelligent load frequency control for improving wind power penetration in power systems | |
Wang et al. | Robust active yaw control for offshore wind farms using stochastic predictive control based on online adaptive scenario generation | |
Flórez et al. | Explicit coordination for MPC-based distributed control with application to Hydro-Power Valleys | |
Ma et al. | A Reinforcement learning based coordinated but differentiated load frequency control method with heterogeneous frequency regulation resources |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |