CN115333111A

CN115333111A - Multi-region power grid collaborative optimization method, system, equipment and readable storage medium

Info

Publication number: CN115333111A
Application number: CN202211109903.4A
Authority: CN
Inventors: 蒲天骄; 董雷; 韩笑; 林灏; 王新迎; 马世乾; 崇志强
Original assignee: State Grid Corp of China SGCC; China Electric Power Research Institute Co Ltd CEPRI; State Grid Tianjin Electric Power Co Ltd; North China Electric Power University; Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; China Electric Power Research Institute Co Ltd CEPRI; State Grid Tianjin Electric Power Co Ltd; North China Electric Power University; Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd
Priority date: 2022-09-13
Filing date: 2022-09-13
Publication date: 2022-11-11

Abstract

A multi-region power grid collaborative optimization method, a system, equipment and a readable storage medium are provided, wherein the optimization method comprises the steps of collecting observation data of each intelligent agent region; constructing a multi-region power grid collaborative optimization model containing renewable energy sources; designing a multi-region power grid collaborative optimization model containing renewable energy sources into a reinforcement learning model under a multi-agent environment according to a state space, an action space, an environment and a reward function; and solving the reinforcement learning model in the multi-agent environment, and outputting a collaborative optimization result. The invention adopts a distributed model with multiple decision centers to reduce the communication pressure of the system, can consider the double uncertainty of renewable energy sources and loads in training, has better capability of coping with the uncertainty, can realize the real-time decision of the system, and can make decisions on controllable units only according to local observation values of each regional power grid after training based on the characteristic of concentrated training-distributed execution of multiple intelligent agents, thereby being beneficial to protecting the privacy of each regional power grid.

Description

Multi-region power grid collaborative optimization method, system, equipment and readable storage medium

Technical Field

The invention belongs to the technical field of regional power grid optimization scheduling, and particularly relates to a multi-region power grid collaborative optimization method, a multi-region power grid collaborative optimization system, multi-region power grid collaborative optimization equipment and a readable storage medium, which are particularly suitable for multi-region power grid collaborative optimization containing high-proportion renewable energy.

Background

Constructing a new power system based on renewable energy becomes an important measure for reducing carbon emission. With the continuous improvement of the permeability of renewable energy sources, a traditional power system which is dominated by a coal-electricity set with continuous controllable output will be converted to a novel power system which is dominated by renewable energy sources with strong uncertainty and weak controllable output, and the conversion makes the internal power and electricity balance of a regional power grid difficult. In order to guarantee safe and stable operation of a power grid, reduce investment of a traditional standby unit and reduce operation cost of the power grid, a plurality of regional power grids are interconnected, and the capability of fully mining the internal autonomy of the regional power grids and information interaction and collaborative optimization between regions has great significance. At present, a centralized optimization method is mainly adopted for the problem of collaborative optimization of a regional power grid. The centralized optimization method needs to collect data of the whole system, then makes a decision through a scheduling center, and sends the decision to each execution unit to complete the optimization of the whole system. However, the permeability of distributed power generation of the novel power system is continuously improved, and the operation mode is changeable, so that the controllability of the novel power system is reduced, and the global operation information is difficult to collect.

Therefore, the traditional centralized control method has the problems of large data acquisition amount, high communication cost and complex modeling, has certain limitations on coping with uncertainty and solving efficiency, and is difficult to be applied to the control of a complex system containing a plurality of distributed power supplies on line.

Object of the Invention

The invention aims to provide a method, a system, equipment and a readable storage medium for collaborative optimization of a multi-region power grid aiming at the problems in the prior art, and provides a more economic, accurate and reliable solution for the collaborative optimization of the multi-region power grid containing high-proportion renewable energy based on a multi-agent deep reinforcement learning technology.

In order to achieve the purpose, the invention has the following technical scheme:

in a first aspect, a multi-region power grid collaborative optimization method is provided, including:

collecting observation data of each intelligent agent region in a multi-region power grid to be optimized;

constructing a multi-region power grid collaborative optimization model containing renewable energy sources on the basis of the observation data;

designing the multi-region power grid collaborative optimization model containing renewable energy sources into a reinforcement learning model under a multi-agent environment according to a state space, an action space, an environment and a reward function;

and solving the reinforcement learning model in the multi-agent environment, and outputting a collaborative optimization result to carry out collaborative optimization on the multi-region power grid.

The optimal scheme of the multi-region power grid collaborative optimization method further comprises the step of dividing the power grid into intelligent bodies, wherein when the power grid is divided into the intelligent bodies, the node standard system is divided into different regions, the different intelligent bodies are set according to the regions, and the regional intelligent bodies are used as decision centers to realize collaborative optimization operation of the multi-region power grid.

As a preferred solution of the multi-region grid collaborative optimization method in the present invention, in the step of collecting observation data of each agent region in the multi-region grid to be optimized, the observation data of each agent region includes:

load data p _L Renewable energy source actual output power

Actual output power of thermal power generating unit

And the operating cost coefficient c of the thermal power generating unit _o Cost coefficient of wind and light abandoning c _g Cost coefficient of starter SU ⁱ Shutdown cost factor SD ⁱ 。

As a preferred scheme of the multi-region grid collaborative optimization method, in the step of constructing the multi-region grid collaborative optimization model containing renewable energy, the following objective function is established:

wherein N is the number of divided regions,

for regional n thermal power generating unitsThe cost of the process is reduced, and the cost is reduced,

a wind and light abandoning punishment is carried out on the wind and light abandoning of the renewable energy source unit in the region n; wherein the content of the first and second substances,

and

respectively as follows:

in the formula, M _k The number of the fire-electricity generating sets in the region n; t is the calculation duration;

the method is characterized in that the method is an operation state of a thermal power generating unit i in a time period t:

the operation of the thermal power generating unit i is shown,

indicating that the thermal power generating unit i is shut down; c. C _o The coefficient is the operation cost coefficient of the thermal power generating unit;

the power output by the thermal power generating unit i in the time period t; delta t is an operation time interval; SU ⁱ The cost for starting the thermal power generating unit i once;

the shutdown cost of the thermal power generating unit i is calculated;

the shutdown time of the thermal power generating unit i is shown;

wherein, M _n The number of renewable energy units in the region n; t is the calculated time length; c. C _g A light abandoning penalty coefficient for abandoning wind;

actual output power of the renewable energy source unit j in the time period t;

and (4) the output power of the renewable energy set j in the time period t is limited.

As a preferred scheme of the multi-region grid collaborative optimization method, in the step of constructing the multi-region grid collaborative optimization model containing renewable energy, equality constraints and inequality constraints are established on the objective function;

equality constraints including network power flow constraints and power balance constraints;

the expression of the network power flow constraint is as follows:

in the formula, P _G,i 、Q _G,i Active and reactive power injected from the node i; p _D,i 、Q _D,i Active and reactive power consumed for the load of the node i; n is a radical of _e Counting the nodes of the power distribution network; u shape _i 、U _j The voltage amplitudes of the node i and the node j are obtained; g _ij 、B _ij Respectively conductance and susceptance; theta.theta. _ij Is the phase angle difference between the node i and the node j;

the expression of the power balance constraint is as follows:

in the formula, M _r As the number of loads in the region n,

power for load r over time period t;

inequality constraints comprising upper limit constraint of branch current, upper and lower limit constraint of node voltage, upper and lower limit constraint of output power of a thermal power generating unit, startup and shutdown time constraint of the thermal power generating unit, climbing constraint of the thermal power generating unit, rotating standby constraint of the thermal power generating unit and upper and lower limit constraint of output power of a renewable energy source unit;

the expression of the branch current upper limit constraint is as follows:

in the formula I _ij The magnitude of the current flowing on branch ij,

is the maximum amplitude of the current allowed to flow on branch ij;

the expression of the upper and lower node voltage limit constraints is as follows:

wherein the content of the first and second substances,

U _i respectively representing the upper limit and the lower limit of the voltage of the node i;

the expression of the upper and lower limit constraints of the output power of the thermal power generating unit is as follows:

wherein the content of the first and second substances,

P _t ⁱ respectively representing the upper limit and the lower limit of the output power of the thermal power generating unit i;

the expression of the thermal power generating unit on-off time constraint is as follows:

in the formula (I), the compound is shown in the specification,

starting time, stopping time, minimum starting time and minimum stopping time of the thermal power generating unit i;

the climbing constraint expression of the thermal power generating unit is as follows:

in the formula of U _t 、D _n The maximum climbing and descending speed of active power of the thermal power generating unit i in unit time;

the expression of the rotating standby constraint of the thermal power generating unit is as follows:

in the formula (I), the compound is shown in the specification,

for the load reserve capacity of node R at time t, R ^w Predicting an error for the output of the renewable energy unit j; the expression of the upper and lower limits of the output power of the renewable energy source set is as follows:

in the formula (I), the compound is shown in the specification,

and the upper limit and the lower limit of the output power of the renewable energy source unit j.

As a preferred embodiment of the multi-region power grid collaborative optimization method of the present invention, in the step of designing the multi-region power grid collaborative optimization model containing renewable energy into a reinforcement learning model in a multi-agent environment according to a state space, an action space, an environment and a reward function, the state space variable includes load data p of each agent region _L Actual output power of renewable energy

Renewable energy output power upper limit

Actual output power of thermal power generating unit

Operating cost coefficient c of thermal power generating unit _o Cost coefficient of wind and light abandoning c _g Cost coefficient of starter SU ⁱ Shutdown cost factor SD ⁱ And node voltage U _i (ii) a The expression of the state space is as follows:

the action space variables include: output power of thermal power generating unit

Start and stop of thermal power generating unit

Renewable energy output power

The expression of the action space is as follows:

in a multi-region power grid collaborative optimization model containing renewable energy, equality constraint and inequality constraint established for the objective function are taken as an environment, power flow calculation is performed once after each intelligent agent takes action at each moment, relevant state quantity of a feedback power grid is used for calculating a reward function, and the calculation is transferred to the next moment, and the steps are performed in a circulating manner;

in a multi-region power grid collaborative optimization model containing renewable energy sources, the opposite number of the objective function is used as the instant reward of each intelligent agent; according to the equality constraint and the inequality constraint established for the target function, if the corresponding variable does not satisfy the constraint, a penalty value r is set _push And the real-time reward is taken as a final reward function of the intelligent agent, and the expression is as follows:

R＝{F+r _push }。

as an optimal scheme of the multi-region power grid collaborative optimization method, in the step of solving the reinforcement learning model in the multi-agent environment, the MADDPG algorithm is adopted for solving, and in the MADDPG algorithm, the Actor of the agent i acquires the state information s related to the Actor _i ，a _i Actions taken for agent i, r _i For awarding, θ _i For the weight parameter of the system, n agents are arranged, and an observation set x =(s) ₁ ,...,s _N ) Status information for all agents; the Actor continuously updates the self parameter theta _i To maximize the expectation of the own prize, i.e., the Critic evaluation value is higher;

the policy update rule expression of Actor is as follows:

in the formula (I), the compound is shown in the specification,

for each agent, as a centralized state-action value function

Independently learning and updating; d = (x, x', a) ₁ ,...,a _N ,r ₁ ,...,r _N ) Randomly selecting one group from the playback units for storing all the experiences of the intelligent agents to train during each training; for continuous motion

The MADDPG algorithm adopts the continuous determination strategy set mu of n agents; for 0-1 motion

Random values are taken in the training stage;

critic updates its parameters by minimizing the time difference error, and the loss function expression of Critic is:

in the formula (I), the compound is shown in the specification,

mu' is the strategy set of the target network, and gamma belongs to [0,1 ]]Is a discount factor;

the target network periodically copies the parameters from the evaluation network, with the following rules:

θ′ _i ＝(1-τ)θ′ _i +τθ _i

in formula (II), theta' _i Tau is a soft update coefficient and tau is less than or equal to 1;

let strategy μ of agent i _i There is a set of K sub-policies,using only one sub-strategy mu in each training round _i ^(k) In the whole training process, the overall reward of the strategy set is the highest, and the final strategy update of the Actor is as follows:

as an optimal scheme of the multi-region power grid collaborative optimization method, the step of solving the reinforcement learning model in the multi-agent environment by adopting the MADDPG algorithm specifically comprises the following steps:

setting an optimized scheduling time period T and the number M of training rounds of each agent, setting initial values to be 1, and randomly setting an agent network parameter theta at initial time _i ；

Loading the multi-region power grid collaborative optimization model into an environment, setting interface files of states and actions in the MADDPG algorithm so as to carry out load flow calculation according to the state actions in real time and feed back corresponding environment state quantities;

each agent observes the state quantity of the region and takes action, interacts with the environment and feeds back the state quantity to calculate the reward, the agent carries out state transition to enter the next moment, observes the state of the next moment and stores (x, a, r, x') into an experience playback unit D;

randomly sampling k strategy next group (x) from empirical playback unit D ^(k) ,a ^(k) ,r ^(k) ,x′ ^(k) ) Updating critic and operator parameters and target network parameters;

and judging whether the current training round number M reaches a set value M, if so, ending the training, outputting and storing a result, and if not, restarting a new round of training.

In a second aspect, a multi-region grid collaborative optimization system is provided, including:

the observation data collection module is used for collecting observation data of each intelligent agent region in the multi-region power grid to be optimized;

the collaborative optimization model building module is used for building a multi-region power grid collaborative optimization model containing renewable energy sources on the basis of the observation data;

the reinforcement learning model design module is used for designing the multi-region power grid collaborative optimization model containing the renewable energy into a reinforcement learning model under a multi-agent environment according to a state space, an action space, an environment and a reward function;

and the model solving module is used for solving the reinforcement learning model in the multi-agent environment and outputting a collaborative optimization result to carry out collaborative optimization on the multi-area power grid.

As a preferred scheme of the multi-region power grid collaborative optimization system, the multi-region power grid collaborative optimization system further comprises an intelligent agent partitioning module, wherein the intelligent agent partitioning module is used for performing intelligent agent partitioning on the power grid;

when the power grid is divided into different zones, the node standard system is divided into different zones, different intelligent agents are set according to the zones, and the zone intelligent agents are used as decision centers, so that collaborative optimization operation of the multi-zone power grid is achieved.

As a preferable solution of the multi-region grid collaborative optimization system of the present invention, the observation data collected by the observation data collection module for each agent region includes:

load data p _L Renewable energy source actual output power

Actual output power of thermal power generating unit

As an optimal scheme of the multi-region grid collaborative optimization system, the collaborative optimization model building module establishes the following objective function:

wherein N is the number of divided regions,

in order to reduce the cost of the thermal power generating unit in the region n,

and

respectively as follows:

in the formula, M _k The number of the fire-electricity generating sets in the region n; t is the calculated time length;

the operation of the thermal power generating unit i is shown,

the power output by the thermal power generating unit i in the time period t; Δ t is an operating time interval; SU ⁱ The cost for starting the thermal power generating unit i once;

the shutdown cost of the thermal power generating unit i is calculated;

the shutdown time of the thermal power generating unit i is shown;

wherein M is _n The number of renewable energy source units in the region n; t is the calculation duration; c. C _g A light abandoning penalty coefficient for wind abandon;

the actual output power of the renewable energy source unit j in the time period t is obtained;

As a preferred scheme of the multi-region power grid collaborative optimization system, the collaborative optimization model building module builds equality constraints and inequality constraints on the objective function;

equality constraints including network power flow constraints, power balance constraints;

the expression of the network flow constraint is as follows:

in the formula, P _G,i 、Q _G,i Active and reactive power injected from the node i; p is _D,i 、Q _D,i Active and reactive power consumed for node i load; n is a radical of _e Counting the nodes of the power distribution network; u shape _i 、U _j The voltage amplitudes of the node i and the node j are obtained; g _ij 、B _ij Respectively a conductance and a susceptance; theta _ij Is the phase angle difference between the node i and the node j;

the expression of the power balance constraint is as follows:

in the formula, M _r As the number of loads in the region n,

power for load r over time period t;

the branch current upper limit constraint is expressed as follows:

in the formula I _ij The magnitude of the current flowing on branch ij,

is the maximum amplitude of the current allowed to flow on branch ij;

the expression of the upper and lower node voltage limits constraint is as follows:

wherein the content of the first and second substances,

wherein the content of the first and second substances,

in the formula (I), the compound is shown in the specification,

in the formula (I), the compound is shown in the specification,

for the load reserve capacity of node r at time t,R ^w predicting an error for the output of the renewable energy unit j; the expression of the upper and lower limits of the output power of the renewable energy source set is as follows:

in the formula (I), the compound is shown in the specification,

As a preferable solution of the multi-region power grid collaborative optimization system of the present invention, the reinforcement learning model design module is configured to design the multi-region power grid collaborative optimization model containing renewable energy into a reinforcement learning model in a multi-agent environment according to a state space, an action space, an environment and a reward function, where the state space variables include load data p of each agent region _L Renewable energy source actual output power

Upper limit of renewable energy output power

Actual output power of thermal power generating unit

Coefficient of operating cost c of thermal power generating unit _o Cost coefficient of wind and light abandoning c _g Cost coefficient of starter SU ⁱ Shutdown cost factor SD ⁱ And node voltage U _i (ii) a The expression of the state space is as follows:

Start and stop of thermal power generating unit

Renewable energy output power

The expression of the action space is as follows:

in a multi-region power grid collaborative optimization model containing renewable energy sources, the opposite number of the objective function is used as the instant reward of each intelligent agent; according to the equality constraint and the inequality constraint established for the target function, if the corresponding variable does not meet the constraint, a penalty value r is set _push And as a final reward function for the agent along with the instant reward, the expression is as follows:

R＝{F+r _push }。

as an optimal solution of the multi-region power grid collaborative optimization system, the model solving module adopts a madpg algorithm to solve the reinforcement learning model in the multi-agent environment, and in the madpg algorithm, an Actor of an agent i obtains state information s related to the Actor _i ，a _i Actions taken for agent i, r _i For awarding, θ _i For the weight parameter of the agent, n agents are arranged, and the observation set x =(s) ₁ ,...,s _N ) Status information for all agents; the Actor continuously updates the self parameter theta _i To maximize the expected value of the own prize, i.e. toCritic's evaluation value is higher;

the policy update rule expression of Actor is as follows:

in the formula (I), the compound is shown in the specification,

for each agent, as a centralized state-action value function

Random values are taken in the training stage;

in the formula (I), the compound is shown in the specification,

mu' is the strategy set of the target network, gamma belongs to [0,1 ]]Is a discount factor;

θ′ _i ＝(1-τ)θ′ _i +τθ _i

in the formula (II), theta' _i Is a target network parameter of an agent i, tau is a soft update coefficient, and tau is less than or equal to 1;

let strategy μ of agent i _i With a set of K sub-strategies, only one sub-strategy mu being used in each training round _i ^(k) In the whole training process, the overall reward of the strategy set is the highest, and the final strategy update of the Actor is as follows:

as a preferred embodiment of the multi-region power grid collaborative optimization system of the present invention, the step of solving the reinforcement learning model in the multi-agent environment by the model solving module using madpg algorithm specifically includes:

Loading the multi-region power grid collaborative optimization model into an environment, setting an interface file of states and actions in an MADDPG algorithm, so that load flow calculation can be performed in real time according to the state actions, and feeding back corresponding environment state quantities;

randomly sampling k strategy next group (x) from empirical playback unit D ^(k) ,a ^(k) ,r ^(k) ,x′ ^(k) ) Updating critic and actor parameters and updating target network parameters;

In a third aspect, an electronic device is provided, including:

a memory storing at least one instruction; and

and the processor executes the instructions stored in the memory to realize the multi-region power grid collaborative optimization method.

In a fourth aspect, a computer-readable storage medium is provided, where a computer program is stored, and the computer program is executed by a processor to implement the multi-region grid collaborative optimization method.

Compared with the prior art, the first aspect of the invention has at least the following beneficial effects:

the invention relates to a multi-region power grid collaborative optimization method based on multi-agent deep reinforcement learning, which mainly aims at solving the problems of complex modeling, higher optimization solving difficulty and the like caused by the characteristics of high-dimensional non-convex nonlinearity presented by the multi-region collaborative optimization problem in the existing power grid, simultaneously considers uncertain factors such as renewable energy and the like, constructs a multi-region power grid collaborative optimization model containing high-proportion renewable energy, adopts a multi-agent deep certainty strategy gradient method, designs the optimization problem into a distributed optimization problem under a multi-agent reinforcement learning environment, and can effectively solve the high-dimensional non-convex nonlinear multi-region power grid collaborative optimization problem. Compared with the prior power grid optimal scheduling method, the method has the advantages that: (1) The distributed model with multiple decision centers is adopted to reduce the communication pressure of the system, and simultaneously, the result which is nearly consistent with centralized optimization can be achieved. (2) The optimization method provided by the method can consider double uncertainty of renewable energy sources and loads in training, and according to the self-adaptability of the algorithm, compared with the traditional iterative solution method, the trained model has better capability of coping with uncertainty, can realize real-time decision of the system, and is favorable for online application. (3) Based on the characteristic of 'centralized training-distributed execution' of multiple intelligent agents, each regional power grid makes a decision on the controllable unit only according to local observation values of each regional power grid after training is completed, and privacy of each regional power grid is protected.

It is understood that the beneficial effects of the second to fourth aspects can be seen from the description of the first aspect, and are not described herein again.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

FIG. 1 is a power grid topology structure for a multi-region power grid collaborative optimization method based on multi-agent deep reinforcement learning;

FIG. 2 is a flowchart of a multi-region power grid collaborative optimization method based on multi-agent deep reinforcement learning according to an embodiment of the present invention;

FIG. 3 is a flowchart of the overall algorithm solution of the IES distributed optimization model based on MADDPG according to the embodiment of the present invention;

FIG. 4 is a block diagram of a multi-region power grid collaborative optimization system based on multi-agent deep reinforcement learning according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Aiming at the problem that the existing multi-region power grid collaborative optimization problem solution method for containing high-proportion renewable energy is not economical, accurate and reliable, a multi-region power grid collaborative optimization method based on multi-agent deep reinforcement learning is provided, and compared with a centralized optimization method and a traditional distributed optimization method, the method (1) provided by the invention adopts a multi-region power grid collaborative optimization model with multiple decision centers, can fully consider complex constraint conditions in each region power grid, reduces the communication pressure of a system, reduces the decision dimension and establishes a high-precision region power grid optimization model; (2) The method is based on multi-agent deep reinforcement learning, the optimization problem is designed to be a distributed optimization problem under the multi-agent reinforcement learning environment, the uncertainty of renewable energy sources and load under a fluctuation scene can be considered in training, real-time decision of a system can be realized, and online application is facilitated. (3) Based on the characteristic of 'centralized training-distributed execution' of multiple intelligent agents, after training is completed, each regional power grid only needs to make a decision on a controllable unit according to local observation values of each regional power grid, and privacy of each regional power grid is protected.

The invention relates to a regional power grid collaborative optimization method based on multi-agent deep reinforcement learning, wherein the data related to the observation data of each agent region are collected, and the data mainly comprise renewable energy unit output data, thermal power unit output data, load data and the like; on the basis, a multi-region power grid collaborative optimization model oriented to high-proportion renewable energy is constructed, the economy of all region power grids in the system is taken as an optimization target, and the safe and stable operation of the system is taken as a constraint; a Multi-Agent Deep Deterministic Policy Gradient Method (MADDPG) is adopted, and the model is designed into a reinforcement learning model under the Multi-Agent environment according to a state space, an action space, an environment and a reward function; and finally, programming and solving the model by using simulation software, comparing the result obtained by solving with other methods, verifying that the trained model not only has better capability of coping with uncertainty, but also can realize real-time decision of the system, and simultaneously has higher convergence speed and better training result in the training process, thereby having obvious advantages in coping with complex environment problems.

The invention provides a multi-region power grid collaborative optimization method which mainly comprises the following steps:

1. carrying out intelligent division on a power grid;

taking IEEE39 node standard system as an example, and taking the following standards as division bases: 1) The connection between the areas is a single loop as much as possible; 2) The structure between the areas is clear, and the flow direction of the tide is single; 3) The power grid in the group is basically unconstrained and has strong structuredness. The IEEE39 node standard system is divided into different areas, different intelligent agents are set according to the areas, the area intelligent agents are used as decision centers, the collaborative optimization operation of a multi-area power grid is achieved, and the area division condition is shown in figure 1.

2. The multi-region power grid collaborative optimization method based on multi-agent deep reinforcement learning comprises the following steps:

collecting observation data of each agent region, the observation data of each agent related to the invention comprises: the load data, the renewable energy source actual output power, the thermal power unit actual output power and the cost coefficient comprise a thermal power unit operation cost coefficient, a wind and light abandoning cost coefficient, a starting cost coefficient and a shutdown cost coefficient; each regional intelligent agent meets the demand of load in the region and the maximum consumption of renewable energy sources through generating sets (mainly thermal power generating sets and renewable energy generating sets) on one hand and energy interaction between regions on the other hand, and realizes the distribution autonomy of each region to a certain extent, wherein the model is as follows:

the total economy of all regional power grids in the system is taken as an optimization target, the cost of each regional power grid is set as the cost of a thermal power generating unit and the wind and light abandoning punishment of renewable energy sources, and N is the number of divided regions.

and (4) discarding light punishment for wind curtailment of the regional n renewable energy unit.

Determining the constraint conditions of the optimization model according to the constructed multi-region power grid collaborative optimization model containing the high-proportion renewable energy sources:

the equality constraint conditions of the system mainly comprise network power flow constraint and power balance constraint. The inequality constraint of the system mainly aims at ensuring the safe and stable operation of the system, and mainly comprises branch current upper limit constraint, node voltage upper and lower limit constraint, thermal power unit output power upper and lower limit constraint, thermal power unit startup and shutdown time constraint, thermal power unit climbing constraint, thermal power unit rotation standby constraint and renewable energy unit output power upper and lower limit constraint.

Based on basic elements of multi-agent deep reinforcement learning, a regional power grid collaborative optimization model containing high-proportion renewable energy is designed into a reinforcement learning model under a multi-agent environment according to a state space, an action space, an environment and a reward function, and a MADDPG method is adopted for solving.

The solution is realized by using computer software, and compared with the centralized optimization solution, the trained model is verified to have better capability of coping with uncertainty, and the real-time decision of the system can be realized, so that the online application is facilitated; compared with a single agent deep reinforcement learning method, the MADDPG method has the advantages of high convergence rate, good training result and obvious response to complex environment problems.

3. The MADDPG method adopts a mode of 'centralized training and distributed execution', each intelligent agent is provided with an independent Actor (Actor) and Critic (Critic) network, different from the DDPG, during training, the Actor of each intelligent agent takes action according to the state of the Actor, critic evaluates the action of the Actor, the Actor updates the strategy of the Actor according to feedback, and Critic of each intelligent agent obtains more accurate evaluation information by performing strategy estimation on other intelligent agents. After the training is finished, each intelligent agent only needs to utilize the Actor to take action according to the state of the intelligent agent, at the moment, the information of other intelligent agents does not need to be acquired, and each intelligent agent independently finishes decision making. The MADDPG obtains the optimal strategy through centralized training and learning, only local information is needed when the MADDPG is applied, and the privacy information of each agent can be effectively protected.

4. In the multi-region power grid collaborative optimization method based on multi-agent deep reinforcement learning, double uncertainties of renewable energy sources and loads are considered in the model training process, uncertainties are superposed while data are input in the model training process, namely, in the MADDPG training process, each regional power grid agent needs to obtain observable state data in a regional range of the regional power grid agent, and at the moment of reading the renewable energy sources and the load data, unknown data, namely a fluctuation range, is correspondingly superposed on the obtained data, so that the data obey normal distribution, therefore, the method can ensure that the renewable energy sources and the load data obtained each time are different no matter how many times the training is, but the data are in the fluctuation range, and the size of the fluctuation range can be flexibly adjusted.

Example 1

Referring to fig. 2, the multi-region power grid collaborative optimization method based on multi-agent deep reinforcement learning in the embodiment of the present invention includes the following steps:

s01, collecting observation data of each intelligent agent region, wherein the observation data of each intelligent agent comprises load data, renewable energy actual output power, thermal power unit actual output power, and cost coefficients comprising a thermal power unit operation cost coefficient, a wind and light abandoning cost coefficient, a starter cost coefficient and a shutdown cost coefficient;

s02, constructing a multi-region power grid collaborative optimization model containing high-proportion renewable energy, taking the total economy of all region power grids in the system as an optimization target, setting the cost of each region power grid as the cost of a thermal power unit and the wind and light abandoning punishment of the renewable energy, wherein the equality constraint condition of the system mainly comprises a network power flow constraint and a power balance constraint, the inequality constraint mainly aims at ensuring the safe and stable operation of the system, and mainly comprises an upper limit constraint, a lower limit constraint, a climbing constraint and a rotating standby constraint of variables;

s03, designing a multi-region power grid collaborative optimization model containing high-proportion renewable energy sources into a multi-agent environment reinforcement learning model according to a state space, an action space, an environment and a reward function based on basic elements of multi-agent deep reinforcement learning;

s04, solving by adopting an MADDPG method according to the constructed multi-area power grid collaborative optimization model based on multi-agent deep reinforcement learning, and giving a solving flow;

s05, solving the model by using computer software, and verifying that the trained model not only has better capability of coping with uncertainty, but also can realize real-time decision of a system by comparing with centralized optimization solution, thereby being beneficial to online application; compared with a single agent deep reinforcement learning method, the MADDPG method has the advantages of high convergence rate, good training result and obvious response to complex environment problems.

In one possible implementation, the observation data for each agent region is first collected, and the observation data for each agent involved in the example includes load data p _L Actual output power of renewable energy

Actual output power of thermal power generating unit

Coefficient of operating cost c of thermal power generating unit _o Cost coefficient c of abandoned wind and abandoned light _g Cost coefficient of starter SU ⁱ Shutdown cost factor SD ⁱ 。

Then, constructing a multi-region power grid collaborative optimization model containing high-proportion renewable energy sources, comprising the following steps:

step 2.1: and establishing an objective function.

The method comprises the following steps of constructing a multi-region power grid collaborative optimization model containing high-proportion renewable energy, taking an IEEE39 node standard system as an example, dividing the multi-region power grid collaborative optimization model into 3 region power grids, taking the overall economy of all regions in the system as an optimization target, setting the cost of each region power grid as the cost of a thermal power generating unit and the wind and light abandoning punishment of the renewable energy, wherein the target function of the system is as follows:

where N is the number of divided regions.

and (4) abandoning the light punishment for the wind abandoning of the regional n renewable energy units.

In the formula (3), the reaction mixture is,

wherein M is _k The number of the thermoelectric generator sets in the region n. And T is the calculation time length.

For the operating state of the thermal power generating unit i in the time period t,

representing the operation of a thermal power generating unit i;

indicating that the thermal power generating unit i is shut down. c. C _o Is an operating cost factor.

And (4) outputting the power of the thermal power generating unit i in the time period t. Δ t is the operating period interval. SU ⁱ The cost for starting the thermal power generating unit i once.

The shutdown cost of the thermal power generating unit i is reduced.

The shutdown time of the thermal power generating unit i.

In the formula (3), the reaction mixture is,

wherein, M _n The number of renewable energy units in the region n; t is the calculation duration; c. C _g A light abandoning penalty coefficient for abandoning wind;

Step 2.2: and establishing constraint conditions of the optimization model.

The constraint conditions of the multi-region power grid collaborative optimization model containing the high-proportion renewable energy comprise equality constraint conditions and inequality constraint conditions.

1. The equation constrains:

the equality constraint conditions of the system mainly comprise network power flow constraint and power balance constraint.

The network flow constraints are as follows:

in the formula, P _G,i 、Q _G,i Active and reactive power injected from the node i; p _D,i 、Q _D,i Active and reactive power consumed for the load of the node i; n is a radical of _e Counting the nodes of the power distribution network; u shape _i 、U _j The voltage amplitudes of the node i and the node j are obtained; g _ij 、B _ij Respectively a conductance and a susceptance; theta.theta. _ij Is the phase angle difference between the node i and the node j;

the power balance constraints are as follows:

wherein M is _r Is the number of loads in the region n.

Is the power of the load r over the time period t. The formula represents the injected power of the whole system, namely the sum of all renewable energy output power and thermal power unit output power, and the power consumed by all loads needs to be satisfied.

2. The inequality constrains:

the inequality constraint of the system mainly aims at ensuring the safe and stable operation of the system and meets the actual operation condition of the unit. The upper limit of the branch current is mainly restricted; limiting the upper limit and the lower limit of the node voltage; the method comprises the following steps of thermal power unit output power upper and lower limit constraint, start-up and shutdown time constraint, climbing constraint and rotation standby constraint; and (4) limiting the upper limit and the lower limit of the output power of the renewable energy source unit.

The upper limit of the branch current is constrained as follows:

wherein, I _ij The magnitude of the current flowing on branch ij,

the maximum amplitude of the current allowed to flow on branch ij.

The upper and lower limits of the node voltage are constrained as follows:

wherein the content of the first and second substances,

U _i respectively representing the upper and lower limits of the voltage at node i.

And (3) upper and lower limit constraint of output power of the thermal power generating unit:

wherein, the first and the second end of the pipe are connected with each other,

P _t ⁱ and respectively representing the upper limit and the lower limit of the output power of the thermal power generating unit i.

And (3) constraint of the starting and stopping time of the thermal power generating unit:

the starting time, the stopping time, the minimum starting time and the minimum stopping time of the thermal power generating unit i.

Thermal power generating unit climbing restraint:

wherein, U _t 、D _n And the maximum climbing and descending speed of the active power of the thermal power generating unit i in unit time.

Rotating standby constraint of a thermal power generating unit:

wherein the content of the first and second substances,

is node r negative at time tThe charge reserve capacity is typically 5% of the total load. R ^w And predicting an error for the output of the renewable energy unit j. The first expression shows that the upper limit of the output power of all the units is larger than the sum of all the loads and the maximum positive error. The second expression shows that the lower limit of the output power of all the units is smaller than the difference between all the loads and the maximum negative error.

The upper and lower limits of the output power of the renewable energy source unit are restricted:

wherein the content of the first and second substances,

And then based on basic elements of multi-agent deep reinforcement learning, designing a multi-region collaborative optimization model containing high-proportion renewable energy sources into a reinforcement learning model under a multi-agent environment according to a state space, an action space, an environment and a reward function:

1. state space

The state space of the system mainly comprises load data p of each intelligent agent region _L Actual output power of renewable energy

Renewable energy output power upper limit

Actual output power of thermal power generating unit

The cost coefficient comprises a thermal power generating unit operation cost coefficient c _o Cost coefficient of wind and light abandoning c _g Cost coefficient of starter SU ⁱ Shutdown cost factor SD ⁱ Node voltage U _i I.e. by

2. Movement space

The action space variable corresponds to the control variable of the system to be researched, each regional power grid in the system is used as an intelligent agent, and the action space variable comprises the output power of the thermal power generating unit according to the constraint of the intelligent agent

Start-stop of thermal power generating unit

Renewable energy output power

Namely, it is

3. Design of environment

The Actor of each agent takes action according to the state at the moment, interacts with the environment, obtains reward and transfers to the state at the next moment, and Critic evaluates the action and guides the agent to take action at the next moment. According to the process, the multi-region power grid models (6) - (14) are used as the environment, once power grid load flow calculation is carried out after each intelligent agent takes action at each moment, the relevant state quantity of the power grid is fed back to be used for calculating the reward function, the process is transferred to the next moment, and the process is repeated.

4. Reward function

The reward function affects the convergence of the algorithm to some extent, so that the setting of the reward signal needs to be able to deliver the desired goal of the agent, thereby directing the agent to improve the action towards maximizing the reward function. And taking the opposite number of the objective function of the multi-region power grid model as the instant reward of each intelligent agent. In the optimization problem, corresponding constraint conditions are also required to be met, and according to the constraint conditions provided by the invention, if correspondingIf the variable does not satisfy the constraint, a penalty value r is set _push And the instant prize together as the final prize function for the agent, is calculated as follows:

R＝{F+r _push } (17)

and finally, solving by adopting an MADDPG method according to the multi-area power grid collaborative optimization model based on the multi-agent deep reinforcement learning. In the MADDPG method, the Actor of the agent i only needs to acquire the state information s related to the Actor _i ，a _i Actions taken for agent i, r _i For awarding, θ _i Is a weight parameter of itself. Is provided with n agents, and an observation set x =(s) ₁ ,...,s _N ) Is the state information of all agents. The Actor continuously updates the self parameter theta _i To maximize the expectation of the own prize, i.e., the Critic evaluation value is higher. The policy update rule of the Actor is as follows:

in the formula (I), the compound is shown in the specification,

for each agent, as a centralized state-action value function

Independently learned and updated. D = (x, x', a) ₁ ,...,a _N ,r ₁ ,...,r _N ) To store all the agent experience playback units, a random set of them is drawn for training each time. To solve the problem that the intelligent agent is not easy to converge due to selecting a certain action by action distribution in a continuous space, the continuous action

The MADDPG adopts a continuous determination strategy set mu of n intelligent agents; for 0-1 motion

And random values are taken in the training stage.

Critic updates its parameters primarily by minimizing the time difference error, and its loss function is:

in the formula (I), the compound is shown in the specification,

mu' is the strategy set of the target network, gamma belongs to [0,1 ]]Is a discount factor.

Finally, the target network periodically copies the parameters from the evaluation network in a soft update mode according to the following rules:

θ′ _i ＝(1-τ)θ′ _i +τθ _i (20)

in formula (II), theta' _i Is the target network parameter of the agent i, tau is the soft update coefficient, and tau is less than or equal to 1.

In a multi-agent environment, agents interact with the environment simultaneously, resulting in an environment that is unstable for each agent. MADDPG proposes a strategy integration method, which enables the strategy mu of an agent i _i With a set of K sub-strategies, only one sub-strategy μ being used in each training round _i ^(k) The overall reward of the set of strategies is made highest throughout the training process. Therefore, the final policy update of Actor is:

referring to fig. 3, according to the solution principle of the madpg method, the overall algorithm flow of the multi-region power grid collaborative optimization model based on the madpg according to the embodiment of the present invention is as follows:

the method comprises the following steps: each agent synchronizes and initializes parameters. Setting an optimized scheduling time period T and the number M of training rounds of each agent, setting initial values to be 1, and randomly setting network parameters of the agents at initial timeθ _i 。

Step two: the environment is initialized. And loading the multi-region power grid collaborative optimization model into an environment, setting interface files of states and actions in the MADDPG algorithm so as to carry out load flow calculation according to the states and actions in real time and feed back corresponding environment state quantities.

Step three: each agent interacts with the environment. And each agent observes the state quantity of the region and takes action, interacts with the environment and feeds back the state quantity to calculate the reward, and the agent carries out state transition to enter the next moment, observes the state of the next moment and stores (x, a, r and x') in the experience playback unit D.

Step four: and updating each network parameter. Randomly sampling k strategy next group (x) from empirical playback unit D ^(k) ,a ^(k) ,r ^(k) ,x′ ^(k) ) The critic and operator parameters are updated according to equations (17), (19), and the target network parameters are updated according to equation (18).

Step five: and judging whether the current training round number M reaches a set value M, if so, ending the training, outputting and storing a result, and if not, returning to the step two, and restarting a new round of training.

The model is programmed and solved by using simulation software, and compared with centralized optimization solution, the trained model is verified to have better capability of coping with uncertainty, and can realize real-time decision making of a system, thereby being beneficial to online application; compared with a single agent deep reinforcement learning method, the MADDPG method has the advantages of high convergence rate, good training result and obvious response to complex environment problems.

Example 2

Referring to fig. 4, a multi-region power grid collaborative optimization system according to an embodiment of the present invention includes:

the observation data collection module 2 is used for collecting the observation data of each intelligent agent region in the multi-region power grid to be optimized;

the collaborative optimization model building module 3 is used for building a multi-region power grid collaborative optimization model containing renewable energy sources on the basis of the observation data;

the reinforcement learning model design module 4 is used for designing the multi-region power grid collaborative optimization model containing renewable energy sources into a reinforcement learning model under a multi-agent environment according to a state space, an action space, an environment and a reward function;

and the model solving module 5 is used for solving the reinforcement learning model in the multi-agent environment and outputting a collaborative optimization result to carry out collaborative optimization on the multi-region power grid.

In a possible implementation manner, the system further comprises an agent partitioning module 1, configured to perform agent partitioning on the power grid;

when the intelligent agent is divided into the power grids, the node standard system is divided into different areas, different intelligent agents are set according to the areas, and the area intelligent agents are used as decision centers to realize the collaborative optimization operation of the multi-area power grids.

In one possible embodiment, the observation data collected by the observation data collecting module 2 for each agent region includes: load data p _L Actual output power of renewable energy

Actual output power of thermal power generating unit

And the operating cost coefficient c of the thermal power generating unit _o Cost coefficient c of abandoned wind and abandoned light _g Cost coefficient of starter SU ⁱ Shutdown cost factor SD ⁱ 。

In one possible embodiment, the collaborative optimization model building module 3 builds the following objective function:

wherein N is the number of divided regions,

punishment of abandoned wind and abandoned light for the renewable energy unit of the region n; wherein, the first and the second end of the pipe are connected with each other,

and

the calculation expressions of (a) are respectively as follows:

in the formula, M _k The number of the thermoelectric generator sets in the region n is set; t is the calculated time length;

it indicates that the thermal power generating unit i is operated,

indicating that the thermal power generating unit i is shut down; c. C _o The operation cost coefficient of the thermal power generating unit is obtained;

the shutdown cost of the thermal power generating unit i is calculated;

the shutdown time of the thermal power generating unit i is shown;

wherein M is _n The number of renewable energy units in the region n; t is the calculation duration; c. C _g A light abandoning penalty coefficient for abandoning wind;

and (4) setting the upper limit of the output power of the renewable energy source set j in the time period t.

In a possible implementation manner, the collaborative optimization model building module 3 builds equality constraints and inequality constraints on the objective function;

the expression of the network flow constraint is as follows:

in the formula, P _G,i 、Q _G,i Active and reactive power injected from the node i; p is _D,i 、Q _D,i Active and reactive power consumed for the load of the node i; n is a radical of hydrogen _e Counting the nodes of the power distribution network; u shape _i 、U _j The voltage amplitudes of the node i and the node j are obtained; g _ij 、B _ij Respectively a conductance and a susceptance; theta.theta. _ij Is the phase angle difference between the node i and the node j;

the expression of the power balance constraint is as follows:

in the formula, M _r For the number of loads in the region n,

power for load r over time period t;

the expression of the branch current upper limit constraint is as follows:

in the formula I _ij The magnitude of the current flowing on branch ij,

is the maximum amplitude of the current allowed to flow on branch ij;

the thermal power unit output power upper and lower limit constraint expression is as follows:

wherein the content of the first and second substances,

in the formula (I), the compound is shown in the specification,

in the formula of U _t 、D _n The maximum active power climbing and descending speed of the thermal power generating unit i in unit time;

in the formula (I), the compound is shown in the specification,

for the load reserve capacity, R, of node R at time t ^w Predicting an error for the output of the renewable energy unit j; the expression of the upper and lower limits of the output power of the renewable energy source set is as follows:

in the formula (I), the compound is shown in the specification,

In a possible implementation manner, the reinforcement learning model design module 4 is used for designing the renewable energy multi-region power grid collaborative optimization model into the reinforcement learning model in the multi-agent environment according to the state space, the action space, the environment and the reward function, wherein the state space variables comprise the load data p of each agent region _L Actual output power of renewable energy

Renewable energy output power upper limit

Actual output power of thermal power generating unit

Start-stop of thermal power generating unit

Renewable energy output power

The expression of the motion space is as follows:

in a multi-region power grid collaborative optimization model containing renewable energy sources, equality constraints and inequality constraints established for the objective function are used as environments, once power grid flow calculation is carried out after each intelligent agent takes action at each moment, relevant state quantities of a feedback power grid are used for calculating a reward function, and the calculation is transferred to the next moment, and the steps are carried out in a circulating manner;

in a multi-region power grid collaborative optimization model containing renewable energy sources, the opposite number of a target function is used as the instant reward of each intelligent agent; according to the equality constraint and the inequality constraint established for the target function, if the corresponding variable does not satisfy the constraint, a penalty value r is set _push And as a final reward function for the agent along with the instant reward, the expression is as follows:

R＝{F+r _push }。

in one possible implementation, the model solving module 5 adopts a madpg algorithm in which the Actor of agent i obtains its own relevant state information s to solve the reinforcement learning model in the multi-agent environment _i ，a _i Actions taken for agent i, r _i For awarding, θ _i For the weight parameter of the agent, n agents are arranged, and the observation set x =(s) ₁ ,...,s _N ) Status information for all agents; the Actor continuously updates the self parameter theta _i To maximize the expectation of the reward itself, i.e., the Critic evaluation is higher;

the policy update rule expression of Actor is as follows:

in the formula (I), the compound is shown in the specification,

for each agent, as a centralized state-action value function

Random values are taken in the training stage;

in the formula (I), the compound is shown in the specification,

the target network periodically copies parameters from the evaluation network as follows:

θ′ _i ＝(1-τ)θ′ _i +τθ _i

in the formula, theta _i ' is a target network parameter of an agent i, tau is a soft update coefficient, and tau is less than or equal to 1;

let strategy μ of agent i _i With a set of K sub-strategies, only one sub-strategy mu being used in each training round _i ^(k) Make strategy during the whole training processThe overall reward of the rough set is the highest, and the final strategy update of the Actor is as follows:

in one possible embodiment, the step of the model solving module 5 adopting the madpg algorithm to solve the reinforcement learning model in the multi-agent environment specifically includes:

and judging whether the number M of the current training rounds reaches a set value M, if so, ending the training, outputting and storing a result, and if not, restarting a new round of training.

Example 3

Another embodiment of the present invention further provides an electronic device, including:

a memory storing at least one instruction; and

Example 4

Another embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the method for collaborative optimization of a multi-region power grid according to the present invention is implemented.

The computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying said computer program code, media, usb disk, removable hard disk, magnetic diskette, optical disk, computer memory, read-only memory, random access memory, electrical carrier wave signals, telecommunication signals, software distribution media, etc. It should be noted that the computer-readable medium may contain suitable additions or subtractions depending on the requirements of legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer-readable media may not include electrical carrier signals or telecommunication signals in accordance with legislation and patent practice. For convenience of explanation, the above description only shows the relevant parts of the embodiments of the present invention, and the detailed technical details are not disclosed, please refer to the method parts of the embodiments of the present invention. The computer-readable storage medium is non-transitory, and may be stored in a storage device formed by various electronic devices, and is capable of implementing the execution process described in the method of the embodiment of the present invention.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims

1. A multi-region grid collaborative optimization method is characterized by comprising the following steps:

and solving the reinforcement learning model under the multi-agent environment, and outputting a cooperative optimization result to perform cooperative optimization on the multi-region power grid.

2. The multi-region power grid collaborative optimization method according to claim 1, further comprising a step of dividing a power grid into different regions, wherein when the power grid is divided into different regions, the different regions are set as different intelligent agents according to the regions, and the region intelligent agents are used as decision centers to realize collaborative optimization operation of the multi-region power grid.

3. The multi-region grid collaborative optimization method according to claim 1, wherein in the step of collecting observation data of each smart region in the multi-region grid to be optimized, the observation data of each smart region includes:

load data p _L Renewable energy source actual output power

Actual output power of thermal power generating unit

4. The multi-region grid collaborative optimization method according to claim 3, wherein in the step of constructing the multi-region grid collaborative optimization model including renewable energy, the following objective function is established:

wherein N is the number of divided regions,

a wind and light abandoning punishment is carried out on the wind and light abandoning of the renewable energy source unit in the region n; wherein, the first and the second end of the pipe are connected with each other,

and

respectively as follows:

it indicates that the thermal power generating unit i is operated,

the shutdown cost of the thermal power generating unit i is reduced;

the shutdown time of the thermal power generating unit i is shown;

wherein, M _n The number of renewable energy source units in the region n; t is the calculation duration; c. C _g A light abandoning penalty coefficient for wind abandon;

actual output power of the renewable energy source unit j in the time period t;

5. The multi-region grid collaborative optimization method according to claim 4, wherein in the step of constructing the multi-region grid collaborative optimization model including renewable energy, equality constraints and inequality constraints are established for the objective function;

the expression of the network power flow constraint is as follows:

in the formula, P _G,i 、Q _G,i Active and reactive power injected from the node i; p _D,i 、Q _D,i Active and reactive power consumed for the load of the node i; n is a radical of hydrogen _e Counting the nodes of the power distribution network; u shape _i 、U _j The voltage amplitudes of the node i and the node j are obtained; g _ij 、B _ij Respectively conductance and susceptance; theta.theta. _ij Is the phase angle difference between the node i and the node j;

the expression of the power balance constraint is as follows:

in the formula, M _r As the number of loads in the region n,

power for load r over time period t;

inequality constraints comprising branch current upper limit constraints, node voltage upper and lower limit constraints, thermal power unit output power upper and lower limit constraints, thermal power unit startup and shutdown time constraints, thermal power unit climbing constraints, thermal power unit rotation standby constraints and renewable energy unit output power upper and lower limit constraints;

the expression of the branch current upper limit constraint is as follows:

in the formula I _ij The magnitude of the current flowing on branch ij,

is the maximum amplitude of the current allowed to flow on branch ij;

wherein the content of the first and second substances,

respectively representing the upper limit and the lower limit of the output power of the thermal power generating unit i;

the thermal power generating unit on-off time constraint expression is as follows:

in the formula (I), the compound is shown in the specification,

the thermal power generating unit rotation standby constraint expression is as follows:

in the formula (I), the compound is shown in the specification,

for the load reserve capacity, R, of node R at time t ^w Predicting an error for the output of the renewable energy unit j;

the expression of the upper and lower limits of the output power of the renewable energy source set is as follows:

in the formula (I), the compound is shown in the specification,

and the upper limit and the lower limit of the j output power of the renewable energy source unit.

6. The multi-region power grid collaborative optimization method according to claim 1, wherein in the step of designing the multi-region power grid collaborative optimization model containing renewable energy sources into the reinforcement learning model in the multi-agent environment according to the state space, the action space, the environment and the reward function, the state space variables comprise the load data p of each agent region _L Renewable energy source actual output power

Renewable energy output power upper limit

Actual output power of thermal power generating unit

Start and stop of thermal power generating unit

Renewable energy output power

The expression of the motion space is as follows:

in a multi-region power grid collaborative optimization model containing renewable energy sources, the opposite number of the objective function is used as the instant reward of each intelligent agent(ii) a According to the equality constraint and the inequality constraint established for the target function, if the corresponding variable does not satisfy the constraint, a penalty value r is set _push And the real-time reward is taken as a final reward function of the intelligent agent, and the expression is as follows:

R＝{F+r _push }。

7. the multi-region power grid collaborative optimization method according to claim 1, wherein in the step of solving the reinforcement learning model in the multi-agent environment, a MADDPG algorithm is adopted for solving, and in the MADDPG algorithm, an Actor of an agent i obtains state information s related to the Actor _i ，a _i Actions taken for agent i, r _i For awarding, θ _i For the weight parameter of the agent, n agents are arranged, and the observation set x =(s) ₁ ,...,s _N ) Status information for all agents; the Actor continuously updates the self parameter theta _i To maximize the expectation of the reward itself, i.e., the Critic evaluation is higher;

the policy update rule expression of Actor is as follows:

in the formula (I), the compound is shown in the specification,

for each agent, a centralized state-action value function

MADDThe PG algorithm adopts a continuous determination strategy set mu of n agents; for 0-1 motion

Random values are taken in the training stage;

in the formula (I), the compound is shown in the specification,

θ′ _i ＝(1-τ)θ′ _i +τθ _i

let strategy μ of agent i _i With a set of K sub-strategies, only one sub-strategy mu being used in each training round _i ^(k) In the whole training process, the overall reward of the strategy set is the highest, and the final strategy of the Actor is updated as follows:

8. the multi-region power grid collaborative optimization method according to claim 7, wherein the step of solving the reinforcement learning model in the multi-agent environment by using a MADDPG algorithm specifically comprises:

setting an optimized scheduling time period T and setting the number of training rounds M of each agent, wherein the initial values are all set to 1, and the initial values areTime-random setting of intelligent agent network parameter theta _i ；

randomly sampling k strategy next group (x) from empirical playback unit D ^(k) ,a ^(k) ,r ^(k) ,x′ ⁽ k ⁾ ) Updating critic and actor parameters and updating target network parameters;

9. A multi-region grid collaborative optimization system, comprising:

the observation data collection module is used for collecting the observation data of each intelligent agent region in the multi-region power grid to be optimized;

10. The multi-region power grid collaborative optimization system according to claim 9, further comprising an agent partitioning module configured to partition an electric grid agent;

11. The multi-region grid collaborative optimization system according to claim 9, wherein the observation data of each agent region collected by the observation data collection module includes:

load data p _L Renewable energy source actual output power

Actual output power of thermal power generating unit

12. The multi-region grid collaborative optimization system according to claim 11, wherein the collaborative optimization model building module establishes the following objective function:

wherein N is the number of divided regions,

abandon wind and abandon light punishment for regional n renewable energy unitPenalty; wherein the content of the first and second substances,

and

respectively as follows:

the operation of the thermal power generating unit i is shown,

the shutdown cost of the thermal power generating unit i is calculated;

the shutdown time of the thermal power generating unit i is shown;

13. The multi-region grid collaborative optimization system according to claim 12, wherein the collaborative optimization model building module establishes equality constraints and inequality constraints on the objective function;

the expression of the network flow constraint is as follows:

in the formula, P _G,i 、Q _G,i Active and reactive power injected from the node i; p _D,i 、Q _D,i Active and reactive power consumed for the load of the node i; n is a radical of _e Counting the nodes of the power distribution network; u shape _i 、U _j The voltage amplitudes of the node i and the node j are obtained; g _ij 、B _ij Respectively conductance and susceptance; theta _ij Is the phase angle difference between the node i and the node j;

the expression of the power balance constraint is as follows:

in the formula, M _r For the number of loads in the region n,

power for load r over time period t;

the expression of the branch current upper limit constraint is as follows:

in the formula I _ij The magnitude of the current flowing on branch ij,

is the maximum amplitude of the current allowed to flow on branch ij;

wherein the content of the first and second substances,

in the formula (I), the compound is shown in the specification,

the thermal power generating unit climbing constraint expression is as follows:

in the formula (I), the compound is shown in the specification,

reserve capacity for node r load at time t，R ^w Predicting an error for the output of the renewable energy unit j;

in the formula (I), the compound is shown in the specification,

14. The multi-region grid collaborative optimization system according to claim 9, wherein the reinforcement learning model design module is configured to design the renewable energy-containing multi-region grid collaborative optimization model as a reinforcement learning model in a multi-agent environment according to a state space, an action space, an environment and a reward function, wherein the state space variables include load data p of each agent region _L Actual output power of renewable energy

Renewable energy output power upper limit

Actual output power of thermal power generating unit

Operating cost coefficient c of thermal power generating unit _o Cost coefficient c of abandoned wind and abandoned light _g Cost coefficient of starter SU ⁱ Shutdown cost factor SD ⁱ And node voltage U _i (ii) a The expression of the state space is as follows:

Start-stop of thermal power generating unit

Renewable energy output power

The expression of the action space is as follows:

R＝{F+r _push }。

15. the multi-region grid collaborative optimization system according to claim 9, wherein the model solving module adopts a madpg algorithm in which an Actor of an agent i obtains its own relevant state information s to solve the reinforcement learning model in the multi-agent environment _i ，a _i Taken for agent iAction, r _i For awarding, θ _i For the weight parameter of the agent, n agents are arranged, and the observation set x =(s) ₁ ,...,s _N ) Status information for all agents; the Actor continuously updates the self parameter theta _i To maximize the expectation of the own prize, i.e., the Critic evaluation value is higher;

the policy update rule expression of Actor is as follows:

in the formula (I), the compound is shown in the specification,

for each agent, as a centralized state-action value function

The MADDPG algorithm adopts a continuous determination strategy set mu of n agents; for 0-1 motion

Random values are taken in the training stage;

in the formula (I), the compound is shown in the specification,

θ′ _i ＝(1-τ)θ′ _i +τθ _i

16. the multi-region grid collaborative optimization system according to claim 15, wherein the step of the model solution module using madpg algorithm to solve the reinforcement learning model in the multi-agent environment specifically comprises:

17. An electronic device, comprising:

a memory storing at least one instruction; and

a processor executing instructions stored in the memory to implement the multi-region grid collaborative optimization method of any of claims 1 to 8.

18. A computer-readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the multi-region grid collaborative optimization method according to any of claims 1 to 8.