CN115860180A

CN115860180A - Power grid multi-time scale economic dispatching method based on consistency reinforcement learning algorithm

Info

Publication number: CN115860180A
Application number: CN202211370283.XA
Authority: CN
Inventors: 郭笑岩; 盛潜; 卢坤; 傅荣荣; 李成睿; 汪涌泉; 刘家欣; 孔令稷
Original assignee: Weihai Power Supply Co of State Grid Shandong Electric Power Co Ltd
Current assignee: Weihai Power Supply Co of State Grid Shandong Electric Power Co Ltd
Priority date: 2022-11-03
Filing date: 2022-11-03
Publication date: 2023-03-28

Abstract

The invention relates to a power grid multi-time scale economic dispatching method based on a consistency reinforcement learning algorithm, which comprises the following steps of 1, acquiring network topological structure data of an intelligent power grid, and establishing a combined economic dispatching model based on unit combination and load distribution; step 2, carrying out preliminary solution on the model through a reinforcement learning intelligent algorithm to obtain the rough output and the accurate start-stop condition of each adjustable unit in the intelligent power grid at the time t; step 3, with reference to the accurate start-stop condition, optimizing the rough output in the step 2 through a completely distributed consistency algorithm to obtain the accurate output of each adjustable unit at the time t, and finishing the initial optimization scheduling of each adjustable unit; and 4, performing optimized scheduling on the flexible load in the adjustable unit by using a multi-time scale scheduling strategy according to the power predicted value of the wind power and the marginal cost of the flexible load, and eliminating the uncertainty of the wind power output. The method can realize the economic dispatching of the power grid and guarantee the stable operation of the power grid.

Description

Power grid multi-time scale economic dispatching method based on consistency reinforcement learning algorithm

Technical Field

The invention relates to the technical field of economic dispatching of smart power grids, in particular to a power grid multi-time scale economic dispatching method based on a consistency reinforcement learning algorithm.

Background

Nowadays, the development of renewable energy is changing day by day, and a Smart Grid (Smart Grid) containing a large-scale Distributed Generation (DG) has attracted the sight of a plurality of researchers as a new energy supply mode. Due to the large-scale popularization of renewable energy sources, the uncertainty of the distributed power supply has a certain influence on the system economy, and the stability, economy and safety of the smart grid are ensured by the dispatchable equipment. Therefore, coordination among various links of source-network-load-storage in the intelligent power grid is coordinated, economic dispatching with huge data volume and complex working conditions is completed, and the economic and stable operation of the whole system is realized, so that the method becomes a research hotspot concerned by the academic community.

Aiming at the problem of optimal scheduling in the smart grid, a centralized algorithm is usually adopted for solving, namely, the operation data of all schedulable devices are uploaded to a central scheduling center for centralized calculation, processing and uniform release. However, as the DGs incorporated into the power grid increase year by year, the number and types of users and the requirements on plug-and-play power utilization modes make the effect of a centralized algorithm when the centralized algorithm is used for processing complex smart power grid scheduling problems not obvious. Therefore, the distributed algorithm suitable for large-scale decentralized computing is favored by most scholars. The document 'two-layer power optimized distribution of a multi-energy local area network facing to an energy internet' (Miyang, liu hong industry, song dynasty, lizhang war, fuyang, lizhuang Kun. Electric power automation equipment, 2018,38 (07): 1-10.) proposes an economic dispatching method based on an improved consistency theory aiming at the energy internet to optimize the output of a unit in the multi-energy local area network; in the literature, "active power distribution network source-load-storage distributed coordination optimization operation (II): a consistency algorithm considering a non-ideal telemetry environment" (Xixi lin, song Yi group, yao well-faithful, strict. Chinese Motor engineering report, 2018,38 (11): 3244-3254.) is used for robustly optimizing the scheduling plan of each adjustable unit in the active power distribution network in a fully distributed mode through an improved consistency strategy. However, the system unit combination sequence is dynamically changed along with the balance of supply and demand of the system, and the economic dispatching problem of the smart grid cannot be effectively solved only by paying attention to the output distribution of each unit. In addition, the influence of uncertainty of wind power output is a non-negligible problem.

With the rise of artificial intelligence, research on Reinforcement Learning (RL) has been deepened, and the advantages of the research on solving the optimization problem of the power system have been accepted by more and more scholars. In the document, "micro-grid composite energy storage coordination control method based on deep reinforcement learning" (zhuan-san-shi, qiuliming, zhuoxia, xushunwei, haxing-grid technology, 2019,43 (06): 1914-1921), an islanded running comprehensive energy system is constructed, and optimized scheduling is realized by improving the reinforcement learning method. Although the above research can use RL to complete the Pareto optimal solution set problem of multi-objective optimization, the reinforcement learning algorithm cannot embody its advantages when encountering the problems in the aspects of continuous variable operation and DG plug and play characteristic processing.

Disclosure of Invention

In order to solve the problems in the prior art, the utility model provides a power grid multi-time scale economic dispatching method based on a consistency reinforcement learning algorithm, which not only solves the uncertainty problem of unit combination by using a reinforcement learning intelligent algorithm, but also optimizes the economic output of each unit by a consistency algorithm with a higher operation speed to obtain optimal economic power distribution, and also gradually eliminates the uncertainty of wind power output by using a multi-time scale dispatching strategy to ensure the overall stable operation of the intelligent power grid.

In order to achieve the above purpose, the present application provides a power grid multi-time scale economic dispatching method based on a consistency reinforcement learning algorithm, including the following steps:

step 1, acquiring network topology structure data of an intelligent power grid, and establishing a combined economic dispatching model based on unit combination and load distribution;

step 2, carrying out preliminary solution on the combined economic dispatching model through a reinforcement learning intelligent algorithm to obtain the rough output of each adjustable unit i in the intelligent power grid at the time t

And accurate Start-stop Condition I _i,t ；

Step 3, referring to the accurate start-stop condition I in the step 2 _i,t Coarse contribution to step 2 by a fully distributed consensus algorithm

Optimizing to obtain the accurate output P of each adjustable unit i at the moment t _i,t Thus finishing the preliminary optimized scheduling of each adjustable unit;

and 4, according to the power predicted value of the wind power and the marginal cost of the flexible load, further optimizing and scheduling the flexible load in the adjustable unit from three time scales of 24h, 1h and 15min by using a multi-time-scale scheduling strategy, and eliminating the uncertainty of the wind power output so as to realize the economic scheduling of the power grid.

In some embodiments, in step 1, the expression of the joint economic scheduling model based on the unit combination and the load distribution is:

wherein N represents the total number of adjustable units, the adjustable units comprising a thermal power generator, a fan and a flexible load; t represents the total scheduling time, I _i,t Indicating that the adjustable unit i is in operation or stopped at the moment t; c _i,D (t) represents the shutdown cost C of the ith adjustable unit at time t _D ，C _i,U (t) represents the start-up cost C of the ith adjustable unit at time t _U ，C _i (P _i,t ) Representing a cost function C of the ith adjustable unit under the power P at the moment t, namely representing the generated power for the generator and the absorbed power for the flexible load; wherein

Wherein alpha is _i 、β _i And gamma _i Is the cost coefficient of the adjustable unit i;

all tunable units i =1,2, in the joint economic dispatch model, N is to satisfy the following constraint: the method comprises the following steps of (1) power distribution constraint conditions, adjustable unit capacity constraint conditions, unit shortest continuous operation/shutdown time constraint conditions and unit climbing characteristic constraint conditions;

wherein the power allocation constraint is expressed as:

wherein P is _loss Representing power loss, generally transmission loss accounts for about 3% to 7% of the total load, and D is non-adjustable rigid power and comprises rigid load and wind power;

the capacity constraint of the tunable element is expressed as:

P _i ^min ≤P _i ≤P _i ^max (4)

wherein P is _min And P _max Respectively representing ultimate minimum and maximum output power data of the adjustable unit i;

the constraint on the shortest continuous operation/downtime of the unit is expressed as:

wherein, T _i,U And T _i,D Respectively, the minimum uptime T of the adjustable unit i _U And minimum continuous downtime T _D ；X _i,ON (t-1) is the total amount of time that the tunable element i has been continuously operated at time t-1, and X _i,OFF (t-1) represents the total amount of time that the tunable element i has been continuously stopped at time t-1; i is _i,t-1 Indicating that the adjustable unit i is in operation or stopped at the moment t-1;

the unit climbing characteristic constraint expression is as follows:

-R _i,D ≤(P _i,t -P _i,t-1 )I _i,t I _i,t-1 ≤R _i,U (6)

wherein R is _i,U And R _i,D The upper limit R of the climbing constraint of the ith adjustable unit is _U And a lower limit R _D ；P _i,t-1 Indicating the precise output of the tunable element i at time t-1.

In some embodiments, the step 2 specifically includes the following steps:

step 201: accurate output P of each adjustable unit i at time t in smart grid _i,t And accurate Start-stop Condition I _i,t State set S representing tunable element set I _I,t And action set A _I,t Element (S) of (1), establishing an initial state-action set function Q (S) _I,t ,A _I,t ) Wherein Q (S) _I,t ,A _I,t ) State-action value function Q (S) at time t, referring to set I of tunable elements I proxied by all agents _i,t ,A _i,t ) A set of (a);

step 202: choosing tunable set of units I using greedy algorithmAn action value a at a time t +1 _I,t+1 And form an action set A at the next time t +1 _I,t+1 ；

Step 203: action set A for next moment t +1 through reward penalty function rew _I,t+1 Selecting or rejecting the reward value r, and selecting an optimal action value;

step 204: according to the action value a of the adjustable unit set I at the current moment t _I,t And the next time t +1 optimal action value a _I,t+1 The state-action set function Q (S) at the next time t +1 is updated _I,t+1 ,A _I,t+1 ) And jumping to step 202 and looping to step 204;

step 205: current state-action value function Q (S) _i,t ,A _i,t ) When the accumulated value is updated to the preset degree, the optimal action value of the highest value function can be judged to obtain the optimal strategy pi of unit scheduling ^* That is, the accurate start/stop condition I of each adjustable unit I in the smart grid at the time t _i,t And a coarse force

In some embodiments, in step 201, the accurate output P of each adjustable unit i at time t in the smart grid is utilized _i,t And accurate Start-stop Condition I _i,t State set S representing tunable element set I _I,t And action set A _I,t The element (2) is specifically represented as follows:

in said step 203, according to the constraint conditions of the joint economic dispatch model, defining a reward function rew after the adjustable unit i of the agent executes actions at the time t _i,t Comprises the following steps:

rew _i,t ＝r ₁ +r ₂ +r ₃ +r ₄ (8)

/>

wherein r is ₁ Offsetting bonus items for power balancing, r ₂ To restrict the reward item for climbing, r ₃ And r ₄ Constraining the reward item, Δ p, for minimum continuous upper and lower bounds ₁ And Δ p ₂ According to the sum of the collected powers and the total load L as deviation threshold value _all To select the degree of deviation of r ₁ Thereby realizing the rough adjustment of the power balance;

in said step 204, the state-action set function updates the iterative formula as follows:

where the subscript I represents the set of tunable elements I proxied by all agents and η represents the state S _i,t Agent of lower tunable unit i takes action A at time t _i,t The learning rate of time, τ, represents the discount coefficient, rew _I,t Representing a reward variable set generated by decision actions of a set I of adjustable units I proxied by all agents at the current moment t;

in step 205, when the state-action value function is cumulatively updated to a preset level, it can be determined to obtain the optimal action value of the maximum function, i.e. the action value of the maximum functionOptimal strategy pi for scheduling to unit ^* (s _i,t ) The concrete formula is as follows:

in some embodiments, the step 3 specifically includes the following steps:

step 301: according to the accurate start-stop condition I obtained in the step 205 _i,t And a coarse force

Initializing the state and the output of each adjustable unit;

step 302: and performing operation iteration on the output of each adjustable unit by adopting a consistency double-order gradient descent estimation algorithm, updating consistency variables of each adjustable unit, and when each consistency variable converges to the same value, obtaining a result which is the planned optimal solution of the output of each adjustable unit.

In some embodiments, in the step 301, when setting the initial value, the initial variable P is set _i (0) Satisfying the initial constraint formula to make the power deviation of the iteration result approach to 0; constructing a consistency matrix W, replacing the consistency matrix W with a Laplace matrix, and constructing an initial constraint formula and the consistency matrix W as shown in formulas (16-17):

wherein, P _i (0) The coarse contribution in step 205 is shown as an initial variable in this step,

indicates that the adjustable unit i calculated in iteration 0 has a deviation adjustment term->

D is the non-adjustable rigid power, L (G) is set to 0-1, the matrix of graph G has zero diagonal elements and D off-diagonal elements _ij 。

In some embodiments, in said step 302, the iterative update formula of the consistent two-step gradient descent estimation algorithm is as follows:

wherein, mu _i (k) The consumption micro-increment rate mu of the adjustable unit i in the k iteration calculation is represented, and also represents a consistency variable;

indicates that the adjustable unit i calculated in the kth iteration has a deviation adjustment term->

δ is the adjustment coefficient, set to 0.01; w _ij Is an element of the matrix W, v _ij Is the element of the matrix W after transposition; p _i (k) Representing the accurate power of the adjustable unit i after the k iterative computation; in the iterative calculation process, the formula (20) determines the convergence direction of the consistency variable so that the optimization result continuously approaches to the optimal solution meeting the power balance constraint condition;

by updating the exponential moving average of the first and second order gradients and by the hyperparameter χ ₁ Hexix- ₂ Controlling the exponential decay rate to realize optimization; biased first order gradient descent estimation m (k) and second order gradient descent estimation m (k) of kth iterationThe specific iterative formula for the gradient descent estimate v (k) is shown below:

wherein the content of the first and second substances,

refers to the gradient of the cost function C to the power P after the k-1 iteration calculation,

the second-order gradient of the cost function C to the power P after the (k-1) th iterative computation is referred to; the specific iterative formula of the offset corrected first-order gradient descent estimate m '(k) and second-order gradient descent estimate v' (k) is shown as follows:

wherein

And &>

Representing the hyper-parameter χ ₁ Hexix ₂ To the k power, the core update iteration formula of the proposed fully distributed consistency algorithm is expressed as follows:

where ε is the coefficient used for fine tuning, w _ij 'is the ith row and jth column element of the transformation matrix W', and satisfies 1 ^T W'＝0 ^T And W'1=0,X _i Is in accordance withIndex set, x, with adjacent variables i _i Is the weight of the consistency variable i; by distributing formula (19) and formula (23), each variable μ can be made _i And P _i Receiving only the adjacent variable mu _j And P _j Under the condition of parameter information, the global optimal solution can still be obtained through algorithm calculation.

In some embodiments, the step 4 specifically includes the following steps:

step 401: power P for a wind turbine with uncertainty taken into account _iw Performing predictive modeling; constructing a marginal cost function formula for flexible loads, namely excitation type loads and interruptible type loads;

step 402: and performing optimized scheduling on each flexible load resource from three time scales of a 24h scheduling stage before the day, a 1h scheduling stage in the day and a 15min scheduling stage in the day, so as to reduce unbalanced power and furthest absorb uncertainty of wind power output.

In some embodiments, in said step 401, first, at time t, the power P for the ith wind turbine group taking into account uncertainty _iw (t), let it approximately obey N (P) _wf (t),σ ² ) Mean value P _wf (t) the active power predicted value output by the wind turbine generator at the moment t is represented, and the variance sigma ² The error level of the power prediction is characterized and changes along with the change of the prediction time scale; the active output of the fan can be expressed as:

wherein x is the actual substitution amount;

for flexible load, the user judges the trade according to the current market price, and the marginal cost formula is shown as the following formula:

μ _h1 ＝-θ _h0 (1+2△P _Dh )/φ _h P _Dh0 +a _h2 θ _h0 (26)

wherein phi _h Is the h flexible load self-elastic coefficient, delta P _Dh Denotes the adjustment difference, P, of the h-th flexible load power _Dh0 Initial power of h flexible load, theta _h0 For the h flexible load initial price of electricity, a _h2 The h flexible load electricity price conversion rate; the scheduling cost of the flexible load is represented by the following equation:

C _Lh1 (△P _Dh )＝μ _h1 ·△P _Dh (27)

wherein Δ P _Dh Represents the adjustment difference of the h-th flexible load power, C _Lh1 (△P _Dh ) And (4) representing the flexible load scheduling cost under the adjustment difference value of the h flexible load power.

In some embodiments, in said step 402, the scheduling policy is divided into 3 time scales: in the 24h day-ahead scheduling stage, the 1h day scheduling stage and the 15min day scheduling stage, each adjustable unit, excitation type and interruptible type load resource participate in the optimized scheduling on the three time scales:

1. a 24h scheduling stage before day: the method is executed once every 24 hours, the 24 hours are divided into 96 time periods, and a scheduling plan of the 96 time periods is made for the unit combination in the next day;

2. scheduling stage for 1h in day: the method is executed every 1 hour, 1 hour is divided into 4 time periods, and a scheduling plan of the 4 time periods is made for the power distribution of the adjustable units within one hour in the future;

3. 15min scheduling stage in day: every 15 minutes, a final distribution plan is made for the compliant load force in the next 15 minutes.

The scheme has the advantages that the power grid multi-time scale economic dispatching method based on the consistency reinforcement learning algorithm has the following advantages:

(1) The adjustable units are intelligently processed through a reinforcement learning algorithm, and the optimal state and the rough output condition of each adjustable unit are calculated, so that the global optimal scheduling of the unit combination is realized, meanwhile, preparation is made for the next optimal power distribution, and the intelligent power grid environment with large data volume and complex topological structure can be dealt with.

(2) The multi-time scale economic dispatching method of the intelligent power grid belongs to a completely distributed method, all variables in the method are connected together through the topological structure data of the grid structure and the consistency principle, and the uncertainty of unit combination is considered; therefore, when a unit is shut down or a new unit is put into operation, a new global optimal solution can be calculated, and the method can adapt to the plug and play characteristic of the distributed energy.

(3) The method combines the equal micro-increment rate criterion in the power system with the consistency principle, and adds the adjustment items, thereby not only ensuring that the micro-increment rate of each adjustable unit is converged to the optimal value in the iterative updating process, but also gradually reducing the unbalanced power of the whole system in the iterative process, and realizing the optimization of economic dispatch and the rationality of operation results.

(4) A dual-order gradient descent estimation algorithm is added for improvement, the power distribution of each adjustable unit processed by a consistency algorithm is further optimized, the integral cycle times of the algorithm are reduced, and the convergence speed of the algorithm is improved.

(5) According to the characteristic that the prediction precision of the wind power output is gradually improved along with the time scale and different scheduling elasticity of each flexible load, the careful optimization of the three time scales of the flexible loads is determined, the uncertainty of the wind power output is greatly eliminated, the wind abandoning rate is reduced, and the stability and the economy of the system are improved.

(6) Compared with a conventional planning algorithm, the method has better instantaneity, when the computing environment changes, all variables do not need to be initialized and calculated again, decision operation is carried out based on the current state variables, and online decision can be carried out; compared with the popular heuristic algorithm, the method can more stably achieve the convergence result on the premise of having the rapid convergence, and has better robustness; in addition, the method combines the variable information of each node together through the consistency principle to participate in operation, completes global optimization, realizes completely distributed computation, and does not need a centralized control computation center.

(7) The multi-time scale economic dispatching method for the smart grid is an economic dispatching system for controlling all adjustable units such as generators and flexible loads in the smart grid, comprises multi-time scale optimization, and is characterized in that all adjustable units are subjected to economic optimization from fine to integral, and a theoretical system is complete.

Drawings

Fig. 1 shows a flowchart of a power grid multi-time scale economic dispatching method based on a consistency reinforcement learning algorithm in an embodiment.

FIG. 2 illustrates an IEEE-30 node Smart grid simulation in an embodiment.

FIG. 3 shows a wind power output multi-time scale prediction data graph in an embodiment.

Fig. 4 shows a simulation diagram of the consistent variable under single scheduling in the embodiment.

Fig. 5 shows an unbalanced power simulation diagram under single scheduling in an embodiment.

Fig. 6 shows a simulation diagram of the consistent variable under plug and play in the embodiment.

Fig. 7 shows an unbalanced power dissipation simulation diagram in an embodiment.

Detailed Description

The following description of the embodiments of the present application will be made with reference to the accompanying drawings.

As shown in fig. 1, the power grid multi-time scale economic dispatching method based on the consistency reinforcement learning algorithm includes the following steps:

step 1, network topology structure data of the intelligent power grid are obtained, and a combined economic dispatching model based on unit combination and load distribution is established.

And accurate Start-stop Condition I _i,t 。

Step 3, referring to the accurate start-stop condition I in the step 2 _i,t By complete reaction ofDistributed consistency algorithm to the coarse contribution in step 2

Optimizing to obtain the accurate output P of each adjustable unit i at the moment t _i,t And therefore, the preliminary optimized scheduling of each adjustable unit is completed. />

And 4, according to the power predicted value of the wind power and the marginal cost of the flexible load, further optimizing and scheduling the flexible load in the adjustable unit from three time scales of 24h, 1h and 15min by using a multi-time-scale scheduling strategy, and eliminating uncertainty of wind power output so as to realize economic scheduling of a power grid and guarantee the operation stability of the whole intelligent power grid system.

The following is a detailed description of the above steps:

the objective of the smart grid economic dispatch is to find a unit combination sequence and power distribution within a period of time T when a smart grid comprising a plurality of adjustable units supplies power to a total load, so that the total running cost of the smart grid is minimized.

In the step 1, an expression of a joint economic dispatching model based on unit combination and load distribution is as follows:

wherein N represents the total number of adjustable units, the adjustable units comprising a thermal power generator, a fan and a flexible load; t represents the total scheduling time, I _i,t Indicating that the adjustable unit i is in operation or stopped at the moment t; c _i,D (t) represents the shutdown cost C of the ith adjustable unit at time t _D ，C _i,U (t) represents the start-up cost C of the ith adjustable unit at time t _U ，C _i (P _i,t ) The cost function C of the ith adjustable unit at power P at time instant tth is shown, i.e. for the generator the generated power and for the flexible load the absorbed power. Typically the running cost function is a convex quadratic function.

Wherein alpha is _i 、β _i And gamma _i Is the cost factor of the tunable element i.

All tunable units i =1, 2.. N in the joint economic dispatch model satisfy the following constraints: a constraint of power allocation, a constraint of capacity of the adjustable unit, a constraint of minimum continuous operation/down time of the unit, and a constraint of climbing characteristics of the unit.

Wherein the power allocation constraint is expressed as:

wherein P is _loss Representing power loss, typically transmission loss accounts for about 3% to 7% of the total load, and D is non-adjustable rigid power, including rigid load and wind power.

The capacity constraint of the tunable element is expressed as:

wherein P is _min And P _max Respectively representing the limit minimum and maximum output power data of the adjustable unit i.

wherein, T _i,U And T _i,D Respectively, the minimum uptime T of the adjustable unit i _U And minimum continuous downtime T _D ；X _i,ON (t-1) is the total amount of time that the tunable element i has been continuously operated at time t-1, and X _i,OFF (t-1) indicates that the tunable element i has been continuously stopped at time t-1Total number of times elapsed; i is _i,t-1 Indicating that the tunable element i is in operation or is off at time t-1.

The unit climbing characteristic constraint expression is as follows:

-R _i,D ≤(P _i,t -P _i,t-1 )I _i,t I _i,t-1 ≤R _i,U (6)

wherein R is _i,U And R _i,D The upper limit R of the climbing constraint of the ith adjustable unit _U And a lower limit R _D ；P _i,t-1 Indicating the precise output of the tunable element i at time t-1.

In this embodiment, the step 2 specifically includes the following steps:

step 201: accurate output P of each adjustable unit i at time t in smart grid _i,t And accurate Start-stop Condition I _i,t State set S representing tunable element set I _I,t And action set A _I,t Element (S) of (1), establishing an initial state-action set function Q (S) _I,t ,A _I,t ) Wherein Q (S) _I,t ,A _I,t ) State-action value function Q (S) at time t, referring to set I of tunable elements I proxied by all agents _i,t ,A _i,t ) A collection of (a).

Step 202: selecting action value a of adjustable unit set I at next moment t +1 by using greedy algorithm _I,t+1 And form an action set A at the next time t +1 _I,t+1 。

Step 203: action set A for next moment t +1 through reward penalty function rew _I,t+1 And selecting the optimal action value by taking or rejecting the reward value r.

In step 201, based on the joint economic dispatch model in step 1, a reinforcement learning algorithm suitable for large-scale data calculation is adopted in this step to perform preliminary solution. Accurate output P of each adjustable unit i at time t in smart grid _i,t And accurate Start-stop Condition I _i,t State set S representing tunable element set I _I,t And action set A _I,t The element (2) is specifically represented as follows:

rew _i,t ＝r ₁ +r ₂ +r ₃ +r ₄ (8)

/>

wherein r is ₁ Offsetting bonus items for power balancing, r ₂ To restrict the reward item for climbing, r ₃ And r ₄ Constraining the reward item, Δ p, for minimum continuous upper and lower bounds ₁ And Δ p ₂ According to the sum of the collected powers and the total load L as deviation threshold value _all To select the degree of deviation of r ₁ Thereby enabling a coarse adjustment of the power balance.

where the subscript I represents the set of tunable elements I proxied by all agents and η represents the state S _i,t Agent of lower tunable unit i takes action A at time t _i,t The learning rate of time, τ, represents the discount coefficient, rew _I,t Representing the set of reward variables generated by the set I decision actions of all the tunable elements I proxied by the agent at the current time t.

For A in equation (7) _I,t Is updated by a greedy algorithm. Wherein the action currently obtaining the highest value is selected with probability 1-epsilon, and other actions are possibly selected with probability epsilon and the like. The iterative formula (14) is updated after selection, and can be based on the current state-action value function Q (S) _i,t ,A _i,t ) Changing the scheduling policy pi(s) of the current unit _i,t ) I.e. the set of current crew scheduling conditions. When the state-action value function is cumulatively updated to a preset degree, the function can be judged to obtain the optimal action value of the highest value function, and the optimal strategy pi for unit scheduling is obtained ^* (s _i,t ) The concrete formula is as follows:

in this embodiment, the step 3 specifically includes the following steps:

Initializing the state and output of each tunable element.

Step 302: and performing operation iteration on the output of each adjustable unit by adopting a consistency dual-order gradient descent estimation algorithm, updating consistency variables of each adjustable unit, and when each consistency variable converges to the same value, obtaining a result which is the planned optimal solution of the output of each adjustable unit.

Specifically, in the step 301, when an initial value is set, the initial variable P is set _i (0) Satisfying the initial constraint formula to make the power deviation of the iteration result approach to 0; constructing a consistency matrix W, adopting a Laplace matrix to replace the consistency matrix W, and constructing an initial constraint formula and the consistency matrix W as shown in a formula (16-17):

In step 302, the iterative update formula of the consistency two-step gradient descent estimation algorithm is as follows:

/>

δ is the adjustment coefficient, set to 0.01; w _ij Is an element of the matrix W, v _ij Is the element of the matrix W after transposition; p is _i (k) Representing the accurate power of the adjustable unit i after the k iterative computation; during the iterative calculation, the formula (20) determines the convergence direction of the consistency variables, so that the optimization result continuously approaches the optimal solution meeting the power balance constraint condition.

In the present embodiment, the average is moved by updating the exponentials of the first and second order gradients, and by the hyperparameter χ ₁ Hexix- ₂ Controlling the exponential decay rate to realize optimization; the specific iterative formula of the biased first-order gradient descent estimate m (k) and the second-order gradient descent estimate v (k) of the kth iteration is shown as follows:

wherein, the first and the second end of the pipe are connected with each other,

the second-order gradient of the cost function C to the power P after the (k-1) th iteration calculation is referred to. The specific iterative formula of the offset corrected first-order gradient descent estimate m '(k) and second-order gradient descent estimate v' (k) is shown as follows:

wherein

And &>

Representing the hyper-parameter χ ₁ Hexix- ₂ To the k power, the core update iteration formula of the proposed fully distributed consensus algorithm is expressed as follows:

where ε is the coefficient for fine tuning, w _ij 'is the ith row and jth column element of the transformation matrix W', and satisfies 1 ^T W'＝0 ^T And W'1=0,X _i Is a set of indices, x, adjacent to a consistency variable i _i Is the weight of the consistency variable i. By distributing formula (19) and formula (23), each variable μ can be made _i And P _i Receiving only the adjacent variable mu _j And P _j Under the condition of parameter information, the global optimal solution can still be obtained through algorithm calculation.

The uncertainty of wind power output can cause a certain resource waste phenomenon in the electric energy scheduling of the smart grid, and the resource waste phenomenon needs to be balanced in consumption, so that the energy utilization rate and the economy are improved. Because the prediction precision of the wind power output has the characteristic of being gradually improved along with the time scale, and each flexible load has different scheduling elasticity, the wind power prediction and the flexible load need to be coordinated on different time scales when participating in scheduling.

In this embodiment, the step 4 specifically includes the following steps:

step 401: power P for a wind turbine with uncertainty taken into account _iw Performing predictive modeling; and constructing a marginal cost function formula for flexible loads, namely excitation type loads and interruptible type loads.

Specifically, in step 401, first, at time t, power P for the ith wind turbine generator set considering uncertainty _iw (t) let it approximately obey N (P) _wf (t),σ ² ) Mean value P _wf (t) the active power predicted value output by the wind turbine generator at the moment t is represented, and the variance sigma ² The error level of the power prediction is characterized and changes along with the change of the prediction time scale; the active output of the fan can be expressed as:

wherein x is the actual substitution amount.

μ _h1 ＝-θ _h0 (1+2△P _Dh )/φ _h P _Dh0 +a _h2 θ _h0 (26)

C _Lh1 (△P _Dh )＝μ _h1 ·△P _Dh (27)

In the step 402, the scheduling policy is divided into 3 time scales: in the 24h day-ahead scheduling stage, the 1h day scheduling stage and the 15min day scheduling stage, each adjustable unit, excitation type and interruptible type load resource participate in the optimized scheduling on the three time scales:

1. a 24h scheduling stage before day: the method is executed once every 24 hours, the 24 hours are divided into 96 time periods, and a scheduling plan of the 96 time periods is made for the unit combination in the next day. On the basis of referring to the prediction data in the day ahead, each adjustable unit is optimized through the reinforcement learning algorithm mentioned in the step 2, wherein the flexible load participating in scheduling at the stage has low scheduling elasticity requirement, the flexible load with slow response speed and long adjustment period can also participate in scheduling, so that power overshoot is prevented, and the overall economy is influenced.

2. Scheduling stage for 1h in day: every 1 hour and dividing 1 hour into 4 time segments, making a 4-segment schedule for the adjustable unit power allocation in one hour in the future. And on the basis of the 1h prediction data in the day and the result of the day-ahead scheduling stage, adjusting the power of each adjustable unit in the next 1 hour by the fully distributed consistency algorithm mentioned in the step 3. The dispatching object is a flexible load with high response speed and short regulation period, and the dispatching elasticity requirement is high.

3. 15min scheduling stage in day: every 15 minutes, a final distribution plan is made for the compliant load force in the next 15 minutes. On the basis of the prediction data of 15min in the day and the scheduling result of 1h in the day, power adjustment is carried out on the flexible load with extremely high response speed and extremely short adjustment time so as to maximally absorb uncertainty of wind power output.

The effectiveness and superiority of the method according to the present application are verified by the following examples.

An IEEE-30 node intelligent power grid simulation system is built, and a simulation experiment is carried out. The IEEE-30 simulation algorithm is used for verifying the effectiveness of the method provided by the application under different working conditions and the superiority of the method compared with other traditional algorithms. An IEEE-30 node Power network is used for simulating a smart grid system, and a schematic structural diagram is shown in fig. 2, wherein a generator at a node 11 is replaced by a Wind driven generator, 20 loads are classified, the loads on the

nodes

2, 4, 7, 8, 10, 12, 17, 19, 24 and 29 are flexible loads, and Wind Power (WP) output multi-time scale prediction data is shown in fig. 3.

Assume that the current rigid total load is 118MW. Firstly, through strategy optimization in a scheduling stage 24h before the day, 5 generators in an IEEE-30 simulation system need to be started through calculation of the enhanced learning algorithm, and the total power unbalance amount is 0.94MW at the moment. This means that the IEEE-30 emulation system will receive a Single Dispatch Instruction (SDI) and perform a Single Dispatch action.

Then, through strategy optimization of two scheduling stages in the day, the power output of each adjustable unit is finely distributed through the improved consistency algorithm mentioned in the application, and simulation experiment data are shown in fig. 4 and 5.

As can be seen from the two simulation graphs, the consistency variables of the 15 adjustable units converge to the same value and tend to be stable, and the unbalanced power of the system finally approaches to 0, which proves that the proposed algorithm has remarkable convergence. Meanwhile, the algorithm operation is iterated for 54 times, the total operation time is 0.36741 second, and the algorithm operation efficiency is high.

Although it can be demonstrated in principle that the obtained result is the optimal solution according to the "equal consumption micro-rate increase criterion", the Cplex optimization toolkit is called here to further verify the correctness of the calculation result. And solving a Lagrange relaxation multiplier corresponding to the optimal solution, wherein the Lagrange relaxation multiplier can be used for representing a consistent variable value. The value is 5.623, which is consistent with the result in fig. 4, and the simulation result is proved to be correct.

The operation of the smart grid must meet the requirements for the DG plug-and-play (PAP) feature, and in order to verify the validity of the proposed method in the plug-and-play situation, the following scenario is set: the initial environmental conditions of the simulation experiment are the same as those of the previous experiment, and when a scheduling period (0.4 s) passes, the distributed power supply G16 is connected to the system node 23, and the operation parameters of the distributed power supply G16 are the same as those of the distributed power supply G3. The experimental simulation data is shown in fig. 6.

As can be seen from fig. 6, when the distributed power source G16 is connected to the system, it will bear a part of the power load, thereby relieving the output load of the other tunable elements, reducing the output power of the remaining tunable elements, and reducing the uniformity variation in the system as a whole. Therefore, the effectiveness of the method for realizing economic dispatching under the plug-and-play working condition in the smart grid can be proved.

In order to verify the effectiveness of the proposed multi-time scale scheduling strategy in the aspect of wind power consumption, the multi-time scale scheduling strategy provided by the application is used for carrying out one-day economic scheduling on the simulated smart grid based on the day-ahead prediction data of wind turbine output in FIG. 3, so that redundant wind power output in a system is consumed, and the utilization efficiency and the economy of energy are improved. Simulation data of the unbalanced power of the simulated smart grid in each scheduling stage are shown in FIG. 7.

As can be seen from fig. 7, when the time scale is large, the imbalance power is not well absorbed due to the large prediction error of the fan output, which is a normal phenomenon. And the phase enables the power adjustment deviation to be reduced to a certain range, and unbalanced power consumption can be realized by less flexible loads in the next day scheduling phase. And as the time scale is gradually reduced, the prediction error is gradually reduced, and the unbalanced power consumption effect is gradually obvious. When the time scale approaches zero, the effectiveness of the multi-time scale optimization scheduling strategy in wind power consumption is proved.

The power grid multi-time-scale economic dispatching method based on the consistency reinforcement learning algorithm has the following advantages:

(1) The intelligent processing is carried out on each adjustable unit through a reinforcement learning algorithm, the optimal state and the rough output condition of each adjustable unit are calculated, so that the global optimal scheduling of the unit combination is realized, meanwhile, the preparation is made for the next optimal power distribution, and the intelligent power grid environment with large data volume and complex topological structure can be dealt with.

(2) The multi-time scale economic dispatching method for the smart power grid belongs to a completely distributed method, all variables in the method are connected together through grid topological structure data and a consistency principle, and uncertainty of unit combination is considered; therefore, when a unit is shut down or a new unit is put into operation, a new global optimal solution can be calculated, and the method can adapt to the plug and play characteristic of the distributed energy.

(3) The equal micro-increment rate criterion in the power system is combined with the consistency principle, and the adjustment items are added, so that the micro-increment rates of all adjustable units are guaranteed to be converged to an optimal value in the iterative updating process, the unbalanced power of the whole system is gradually reduced in the iterative process, and the optimization of economic dispatching and the reasonability of operation results are realized.

(4) A dual-order gradient descent estimation algorithm is added for perfection, and the power distribution of each adjustable unit processed by a consistency algorithm is further optimized, so that the overall cycle number of the algorithm is reduced, and the convergence speed of the algorithm is improved.

Although the embodiments of the present application have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present application, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive effort by those skilled in the art.

Claims

1. A power grid multi-time scale economic dispatching method based on a consistency reinforcement learning algorithm is characterized by comprising the following steps: the method comprises the following steps:

step 1, acquiring network topology structure data of a smart power grid, and establishing a combined economic dispatching model based on unit combination and load distribution;

And accurate Start-stop Condition I _i,t ；

Step 3, refer to essence in step 2Quasi Start stop Condition I _i,t Coarse contribution to step 2 by a fully distributed consensus algorithm

2. The power grid multi-time scale economic dispatching method based on the consistency reinforcement learning algorithm as claimed in claim 1, characterized in that: in the step 1, an expression of a joint economic dispatching model based on unit combination and load distribution is as follows:

wherein N represents the total number of adjustable units, the adjustable units including a thermal power generator, a fan, and a flexible load; t represents the total scheduling time, I _i,t Indicating that the adjustable unit i is in operation or stopped at the moment t; c _i,D (t) represents the shutdown cost C of the ith adjustable unit at time t _D ，C _i,U (t) represents the start-up cost C of the ith adjustable unit at time t _U ，C _i (P _i,t ) Representing a cost function C of the ith adjustable unit under the power P at the moment t, namely representing the generated power for the generator and the absorbed power for the flexible load; wherein

wherein the power allocation constraint is expressed as:

the capacity constraint of the tunable element is expressed as:

P _i ^min ≤P _i ≤P _i ^max (4)

/>

the unit climbing characteristic constraint expression is as follows:

-R _i,D ≤(P _i,t -P _i,t-1 )I _i,t I _i,t-1 ≤R _i,U (6)

3. The power grid multi-time scale economic dispatching method based on the consistency reinforcement learning algorithm as claimed in claim 2, characterized in that: the step 2 specifically comprises the following steps:

step 202: selecting action value a of adjustable unit set I at next moment t +1 by using greedy algorithm _I,t+1 And form the action set A of the next time t +1 _I,t+1 ；

step 205: current state-action value function Q (S) _i,t ,A _i,t ) When the accumulated value is updated to the preset degree, the optimal action value of the highest value function can be judged to obtain the optimal strategy pi of unit scheduling ^* I.e. each tunable unit i in the smart grid is at timePrecise start and stop condition I under t _i,t And a coarse force

4. The power grid multi-time scale economic dispatching method based on the consistency reinforcement learning algorithm as claimed in claim 3, characterized in that: in step 201, the accurate output P of each adjustable unit i at time t in the smart grid is utilized _i,t And accurate Start-stop Condition I _i,t State set S representing tunable element set I _I,t And action set A _I,t The element (2) is specifically represented as follows:

rew _i,t ＝r ₁ +r ₂ +r ₃ +r ₄ (8)

/>

wherein r is ₁ Offsetting bonus items for power balancing, r ₂ To restrict the reward item for climbing, r ₃ And r ₄ Constraining the reward item for a minimum continuous upper and lower bound, Δ p ₁ And Δ p ₂ According to the sum of the collected powers and the total load L as deviation threshold value _all To select the degree of deviation of r ₁ Thereby realizing the rough adjustment of the power balance;

where the subscript I represents the set of tunable elements I proxied by all agents, and η represents the state S _i,t Agent of lower tunable unit i takes action A at time t _i,t The learning rate of time, τ, represents the discount coefficient, rew _I,t Representing a reward variable set generated by decision actions of a set I of adjustable units I proxied by all agents at the current moment t;

in step 205, when the state-action value function is cumulatively updated to a preset degree, the optimal action value of the highest value function can be determined, and the optimal strategy pi for unit scheduling is obtained ^* (s _i,t ) The concrete formula is as follows:

5. the power grid multi-time scale economic dispatching method based on the consistency reinforcement learning algorithm as claimed in claim 4, wherein: the step 3 specifically comprises the following steps:

Initializing the state and the output of each adjustable unit;

6. The power grid multi-time scale economic dispatching method based on the consistency reinforcement learning algorithm as claimed in claim 5, characterized in that: in the above step 301, when setting an initial value, an initial variable P is set _i (0) Satisfying the initial constraint formula to make the power deviation of the iteration result approach to 0; constructing a consistency matrix W, adopting a Laplace matrix to replace the consistency matrix W, and constructing an initial constraint formula and the consistency matrix W as shown in a formula (16-17):

D is non-adjustable rigid power, and L (G) is set to 0-1G matrix, diagonal elements are zero and off-diagonal elements are d _ij 。/>

7. The power grid multi-time scale economic dispatching method based on the consistency reinforcement learning algorithm as claimed in claim 6, characterized in that: in step 302, the iterative update formula of the consistency two-step gradient descent estimation algorithm is as follows:

by updating the exponential moving average of the first and second order gradients, and by the hyperparameter χ ₁ Hexix- ₂ Controlling the exponential decay rate to realize optimization; iteration kThe specific iterative formula of the biased first-order gradient descent estimate m (k) and the second-order gradient descent estimate v (k) is shown as follows:

wherein the content of the first and second substances,

means the gradient of the cost function C to the power P calculated in the (k-1) th iteration, and/or the value of the power P is selected>

The second-order gradient of the power P after the k-1 iteration calculation of the cost function C is pointed; the specific iterative formula of the offset corrected first-order gradient descent estimate m '(k) and second-order gradient descent estimate v' (k) is shown as follows:

wherein

And &>

where ε is the coefficient used for fine tuning, w _ij 'is the ith row and jth column elements of the transformation matrix W' and satisfies 1 ^T W'＝0 ^T And W'1=0,X _i Is a set of indices, x, adjacent to a consistency variable i _i Is the weight of the consistency variable i; by distributing formula (19) and formula (23), each variable μ can be made _i And P _i Receiving only the adjacent variable mu _j And P _j Under the condition of parameter information, the global optimal solution can still be obtained through algorithm calculation.

8. The power grid multi-time scale economic dispatching method based on the consistency reinforcement learning algorithm as claimed in claim 7, wherein: the step 4 specifically comprises the following steps:

9. The power grid multi-time scale economic dispatching method based on the consistency reinforcement learning algorithm as claimed in claim 8, wherein: in step 401, first, at time t, the power P of the ith wind turbine generator set considering uncertainty _iw (t) let it approximately obey N (P) _wf (t),σ ² ) Mean value P _wf (t) the active power predicted value output by the wind turbine generator at the moment t is represented, and the variance sigma ² The error level of the power prediction is characterized and changes along with the change of the prediction time scale; the active output of the fan can be expressed as:

wherein x is the actual substitution amount;

μ _h1 ＝-θ _h0 (1+2△P _Dh )/φ _h P _Dh0 +a _h2 θ _h0 (26)

wherein phi _h Is the h flexible load self-elastic coefficient, delta P _Dh Denotes the adjustment difference, P, of the h-th flexible load power _Dh0 Initial power of h flexible load, theta _h0 For the h flexible load initial price of electricity, a _h2 The h flexible load electricity price conversion rate; the scheduling cost of the flexible load is represented by:

C _Lh1 (△P _Dh )＝μ _h1 ·△P _Dh (27)

10. The power grid multi-time scale economic dispatching method based on the consistency reinforcement learning algorithm as claimed in claim 9, wherein: in step 402, the scheduling policy is divided into 3 time scales: in the 24h day-ahead scheduling stage, the 1h day scheduling stage and the 15min day scheduling stage, each adjustable unit, excitation type and interruptible type load resource participate in the optimized scheduling on the three time scales:

1. and 24h before the day scheduling stage: the method is executed once every 24 hours, the 24 hours are divided into 96 time periods, and a scheduling plan of the 96 time periods is made for the unit combination in the next day;

2. scheduling stage for 1h in day: the method is executed every 1 hour, 1 hour is divided into 4 time periods, and a scheduling plan of the 4 time periods is made for the power distribution of the adjustable unit in one hour in the future;

3. a 15min scheduling stage in a day: every 15 minutes, a final distribution plan is made for the compliant load force in the next 15 minutes.