CN115860180A - Power grid multi-time scale economic dispatching method based on consistency reinforcement learning algorithm - Google Patents

Power grid multi-time scale economic dispatching method based on consistency reinforcement learning algorithm Download PDF

Info

Publication number
CN115860180A
CN115860180A CN202211370283.XA CN202211370283A CN115860180A CN 115860180 A CN115860180 A CN 115860180A CN 202211370283 A CN202211370283 A CN 202211370283A CN 115860180 A CN115860180 A CN 115860180A
Authority
CN
China
Prior art keywords
power
time
consistency
scheduling
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211370283.XA
Other languages
Chinese (zh)
Inventor
郭笑岩
盛潜
卢坤
傅荣荣
李成睿
汪涌泉
刘家欣
孔令稷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weihai Power Supply Co of State Grid Shandong Electric Power Co Ltd
Original Assignee
Weihai Power Supply Co of State Grid Shandong Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weihai Power Supply Co of State Grid Shandong Electric Power Co Ltd filed Critical Weihai Power Supply Co of State Grid Shandong Electric Power Co Ltd
Priority to CN202211370283.XA priority Critical patent/CN115860180A/en
Publication of CN115860180A publication Critical patent/CN115860180A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention relates to a power grid multi-time scale economic dispatching method based on a consistency reinforcement learning algorithm, which comprises the following steps of 1, acquiring network topological structure data of an intelligent power grid, and establishing a combined economic dispatching model based on unit combination and load distribution; step 2, carrying out preliminary solution on the model through a reinforcement learning intelligent algorithm to obtain the rough output and the accurate start-stop condition of each adjustable unit in the intelligent power grid at the time t; step 3, with reference to the accurate start-stop condition, optimizing the rough output in the step 2 through a completely distributed consistency algorithm to obtain the accurate output of each adjustable unit at the time t, and finishing the initial optimization scheduling of each adjustable unit; and 4, performing optimized scheduling on the flexible load in the adjustable unit by using a multi-time scale scheduling strategy according to the power predicted value of the wind power and the marginal cost of the flexible load, and eliminating the uncertainty of the wind power output. The method can realize the economic dispatching of the power grid and guarantee the stable operation of the power grid.

Description

Power grid multi-time scale economic dispatching method based on consistency reinforcement learning algorithm
Technical Field
The invention relates to the technical field of economic dispatching of smart power grids, in particular to a power grid multi-time scale economic dispatching method based on a consistency reinforcement learning algorithm.
Background
Nowadays, the development of renewable energy is changing day by day, and a Smart Grid (Smart Grid) containing a large-scale Distributed Generation (DG) has attracted the sight of a plurality of researchers as a new energy supply mode. Due to the large-scale popularization of renewable energy sources, the uncertainty of the distributed power supply has a certain influence on the system economy, and the stability, economy and safety of the smart grid are ensured by the dispatchable equipment. Therefore, coordination among various links of source-network-load-storage in the intelligent power grid is coordinated, economic dispatching with huge data volume and complex working conditions is completed, and the economic and stable operation of the whole system is realized, so that the method becomes a research hotspot concerned by the academic community.
Aiming at the problem of optimal scheduling in the smart grid, a centralized algorithm is usually adopted for solving, namely, the operation data of all schedulable devices are uploaded to a central scheduling center for centralized calculation, processing and uniform release. However, as the DGs incorporated into the power grid increase year by year, the number and types of users and the requirements on plug-and-play power utilization modes make the effect of a centralized algorithm when the centralized algorithm is used for processing complex smart power grid scheduling problems not obvious. Therefore, the distributed algorithm suitable for large-scale decentralized computing is favored by most scholars. The document 'two-layer power optimized distribution of a multi-energy local area network facing to an energy internet' (Miyang, liu hong industry, song dynasty, lizhang war, fuyang, lizhuang Kun. Electric power automation equipment, 2018,38 (07): 1-10.) proposes an economic dispatching method based on an improved consistency theory aiming at the energy internet to optimize the output of a unit in the multi-energy local area network; in the literature, "active power distribution network source-load-storage distributed coordination optimization operation (II): a consistency algorithm considering a non-ideal telemetry environment" (Xixi lin, song Yi group, yao well-faithful, strict. Chinese Motor engineering report, 2018,38 (11): 3244-3254.) is used for robustly optimizing the scheduling plan of each adjustable unit in the active power distribution network in a fully distributed mode through an improved consistency strategy. However, the system unit combination sequence is dynamically changed along with the balance of supply and demand of the system, and the economic dispatching problem of the smart grid cannot be effectively solved only by paying attention to the output distribution of each unit. In addition, the influence of uncertainty of wind power output is a non-negligible problem.
With the rise of artificial intelligence, research on Reinforcement Learning (RL) has been deepened, and the advantages of the research on solving the optimization problem of the power system have been accepted by more and more scholars. In the document, "micro-grid composite energy storage coordination control method based on deep reinforcement learning" (zhuan-san-shi, qiuliming, zhuoxia, xushunwei, haxing-grid technology, 2019,43 (06): 1914-1921), an islanded running comprehensive energy system is constructed, and optimized scheduling is realized by improving the reinforcement learning method. Although the above research can use RL to complete the Pareto optimal solution set problem of multi-objective optimization, the reinforcement learning algorithm cannot embody its advantages when encountering the problems in the aspects of continuous variable operation and DG plug and play characteristic processing.
Disclosure of Invention
In order to solve the problems in the prior art, the utility model provides a power grid multi-time scale economic dispatching method based on a consistency reinforcement learning algorithm, which not only solves the uncertainty problem of unit combination by using a reinforcement learning intelligent algorithm, but also optimizes the economic output of each unit by a consistency algorithm with a higher operation speed to obtain optimal economic power distribution, and also gradually eliminates the uncertainty of wind power output by using a multi-time scale dispatching strategy to ensure the overall stable operation of the intelligent power grid.
In order to achieve the above purpose, the present application provides a power grid multi-time scale economic dispatching method based on a consistency reinforcement learning algorithm, including the following steps:
step 1, acquiring network topology structure data of an intelligent power grid, and establishing a combined economic dispatching model based on unit combination and load distribution;
step 2, carrying out preliminary solution on the combined economic dispatching model through a reinforcement learning intelligent algorithm to obtain the rough output of each adjustable unit i in the intelligent power grid at the time t
Figure BDA0003924566550000021
And accurate Start-stop Condition I i,t
Step 3, referring to the accurate start-stop condition I in the step 2 i,t Coarse contribution to step 2 by a fully distributed consensus algorithm
Figure BDA0003924566550000022
Optimizing to obtain the accurate output P of each adjustable unit i at the moment t i,t Thus finishing the preliminary optimized scheduling of each adjustable unit;
and 4, according to the power predicted value of the wind power and the marginal cost of the flexible load, further optimizing and scheduling the flexible load in the adjustable unit from three time scales of 24h, 1h and 15min by using a multi-time-scale scheduling strategy, and eliminating the uncertainty of the wind power output so as to realize the economic scheduling of the power grid.
In some embodiments, in step 1, the expression of the joint economic scheduling model based on the unit combination and the load distribution is:
Figure BDA0003924566550000023
wherein N represents the total number of adjustable units, the adjustable units comprising a thermal power generator, a fan and a flexible load; t represents the total scheduling time, I i,t Indicating that the adjustable unit i is in operation or stopped at the moment t; c i,D (t) represents the shutdown cost C of the ith adjustable unit at time t D ,C i,U (t) represents the start-up cost C of the ith adjustable unit at time t U ,C i (P i,t ) Representing a cost function C of the ith adjustable unit under the power P at the moment t, namely representing the generated power for the generator and the absorbed power for the flexible load; wherein
Figure BDA0003924566550000024
Wherein alpha is i 、β i And gamma i Is the cost coefficient of the adjustable unit i;
all tunable units i =1,2, in the joint economic dispatch model, N is to satisfy the following constraint: the method comprises the following steps of (1) power distribution constraint conditions, adjustable unit capacity constraint conditions, unit shortest continuous operation/shutdown time constraint conditions and unit climbing characteristic constraint conditions;
wherein the power allocation constraint is expressed as:
Figure BDA0003924566550000031
wherein P is loss Representing power loss, generally transmission loss accounts for about 3% to 7% of the total load, and D is non-adjustable rigid power and comprises rigid load and wind power;
the capacity constraint of the tunable element is expressed as:
P i min ≤P i ≤P i max (4)
wherein P is min And P max Respectively representing ultimate minimum and maximum output power data of the adjustable unit i;
the constraint on the shortest continuous operation/downtime of the unit is expressed as:
Figure BDA0003924566550000032
wherein, T i,U And T i,D Respectively, the minimum uptime T of the adjustable unit i U And minimum continuous downtime T D ;X i,ON (t-1) is the total amount of time that the tunable element i has been continuously operated at time t-1, and X i,OFF (t-1) represents the total amount of time that the tunable element i has been continuously stopped at time t-1; i is i,t-1 Indicating that the adjustable unit i is in operation or stopped at the moment t-1;
the unit climbing characteristic constraint expression is as follows:
-R i,D ≤(P i,t -P i,t-1 )I i,t I i,t-1 ≤R i,U (6)
wherein R is i,U And R i,D The upper limit R of the climbing constraint of the ith adjustable unit is U And a lower limit R D ;P i,t-1 Indicating the precise output of the tunable element i at time t-1.
In some embodiments, the step 2 specifically includes the following steps:
step 201: accurate output P of each adjustable unit i at time t in smart grid i,t And accurate Start-stop Condition I i,t State set S representing tunable element set I I,t And action set A I,t Element (S) of (1), establishing an initial state-action set function Q (S) I,t ,A I,t ) Wherein Q (S) I,t ,A I,t ) State-action value function Q (S) at time t, referring to set I of tunable elements I proxied by all agents i,t ,A i,t ) A set of (a);
step 202: choosing tunable set of units I using greedy algorithmAn action value a at a time t +1 I,t+1 And form an action set A at the next time t +1 I,t+1
Step 203: action set A for next moment t +1 through reward penalty function rew I,t+1 Selecting or rejecting the reward value r, and selecting an optimal action value;
step 204: according to the action value a of the adjustable unit set I at the current moment t I,t And the next time t +1 optimal action value a I,t+1 The state-action set function Q (S) at the next time t +1 is updated I,t+1 ,A I,t+1 ) And jumping to step 202 and looping to step 204;
step 205: current state-action value function Q (S) i,t ,A i,t ) When the accumulated value is updated to the preset degree, the optimal action value of the highest value function can be judged to obtain the optimal strategy pi of unit scheduling * That is, the accurate start/stop condition I of each adjustable unit I in the smart grid at the time t i,t And a coarse force
Figure BDA0003924566550000041
In some embodiments, in step 201, the accurate output P of each adjustable unit i at time t in the smart grid is utilized i,t And accurate Start-stop Condition I i,t State set S representing tunable element set I I,t And action set A I,t The element (2) is specifically represented as follows:
Figure BDA0003924566550000042
in said step 203, according to the constraint conditions of the joint economic dispatch model, defining a reward function rew after the adjustable unit i of the agent executes actions at the time t i,t Comprises the following steps:
rew i,t =r 1 +r 2 +r 3 +r 4 (8)
Figure BDA0003924566550000043
/>
Figure BDA0003924566550000044
Figure BDA0003924566550000045
Figure BDA0003924566550000046
Figure BDA0003924566550000047
wherein r is 1 Offsetting bonus items for power balancing, r 2 To restrict the reward item for climbing, r 3 And r 4 Constraining the reward item, Δ p, for minimum continuous upper and lower bounds 1 And Δ p 2 According to the sum of the collected powers and the total load L as deviation threshold value all To select the degree of deviation of r 1 Thereby realizing the rough adjustment of the power balance;
in said step 204, the state-action set function updates the iterative formula as follows:
Figure BDA0003924566550000051
where the subscript I represents the set of tunable elements I proxied by all agents and η represents the state S i,t Agent of lower tunable unit i takes action A at time t i,t The learning rate of time, τ, represents the discount coefficient, rew I,t Representing a reward variable set generated by decision actions of a set I of adjustable units I proxied by all agents at the current moment t;
in step 205, when the state-action value function is cumulatively updated to a preset level, it can be determined to obtain the optimal action value of the maximum function, i.e. the action value of the maximum functionOptimal strategy pi for scheduling to unit * (s i,t ) The concrete formula is as follows:
Figure BDA0003924566550000052
in some embodiments, the step 3 specifically includes the following steps:
step 301: according to the accurate start-stop condition I obtained in the step 205 i,t And a coarse force
Figure BDA0003924566550000053
Initializing the state and the output of each adjustable unit;
step 302: and performing operation iteration on the output of each adjustable unit by adopting a consistency double-order gradient descent estimation algorithm, updating consistency variables of each adjustable unit, and when each consistency variable converges to the same value, obtaining a result which is the planned optimal solution of the output of each adjustable unit.
In some embodiments, in the step 301, when setting the initial value, the initial variable P is set i (0) Satisfying the initial constraint formula to make the power deviation of the iteration result approach to 0; constructing a consistency matrix W, replacing the consistency matrix W with a Laplace matrix, and constructing an initial constraint formula and the consistency matrix W as shown in formulas (16-17):
Figure BDA0003924566550000054
Figure BDA0003924566550000055
wherein, P i (0) The coarse contribution in step 205 is shown as an initial variable in this step,
Figure BDA0003924566550000056
indicates that the adjustable unit i calculated in iteration 0 has a deviation adjustment term->
Figure BDA0003924566550000057
D is the non-adjustable rigid power, L (G) is set to 0-1, the matrix of graph G has zero diagonal elements and D off-diagonal elements ij
In some embodiments, in said step 302, the iterative update formula of the consistent two-step gradient descent estimation algorithm is as follows:
Figure BDA0003924566550000061
Figure BDA0003924566550000062
Figure BDA0003924566550000063
wherein, mu i (k) The consumption micro-increment rate mu of the adjustable unit i in the k iteration calculation is represented, and also represents a consistency variable;
Figure BDA0003924566550000064
indicates that the adjustable unit i calculated in the kth iteration has a deviation adjustment term->
Figure BDA0003924566550000065
δ is the adjustment coefficient, set to 0.01; w ij Is an element of the matrix W, v ij Is the element of the matrix W after transposition; p i (k) Representing the accurate power of the adjustable unit i after the k iterative computation; in the iterative calculation process, the formula (20) determines the convergence direction of the consistency variable so that the optimization result continuously approaches to the optimal solution meeting the power balance constraint condition;
by updating the exponential moving average of the first and second order gradients and by the hyperparameter χ 1 Hexix- 2 Controlling the exponential decay rate to realize optimization; biased first order gradient descent estimation m (k) and second order gradient descent estimation m (k) of kth iterationThe specific iterative formula for the gradient descent estimate v (k) is shown below:
Figure BDA0003924566550000066
wherein the content of the first and second substances,
Figure BDA0003924566550000067
refers to the gradient of the cost function C to the power P after the k-1 iteration calculation,
Figure BDA0003924566550000068
the second-order gradient of the cost function C to the power P after the (k-1) th iterative computation is referred to; the specific iterative formula of the offset corrected first-order gradient descent estimate m '(k) and second-order gradient descent estimate v' (k) is shown as follows:
Figure BDA0003924566550000069
wherein
Figure BDA00039245665500000610
And &>
Figure BDA00039245665500000611
Representing the hyper-parameter χ 1 Hexix 2 To the k power, the core update iteration formula of the proposed fully distributed consistency algorithm is expressed as follows:
Figure BDA00039245665500000612
Figure BDA00039245665500000613
where ε is the coefficient used for fine tuning, w ij 'is the ith row and jth column element of the transformation matrix W', and satisfies 1 T W'=0 T And W'1=0,X i Is in accordance withIndex set, x, with adjacent variables i i Is the weight of the consistency variable i; by distributing formula (19) and formula (23), each variable μ can be made i And P i Receiving only the adjacent variable mu j And P j Under the condition of parameter information, the global optimal solution can still be obtained through algorithm calculation.
In some embodiments, the step 4 specifically includes the following steps:
step 401: power P for a wind turbine with uncertainty taken into account iw Performing predictive modeling; constructing a marginal cost function formula for flexible loads, namely excitation type loads and interruptible type loads;
step 402: and performing optimized scheduling on each flexible load resource from three time scales of a 24h scheduling stage before the day, a 1h scheduling stage in the day and a 15min scheduling stage in the day, so as to reduce unbalanced power and furthest absorb uncertainty of wind power output.
In some embodiments, in said step 401, first, at time t, the power P for the ith wind turbine group taking into account uncertainty iw (t), let it approximately obey N (P) wf (t),σ 2 ) Mean value P wf (t) the active power predicted value output by the wind turbine generator at the moment t is represented, and the variance sigma 2 The error level of the power prediction is characterized and changes along with the change of the prediction time scale; the active output of the fan can be expressed as:
Figure BDA0003924566550000071
wherein x is the actual substitution amount;
for flexible load, the user judges the trade according to the current market price, and the marginal cost formula is shown as the following formula:
μ h1 =-θ h0 (1+2△P Dh )/φ h P Dh0 +a h2 θ h0 (26)
wherein phi h Is the h flexible load self-elastic coefficient, delta P Dh Denotes the adjustment difference, P, of the h-th flexible load power Dh0 Initial power of h flexible load, theta h0 For the h flexible load initial price of electricity, a h2 The h flexible load electricity price conversion rate; the scheduling cost of the flexible load is represented by the following equation:
C Lh1 (△P Dh )=μ h1 ·△P Dh (27)
wherein Δ P Dh Represents the adjustment difference of the h-th flexible load power, C Lh1 (△P Dh ) And (4) representing the flexible load scheduling cost under the adjustment difference value of the h flexible load power.
In some embodiments, in said step 402, the scheduling policy is divided into 3 time scales: in the 24h day-ahead scheduling stage, the 1h day scheduling stage and the 15min day scheduling stage, each adjustable unit, excitation type and interruptible type load resource participate in the optimized scheduling on the three time scales:
1. a 24h scheduling stage before day: the method is executed once every 24 hours, the 24 hours are divided into 96 time periods, and a scheduling plan of the 96 time periods is made for the unit combination in the next day;
2. scheduling stage for 1h in day: the method is executed every 1 hour, 1 hour is divided into 4 time periods, and a scheduling plan of the 4 time periods is made for the power distribution of the adjustable units within one hour in the future;
3. 15min scheduling stage in day: every 15 minutes, a final distribution plan is made for the compliant load force in the next 15 minutes.
The scheme has the advantages that the power grid multi-time scale economic dispatching method based on the consistency reinforcement learning algorithm has the following advantages:
(1) The adjustable units are intelligently processed through a reinforcement learning algorithm, and the optimal state and the rough output condition of each adjustable unit are calculated, so that the global optimal scheduling of the unit combination is realized, meanwhile, preparation is made for the next optimal power distribution, and the intelligent power grid environment with large data volume and complex topological structure can be dealt with.
(2) The multi-time scale economic dispatching method of the intelligent power grid belongs to a completely distributed method, all variables in the method are connected together through the topological structure data of the grid structure and the consistency principle, and the uncertainty of unit combination is considered; therefore, when a unit is shut down or a new unit is put into operation, a new global optimal solution can be calculated, and the method can adapt to the plug and play characteristic of the distributed energy.
(3) The method combines the equal micro-increment rate criterion in the power system with the consistency principle, and adds the adjustment items, thereby not only ensuring that the micro-increment rate of each adjustable unit is converged to the optimal value in the iterative updating process, but also gradually reducing the unbalanced power of the whole system in the iterative process, and realizing the optimization of economic dispatch and the rationality of operation results.
(4) A dual-order gradient descent estimation algorithm is added for improvement, the power distribution of each adjustable unit processed by a consistency algorithm is further optimized, the integral cycle times of the algorithm are reduced, and the convergence speed of the algorithm is improved.
(5) According to the characteristic that the prediction precision of the wind power output is gradually improved along with the time scale and different scheduling elasticity of each flexible load, the careful optimization of the three time scales of the flexible loads is determined, the uncertainty of the wind power output is greatly eliminated, the wind abandoning rate is reduced, and the stability and the economy of the system are improved.
(6) Compared with a conventional planning algorithm, the method has better instantaneity, when the computing environment changes, all variables do not need to be initialized and calculated again, decision operation is carried out based on the current state variables, and online decision can be carried out; compared with the popular heuristic algorithm, the method can more stably achieve the convergence result on the premise of having the rapid convergence, and has better robustness; in addition, the method combines the variable information of each node together through the consistency principle to participate in operation, completes global optimization, realizes completely distributed computation, and does not need a centralized control computation center.
(7) The multi-time scale economic dispatching method for the smart grid is an economic dispatching system for controlling all adjustable units such as generators and flexible loads in the smart grid, comprises multi-time scale optimization, and is characterized in that all adjustable units are subjected to economic optimization from fine to integral, and a theoretical system is complete.
Drawings
Fig. 1 shows a flowchart of a power grid multi-time scale economic dispatching method based on a consistency reinforcement learning algorithm in an embodiment.
FIG. 2 illustrates an IEEE-30 node Smart grid simulation in an embodiment.
FIG. 3 shows a wind power output multi-time scale prediction data graph in an embodiment.
Fig. 4 shows a simulation diagram of the consistent variable under single scheduling in the embodiment.
Fig. 5 shows an unbalanced power simulation diagram under single scheduling in an embodiment.
Fig. 6 shows a simulation diagram of the consistent variable under plug and play in the embodiment.
Fig. 7 shows an unbalanced power dissipation simulation diagram in an embodiment.
Detailed Description
The following description of the embodiments of the present application will be made with reference to the accompanying drawings.
As shown in fig. 1, the power grid multi-time scale economic dispatching method based on the consistency reinforcement learning algorithm includes the following steps:
step 1, network topology structure data of the intelligent power grid are obtained, and a combined economic dispatching model based on unit combination and load distribution is established.
Step 2, carrying out preliminary solution on the combined economic dispatching model through a reinforcement learning intelligent algorithm to obtain the rough output of each adjustable unit i in the intelligent power grid at the time t
Figure BDA0003924566550000091
And accurate Start-stop Condition I i,t
Step 3, referring to the accurate start-stop condition I in the step 2 i,t By complete reaction ofDistributed consistency algorithm to the coarse contribution in step 2
Figure BDA0003924566550000092
Optimizing to obtain the accurate output P of each adjustable unit i at the moment t i,t And therefore, the preliminary optimized scheduling of each adjustable unit is completed. />
And 4, according to the power predicted value of the wind power and the marginal cost of the flexible load, further optimizing and scheduling the flexible load in the adjustable unit from three time scales of 24h, 1h and 15min by using a multi-time-scale scheduling strategy, and eliminating uncertainty of wind power output so as to realize economic scheduling of a power grid and guarantee the operation stability of the whole intelligent power grid system.
The following is a detailed description of the above steps:
the objective of the smart grid economic dispatch is to find a unit combination sequence and power distribution within a period of time T when a smart grid comprising a plurality of adjustable units supplies power to a total load, so that the total running cost of the smart grid is minimized.
In the step 1, an expression of a joint economic dispatching model based on unit combination and load distribution is as follows:
Figure BDA0003924566550000093
wherein N represents the total number of adjustable units, the adjustable units comprising a thermal power generator, a fan and a flexible load; t represents the total scheduling time, I i,t Indicating that the adjustable unit i is in operation or stopped at the moment t; c i,D (t) represents the shutdown cost C of the ith adjustable unit at time t D ,C i,U (t) represents the start-up cost C of the ith adjustable unit at time t U ,C i (P i,t ) The cost function C of the ith adjustable unit at power P at time instant tth is shown, i.e. for the generator the generated power and for the flexible load the absorbed power. Typically the running cost function is a convex quadratic function.
Figure BDA0003924566550000101
Wherein alpha is i 、β i And gamma i Is the cost factor of the tunable element i.
All tunable units i =1, 2.. N in the joint economic dispatch model satisfy the following constraints: a constraint of power allocation, a constraint of capacity of the adjustable unit, a constraint of minimum continuous operation/down time of the unit, and a constraint of climbing characteristics of the unit.
Wherein the power allocation constraint is expressed as:
Figure BDA0003924566550000102
wherein P is loss Representing power loss, typically transmission loss accounts for about 3% to 7% of the total load, and D is non-adjustable rigid power, including rigid load and wind power.
The capacity constraint of the tunable element is expressed as:
Figure BDA0003924566550000103
wherein P is min And P max Respectively representing the limit minimum and maximum output power data of the adjustable unit i.
The constraint on the shortest continuous operation/downtime of the unit is expressed as:
Figure BDA0003924566550000104
wherein, T i,U And T i,D Respectively, the minimum uptime T of the adjustable unit i U And minimum continuous downtime T D ;X i,ON (t-1) is the total amount of time that the tunable element i has been continuously operated at time t-1, and X i,OFF (t-1) indicates that the tunable element i has been continuously stopped at time t-1Total number of times elapsed; i is i,t-1 Indicating that the tunable element i is in operation or is off at time t-1.
The unit climbing characteristic constraint expression is as follows:
-R i,D ≤(P i,t -P i,t-1 )I i,t I i,t-1 ≤R i,U (6)
wherein R is i,U And R i,D The upper limit R of the climbing constraint of the ith adjustable unit U And a lower limit R D ;P i,t-1 Indicating the precise output of the tunable element i at time t-1.
In this embodiment, the step 2 specifically includes the following steps:
step 201: accurate output P of each adjustable unit i at time t in smart grid i,t And accurate Start-stop Condition I i,t State set S representing tunable element set I I,t And action set A I,t Element (S) of (1), establishing an initial state-action set function Q (S) I,t ,A I,t ) Wherein Q (S) I,t ,A I,t ) State-action value function Q (S) at time t, referring to set I of tunable elements I proxied by all agents i,t ,A i,t ) A collection of (a).
Step 202: selecting action value a of adjustable unit set I at next moment t +1 by using greedy algorithm I,t+1 And form an action set A at the next time t +1 I,t+1
Step 203: action set A for next moment t +1 through reward penalty function rew I,t+1 And selecting the optimal action value by taking or rejecting the reward value r.
Step 204: according to the action value a of the adjustable unit set I at the current moment t I,t And the next time t +1 optimal action value a I,t+1 The state-action set function Q (S) at the next time t +1 is updated I,t+1 ,A I,t+1 ) And jumping to step 202 and looping to step 204;
step 205: current state-action value function Q (S) i,t ,A i,t ) When the accumulated value is updated to the preset degree, the optimal action value of the highest value function can be judged to obtain the optimal strategy pi of unit scheduling * That is, the accurate start/stop condition I of each adjustable unit I in the smart grid at the time t i,t And a coarse force
Figure BDA0003924566550000111
In step 201, based on the joint economic dispatch model in step 1, a reinforcement learning algorithm suitable for large-scale data calculation is adopted in this step to perform preliminary solution. Accurate output P of each adjustable unit i at time t in smart grid i,t And accurate Start-stop Condition I i,t State set S representing tunable element set I I,t And action set A I,t The element (2) is specifically represented as follows:
Figure BDA0003924566550000112
in said step 203, according to the constraint conditions of the joint economic dispatch model, defining a reward function rew after the adjustable unit i of the agent executes actions at the time t i,t Comprises the following steps:
rew i,t =r 1 +r 2 +r 3 +r 4 (8)
Figure BDA0003924566550000113
Figure BDA0003924566550000114
Figure BDA0003924566550000121
Figure BDA0003924566550000122
Figure BDA0003924566550000123
/>
wherein r is 1 Offsetting bonus items for power balancing, r 2 To restrict the reward item for climbing, r 3 And r 4 Constraining the reward item, Δ p, for minimum continuous upper and lower bounds 1 And Δ p 2 According to the sum of the collected powers and the total load L as deviation threshold value all To select the degree of deviation of r 1 Thereby enabling a coarse adjustment of the power balance.
In said step 204, the state-action set function updates the iterative formula as follows:
Figure BDA0003924566550000124
where the subscript I represents the set of tunable elements I proxied by all agents and η represents the state S i,t Agent of lower tunable unit i takes action A at time t i,t The learning rate of time, τ, represents the discount coefficient, rew I,t Representing the set of reward variables generated by the set I decision actions of all the tunable elements I proxied by the agent at the current time t.
For A in equation (7) I,t Is updated by a greedy algorithm. Wherein the action currently obtaining the highest value is selected with probability 1-epsilon, and other actions are possibly selected with probability epsilon and the like. The iterative formula (14) is updated after selection, and can be based on the current state-action value function Q (S) i,t ,A i,t ) Changing the scheduling policy pi(s) of the current unit i,t ) I.e. the set of current crew scheduling conditions. When the state-action value function is cumulatively updated to a preset degree, the function can be judged to obtain the optimal action value of the highest value function, and the optimal strategy pi for unit scheduling is obtained * (s i,t ) The concrete formula is as follows:
Figure BDA0003924566550000125
in this embodiment, the step 3 specifically includes the following steps:
step 301: according to the accurate start-stop condition I obtained in the step 205 i,t And a coarse force
Figure BDA0003924566550000126
Initializing the state and output of each tunable element.
Step 302: and performing operation iteration on the output of each adjustable unit by adopting a consistency dual-order gradient descent estimation algorithm, updating consistency variables of each adjustable unit, and when each consistency variable converges to the same value, obtaining a result which is the planned optimal solution of the output of each adjustable unit.
Specifically, in the step 301, when an initial value is set, the initial variable P is set i (0) Satisfying the initial constraint formula to make the power deviation of the iteration result approach to 0; constructing a consistency matrix W, adopting a Laplace matrix to replace the consistency matrix W, and constructing an initial constraint formula and the consistency matrix W as shown in a formula (16-17):
Figure BDA0003924566550000131
Figure BDA0003924566550000132
wherein, P i (0) The coarse contribution in step 205 is shown as an initial variable in this step,
Figure BDA0003924566550000133
indicates that the adjustable unit i calculated in iteration 0 has a deviation adjustment term->
Figure BDA0003924566550000134
D is the non-adjustable rigid power, L (G) is set to 0-1, the matrix of graph G has zero diagonal elements and D off-diagonal elements ij
In step 302, the iterative update formula of the consistency two-step gradient descent estimation algorithm is as follows:
Figure BDA0003924566550000135
Figure BDA0003924566550000136
Figure BDA0003924566550000137
/>
wherein, mu i (k) The consumption micro-increment rate mu of the adjustable unit i in the k iteration calculation is represented, and also represents a consistency variable;
Figure BDA0003924566550000138
indicates that the adjustable unit i calculated in the kth iteration has a deviation adjustment term->
Figure BDA0003924566550000139
δ is the adjustment coefficient, set to 0.01; w ij Is an element of the matrix W, v ij Is the element of the matrix W after transposition; p is i (k) Representing the accurate power of the adjustable unit i after the k iterative computation; during the iterative calculation, the formula (20) determines the convergence direction of the consistency variables, so that the optimization result continuously approaches the optimal solution meeting the power balance constraint condition.
In the present embodiment, the average is moved by updating the exponentials of the first and second order gradients, and by the hyperparameter χ 1 Hexix- 2 Controlling the exponential decay rate to realize optimization; the specific iterative formula of the biased first-order gradient descent estimate m (k) and the second-order gradient descent estimate v (k) of the kth iteration is shown as follows:
Figure BDA00039245665500001310
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00039245665500001311
refers to the gradient of the cost function C to the power P after the k-1 iteration calculation,
Figure BDA00039245665500001312
the second-order gradient of the cost function C to the power P after the (k-1) th iteration calculation is referred to. The specific iterative formula of the offset corrected first-order gradient descent estimate m '(k) and second-order gradient descent estimate v' (k) is shown as follows:
Figure BDA0003924566550000141
wherein
Figure BDA0003924566550000142
And &>
Figure BDA0003924566550000143
Representing the hyper-parameter χ 1 Hexix- 2 To the k power, the core update iteration formula of the proposed fully distributed consensus algorithm is expressed as follows:
Figure BDA0003924566550000144
Figure BDA0003924566550000145
where ε is the coefficient for fine tuning, w ij 'is the ith row and jth column element of the transformation matrix W', and satisfies 1 T W'=0 T And W'1=0,X i Is a set of indices, x, adjacent to a consistency variable i i Is the weight of the consistency variable i. By distributing formula (19) and formula (23), each variable μ can be made i And P i Receiving only the adjacent variable mu j And P j Under the condition of parameter information, the global optimal solution can still be obtained through algorithm calculation.
The uncertainty of wind power output can cause a certain resource waste phenomenon in the electric energy scheduling of the smart grid, and the resource waste phenomenon needs to be balanced in consumption, so that the energy utilization rate and the economy are improved. Because the prediction precision of the wind power output has the characteristic of being gradually improved along with the time scale, and each flexible load has different scheduling elasticity, the wind power prediction and the flexible load need to be coordinated on different time scales when participating in scheduling.
In this embodiment, the step 4 specifically includes the following steps:
step 401: power P for a wind turbine with uncertainty taken into account iw Performing predictive modeling; and constructing a marginal cost function formula for flexible loads, namely excitation type loads and interruptible type loads.
Step 402: and performing optimized scheduling on each flexible load resource from three time scales of a 24h scheduling stage before the day, a 1h scheduling stage in the day and a 15min scheduling stage in the day, so as to reduce unbalanced power and furthest absorb uncertainty of wind power output.
Specifically, in step 401, first, at time t, power P for the ith wind turbine generator set considering uncertainty iw (t) let it approximately obey N (P) wf (t),σ 2 ) Mean value P wf (t) the active power predicted value output by the wind turbine generator at the moment t is represented, and the variance sigma 2 The error level of the power prediction is characterized and changes along with the change of the prediction time scale; the active output of the fan can be expressed as:
Figure BDA0003924566550000146
wherein x is the actual substitution amount.
For flexible load, the user judges the trade according to the current market price, and the marginal cost formula is shown as the following formula:
μ h1 =-θ h0 (1+2△P Dh )/φ h P Dh0 +a h2 θ h0 (26)
wherein phi h Is the h flexible load self-elastic coefficient, delta P Dh Denotes the adjustment difference, P, of the h-th flexible load power Dh0 Initial power of h flexible load, theta h0 For the h flexible load initial price of electricity, a h2 The h flexible load electricity price conversion rate; the scheduling cost of the flexible load is represented by the following equation:
C Lh1 (△P Dh )=μ h1 ·△P Dh (27)
wherein Δ P Dh Represents the adjustment difference of the h-th flexible load power, C Lh1 (△P Dh ) And (4) representing the flexible load scheduling cost under the adjustment difference value of the h flexible load power.
In the step 402, the scheduling policy is divided into 3 time scales: in the 24h day-ahead scheduling stage, the 1h day scheduling stage and the 15min day scheduling stage, each adjustable unit, excitation type and interruptible type load resource participate in the optimized scheduling on the three time scales:
1. a 24h scheduling stage before day: the method is executed once every 24 hours, the 24 hours are divided into 96 time periods, and a scheduling plan of the 96 time periods is made for the unit combination in the next day. On the basis of referring to the prediction data in the day ahead, each adjustable unit is optimized through the reinforcement learning algorithm mentioned in the step 2, wherein the flexible load participating in scheduling at the stage has low scheduling elasticity requirement, the flexible load with slow response speed and long adjustment period can also participate in scheduling, so that power overshoot is prevented, and the overall economy is influenced.
2. Scheduling stage for 1h in day: every 1 hour and dividing 1 hour into 4 time segments, making a 4-segment schedule for the adjustable unit power allocation in one hour in the future. And on the basis of the 1h prediction data in the day and the result of the day-ahead scheduling stage, adjusting the power of each adjustable unit in the next 1 hour by the fully distributed consistency algorithm mentioned in the step 3. The dispatching object is a flexible load with high response speed and short regulation period, and the dispatching elasticity requirement is high.
3. 15min scheduling stage in day: every 15 minutes, a final distribution plan is made for the compliant load force in the next 15 minutes. On the basis of the prediction data of 15min in the day and the scheduling result of 1h in the day, power adjustment is carried out on the flexible load with extremely high response speed and extremely short adjustment time so as to maximally absorb uncertainty of wind power output.
The effectiveness and superiority of the method according to the present application are verified by the following examples.
An IEEE-30 node intelligent power grid simulation system is built, and a simulation experiment is carried out. The IEEE-30 simulation algorithm is used for verifying the effectiveness of the method provided by the application under different working conditions and the superiority of the method compared with other traditional algorithms. An IEEE-30 node Power network is used for simulating a smart grid system, and a schematic structural diagram is shown in fig. 2, wherein a generator at a node 11 is replaced by a Wind driven generator, 20 loads are classified, the loads on the nodes 2, 4, 7, 8, 10, 12, 17, 19, 24 and 29 are flexible loads, and Wind Power (WP) output multi-time scale prediction data is shown in fig. 3.
Assume that the current rigid total load is 118MW. Firstly, through strategy optimization in a scheduling stage 24h before the day, 5 generators in an IEEE-30 simulation system need to be started through calculation of the enhanced learning algorithm, and the total power unbalance amount is 0.94MW at the moment. This means that the IEEE-30 emulation system will receive a Single Dispatch Instruction (SDI) and perform a Single Dispatch action.
Then, through strategy optimization of two scheduling stages in the day, the power output of each adjustable unit is finely distributed through the improved consistency algorithm mentioned in the application, and simulation experiment data are shown in fig. 4 and 5.
As can be seen from the two simulation graphs, the consistency variables of the 15 adjustable units converge to the same value and tend to be stable, and the unbalanced power of the system finally approaches to 0, which proves that the proposed algorithm has remarkable convergence. Meanwhile, the algorithm operation is iterated for 54 times, the total operation time is 0.36741 second, and the algorithm operation efficiency is high.
Although it can be demonstrated in principle that the obtained result is the optimal solution according to the "equal consumption micro-rate increase criterion", the Cplex optimization toolkit is called here to further verify the correctness of the calculation result. And solving a Lagrange relaxation multiplier corresponding to the optimal solution, wherein the Lagrange relaxation multiplier can be used for representing a consistent variable value. The value is 5.623, which is consistent with the result in fig. 4, and the simulation result is proved to be correct.
The operation of the smart grid must meet the requirements for the DG plug-and-play (PAP) feature, and in order to verify the validity of the proposed method in the plug-and-play situation, the following scenario is set: the initial environmental conditions of the simulation experiment are the same as those of the previous experiment, and when a scheduling period (0.4 s) passes, the distributed power supply G16 is connected to the system node 23, and the operation parameters of the distributed power supply G16 are the same as those of the distributed power supply G3. The experimental simulation data is shown in fig. 6.
As can be seen from fig. 6, when the distributed power source G16 is connected to the system, it will bear a part of the power load, thereby relieving the output load of the other tunable elements, reducing the output power of the remaining tunable elements, and reducing the uniformity variation in the system as a whole. Therefore, the effectiveness of the method for realizing economic dispatching under the plug-and-play working condition in the smart grid can be proved.
In order to verify the effectiveness of the proposed multi-time scale scheduling strategy in the aspect of wind power consumption, the multi-time scale scheduling strategy provided by the application is used for carrying out one-day economic scheduling on the simulated smart grid based on the day-ahead prediction data of wind turbine output in FIG. 3, so that redundant wind power output in a system is consumed, and the utilization efficiency and the economy of energy are improved. Simulation data of the unbalanced power of the simulated smart grid in each scheduling stage are shown in FIG. 7.
As can be seen from fig. 7, when the time scale is large, the imbalance power is not well absorbed due to the large prediction error of the fan output, which is a normal phenomenon. And the phase enables the power adjustment deviation to be reduced to a certain range, and unbalanced power consumption can be realized by less flexible loads in the next day scheduling phase. And as the time scale is gradually reduced, the prediction error is gradually reduced, and the unbalanced power consumption effect is gradually obvious. When the time scale approaches zero, the effectiveness of the multi-time scale optimization scheduling strategy in wind power consumption is proved.
The power grid multi-time-scale economic dispatching method based on the consistency reinforcement learning algorithm has the following advantages:
(1) The intelligent processing is carried out on each adjustable unit through a reinforcement learning algorithm, the optimal state and the rough output condition of each adjustable unit are calculated, so that the global optimal scheduling of the unit combination is realized, meanwhile, the preparation is made for the next optimal power distribution, and the intelligent power grid environment with large data volume and complex topological structure can be dealt with.
(2) The multi-time scale economic dispatching method for the smart power grid belongs to a completely distributed method, all variables in the method are connected together through grid topological structure data and a consistency principle, and uncertainty of unit combination is considered; therefore, when a unit is shut down or a new unit is put into operation, a new global optimal solution can be calculated, and the method can adapt to the plug and play characteristic of the distributed energy.
(3) The equal micro-increment rate criterion in the power system is combined with the consistency principle, and the adjustment items are added, so that the micro-increment rates of all adjustable units are guaranteed to be converged to an optimal value in the iterative updating process, the unbalanced power of the whole system is gradually reduced in the iterative process, and the optimization of economic dispatching and the reasonability of operation results are realized.
(4) A dual-order gradient descent estimation algorithm is added for perfection, and the power distribution of each adjustable unit processed by a consistency algorithm is further optimized, so that the overall cycle number of the algorithm is reduced, and the convergence speed of the algorithm is improved.
(5) According to the characteristic that the prediction precision of the wind power output is gradually improved along with the time scale and different scheduling elasticity of each flexible load, the careful optimization of the three time scales of the flexible loads is determined, the uncertainty of the wind power output is greatly eliminated, the wind abandoning rate is reduced, and the stability and the economy of the system are improved.
(6) Compared with a conventional planning algorithm, the method has better instantaneity, when the computing environment changes, all variables do not need to be initialized and calculated again, decision operation is carried out based on the current state variables, and online decision can be carried out; compared with the popular heuristic algorithm, the method can more stably achieve the convergence result on the premise of having the rapid convergence, and has better robustness; in addition, the method combines the variable information of each node together through the consistency principle to participate in operation, completes global optimization, realizes completely distributed computation, and does not need a centralized control computation center.
(7) The multi-time scale economic dispatching method for the smart grid is an economic dispatching system for controlling all adjustable units such as generators and flexible loads in the smart grid, comprises multi-time scale optimization, and is characterized in that all adjustable units are subjected to economic optimization from fine to integral, and a theoretical system is complete.
Although the embodiments of the present application have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present application, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive effort by those skilled in the art.

Claims (10)

1. A power grid multi-time scale economic dispatching method based on a consistency reinforcement learning algorithm is characterized by comprising the following steps: the method comprises the following steps:
step 1, acquiring network topology structure data of a smart power grid, and establishing a combined economic dispatching model based on unit combination and load distribution;
step 2, carrying out preliminary solution on the combined economic dispatching model through a reinforcement learning intelligent algorithm to obtain the rough output of each adjustable unit i in the intelligent power grid at the time t
Figure FDA0003924566540000011
And accurate Start-stop Condition I i,t
Step 3, refer to essence in step 2Quasi Start stop Condition I i,t Coarse contribution to step 2 by a fully distributed consensus algorithm
Figure FDA0003924566540000012
Optimizing to obtain the accurate output P of each adjustable unit i at the moment t i,t Thus finishing the preliminary optimized scheduling of each adjustable unit;
and 4, according to the power predicted value of the wind power and the marginal cost of the flexible load, further optimizing and scheduling the flexible load in the adjustable unit from three time scales of 24h, 1h and 15min by using a multi-time-scale scheduling strategy, and eliminating the uncertainty of the wind power output so as to realize the economic scheduling of the power grid.
2. The power grid multi-time scale economic dispatching method based on the consistency reinforcement learning algorithm as claimed in claim 1, characterized in that: in the step 1, an expression of a joint economic dispatching model based on unit combination and load distribution is as follows:
Figure FDA0003924566540000013
wherein N represents the total number of adjustable units, the adjustable units including a thermal power generator, a fan, and a flexible load; t represents the total scheduling time, I i,t Indicating that the adjustable unit i is in operation or stopped at the moment t; c i,D (t) represents the shutdown cost C of the ith adjustable unit at time t D ,C i,U (t) represents the start-up cost C of the ith adjustable unit at time t U ,C i (P i,t ) Representing a cost function C of the ith adjustable unit under the power P at the moment t, namely representing the generated power for the generator and the absorbed power for the flexible load; wherein
Figure FDA0003924566540000014
Wherein alpha is i 、β i And gamma i Is the cost coefficient of the adjustable unit i;
all tunable units i =1,2, in the joint economic dispatch model, N is to satisfy the following constraint: the method comprises the following steps of (1) power distribution constraint conditions, adjustable unit capacity constraint conditions, unit shortest continuous operation/shutdown time constraint conditions and unit climbing characteristic constraint conditions;
wherein the power allocation constraint is expressed as:
Figure FDA0003924566540000015
wherein P is loss Representing power loss, generally transmission loss accounts for about 3% to 7% of the total load, and D is non-adjustable rigid power and comprises rigid load and wind power;
the capacity constraint of the tunable element is expressed as:
P i min ≤P i ≤P i max (4)
wherein P is min And P max Respectively representing ultimate minimum and maximum output power data of the adjustable unit i;
the constraint on the shortest continuous operation/downtime of the unit is expressed as:
Figure FDA0003924566540000021
/>
wherein, T i,U And T i,D Respectively, the minimum uptime T of the adjustable unit i U And minimum continuous downtime T D ;X i,ON (t-1) is the total amount of time that the tunable element i has been continuously operated at time t-1, and X i,OFF (t-1) represents the total amount of time that the tunable element i has been continuously stopped at time t-1; i is i,t-1 Indicating that the adjustable unit i is in operation or stopped at the moment t-1;
the unit climbing characteristic constraint expression is as follows:
-R i,D ≤(P i,t -P i,t-1 )I i,t I i,t-1 ≤R i,U (6)
wherein R is i,U And R i,D The upper limit R of the climbing constraint of the ith adjustable unit U And a lower limit R D ;P i,t-1 Indicating the precise output of the tunable element i at time t-1.
3. The power grid multi-time scale economic dispatching method based on the consistency reinforcement learning algorithm as claimed in claim 2, characterized in that: the step 2 specifically comprises the following steps:
step 201: accurate output P of each adjustable unit i at time t in smart grid i,t And accurate Start-stop Condition I i,t State set S representing tunable element set I I,t And action set A I,t Element (S) of (1), establishing an initial state-action set function Q (S) I,t ,A I,t ) Wherein Q (S) I,t ,A I,t ) State-action value function Q (S) at time t, referring to set I of tunable elements I proxied by all agents i,t ,A i,t ) A set of (a);
step 202: selecting action value a of adjustable unit set I at next moment t +1 by using greedy algorithm I,t+1 And form the action set A of the next time t +1 I,t+1
Step 203: action set A for next moment t +1 through reward penalty function rew I,t+1 Selecting or rejecting the reward value r, and selecting an optimal action value;
step 204: according to the action value a of the adjustable unit set I at the current moment t I,t And the next time t +1 optimal action value a I,t+1 The state-action set function Q (S) at the next time t +1 is updated I,t+1 ,A I,t+1 ) And jumping to step 202 and looping to step 204;
step 205: current state-action value function Q (S) i,t ,A i,t ) When the accumulated value is updated to the preset degree, the optimal action value of the highest value function can be judged to obtain the optimal strategy pi of unit scheduling * I.e. each tunable unit i in the smart grid is at timePrecise start and stop condition I under t i,t And a coarse force
Figure FDA0003924566540000031
4. The power grid multi-time scale economic dispatching method based on the consistency reinforcement learning algorithm as claimed in claim 3, characterized in that: in step 201, the accurate output P of each adjustable unit i at time t in the smart grid is utilized i,t And accurate Start-stop Condition I i,t State set S representing tunable element set I I,t And action set A I,t The element (2) is specifically represented as follows:
Figure FDA0003924566540000032
in said step 203, according to the constraint conditions of the joint economic dispatch model, defining a reward function rew after the adjustable unit i of the agent executes actions at the time t i,t Comprises the following steps:
rew i,t =r 1 +r 2 +r 3 +r 4 (8)
Figure FDA0003924566540000033
Figure FDA0003924566540000034
/>
Figure FDA0003924566540000035
Figure FDA0003924566540000036
Figure FDA0003924566540000037
wherein r is 1 Offsetting bonus items for power balancing, r 2 To restrict the reward item for climbing, r 3 And r 4 Constraining the reward item for a minimum continuous upper and lower bound, Δ p 1 And Δ p 2 According to the sum of the collected powers and the total load L as deviation threshold value all To select the degree of deviation of r 1 Thereby realizing the rough adjustment of the power balance;
in said step 204, the state-action set function updates the iterative formula as follows:
Figure FDA0003924566540000038
where the subscript I represents the set of tunable elements I proxied by all agents, and η represents the state S i,t Agent of lower tunable unit i takes action A at time t i,t The learning rate of time, τ, represents the discount coefficient, rew I,t Representing a reward variable set generated by decision actions of a set I of adjustable units I proxied by all agents at the current moment t;
in step 205, when the state-action value function is cumulatively updated to a preset degree, the optimal action value of the highest value function can be determined, and the optimal strategy pi for unit scheduling is obtained * (s i,t ) The concrete formula is as follows:
Figure FDA0003924566540000041
5. the power grid multi-time scale economic dispatching method based on the consistency reinforcement learning algorithm as claimed in claim 4, wherein: the step 3 specifically comprises the following steps:
step 301: according to the accurate start-stop condition I obtained in the step 205 i,t And a coarse force
Figure FDA0003924566540000042
Initializing the state and the output of each adjustable unit;
step 302: and performing operation iteration on the output of each adjustable unit by adopting a consistency dual-order gradient descent estimation algorithm, updating consistency variables of each adjustable unit, and when each consistency variable converges to the same value, obtaining a result which is the planned optimal solution of the output of each adjustable unit.
6. The power grid multi-time scale economic dispatching method based on the consistency reinforcement learning algorithm as claimed in claim 5, characterized in that: in the above step 301, when setting an initial value, an initial variable P is set i (0) Satisfying the initial constraint formula to make the power deviation of the iteration result approach to 0; constructing a consistency matrix W, adopting a Laplace matrix to replace the consistency matrix W, and constructing an initial constraint formula and the consistency matrix W as shown in a formula (16-17):
Figure FDA0003924566540000043
Figure FDA0003924566540000044
wherein, P i (0) The coarse contribution in step 205 is shown as an initial variable in this step,
Figure FDA0003924566540000045
indicates that the adjustable unit i calculated in iteration 0 has a deviation adjustment term->
Figure FDA0003924566540000046
D is non-adjustable rigid power, and L (G) is set to 0-1G matrix, diagonal elements are zero and off-diagonal elements are d ij 。/>
7. The power grid multi-time scale economic dispatching method based on the consistency reinforcement learning algorithm as claimed in claim 6, characterized in that: in step 302, the iterative update formula of the consistency two-step gradient descent estimation algorithm is as follows:
Figure FDA0003924566540000047
Figure FDA0003924566540000048
Figure FDA0003924566540000049
wherein, mu i (k) The consumption micro-increment rate mu of the adjustable unit i in the k iteration calculation is represented, and also represents a consistency variable;
Figure FDA0003924566540000051
indicates that the adjustable unit i calculated in the kth iteration has a deviation adjustment term->
Figure FDA0003924566540000052
δ is the adjustment coefficient, set to 0.01; w ij Is an element of the matrix W, v ij Is the element of the matrix W after transposition; p i (k) Representing the accurate power of the adjustable unit i after the k iterative computation; in the iterative calculation process, the formula (20) determines the convergence direction of the consistency variable so that the optimization result continuously approaches to the optimal solution meeting the power balance constraint condition;
by updating the exponential moving average of the first and second order gradients, and by the hyperparameter χ 1 Hexix- 2 Controlling the exponential decay rate to realize optimization; iteration kThe specific iterative formula of the biased first-order gradient descent estimate m (k) and the second-order gradient descent estimate v (k) is shown as follows:
Figure FDA0003924566540000053
wherein the content of the first and second substances,
Figure FDA0003924566540000054
means the gradient of the cost function C to the power P calculated in the (k-1) th iteration, and/or the value of the power P is selected>
Figure FDA0003924566540000055
The second-order gradient of the power P after the k-1 iteration calculation of the cost function C is pointed; the specific iterative formula of the offset corrected first-order gradient descent estimate m '(k) and second-order gradient descent estimate v' (k) is shown as follows:
Figure FDA0003924566540000056
wherein
Figure FDA0003924566540000057
And &>
Figure FDA0003924566540000058
Representing the hyper-parameter χ 1 Hexix 2 To the k power, the core update iteration formula of the proposed fully distributed consistency algorithm is expressed as follows:
Figure FDA0003924566540000059
Figure FDA00039245665400000510
where ε is the coefficient used for fine tuning, w ij 'is the ith row and jth column elements of the transformation matrix W' and satisfies 1 T W'=0 T And W'1=0,X i Is a set of indices, x, adjacent to a consistency variable i i Is the weight of the consistency variable i; by distributing formula (19) and formula (23), each variable μ can be made i And P i Receiving only the adjacent variable mu j And P j Under the condition of parameter information, the global optimal solution can still be obtained through algorithm calculation.
8. The power grid multi-time scale economic dispatching method based on the consistency reinforcement learning algorithm as claimed in claim 7, wherein: the step 4 specifically comprises the following steps:
step 401: power P for a wind turbine with uncertainty taken into account iw Performing predictive modeling; constructing a marginal cost function formula for flexible loads, namely excitation type loads and interruptible type loads;
step 402: and performing optimized scheduling on each flexible load resource from three time scales of a 24h scheduling stage before the day, a 1h scheduling stage in the day and a 15min scheduling stage in the day, so as to reduce unbalanced power and furthest absorb uncertainty of wind power output.
9. The power grid multi-time scale economic dispatching method based on the consistency reinforcement learning algorithm as claimed in claim 8, wherein: in step 401, first, at time t, the power P of the ith wind turbine generator set considering uncertainty iw (t) let it approximately obey N (P) wf (t),σ 2 ) Mean value P wf (t) the active power predicted value output by the wind turbine generator at the moment t is represented, and the variance sigma 2 The error level of the power prediction is characterized and changes along with the change of the prediction time scale; the active output of the fan can be expressed as:
Figure FDA0003924566540000061
wherein x is the actual substitution amount;
for flexible load, the user judges the trade according to the current market price, and the marginal cost formula is shown as the following formula:
μ h1 =-θ h0 (1+2△P Dh )/φ h P Dh0 +a h2 θ h0 (26)
wherein phi h Is the h flexible load self-elastic coefficient, delta P Dh Denotes the adjustment difference, P, of the h-th flexible load power Dh0 Initial power of h flexible load, theta h0 For the h flexible load initial price of electricity, a h2 The h flexible load electricity price conversion rate; the scheduling cost of the flexible load is represented by:
C Lh1 (△P Dh )=μ h1 ·△P Dh (27)
wherein Δ P Dh Represents the adjustment difference of the h-th flexible load power, C Lh1 (△P Dh ) And (4) representing the flexible load scheduling cost under the adjustment difference value of the h flexible load power.
10. The power grid multi-time scale economic dispatching method based on the consistency reinforcement learning algorithm as claimed in claim 9, wherein: in step 402, the scheduling policy is divided into 3 time scales: in the 24h day-ahead scheduling stage, the 1h day scheduling stage and the 15min day scheduling stage, each adjustable unit, excitation type and interruptible type load resource participate in the optimized scheduling on the three time scales:
1. and 24h before the day scheduling stage: the method is executed once every 24 hours, the 24 hours are divided into 96 time periods, and a scheduling plan of the 96 time periods is made for the unit combination in the next day;
2. scheduling stage for 1h in day: the method is executed every 1 hour, 1 hour is divided into 4 time periods, and a scheduling plan of the 4 time periods is made for the power distribution of the adjustable unit in one hour in the future;
3. a 15min scheduling stage in a day: every 15 minutes, a final distribution plan is made for the compliant load force in the next 15 minutes.
CN202211370283.XA 2022-11-03 2022-11-03 Power grid multi-time scale economic dispatching method based on consistency reinforcement learning algorithm Pending CN115860180A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211370283.XA CN115860180A (en) 2022-11-03 2022-11-03 Power grid multi-time scale economic dispatching method based on consistency reinforcement learning algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211370283.XA CN115860180A (en) 2022-11-03 2022-11-03 Power grid multi-time scale economic dispatching method based on consistency reinforcement learning algorithm

Publications (1)

Publication Number Publication Date
CN115860180A true CN115860180A (en) 2023-03-28

Family

ID=85662390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211370283.XA Pending CN115860180A (en) 2022-11-03 2022-11-03 Power grid multi-time scale economic dispatching method based on consistency reinforcement learning algorithm

Country Status (1)

Country Link
CN (1) CN115860180A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117117973A (en) * 2023-10-24 2023-11-24 国网浙江省电力有限公司宁波供电公司 Distributed power supply scheduling method and device based on time scale and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117117973A (en) * 2023-10-24 2023-11-24 国网浙江省电力有限公司宁波供电公司 Distributed power supply scheduling method and device based on time scale and storage medium
CN117117973B (en) * 2023-10-24 2024-01-12 国网浙江省电力有限公司宁波供电公司 Distributed power supply scheduling method and device based on time scale and storage medium

Similar Documents

Publication Publication Date Title
Bahrami et al. Deep reinforcement learning for demand response in distribution networks
JP7261507B2 (en) Electric heat pump - regulation method and system for optimizing cogeneration systems
Du et al. Distributed MPC for coordinated energy efficiency utilization in microgrid systems
Velasquez et al. Distributed model predictive control for economic dispatch of power systems with high penetration of renewable energy resources
CN110854932B (en) Multi-time scale optimization scheduling method and system for AC/DC power distribution network
CN111555281B (en) Method and device for simulating flexible resource allocation of power system
CN112003330B (en) Adaptive control-based microgrid energy optimization scheduling method
CN112508325B (en) Household micro-grid multi-time scale electric energy scheduling method
CN113326994A (en) Virtual power plant energy collaborative optimization method considering source load storage interaction
Shotorbani et al. Enhanced real-time scheduling algorithm for energy management in a renewable-integrated microgrid
Dong et al. Optimal scheduling framework of electricity-gas-heat integrated energy system based on asynchronous advantage actor-critic algorithm
Cominesi et al. A multi-layer control scheme for microgrid energy management
CN115860180A (en) Power grid multi-time scale economic dispatching method based on consistency reinforcement learning algorithm
CN115345380A (en) New energy consumption electric power scheduling method based on artificial intelligence
CN115481856A (en) Comprehensive energy system multi-scale scheduling method and system considering comprehensive demand response
Li et al. Energy trading of multiple virtual power plants using deep reinforcement learning
CN117559526A (en) Router-simulated energy regulation and control method based on optical storage and charging integrated charging station
Verrilli et al. Stochastic model predictive control for optimal energy management of district heating power plants
CN116979611A (en) Hierarchical optimization scheduling method for source network load storage
Zhou et al. Smart bidding strategy of the demand-side loads based on the reinforcement learning
CN115765035A (en) Flexible power distribution network disturbance recovery method suitable for full-time dynamic reconstruction
CN115115276A (en) Virtual power plant scheduling method and system considering uncertainty and privacy protection
CN109687430B (en) Power distribution network economic operation method based on network reconstruction and uncertainty demand response
Yuan et al. An Energy Management System Based on Adaptive Dynamic Programming for Microgrid Economic Operation
CN113346514B (en) AGC-based optimal mileage scheduling method for generator set

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination