CN116454996A

CN116454996A - Hydropower station real-time load distribution method based on deep reinforcement learning

Info

Publication number: CN116454996A
Application number: CN202310488292.7A
Authority: CN
Inventors: 闻昕; 谭乔凤; 曾宇轩; 王珍妮; 吕俞锡; 陈新宇
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2023-05-04
Filing date: 2023-05-04
Publication date: 2023-07-18

Abstract

The invention discloses a hydropower station real-time load distribution method based on deep reinforcement learning, which comprises the following steps: constructing a load optimization distribution model based on deep reinforcement learning, and training the load optimization distribution model based on the deep reinforcement learning by using power station operation history data; after the power station receives the load instruction, the power station is combined with the current load instruction to update the power station starting machine set combination, other feasible power station starting machine set combinations are further arranged on the basis, and the load optimization distribution model based on deep reinforcement learning is used for carrying out rapid rolling calculation on the flow and the forecast information of the load instruction so as to optimize the power station starting machine set combination; updating a startup unit combination of the power station, and solving an optimal distribution scheme of a power grid load instruction among the startup units by using a load optimal distribution model based on deep reinforcement learning. The invention is suitable for real-time load distribution of the hydropower station and can be used for guiding real-time operation of hydropower station units.

Description

Hydropower station real-time load distribution method based on deep reinforcement learning

Technical Field

The invention relates to a power station optimal scheduling technology, in particular to a hydropower station real-time load distribution method based on deep reinforcement learning.

Background

The current solving algorithm for the real-time load distribution problem in the hydropower station comprises a traditional optimizing algorithm and an intelligent algorithm. The traditional optimization algorithm mainly comprises an equal micro-increment method, a dynamic programming method and a load distribution table method, and the intelligent algorithm mainly comprises a genetic algorithm and a particle swarm algorithm. However, these methods are difficult to apply to actual real-time scheduling decision guidance, and mainly include the following problems:

(1) In the traditional optimization algorithm, the equal micro-increment method needs that the flow characteristic curve of the unit can continuously and slightly increase, but the unit has a vibration area, most power stations are difficult to meet the condition, the dynamic planning method and the load distribution meter method are easy to sink into dimension disasters, and the problem of economic operation in a plurality of units and high precision plants is difficult to solve.

(2) The intelligent algorithm has the problem of unstable decision results, and solutions with poor optimization effects or infeasible solutions are easy to generate.

Disclosure of Invention

The invention aims to: the invention aims to provide a hydropower station real-time load distribution method based on deep reinforcement learning.

The technical scheme is as follows: the invention discloses a hydropower station real-time load distribution method based on deep reinforcement learning, which comprises the following steps:

S1, constructing a load optimization distribution model based on deep reinforcement learning, and training the load optimization distribution model based on the deep reinforcement learning by combining historical operation data; the load optimization distribution model comprises a state, actions, rewards and constraints, and takes all factors which can influence the load distribution decision of the power station as a state s; the method comprises the steps of designing actions according to the thought of load adjustment, converting the distribution of primary load instructions into a process of gradually adjusting a load distribution scheme, wherein the process comprises two types of actions of load adjustment and scheme output; each action has an instant prize r; constraints of the load optimization distribution model based on deep reinforcement learning comprise constraint conditions of power station operation;

s2, the power station receives a power grid load instruction, calculates the minimum output and the maximum output of each unit in the current starting-up unit combination, adds the minimum output of each unit to obtain the lower boundary of the output range of the power station, and adds the maximum output of each unit to obtain the upper boundary of the output range of the power station; judging whether the current starting machine set combination can finish the load instruction or not according to the value of the load instruction; if the load instruction can be completed, keeping the current starting machine set combination unchanged, and executing step S4; otherwise, executing the step S3;

S3, if the power grid load instruction is greater than the upper boundary of the output range of the power station, starting a unit with the longest current shutdown time, and updating the startup and shutdown state and the load range of the unit of the power station until the power grid load instruction is between the lower boundary of the output range of the power station and the upper boundary of the output range of the power station;

if the power grid load instruction is smaller than the lower boundary of the output range of the power station, closing the unit with the longest current starting time, and updating the starting and stopping state and the load range of the unit of the power station until the power grid load instruction is between the lower boundary of the output range of the power station and the upper boundary of the output range of the power station;

s4, judging whether other feasible startup unit combinations exist, if not, directly entering a step S6, otherwise, executing a step S5;

s5, generating other feasible startup unit combinations by combining rationality constraint of startup and shutdown of the unit, and rolling and calculating an in-station load distribution scheme of all feasible startup unit combination schemes in a calculation period by using a load optimization distribution model based on deep reinforcement learning and combining load and flow forecast information in a future period of time, wherein the in-station load distribution scheme can obtain objective function values; calculating total objective function values which can be obtained by each scheme in a forecast period, selecting an optimal startup unit combination by combining the total objective function values, and updating the startup unit combination of the power station;

S6, generating an optimized load distribution scheme among the starting units by using a load optimization distribution model based on deep reinforcement learning.

Further, in step S1, the state is a one-dimensional tensor [ Q ] _in ,Q _flood ,Q _generate ,Q _g,t ,Z,N _in-plant ]Wherein Q is _in To put in storage flow, Q _flood To discharge flood, Q _generate To generate electricity flow, Q _g,t For the power generation flow required to achieve the power station water quantity scheduling target at the moment t, Z is the upstream water level and N _in-plant Is the load of the unit;

the calculation formula of the instant prize r is:

wherein r is the instant reward of the deep reinforcement learning model; n is the total number of the units; s is(s) _j Sum s _j+1 The state before and after the action is executed respectively;and->The output of the unit i before and after the action is executed respectively; h is the current water head of the power station; />And->The output of the machine set is +.>And->The power generation flow when the water head is H;

constraints include constraints on the operation of the plant: unit operation constraint, water balance constraint, water level-water storage capacity relation constraint and tail water level-delivery flow relation constraint.

Further, the power generation flow Q required to be achieved for completing the power station water quantity scheduling target is calculated _g,t To optimize the power generation flow of the power station, when the power station needs water storage, Q _g,t Can be taken as the minimum drainage flow of the power station, and when the water level control requirement exists in the power station, Q _g,t The calculation formula of (2) is as follows:

Q _g,t ＝Q _in,t -Q _flood,t +(V(Z _t )-V(Z _g,t ))/△t

wherein Q is _in,t For the warehouse-in flow of the power station at the moment t, Q _flood,t For the flood discharge flow of the power station at the moment t, V (Z _t ) And V (Z) _g,t ) Respectively the water levels are Z _t And Z _g,t When the water storage capacity of the power station reservoir is equal to Z _t For the water level of the power station at the time t, Z _g,t The target water level of the power station at the time t is shown, and Deltat is the time interval between load instructions.

Further, the training process of the load optimization distribution model based on the deep reinforcement learning in the step S1 specifically includes the following steps:

(1) Randomly generating an initial cost function Q (s, a), taking a counter i=0 and a counter j=0, enabling the training round number to be I, the sampling number to be J, and randomly selecting the probability P of the action _r Generating a sample library which is the same as a power station historical operation database;

(2) In the sample library, a piece of operation data is randomly taken out in a sampling-free mode, and a load instruction N is randomly generated on the premise of keeping the combination of the current starting machine set unchanged _c Feasible load distribution scheme N of (2) _in-plant Constructing a state s= [ Q ] at the current moment by combining operation data and a feasible load distribution scheme _in ,Q _flood ,Q _generate ,Q _g,t ,Z,N _in-plant ]If all the data in the sample library are sampled, regenerating a sample library which is the same as the power station historical operation database, and enabling i=i+1;

(3) Inputting the state s at the current moment into a load optimizing distribution model, judging and shielding ineffective actions which lead the unit load to fall into a vibration area or be negative, and determining the condition that the unit load falls into the vibration area or be negative in [0,1]Generating a random number P, if P is less than or equal to P _r Randomly selecting an action from the selectable actions as an action a at the current moment, otherwise, selecting an action capable of obtaining the maximum value of the value from the selectable actions as an action a at the current moment;

(4) Executing action a at the current moment and updating N _in-plant Calculating the instant rewards r obtained by the action to obtain the state s 'at the next moment, storing (s, a, r, s') into an experience pool, enabling j=j+1, extracting J samples from the experience pool according to a PER method if J is more than or equal to J, and updating a cost function Q (s, a) of the intelligent agent by combining the samples, wherein the calculation formula is as follows:

Q'(s,a)＝Q(s,a)+α[r+γmax _a' Q(s',a')-Q(s,a)]

wherein Q (s, a) and Q '(s, a) are the cost functions of the state (s, a) before and after updating respectively, Q (s, a) is the initial cost function in the first learning, Q (s', a ') is the cost function of the state (s', a '), a and a' are the actions at the current time and the next time respectively, α is the step size factor, r is the instant prize obtained at the current time, and γ is the attenuation rate;

(5) If the action a at the current moment is a load adjustment action, s=s' is set and the step (3) is returned, and if the action a at the current moment is a scheme output action, the step (6) is entered;

(6) If I is not less than I, training is completed, otherwise, returning to the step (2).

Further, in step S5, the method for generating the remaining feasible startup machine set combinations by combining the rationality constraint of startup and shutdown of the machine set is as follows:

under the premise of completing a load instruction, the power station at most starts or closes one unit, and in order to ensure the rationality of starting and stopping of the units, the following three constraint conditions of starting and stopping of the units are set, and the calculation formulas are respectively as follows:

the number of the starting machine sets and the number of the shutdown machine sets are in a reasonable range:

n _on +n _off ＝n

wherein the method comprises the steps of，n _on For the number of starting units of the power station, n _on >0，n _off For the number of shutdown units of the power station, n _off >0, n is the number of units of the power station;

when the load of the power station rises, the number of the starting units cannot be reduced, and when the load of the power station drops, the number of the starting units cannot be increased:

wherein n is _on,t+1 The number of the startup at time t+1 is n _on,t For the number of starting-up machines at time t, N _c For the power grid load command,the current total output of the power station;

avoiding start-stop operations against future load trends:

wherein, the liquid crystal display device comprises a liquid crystal display device, _on n isThe minimum output of the power station after the unit is started, _c Nfor the minimum output in the load prediction,for maximum output of the plant after the machine set is turned off, < >>Maximum output in load prediction;

And reserving a feasible starting machine set combination scheme capable of meeting the constraint, and taking the feasible starting machine set combination scheme as the rest feasible starting machine set combination schemes except the current situation.

Further, in step S5, the load distribution scheme and the objective function value that can be obtained by the load distribution scheme in the station in the calculation period of all the possible startup unit combination schemes are calculated by rolling by using the load optimization distribution model based on deep reinforcement learning, and the specific steps are as follows:

s521, reading a load and flow prediction sequence of the power station in a future period of time, and sequentially reading predicted values of the load and the flow in the prediction sequence; calculating whether the load predicted value is in the load range of the current startup unit combination, if not, updating the startup unit combination of the power station by combining the calculation method of the step S3 until the load instruction can be completed;

s522, a load optimization distribution model based on deep reinforcement learning is used for respectively calculating and keeping the in-station load distribution scheme of the power station under the current on-off machine set combination and the rest feasible on-off machine set combination schemes, and an objective function value which can be obtained by the load distribution scheme is calculated by combining an objective function, wherein the calculation formula of the objective function value is as follows:

wherein f _t Objective function value, Q, which can be obtained for load distribution scheme in power station secondary station _g,t The current time of the power station is the power generation flow which is required to be achieved for completing the water quantity scheduling target, N is the number of units of the power station, and Q (N) _i,t ,H _t ) As a function of the power flow and the unit output and water head, N _i,t For the output of the unit i at the time t, H _t The real-time water head of the hydropower station, c is the number of times of start-stop, Q _p Punishment flow for start-up and shutdown;

s523, the predicted values of the next load and flow are sequentially read, and the step S512 is returned until the predicted data of all the loads and the flows are read and calculated.

Further, in step S5, the objective function values obtained by the load distribution schemes in the secondary stations in the forecast period are summed to obtain a total objective function value obtained by the schemes in the forecast period; and selecting the starting machine set combination scheme with the maximum total objective function value as an optimal starting machine set combination scheme, and updating the starting and stopping states of the power station according to the scheme.

In another embodiment of the present invention, a hydropower station real-time load distribution system based on deep reinforcement learning includes:

the model construction and training module is used for constructing a load optimization distribution model based on deep reinforcement learning and training the load optimization distribution model based on the deep reinforcement learning by combining historical operation data; the load optimization distribution model comprises a state, an action, instant rewards and constraints, and takes all factors which can influence the load distribution decision of the power station as a state s; the method comprises the steps of designing actions according to the thought of load adjustment, converting the distribution of primary load instructions into a process of gradually adjusting a load distribution scheme, wherein the process comprises two types of actions of load adjustment and scheme output; each action has an instant prize r; constraints of the load optimization distribution model based on deep reinforcement learning comprise constraint conditions of power station operation;

The data calculation module is used for receiving a power grid load instruction, calculating the minimum output and the maximum output of each unit in the current starting-up unit combination, adding the minimum output of each unit to obtain the lower boundary of the output range of the power station, and adding the maximum output of each unit to obtain the upper boundary of the output range of the power station;

the judging and executing module is used for judging whether the current starting machine set combination can complete the load instruction or not according to the value of the load instruction; if the load instruction can be completed, keeping the current starting machine set combination unchanged, judging whether other feasible starting machine set combinations exist, if not, generating an optimized load distribution scheme among the starting machine sets by using a load optimization distribution model based on deep reinforcement learning, and if so, updating the starting machine set combinations of the power station according to a machine set updating module; if the load instruction cannot be completed, if the power grid load instruction is larger than the upper boundary of the output range of the power station, starting a unit with the longest current shutdown time, and updating the startup and shutdown state and the load range of the unit of the power station until the power grid load instruction is between the lower boundary of the output range of the power station and the upper boundary of the output range of the power station; if the power grid load instruction is smaller than the lower boundary of the output range of the power station, closing the unit with the longest current starting time, and updating the starting and stopping state and the load range of the unit of the power station until the power grid load instruction is between the lower boundary of the output range of the power station and the upper boundary of the output range of the power station;

The unit updating module is used for generating other feasible starting-up unit combinations by combining the rationality constraint of starting and stopping of the unit, and rolling and calculating the in-station load distribution scheme of all the feasible starting-up unit combination schemes in a calculation period by using a load optimization distribution model based on deep reinforcement learning and combining load and flow forecast information in a future period of time; and calculating the total objective function value which can be obtained by each scheme in the forecasting period, selecting the optimal startup unit combination by combining the total objective function value, and updating the startup unit combination of the power station.

In yet another embodiment of the present invention, an apparatus device comprises a memory and a processor, wherein:

a memory for storing a computer program capable of running on the processor;

and the processor is used for executing the steps of the hydropower station real-time load distribution method based on deep reinforcement learning when the computer program is run.

The storage medium of the present invention stores a computer program, which when executed by at least one processor, implements the steps of a hydropower station real-time load distribution method based on deep reinforcement learning as described above.

The beneficial effects are that: compared with the prior art, the invention has the remarkable technical effects that:

the method comprises the steps of constructing a load optimization distribution model based on deep reinforcement learning by using a deep reinforcement learning algorithm, training the load optimization distribution model based on the deep reinforcement learning by using power station operation historical data, and realizing rapid distribution of real-time load instructions of a hydropower station; after the power station receives the load instruction, the method firstly combines the current load instruction to update the power station power-on unit combination, then combines the rolling calculation of the load and flow forecast information to optimize the power station power-on unit combination, and finally completes the distribution of the load instruction on the power station power-on unit combination. The method provided by the invention considers the water quantity scheduling requirement of the power station in the objective function, so that the method can further adjust the power generation flow of the power station in cooperation with the water quantity scheduling requirement of the power station by optimizing the start-stop scheme of the power station and adjusting the load distribution scheme of the power station on the premise of completing the power generation scheduling requirement when the load distribution scheme is manufactured. Meanwhile, the method realizes load distribution by deep reinforcement learning, can relieve dimension disaster problems, and realizes high-precision and high-efficiency real-time load distribution decision. Compared with the common method in the current real-time load distribution in the station, the method provided by the invention can meet the power generation scheduling requirement and the water quantity scheduling requirement of the power station, and can also meet the decision accuracy and the decision efficiency.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of a load adjustment action;

FIG. 3 is an overall architecture diagram of the deep reinforcement learning D3QN algorithm;

FIG. 4 is a graph of cumulative prize change obtained during a learning process of the deep reinforcement learning D3QN algorithm;

FIG. 5 is a plot of the water level process of a moat station over two test periods;

FIG. 6 is a plot of the unit operation of the moat station during a dry test period;

FIG. 7 is a graph of the unit operation process of the moat station during the flood season test period.

Detailed Description

The invention will now be described in detail with reference to the drawings and specific examples. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.

Aiming at the problems that when the existing medium-and-small-sized hydropower station bears the peak regulation and frequency modulation tasks, the load instruction distribution only considers the demand of power generation dispatching and ignores the demand of water quantity dispatching, and the water level flow greatly fluctuates and the flood discharging facilities are frequently regulated easily, the invention provides a hydropower station real-time load distribution method based on deep reinforcement learning, which can guide the real-time load distribution of the power station and solve the problems that the real-time load distribution decision of the current power station is difficult to consider the demand of water quantity dispatching, and the decision precision and the efficiency are difficult to be taken into account.

The hydropower station real-time load distribution method based on deep reinforcement learning can be used for efficiently solving the hydropower station real-time load distribution problem, and comprises the following steps: the power station operation requirements are combed, a load optimization distribution model based on deep reinforcement learning is constructed, and the load optimization distribution model based on the deep reinforcement learning is trained by using power station operation history data; after the power station receives the load instruction, the power station is combined with the current load instruction to update the power station starting machine set combination, other feasible power station starting machine set combinations are further arranged on the basis, and the load optimization distribution model based on deep reinforcement learning is used for carrying out rapid rolling calculation on the flow and the forecast information of the load instruction so as to optimize the power station starting machine set combination; updating a startup unit combination of the power station, and solving an optimal distribution scheme of a power grid load instruction among the startup units by using a load optimal distribution model based on deep reinforcement learning. The invention is suitable for real-time load distribution of the hydropower station and can be used for guiding real-time operation of hydropower station units. As shown in fig. 1, the method specifically comprises the following steps:

s1, constructing a load optimization distribution model based on deep reinforcement learning, training the load optimization distribution model based on the deep reinforcement learning by combining historical operation data, and constructing the load optimization distribution model based on the deep reinforcement learning by using a deep reinforcement learning D3QN algorithm to realize load optimization distribution among fixed starting units. The method comprises the following specific steps:

S11, a deep reinforcement learning model comprises an intelligent agent and an environment, wherein the intelligent agent obtains instant rewards through interaction with the environment to guide actions, and the learning purpose is to maximize accumulated rewards; the environment model needs to simulate the computing environment of the load distribution problem in the station of the power station, and comprises constraint conditions of states, instant rewards and operation of the power station; the intelligent agent continuously acquires feedback information in the environment by executing the actions, so that the accuracy of a cost function of the intelligent agent is continuously improved to assist the intelligent agent to better judge the value of each action in a certain state, wherein the intelligent agent comprises the actions and an intelligent agent learning algorithm;

(1) Status of

All factors which can influence the load distribution decision of the power station by the load distribution in the power station are taken as the state s, and the invention selects the real-time warehousing flow Q of the power station _in Flood discharge flow rate Q _flood Flow rate Q of power generation _generate Generating flow Q required to be achieved for completing power station water quantity scheduling target _g,t Upstream water level Z and unit load N _in-plant Construction of one-dimensional tensor [ Q ] _in ,Q _flood ,Q _generate ,Q _g,t ,Z,N _in-plant ]As a state.

(2) Constraint conditions

Corresponding constraint conditions are required to be set in the environment model so as to ensure that the output result can meet the operation requirement of the power station, and the constraint conditions of the model consider all operation constraint conditions required to be met in the actual operation process of the power station, and are specifically as follows:

Unit operation constraint:

N _i,t ≤N _i,t,max

wherein N is _i,t For the output of the unit i at the time t, N _i,t,max For the maximum output which can be achieved by the unit i at the moment t, N is the total unit number of the power station, N _c In order to be a load instruction,for the lower boundary of the vibration region of the assembly i at time t,/->The upper boundary of the vibration area of the unit i at the time t is provided.

Water balance constraint:

wherein V is _t+1 Is the water storage capacity of the power station at the time t+1, V _t For the water storage capacity of the power station at the time t, Q _in,t The warehousing flow of the power station at the moment t; q (Q) _flood,t And delta t is the time interval between the time t and the time t for the flood discharge flow of the power station at the time t.

Water level constraint:

Z _min ≤Z _t ≤Z _max

wherein Z is _max And Z _min The normal water storage level and the dead water level of the power station are respectively.

Water level-water storage capacity relation constraint:

Z＝f(V)

wherein Z is the upstream water level of the power station, V is the water storage capacity of the power station, and f (V) is a function calculation formula of the upstream water level of the power station when the water storage capacity of the power station is V.

Tailstock level-ex-warehouse flow relation constraint:

Z _tail ＝g(Q _out )

wherein Z is _tail Is the tail water level of the power station, Q _out For the output flow of the power station, g (Q _out ) The output flow rate of the power station is Q _out And (5) a function calculation formula of the tail water level of the power station.

(3) Action

The load optimization distribution model based on deep reinforcement learning converts the distribution of primary load instructions into a process of gradually adjusting a load distribution scheme, and the process comprises two types of actions of load adjustment and scheme output. The model sets a fixed unit load according to the required load discrete precision, the load adjustment action can adjust a unit load between two starting units to form a new load distribution scheme, and the scheme output action means that the optimization of the load distribution scheme is finished and the optimized load distribution scheme is output.

When the load optimization distribution model based on deep reinforcement learning selects actions, actions which can cause the unit load to fall into a vibration area and the unit load to be negative are required to be shielded.

Because the load optimization distribution model based on deep reinforcement learning in the invention is fixed in the process of decision, the optimal load distribution scheme under the fixed power-on unit combination is quickly solved according to the current load instruction only under the condition that the water head H is determined in the load distribution. When the water head is fixed as H, taking a hydropower station with two starting units as an example, the relation between the load and the power generation flow of the starting units is shown as a figure 2 (a), wherein N is shown as a figure _a Indicating the load of the starting-up unit 1, N _b Indicating the load of the start-up unit 2. When the total load N is given, the load is N _a +N _b When N is a cross section, a relationship curve between the load and the power generation flow rate between the startup units can be obtained with the total load fixed at N, as shown in fig. 2 (b). The task of the model is to quickly search the generating flow closest to Q on the curve _g,t Is used for the load distribution scheme of the (a).

If the load value that the unit needs to bear is directly used as the action, a large number of actions are generated, which increases the learning difficulty of the intelligent agent. Considering that the combination of the starting machine sets is fixed, and combining the analysis in the previous section, the invention adopts the thought design action of load adjustment in order to reduce the action quantity and improve the decision effect of the model. The idea converts the distribution of a load instruction into a gradual adjustment process of a load distribution scheme, and comprises two types of actions of load adjustment and scheme output. The model sets a fixed unit load delta N according to the required load discrete precision, the load adjustment action can adjust a unit load between two starting units to form a new load distribution scheme, as shown in fig. 2 (b), after adjusting a unit load from the starting unit No. 2 to the starting unit No. 1, the initial solution is carried out I.e. change to +.>Wherein, the liquid crystal display device comprises a liquid crystal display device,this means that the unit load distribution scheme is from +.>Changes to->The power generation flow of the power station is represented by Q ⁰ Change to Q ¹ . The scheme output action means ending the optimization of the load distribution scheme and outputting the scheme. In order to ensure the accuracy of learning effect and decision result, when the agent selects action, some unreasonable actions can be avoided through simple judgment: (1) Endless load-adjusting actions, e.g. after transferring the load out of the group 1, transferring the load back to the group; (2) An action of infeasible solution may be generated, for example, an action that will cause the unit load to fall into the vibration region, causing the unit load to be negative.

(4) Instant rewards

The calculation formula of the instant prize r of each action is as follows:

where r is the instant reward of the deep reinforcement learning model, s _j Sum s _j+1 The state before and after the execution of the action,and->The output of the unit i before and after the action is executed, H is the current water head of the power station, +.>And->The output of the machine set is +.>And->Power generation flow rate when water head is H, Q _g,t When the power station needs to store water, Q is the power generation flow which is needed to be achieved for completing the power station water quantity scheduling target at the moment t _g,t Can be taken as the minimum drainage flow of the power station, and when the water level control requirement exists in the power station, Q _g,t The calculation formula of (2) is as follows:

Q _g,t ＝Q _in,t -Q _flood,t +(V(Z _t )-V(Z _g,t ))/△t

And S12, realizing a decision by an agent based on a load optimization distribution model of deep reinforcement learning, wherein the agent is a neural network containing a large number of parameters. The neural network needs to learn a large amount of data to update the parameters of the neural network, and the invention uses the warehousing flow Q in the actual dispatching operation of a power station in recent years _in Flood discharge flow rate Q _flood Flow rate Q of power generation _generate Generating flow Q required to be achieved for completing power station water quantity scheduling target _g,t Upstream water level Z and unit load N _in-plant The data builds a historical operating database of the operation of the power station. The power station operation data training model in the history operation database is applied, and the specific training steps are as follows:

the first step: randomly generating an initial cost function Q (s, a), taking a counter i=0 and a counter j=0, enabling the training round number to be I, the sampling number to be J, and randomly selecting the probability P of the action _r Generating a sample library which is the same as a power station historical operation database;

and a second step of: in the sample library, a piece of operation data is randomly taken out in a sampling-free mode, and a load instruction N is randomly generated on the premise of keeping the combination of the current starting machine set unchanged _c Feasible load distribution scheme N of (2) _in-plant The state s= [ Q ] of the current moment is constructed by combining the running data and the feasible load distribution scheme _in ,Q _flood ,Q _generate ,Q _g,t ,Z,N _in-plant ]If all the data in the sample library are sampled, regenerating a sample library which is the same as the power station historical operation database, and enabling i=i+1;

and a third step of: the s input load optimizing distribution model at the current moment is used for judging and shielding ineffective actions which lead the load of the unit to fall into a vibration area or be negative, and the sum of the s input load optimizing distribution model is 0,1]Generating a random number P, if P is less than or equal to P _r Randomly selecting an action from the selectable actions as an action a at the current moment, otherwise, selecting an action capable of obtaining the maximum value of the value from the selectable actions as an action a at the current moment;

fourth step: executing action a at the current moment and updating N _in-plant Calculating the instant rewards r obtained by the action to obtain the state s 'at the next moment, storing (s, a, r, s') into an experience pool, enabling j=j+1, extracting J samples from the experience pool according to a PER method if J is more than or equal to J, and updating a cost function Q (s, a) of the intelligent agent by combining the samples, wherein the calculation formula is as follows:

Q'(s,a)＝Q(s,a)+α[r+γmax _a' Q(s',a')-Q(s,a)]

Wherein Q (s, a) and Q ' (s, a) are the cost functions of the pre-update and post-update states (s, a), respectively, Q (s, a) is the initial cost function at the time of the first learning, Q (s ', a ') is the cost function of the state (s ', a '), s and s ' are the states of the current time and the next time, a and a ' are the actions of the current time and the next time, respectively, α is the step size factor, r is the instant prize obtained at the current time, and γ is the attenuation rate.

Fifth step: if the action a at the current moment is a load adjustment action, s=s' is made and the third step is returned, and if the action a at the current moment is a scheme output action, the sixth step is entered;

sixth step: if I is not less than I, training is completed, otherwise, returning to the second step.

After the deep reinforcement learning model is built and trained, the model can be applied to the decision of a real-time load distribution scheme, the flow is shown in figure 3, after the combination of a starting machine set is fixed, the initial load distribution scheme is input into the model, and the model can output the optimized load distribution scheme.

S2, after the power station receives the power grid load, judging whether the current starting machine set combination can complete the load instruction according to the value of the load instruction, wherein the specific steps are as follows: in order to ensure the operation safety of the power grid, after receiving the load instruction, the starting machine set of the current power station needs to be ensured to finish the load instruction, so that the starting and stopping states, the real-time water head H and the load instruction N of the current machine sets of the power station need to be read _c Calculating the minimum output and the maximum output of each unit according to the H; the minimum output of each unit is added to obtain the lower boundary of the output range of the power stationNAdding the maximum output of each unit to obtain the upper boundary of the output range of the power stationThus, the load range born by the power station under the current starting machine set combination condition can be obtainedIf->And (4) keeping the current starting machine set combination unchanged, and entering a step (S4), otherwise, entering a step (S3).

S3, ifThe new unit needs to be started to finish the load instruction, the service life of the unit is prolonged by balancing the starting time of each unit, the unit with the longest current shutdown time is started, and if N _c <NClosing the unit with the longest current starting time; updating electricityAnd repeatedly executing the judgment until the load instruction is within the load range born by the power station.

S51, generating other feasible starting machine set combinations by combining rationality constraint of starting and stopping of the machine set, wherein the specific steps are as follows:

in order to avoid large-scale start-stop of the power station units, on the premise that the current start-up unit combination can finish a load instruction, the invention assumes that the power station can only start or stop one unit at most. Therefore, in the optimization calculation of the starting-up unit combination, two starting-up unit combination schemes of starting up one unit and closing one unit are further set on the basis of the current power station starting-up and stopping state.

And (3) reading load instruction prediction data in a future period of time, and judging whether the two starting machine set combination schemes are reasonable by combining the following constraints:

n _on +n _off ＝n

wherein n is _on For the number of starting units of the power station, n _on >0，n _off For the number of shutdown units of the power station, n _off >And 0, n is the number of units of the power station.

wherein n is _on,t+1 The number of the startup at time t+1 is n _on,t For the number of starting-up machines at time t, N _c For the power grid load command,is the current total output of the power station.

Avoiding an on-off operation contrary to the future load trend, for example, when the unit is turned on, the minimum output of the power station is higher than the minimum load in the predicted period, which will result in the power station being turned off again, and when the unit is turned off, the maximum output of the power station is lower than the maximum load in the predicted period, which will result in the power station being turned on again:

wherein, the liquid crystal display device comprises a liquid crystal display device, _on Nin order to turn on the minimum output of the power station after the unit, _c Nfor the minimum output in the load prediction,for maximum output of the plant after the machine set is turned off, < >>Is the maximum force in load prediction.

When the unit is started or closed, the unit with the longest current shutdown time is started, and the unit with the longest current startup time is closed.

S52, rolling and calculating an in-station load distribution scheme of all feasible startup unit combination schemes in a calculation period by using a load optimization distribution model based on deep reinforcement learning, wherein the method comprises the following specific steps of:

s521, reading a load and flow prediction sequence of the power station in a future period of time, and sequentially reading predicted values of the load and the flow in the prediction sequence; and (3) calculating whether the load predicted value is in the load range of the current starting-up unit combination, if not, updating the starting-up unit combination of the power station by combining the calculation method of the step (S3) until the load instruction can be completed.

wherein f _t Objective function value, Q, which can be obtained for load distribution scheme in power station secondary station _g,t The current time of the power station is the power generation flow which is required to be achieved for completing the water quantity scheduling target, N is the number of units of the power station, and Q (N) _i,t ,H _t ) As a function of the power flow and the unit output and water head, N _i,t For the output of the unit i at the time t, H _t The real-time water head of the hydropower station, c is the number of times of start-stop, Q _p Punishment flow for start-up and shut-down.

S523, the predicted values of the next load and flow are sequentially read, and the step returns to S522 until the predicted data of all loads and flows are read and calculated.

S53, calculating the total objective function value which can be obtained by each scheme in the forecasting period, selecting the optimal startup unit combination by combining the value, and updating the startup unit combination of the power station, wherein the specific steps are as follows: summing the objective function values obtained by the load distribution schemes in the secondary stations in the forecasting period to obtain the total objective function value obtained by the schemes in the forecasting period; and selecting the starting machine set combination scheme with the maximum total objective function value as an optimal starting machine set combination scheme, and updating the starting and stopping states of the power station according to the scheme.

S6, combining the current startup and shutdown state of the power station, generating a real-time load distribution scheme under the startup unit combination by using a load optimization distribution model based on deep reinforcement learning, wherein the calculation flow is shown in figure 3, inputting an initial load distribution scheme into the model after the startup unit combination is fixed, and outputting the optimized load distribution scheme by the model.

The embodiment of the invention is specifically explained through the real-time in-station load distribution of the large-ferry river basin and stream ditch power station. The great river is located in the middle-west of Sichuan province of China and is the largest tributary of Minjiang. Full length 1062km of main flow of large-ferry river and area 77400km of river basin ² Annual diameter flow 470 hundred million m ³ The water energy is rich, and the water energy is the fifth largest hydropower base in China. Currently, 22 hydropower stations are built in large-ferry rivers. The hydropower station for the deep stream ditch is an 18 th hydropower station on the main stream of the large-river, 4 axial-flow units are arranged on the hydropower station, the single-machine capacity is 16.5 ten thousand kW, and the total assembly machine capacity is 66 ten thousand kW. The hydropower station for the deep stream ditches is one of hydropower stations for bearing peak regulation and frequency modulation tasks of the Sichuan power grid, and has large fluctuation of load instructions and high response speed requirement. As the deep-stream ditch power station is a daily regulation power station, the reservoir capacity of the deep-stream ditch power station is smaller and is influenced by peak regulation and frequency modulation requirements, and the problems of insufficient generation benefit exertion in the dead period, large fluctuation of water level flow in the flood period and the like exist. The hydropower station real-time load distribution method based on deep reinforcement learning is applied to solving the real-time load distribution problem in the hydropower station of the deep stream ditch:

Because the running targets of the dead water period and the flood season of the power station are different, the power generation flow is required to be reduced in the dead water period, the power generation benefit is improved, the flood discharge flow is required to exist in the power station in the flood season, and the unit is required to regulate and control the power generation flow in cooperation with the water level control target of the power station, so that a historical running database of the power station is built for the dead water period and the flood season respectively, and two load optimization distribution models based on deep reinforcement learning are built.

The historical scheduling operation database of the power station dead water period is constructed by using the power station operation data of the deep-brook power station 2017-2019 for five minutes in 1 month, and the historical scheduling operation database of the power station flood period is constructed by using the power station operation data of the deep-brook power station 2017-2019 for 8 months.

And combining the set setting condition of the deep-stream ditch power station, and designing load adjustment and scheme output by taking load adjustment as an idea. The scheme output action is taken as a number 0 action, in addition, the load adjustment action comprises 12 possible load adjustment situations among four units, the load adjustment situations are recorded as number 1-12 actions, the unit load which can be adjusted among the starting units by each action is 1MW, and the unit serial numbers corresponding to the actions are shown in table 1.

TABLE 1

Setting target power generation flow Q of a power station according to water quantity scheduling requirements of the deep-stream ditch power station in dead water period and flood period _g,t . It can be known that the reward function combined with the load optimization distribution model based on deep reinforcement learning is that Q _g,t When the load optimization distribution model is set to be a fixed and small enough value, the power generation flow of the power station can be approximated to the value so as to achieve the effect of minimum water consumption, and therefore, the load optimization distribution model based on deep reinforcement learning in the dead water period can be used for optimizing the Q _g,t Fixed as ecological flow of the deep-stream ditch power station, 327m ³ According to the invention, the water level target scheduled in the flood season of the power station is fixed to be the middle water level of the power station, 657.5m, and the water level is used as the target water level to calculate Q _g,t . The parameters of the two models are shown in table 2.

TABLE 2

Parameters (parameters)	Model for dead water period	Flood season model
			Network structure	128×1024×512	256×1024×512
Memory pool capacity	1,000,000	1,000,000
			Number of samples	64	128
Number of iterative training	200	200
			Discount factor	0.9	0.9
Learning rate	0.001	0.0005

And (3) training a model by using the power station operation data in the historical operation database, and taking the completion of random sampling learning of all data in the training set as one round. The power station operation data of 2017, 2018 and 2019 from 1 to 20 are used as training sets for real-time load distribution in the dead water period station, and the power station operation data of 2017, 2018 and 2019 from 1 to 20 are used as training sets for real-time load distribution in the flood period station. The training of the intelligent agent on the two training sets is respectively carried out for 100 rounds, and accumulated rewards obtained by the intelligent agent during the training are shown in fig. 4, wherein (a) in fig. 4 is an accumulated rewards situation obtained by the intelligent agent on the real-time load distribution training set in the dead water period station for 100 rounds of training, and (b) in fig. 4 is an accumulated rewards obtained by the intelligent agent on the real-time load distribution training set in the flood period station for 100 rounds of training. The learning effect of the intelligent agent in the two training is similar, the cumulative reward value obtained in the first 10 rounds of learning is greatly improved, but the cumulative reward value has larger fluctuation in the first 50 rounds of learning, mainly because the intelligent agent can take random action in the earlier exploration process, the cumulative reward value still has small improvement in the learning process of 50 to 80 rounds, and the fluctuation of the cumulative reward value gradually reduces and tends to converge after 80 rounds of learning, so that the model has learned a better load distribution strategy.

The trained model is used for real-time scheduling decisions of a power station, actual running data of the deep-brook power station from 1 month 21 days to 23 days and five minutes by five minutes is used as test data of real-time load distribution in a dead water period station, total power generation water consumption is used as an evaluation index, power station running data of the deep-brook power station from 8 months 21 days to 23 days is used as test data of real-time load distribution in a flood period station, and accumulated deviation between a running water level and a target water level is used as an evaluation index.

In order to verify the effectiveness of the model provided by the invention, the load distribution total table with the load discrete precision of 2MW is used for carrying out real-time load distribution decision in the station and is compared with the real-time load distribution decision. In order to verify whether the rolling calculation taking the forecast information into consideration can improve the start-stop scheme of the unit, respectively setting comparison schemes taking the forecast information into consideration and taking no forecast information into consideration, and realizing load distribution calculation of the two schemes by using a load distribution summary table. In order to verify whether the D3QN algorithm can give consideration to decision efficiency and accuracy, the D3QN algorithm and a real-time load distribution scheme in a load distribution total calculation station are respectively applied. In combination with the above analysis, three comparison schemes were set up as follows:

scheme 1: and constructing a load distribution summary table in advance, and calculating a load distribution scheme by using the load distribution summary table. When the starting machine unit makes a combined decision, only real-time load instructions are considered, and flow penalties are set for starting and stopping.

Scheme 2: the method comprises the steps of constructing a load distribution summary table in advance, considering rolling calculation of forecast information, and calculating a load distribution scheme by using the load distribution summary table. The forecast data is load and flow forecast data of five minutes in the next two hours.

Scheme 3: the method comprises the steps of training a deep reinforcement learning model by historical data in advance, calculating the load distribution scheme by taking rolling of forecast information into consideration, and calculating the load distribution scheme by using a trained load optimization distribution model based on deep reinforcement learning. The forecast data is load and flow forecast data within two hours in the future.

Carrying out in-station load distribution on power station operation data in a dead water period test set and a flood period test set by respectively applying a scheme 1, a scheme 2 and a scheme 3, wherein indexes and unit start-stop times related to power station water quantity scheduling targets of each scheme are shown in a table 3, the power station operation water level on the dead water period test set is shown in fig. 5 (a), and the power station operation water level on the flood period test set is shown in fig. 5 (b); the change situation of the starting numbers of all schemes on the dead water period test set is shown in fig. 6 (a), the unit load process of the dead water period test set scheme 1 is shown in fig. 6 (b), the unit load process of the dead water period test set scheme 2 is shown in fig. 6 (c), and the unit load process of the dead water period test set scheme 3 is shown in fig. 6 (d); the change condition of the startup numbers of the schemes on the flood period test set is shown in fig. 7 (a), the unit load process of the scheme 1 in the flood period test set is shown in fig. 7 (b), the unit load process of the scheme 2 in the flood period test set is shown in fig. 7 (c), and the unit load process of the scheme 3 in the flood period test set is shown in fig. 7 (d).

TABLE 3 Table 3

The method provided by the invention has the advantages compared with the three aspects of unit operation strategy, optimizing effect and calculation time consumption:

in terms of unit operation strategies, both scheme 1 and scheme 2 employ a total load distribution table calculation load distribution scheme, but scheme 2 adds rolling calculation to forecast data. Compared with the scheme 1, the scheme 2 has better effect in the evaluation index of water quantity scheduling although the unit start-stop times in the case of minimum water consumption and the water level control case are increased once after the forecast information is considered. The method verifies that the rolling calculation combined with the forecast information can effectively optimize the start-stop strategy of the unit according to the future load and flow trend of the power station and the water quantity scheduling target of the power station, so that the power station achieves a better water quantity scheduling effect.

In the aspect of optimizing effect, forecast information is considered in both the scheme 2 and the scheme 3, but when the load distribution scheme is calculated, the scheme 2 uses a total load distribution table with the load discrete precision of 2MW, the scheme 3 uses a load optimization distribution model with the load discrete precision of 1MW and based on deep reinforcement learning, and compared with the scheme 2, the scheme 3 achieves better effect in the evaluation index of water quantity dispatching, so that the load optimization distribution model based on the deep reinforcement learning can effectively optimize the real-time load distribution scheme in a station of a power station.

The calculation time for each protocol is shown in table 4.

TABLE 4 Table 4

	Case in dead water period(s)	Case in flood season(s)
			Scheme 1	168.27	174.64
Scheme 2	2132.63	1845.30
			Scheme 3	201.75	278.69

As the calculation amount of the model is greatly increased for the rolling calculation of the forecast information, the average time of single decision of the scheme 2 is respectively 4.94s and 2.14s in the case of dead water and the case of flood season, which are respectively about 13 times and 11 times of the scheme 1, and the time requirement of real-time load distribution is difficult to meet. The forecast information is considered in both the scheme 2 and the scheme 3, and the load precision of the scheme 3 is higher than that of the scheme 2, but the average time of single decisions in the case of dead water is 0.23s, only 4.66% of that in the scheme 2, and the average time of single decisions in the case of flood season is 0.32s, only 14.95% of that in the scheme 2. The method provided by the invention has the advantages that the decision accuracy and the speed of load distribution can be considered, and the response effect and the response speed of the power station to the power grid load instruction can be effectively improved.

Claims

1. A hydropower station real-time load distribution method based on deep reinforcement learning is characterized by comprising the following steps:

2. The hydropower station real-time load distribution method based on deep reinforcement learning according to claim 1, wherein the state in step S1 is a one-dimensional tensor [ Q ] _in ,Q _flood ,Q _generate ,Q _g,t ,Z,N _in-plant ]Wherein Q is _in To put in storage flow, Q _flood To discharge flood, Q _generate To generate electricity flow, Q _g,t For the power generation flow required to achieve the power station water quantity scheduling target at the moment t, Z is the upstream water level and N _in-plant Is the load of the unit;

the calculation formula of the instant prize r is:

3. The hydropower station real-time load distribution method based on deep reinforcement learning according to claim 2, wherein the power generation flow Q required to be achieved for completing the power station water quantity scheduling target is calculated _g,t To optimize the power generation flow of the power station, when the power station needs water storage, Q _g,t Can be taken as the minimum drainage flow of the power station, and when the water level control requirement exists in the power station, Q _g,t The calculation formula of (2) is as follows:

Q _g,t ＝Q _in,t -Q _flood,t +(V(Z _t )-V(Z _g,t ))/△t

4. The hydropower station real-time load distribution method based on deep reinforcement learning according to claim 1, wherein the training process of the load optimization distribution model based on deep reinforcement learning in step S1 specifically comprises the following steps:

(1) Randomly generating an initial cost function Q (s, a), taking a counter i=0 and a counter j=0, enabling the training round number to be I, the sampling number to be J, and randomly selecting the probability P of the action _r GeneratingA sample library identical to the power station historical operation database;

Q'(s,a)＝Q(s,a)+α[r+γmax _a' Q(s',a')-Q(s,a)]

5. The hydropower station real-time load distribution method based on deep reinforcement learning according to claim 1, wherein the method for generating the rest feasible startup unit combination by combining the rationality constraint of the startup and shutdown of the unit in the step S5 is as follows:

n _on +n _off ＝n

wherein n is _on For the number of starting units of the power station, n _on >0，n _off For the number of shutdown units of the power station, n _off >0, n is the number of units of the power station;

avoiding start-stop operations against future load trends:

wherein, the liquid crystal display device comprises a liquid crystal display device, _on Nis opened toThe minimum output of the power station after the machine set is started, _c Nfor the minimum output in the load prediction,for maximum output of the plant after the machine set is turned off, < >>Maximum output in load prediction;

6. The hydropower station real-time load distribution method based on deep reinforcement learning according to claim 1, wherein in step S5, the load distribution scheme in the station of all the available startup assembly schemes in the calculation period and the objective function value which can be obtained by the load distribution scheme in the station are calculated by rolling by using the load optimization distribution model based on deep reinforcement learning, and the specific steps are as follows:

wherein f _t For the load in the secondary station of the power stationObjective function value, Q, obtainable by allocation scheme _g,t The current time of the power station is the power generation flow which is required to be achieved for completing the water quantity scheduling target, N is the number of units of the power station, and Q (N) _i,t ,H _t ) As a function of the power flow and the unit output and water head, N _i,t For the output of the unit i at the time t, H _t The real-time water head of the hydropower station, c is the number of times of start-stop, Q _p Punishment flow for start-up and shutdown;

7. The hydropower station real-time load distribution method based on deep reinforcement learning according to claim 1, wherein in step S5, the objective function values obtained by the load distribution schemes in each secondary station in the forecast period are summed to obtain a total objective function value obtained by each scheme in the forecast period; and selecting the starting machine set combination scheme with the maximum total objective function value as an optimal starting machine set combination scheme, and updating the starting and stopping states of the power station according to the scheme.

8. Hydropower station real-time load distribution system based on deep reinforcement learning, which is characterized by comprising:

9. An apparatus device comprising a memory and a processor, wherein:

a memory for storing a computer program capable of running on the processor;

a processor for performing the steps of a hydropower station real-time load distribution method based on deep reinforcement learning as claimed in any one of claims 1-7 when running said computer program.

10. A storage medium having stored thereon a computer program which, when executed by at least one processor, implements the steps of a hydropower station real-time load distribution method based on deep reinforcement learning as claimed in any one of claims 1-7.