CN116683530A

CN116683530A - Wind-light-containing hybrid type pumping and storing station cascade reservoir random optimization scheduling method

Info

Publication number: CN116683530A
Application number: CN202310481214.4A
Authority: CN
Inventors: 周佳妮; 李文武; 范钟耀; 张一凡
Original assignee: China Three Gorges University CTGU
Current assignee: China Three Gorges University CTGU
Priority date: 2023-04-28
Filing date: 2023-04-28
Publication date: 2023-09-01

Abstract

The invention provides a wind-light-containing hybrid pumping and storing station cascade reservoir random optimization scheduling method, which comprises the following steps: constructing a water, wind and light random scene by solving Markov state transition probability based on historical data; providing an ISODATA algorithm based on the relevant distance to perform scene reduction, and constructing a medium-long-term scheduling environment of the water-wind-solar complementary system through combination; respectively constructing a long-term random optimization scheduling model in a water-wind-solar complementary system, and a long-term and medium-term coupling optimization scheduling model; the method comprises the steps of providing a medium-long term random optimization scheduling strategy by adopting a PER-DQN algorithm, and a Q-learning algorithm to solve medium-long term and short term coupling optimization scheduling strategy; and (3) constructing a real-time dispatching simulation model, compensating the prediction deviation by using water and electricity, and rolling and updating the step reservoir dispatching process of the hybrid pumping and storing station. By the method, the economical efficiency, the reliability and the stability of the water-wind-solar complementary system can be improved while the calculation efficiency is improved.

Description

Wind-light-containing hybrid type pumping and storing station cascade reservoir random optimization scheduling method

Technical Field

The invention relates to the field of new energy and reservoir optimal scheduling research, in particular to a wind-light-containing hybrid pumping and storing station cascade reservoir random optimal scheduling method.

Background

In recent years, new energy consumption studies have been actively conducted in various countries in order to cope with global climate change. At present, the full-force propulsion renewable energy source of China is advanced in a high-quality jump way, and the installed scale of hydropower, wind power, photovoltaic power generation and biomass power generation of China is continuously and stably located at the top of the world for many years. However, with the rapid development of renewable clean energy, the uncertainty of wind power and photoelectricity brings greater pressure to new energy consumption and safe and stable operation of an electric power system. The pumped storage power station is used as a green low-carbon flexible regulating power supply of a power system with the most mature technology, optimal economical efficiency and the most large-scale development condition, and has good matching effect with wind power, photovoltaic power generation, hydropower, nuclear power, thermal power and the like. Therefore, the optimization scheduling is necessary for the complementary system containing the pumped storage power station, the wind discarding, the light discarding and the water discarding are reduced, and the reliability and the stability of the power system are ensured on the premise of maximizing the economical efficiency of the complementary system.

Disclosure of Invention

The invention aims to overcome the defects, and provides a wind-light-containing hybrid pumping and storing station cascade reservoir random optimization scheduling method, which solves the problems of larger prediction errors of medium-long-term runoff, wind power output and photovoltaic output and the reduction of stability and reliability of a power system caused by uncertainty and fluctuation of wind power and light output.

The invention aims to solve the technical problems, and adopts the technical scheme that: a wind-light-containing hybrid pumping and storing station cascade reservoir random optimization scheduling method comprises the following steps:

step 1: carrying out randomness analysis on wind power output, photovoltaic output and cascade reservoir water supply based on historical data, and respectively constructing water, wind and light random scenes scheduled for a long time in a water-wind-light complementary system;

step 2: providing an ISODATA algorithm based on the relevant distance to respectively carry out scene reduction on water, wind and light random scenes, and constructing a medium-long-term scheduling environment of a water-wind-light complementary system;

step 3: constructing a long-term random optimization scheduling model in a water-wind-solar complementary system by taking maximum system power generation as a target;

step 4: taking the output of each scheduling period obtained by medium-long term optimization scheduling as boundary constraint of a short-term scheduling model, and constructing a medium-long term and short-term coupling optimization scheduling model of a water-wind-solar complementary system with minimum residual load mean square error, minimum system output fluctuation and minimum total output deviation square value as targets from three angles of reliability, stability and economy of the complementary system;

step 5: solving a medium-long term random optimization scheduling strategy based on reinforcement learning PER-DQN algorithm;

Step 6: based on a reinforcement learning Q-learning algorithm, solving a short-term optimization scheduling strategy of a medium-long term and short-term coupling optimization scheduling model, updating a medium-long term random optimization scheduling strategy, and determining a day-ahead power generation plan;

step 7: and constructing a real-time dispatching simulation model with hydro-power compensation water-wind-solar prediction deviation, and rolling and updating a step reservoir dispatching process of the hybrid pumped storage power station according to a day-ahead power generation plan, the actual wind-solar output number and radial flow data.

Preferably, in the step 1, the method for constructing the water, wind and light random scene is as follows:

s1.1: the random variable history data of each stage is discretized into discrete values with the same quantity by taking the maximum value and the minimum value as the range as each state of each stage of the Markov decision process, and a pearson correlation test method is adopted for the correlation of the state corresponding to the random value of different time periods, as shown in a formula (1):

wherein: cov(s) _t ,s _t+1 ) Representing a correlation coefficient between random variable states of adjacent time periods and corresponding random values; n is the total number of discrete random values for each stage of the random variable,the i-th discrete random value of the t period and the t+1 period respectively,/and-> The i-th discrete random value mean value and sigma of the t period and the t+1 period respectively ^t 、σ ^t+1 The mean square error of the ith discrete random value in the t period and the t+1 period respectively;

if the correlation between two adjacent time periods is strong, the two adjacent time periods accords with the Markov characteristic, the discrete random values of the adjacent stages can be represented by Markov transition probabilities, otherwise, the discrete random values of the adjacent stages are mutually independent, namely, the discrete values of the adjacent stages are possibly transitioned, and each element in the state transition probability matrix is replaced by the reciprocal of the discrete total number of each stage;

s1.2: based on historical data, counting observed values of wind power output, photovoltaic output and step reservoir inflow water of each scheduling period of the past year to obtain corresponding frequency of a discrete value transition process between each adjacent stage according with Markov characteristics, solving to obtain a Markov state transition probability matrix of each stage, and integrating state transition probabilities between states to form a state transition probability matrix of each adjacent stage;

s1.3: and directly sampling the water, wind and light length series time numbers and the corresponding state transition probabilities thereof as water, wind and light random scenes by a Monte Carlo sampling method according to the state probability matrix of each stage.

Preferably, in the step 2, the scene is cut by using an ISODATA algorithm based on the relevant distance, and a method for constructing a long-term scheduling environment in the water-wind-light complementary system is provided, because in a random scene of the water-wind-light complementary system, a random value is represented by a long series of time series data, and a distance function which is commonly used in a clustering algorithm and represents a distance between two points is difficult to describe the distance between the two time series data, when the random scene of the water-wind-light complementary system is cut, the clustering is carried out by using the relevant distance, and the specific process is as follows:

S2.1: inputting random variable time sequence data; preselection of N _c Initial clustering centerExpected number of cluster centers K, minimum number of samples θ in each cluster domain _N (less than this number, not as an independent cluster), standard deviation θ of sample distance distribution in a cluster domain _S Minimum distance θ between two cluster centers _c (if the number is smaller than the number, the two clusters need to be combined), the maximum logarithm L of cluster centers which can be combined in one iteration operation and the number I of the iteration operation;

s2.2: calculating the related distance between each sample data and the cluster center, and distributing N samples to the nearest cluster S _j ；

S2.3: if cluster S _j The number of samples in (a) is less than theta _N If the cluster is not established, the number N of cluster centers _c Subtracting 1 and restarting clustering;

s2.4: recalculating the clustered parameters, including a clustering center Z _j Related distance D from sample data in cluster to cluster center _j Total correlation distance of all sample data to corresponding center

Wherein, the clustering center is shown as a formula (2):

wherein Z is _j Is the j-th cluster center; n (N) _j The number of samples in the j-th cluster; x represents a sample in the cluster; s is S _j Is the j-th cluster;

the relative distance from the sample data in the cluster to the cluster center is shown as a formula (3):

Wherein: d (D) _j Representing the relative distance from the sample data to the clustering center; cov (X, Z) _j ) For the series X and Z _j Is a covariance of (2); var (X), var (Z) _j ) Respectively the series X and Z _j Is a covariance of (2);

the total correlation distance of all sample data to the corresponding center is shown in formula (4):

in the method, in the process of the invention,the relevant distance from the sample in all clusters to the corresponding cluster center is calculated; n (N) _c The number of the clustering centers; n (N) _j The number of samples in the j-th cluster;The related distance from the sample in the j-th cluster to the cluster center is set;

s2.5: stopping split and merge calculations

If the iteration number is equal to the iteration threshold, the process goes to S2.6, and θ is calculated _c Setting 0;

if N _c K/2 is not more than, namely the number of clusters is too small, and the process is switched to S2.6;

if N _c Not less than 2K, namely, the number of clusters is excessive, and then the process goes to S2.8;

if K/2 is less than or equal to N _c 2K is less than or equal to, and when the iteration is even, the process goes to S2.9, otherwise, the process goes to S2.7;

s2.6: calculation from distance valuesStandard deviation vector of sample data in clusters: sigma (sigma) _j ＝(σ _1j ,σ _2j ,…,σ _πj ) ^T The method comprises the steps of carrying out a first treatment on the surface of the Wherein the vector component is represented by formula (5):

wherein i represents the sample feature vector dimension; j represents the serial number of the cluster; n (N) _j Representing the total number of samples in the j-th cluster; x is X _ik Representing the kth sample in the ith sample feature vector dimension; z is Z _ij Representing the jth cluster center in the ith sample feature vector dimension. And get sigma _j Maximum value sigma of (a) _jmax ；

S2.7: if sigma _jmax Greater than theta _S And satisfy the followingOr N _j >2(θ _N +1) under any conditions, z will be _j Split into two clusters, and σ _jmax 1 is added. Adding 1 to the iteration times, and converting to Step2.1;

s2.8: calculating a correlation distance D between cluster centers _ij ；

S2.9: comparison D _ij And theta _c Will satisfy D _ij <θ _c The values of (2) are arranged in descending order;

s2.10: combining the two adjacent cluster centers after arrangement, calculating according to the formula (6) to obtain a new cluster center, and N _c Subtracting 1;

in the method, in the process of the invention,is a new cluster center;Is the ith after rearrangement _k Sample number in cluster centerA number;Is the j th after rearrangement _k The number of samples in the cluster center;Is the ith _k A cluster center;Is j th _k A cluster center; l is the number of cluster centers left in the process;

s2.11: if the iteration threshold I is reached, or the algorithm is not splitting or merging (cluster center is no longer changing), the algorithm ends. Otherwise, the algorithm iterates again;

s2.12: outputting a clustering center;

and taking the clustering center of the time sequence data as a representative random scene of water, wind and light, and combining to obtain the random scene of the water-wind-light system. And the Markov state transition probability matrix of each stage is compared to obtain the Markov state transition probability of each stage corresponding to the time sequence data of the clustering center, and a long-term scheduling environment in the water-wind-solar complementary system is constructed.

Preferably, in the step 3, the long-term random optimization scheduling model in the water-wind-solar complementary system uses the maximum power generation amount of the system as a target, and schedules the system by using the day as a scheduling period, so that the total power generation amount of the water-wind-solar complementary system is maximum, and the total power generation amount of each scheduling period of the system is shown as a formula (7):

wherein: r is R _t Representing the total power generation amount of the water-wind-light step complementary system in the t period; delta T _t The time length of the scheduling period is the time length of the reservoir t period; n (N) _i,t The output of the step reservoir t period of the hybrid pumped storage power station is contained; n (N) _w,t Representing the total wind power output in the t period; n (N) _s,t Representing the total output of photovoltaic power generation in a t period; n is the total number of step reservoirs;generating flow for the nth level reservoir in the t period; a is that ⁿ The comprehensive force coefficient of the reservoir at the nth level;The average power generation water head is the t-period water head of the nth level reservoir and the n+1th level reservoir; k (K) _t K is the working condition of the t-period reversible water pump turbine set _t =1 is the power generation condition, K _t =0 is pumping condition; e (E) _h,t 、E _p,t The generated energy and the pumped water power consumption of the water turbine unit of the water pump with the reversible time period t are respectively shown as the formula (8):

wherein: e (E) _h,t 、E _p,t Respectively generating capacity and pumping electricity consumption of the reversible water pump turbine set in the t period; p (P) _h,t 、P _P,t The power generation power and the pumping power of the water pump hydroelectric generating set are respectively t time periods; η (eta) _h 、η _p The power generation efficiency and the pumping efficiency of the reversible water pump hydroelectric generating set are respectively; h _t An average power generation head for the period t; q (Q) _h,t 、Q _p,t The flow rates of the power generation working condition and the pumping working condition of the t-period reversible water pump hydroelectric generating set are respectively; η (eta) _wp Is pipeline efficiency; delta T _t Scheduling a time length for a t period; ρ is the density of water; g is gravity acceleration;

the cascade reservoir water quantity balance constraint of the water-storage power station with the mixed pump is as shown in the formula (9):

wherein:the initial and final storage capacities of the nth level reservoir in the period t are respectively;The storage flow and the delivery flow of each reservoir in the period t of the nth level reservoir are respectively;Water is supplied to the section of the nth level reservoir, and +.>I.e. the first-stage reservoir has no interval water supply; q (Q) _h,t 、Q _p,t The flow rates of the power generation working condition and the pumping working condition of the t-period reversible water pump hydroelectric generating set are respectively; delta T _t The time length of the scheduling period is the time length of the reservoir t period;

step water level constraint, as shown in formula (10):

wherein:respectively the minimum limit water level and the maximum limit water level of the nth level reservoir at the beginning of the t period;The water level is the water level of the nth level reservoir at the beginning of the period t;

the power generation flow rate is constrained as shown in the formula (11):

wherein:the maximum machine passing flow rate of the nth-level reservoir is set;Generating electricity flow for the nth level reservoir.

The generation power and the pumping power of the reversible water pump hydroelectric generating set are limited as shown in a formula (12):

Wherein: p (P) _h,t 、P _P,t The power generation power and the pumping power of the water pump hydroelectric generating set are respectively t time periods; p (P) _h,max 、P _p,max The maximum power generation and the maximum pumping power of the reversible water pump hydroelectric generating set are respectively;

wind, light output limit, as shown in equation (13):

wherein: n (N) _w,t Representing the wind power output in the t period; n (N) _s,t Representing a photovoltaic output; n (N) _w,max Limiting the maximum output of wind power; n (N) _s,max Is the photovoltaic maximum output limit.

Preferably, in the step 4, the medium-long term and short term coupling optimization scheduling model of the water-wind-solar complementary system calculates the short term scheduling strategy by using the system daily output in the medium-long term scheduling strategy as the boundary constraint of the total output deviation of the short term scheduling system, and feeds back the short term scheduling result to the medium-long term scheduling, and updates the scheduling process of the medium-long term scheduling calculation scheduling strategy; in addition, the short-term scheduling is performed at an hour level, so that the daily residual load mean square error of the water-wind-solar complementary system is minimum, the fluctuation of the system output force is minimum and the square value of the total output force deviation is minimum;

wherein, the residual load mean square error is represented as shown in formula (14):

wherein: f (f) ₁ The mean square error of the residual load; t is the total number of time periods; n (N) _i,t 、N _w,t 、N _s,t The output of the step reservoir of the hybrid pumped storage power station in the t period, the total output of wind power in the t period and the total output of photovoltaic power generation in the t period are respectively included; p (P) _t Load demand for period t;

the system output fluctuation is expressed by a coefficient of variation as shown in formula (15):

wherein: f (f) ₂ The fluctuation coefficient of the system output is used as the fluctuation coefficient of the system output; t is the total number of time periods; n (N) _i,t 、N _w,t 、N _s,t The output of the step reservoir of the hybrid pumped storage power station in the t period, the total output of wind power in the t period and the total output of photovoltaic power generation in the t period are respectively included;the total output average value of the complementary system;

the minimum square value of the total force deviation of the system is represented as shown in the formula (16):

f ₃ ＝(C _d -N _d ) ² (16)

wherein: f (f) ₃ Punishment for the output; c (C) _d Total output on day d obtained by the medium-long term scheduling model; n (N) _d Total output on day d, calculated by the short schedule model;

the three objective functions are converted into a single-objective comprehensive objective function through a weighting method, as shown in a formula (17):

wherein: f is a short-term scheduling comprehensive objective function; omega ₁ 、ω ₂ 、ω ₃ Respectively the objective function f ₁ 、f ₂ 、f ₃ Weight coefficient of (c) in the above-mentioned formula (c).

Preferably, in the step 5, the method for solving the medium-long term random optimization scheduling policy based on the reinforcement learning PER-DQN algorithm considers the medium-long term random optimization scheduling of the hybrid pumping and storing station step reservoir of the wind-solar access as a multi-stage decision problem, and solves the optimal decision sequence of the problem, namely the optimal scheduling policy, by taking each scheduling period as each stage and based on the reinforcement learning PER-DQN algorithm;

Priority experience playback by taking time difference (temporal difference, TD) errors as priority indexes and combining a summation tree (sumtree) algorithm, the sampling of sample priority is realized; the larger the TD error is, the larger the prediction accuracy is, the higher the sample has higher learning value and the higher the priority is; the Sumtre algorithm samples by adopting a binary tree structure; the priority of each leaf node in the binary tree corresponds to the sample, and other nodes have no practical significance; the priority of the father node is the sum of the priorities of the two child nodes, and the priority of the root node is the sum of all the sample priorities; when training data, firstly dividing the sample into a plurality of sections, dividing the priority of the root node by the total sample number in each batch, randomly pumping a number in each section, and determining the final sampling data from the leaf nodes searched from the root node from top to bottom.

Therefore, the priority based on the TD error is defined as shown in equation (18):

wherein: delta _i Representing the timing differential error of the ith sample; p (P) _i Priority of the ith sample; epsilon is a very small constant, and the prevention priority is 0; r is (r) _t+1 A benefit, namely action rewards value, for the t+1 period; q(s) _t ,a _t The method comprises the steps of carrying out a first treatment on the surface of the θ) is in state s _t Action a _t When the parameters of the main neural network are theta, the calculated action value is obtained; q(s) _t+1 ,a _t+1 ；θ ^- ) To be in state s _t+1 Action a _t Under the condition that the fixed target network parameter is theta ^- The calculated action value, namely, the Q target value; gamma is the discount rate, which is used to describe the effect of future benefits on the current; θ is a network parameter of the main neural network; θ ^- Network parameters for a fixed target network; s is(s) _t 、s _t+1 The states of random variables of the t period and the t+1 period respectively; a, a _t 、a _t+1 Actions taken for the t period, t+1 period, respectively;

in addition, the Q value and the neural network parameter updating of the PER-DQN algorithm are the same as those of the DQN algorithm, and the neural network parameter and the Q value are updated through continuous iteration, namely the agent training, so that the agent decision capability is improved, and the optimal scheduling strategy is obtained;

as shown in expression (19):

wherein: l (θ) is the timing differential error; q (Q) ^k (s _t ,a _t ；θ)、Q ^k+1 (s _t ,a _t The method comprises the steps of carrying out a first treatment on the surface of the θ) represent the kth, k+1 training states s, respectively _t Action a _t When the main neural network parameter is theta, the calculated action value, namely the Q estimated value; q (Q) ^k (s _t+1 ,a _t+1 ；θ ^- ) To be in state s _t+1 Action a _t+1 Under the condition that the fixed target network parameter is theta ^- The calculated action value, namely, the Q target value; gamma is the discount rate, which is used to describe the effect of future benefits on the current; θ is a network parameter of the main neural network; θ ^- Network parameters for a fixed target network; r is (r) _t+1 A benefit, namely an action rewarding value, of t+1 time period;a gradient representing a loss function; α is denoted as a learning rate for determining the degree to which the error is learned;

the specific solving steps are as follows:

s5.1: initializing a neural network parameter, an upper limit of training times and updating interval training times of the neural network parameter;

s5.2: inputting relevant data for constructing a long-term scheduling environment in a water-wind-solar complementary system;

s5.3: the time sequence data in the random scene is taken as a sample to be put into an experience pool for priority experience playback;

s5.4: taking out a batch of samples and inputting the samples into a neural network to map the Q value;

s5.5: the intelligent agent interacts with the environment to acquire knowledge and a Q value;

s5.6: based on the time sequence difference and gradient descent ideas, updating the parameters of the main neural network, reducing the prediction deviation of the neural network, updating the Q value, and improving the decision capability of an agent, wherein the decision capability is shown in a formula (19);

s5.7: copying the main neural network parameters to the fixed target network parameters every interval training times of updating the neural network parameters;

s5.8: repeating the steps S3 to S4 until the intelligent agent reaches the final state to complete one training;

s5.9: repeating S5 to the upper limit of training times;

s5.10: and outputting a medium-long-term scheduling strategy of the water-wind-solar complementary system.

Preferably, in the step 6, a method for solving the short-term optimization scheduling strategy in the medium-term and short-term coupling optimization scheduling model based on the reinforcement learning Q-learning algorithm is provided, wherein the Q-learning algorithm constructs a time sequence difference value to update the value estimation at the current time by calculating the difference between the value at the next time and the value at the current time, the Q value table stores the Q values in all states of each stage of the multi-stage decision problem in the updating process, and the old value is always covered with a better Q value in the training process, so that the Q value is close to the optimal action value, thereby improving the decision capability of the intelligent agent and obtaining the most optimal scheduling strategy; aiming at a determined model with a smaller discrete state space, the Q-learning algorithm with a simpler structure can ensure the calculation precision under the condition of smaller occupied memory, thereby saving the calculation cost;

q-learning algorithm Q value update formula expression, as shown in formula (20):

wherein: q (Q) ^k (s _t ,a _t )、Q ^k+1 (s _t ,a _t ) Represent the kth and k+1 training states s respectively _t Take action a down _t The action value obtained later is Q value; q (Q) ^k (s _t+1 ,a _t+1 ) Training state s at the kth time _t+1 Action a _t+1 The action value obtained is Q value;markov transition probabilities for adjacent time period step reservoir runoff states; r is (r) _t+1 A benefit, namely an action rewarding value, of t+1 time period; s is(s) _t 、s _t+1 The states of random variables in the t period and the t+1 period respectively; a, a _t 、a _t+1 Actions taken for the t period, t+1 period, respectively; s random variable state sets; delta _t Representing a timing differential error; alpha is a learning rate for determining the degree to which the error is learned;

the specific solving steps are as follows:

s6.1: initializing a Q value table, wherein the upper limit of training times is N, the state transition probability of the adjacent stage is 1, and the daily number i=0;

s6.2: inputting a short-term predicted value of the ith water, wind and light random variables, and outputting the ith power in a medium-long term scheduling strategy;

s6.3: constructing a definite scheduling environment according to short-term predicted values of random variables of water, wind and light, namely, the transition probability of each stage is 1;

s6.4: the intelligent agent interacts with the environment to make decisions and acquire knowledge;

s6.5: calculating a Q value based on a Q value updating formula (16) of a Q-learning algorithm according to a knowledge sample, comparing the Q value with a Q value in a Q value table, and covering a larger Q value at a corresponding position of the Q value table for updating so as to improve decision capability of an agent;

s6.6: repeating the steps S3 to S4 until the intelligent agent reaches the final state to complete one training;

s6.7: repeating S5 to the upper limit of training times;

s6.8: outputting a short-term scheduling strategy of an ith solar water wind-solar complementary system;

S6.9: taking the final state of the short-term scheduling strategy of the ith water-wind-solar complementary system as the initial state of the medium-long term scheduling, calling a medium-long term random optimization scheduling model of the water-wind-solar complementary system based on a PER-DQN algorithm, and calculating and updating the medium-long term random optimization scheduling strategy;

s6.10: outputting a long-term random optimization scheduling strategy in the water-wind-solar complementary system taking the i+1th day as an initial state;

and repeating the steps to update the short-term scheduling strategy of the water-wind-solar complementary system so as to compile and update a day-ahead power generation plan for the real-time scheduling simulation model.

Preferably, in the step 7, a real-time scheduling simulation model of the water-wind-solar complementary system follows the principle of 'electricity-based water determination', monitors the running state of a hydropower station warehouse in real time strictly according to a day-ahead power generation plan, uses a model containing a hybrid pumped storage power station step reservoir for adjusting the water-wind-solar prediction deviation, updates the water-wind-solar prediction value by a water-wind-solar observation value according to time intervals, performs step reservoir scheduling containing the hybrid pumped storage power station, uses the scheduled water level as an initial water level at the beginning of the next time interval to update the day-ahead scheduling plan, and continuously repeats the process for rolling update;

the method comprises the following specific steps:

S7.1: invoking a long-term and short-term coupling optimization scheduling model in the water-wind-solar complementary system, solving a short-term scheduling strategy through a Q-learning algorithm, and compiling a day-ahead power generation plan;

s7.2: according to a day-ahead power generation plan and actual measurement data of first scheduling periods of the water supply, wind power output and photovoltaic output of the cascade reservoir, compensating deviation of predicted values and the actual measurement data of the water supply, wind power output and photovoltaic output of the cascade reservoir through the cascade reservoir containing the hybrid pumped storage power station;

s7.3: taking the compensated final water level as the initial state of the definite medium-long-term and short-term coupling optimization scheduling model, and updating the daily power generation plan;

s7.4: and repeating the steps to update the cascade reservoir dispatching process of the hybrid pumped storage power station in a rolling way.

The random optimization scheduling method for the step reservoir of the wind-light-hybrid pumping power storage station is provided by considering the randomness of the medium-long term scheduling runoff, the wind power output and the photovoltaic output and combining the medium-long term scheduling, the short term scheduling and the real-time scheduling simulation model, and has the technical effects that:

1) The invention solves the Markov transition probability based on the historical data to construct a Markov chain, generates a random scene corresponding to runoff, wind power output and photovoltaic output, and effectively analyzes and describes the randomness of runoff, wind power output and photovoltaic output in long-term scheduling in a water-wind-solar complementary system.

2) Aiming at the problem of random scene redundancy caused by complex randomness, the invention provides an ISODATA algorithm based on related distances to respectively carry out scene reduction on random scenes of runoff, wind power output and photovoltaic output, and then a long-term scheduling environment in a water-wind-solar complementary system is constructed through combination. The method can lead the random scene to be scheduled for a long time in the constructed water-wind-solar complementary system, reduces the number of scenes on the premise of effectively describing the random characteristic of the scheduling environment, and improves the training speed of the reinforcement learning algorithm for the long-term random scheduling in the water-wind-solar complementary system.

3) Aiming at the problem of inaccurate medium-long term forecast, the invention constructs a medium-long term random optimization scheduling model of a water-wind complementary system which takes a hybrid pumped storage power station step reservoir with wind-light access as an object and takes the maximum total power generation amount as a target. The economy of the water-wind-solar complementary system can be improved from a middle-long term perspective through the model, and a month scheduling strategy of the water-wind-solar complementary system is obtained and used for guiding short-term scheduling.

4) The invention provides a long-term random optimization scheduling model in a water-wind-solar complementary system which is constructed by adopting a PER-DQN algorithm solution. The PER-DQN algorithm is used as an enhanced learning improved algorithm, the difficulty of low dimension disaster problem calculation caused by large discrete space by the traditional enhanced learning is improved based on a Markov decision process under an enhanced learning framework through a deep learning neural network and a priority experience playback technology, the learning speed is accelerated, and a complex optimized scheduling model scheduling strategy with multiple characteristics of high dimensionality, randomness, non-convexity, multiple stages, discretization and the like can be solved more effectively.

5) In order to improve the economy, reliability and stability of the water-wind-solar complementary system, the invention constructs a long-term and short-term coupling optimization scheduling model in the water-wind-solar complementary system, which takes a step reservoir of a hybrid pumped storage power station with wind-solar access as an object and takes the minimum residual load mean square error, the minimum fluctuation of system output force and the minimum square value of total output force deviation of the system as targets.

The model can effectively smooth the fluctuation of the output and the residual load of the water-wind-light complementary system from a short-term angle, and simultaneously maximize the economic benefit of the water-wind-light complementary system in the middle-long term and short-term mutual feedback.

6) The invention provides a long-term and short-term coupling optimization scheduling model in a water-wind-solar complementary system which is constructed by adopting a Q-learning algorithm. Because the short-term prediction of the water and wind power is accurate and the occupied memory is small when the Q-learning algorithm is operated, the traditional reinforcement learning Q-learning algorithm is adopted to directly carry out the definite calculation, so that the calculation cost is saved under the condition of ensuring the calculation precision and speed, and the method has better effectiveness on solving the scheduling strategy of the short-term optimization scheduling model of the water and wind power complementary system.

7) The invention builds a real-time dispatching simulation model for compensating wind-light prediction deviation by using the step reservoir of the hybrid pumped storage power station, and is used for solving the problem that the deviation occurs between the actual output and the predicted output caused by inaccurate wind-light prediction. The step reservoir with the hybrid pumped storage power station has strong regulation capability and high scheduling flexibility, and the step reservoir with the hybrid pumped storage power station compensates the water, wind and light prediction deviation, so that the plan can be smoothly implemented in the future, the real-time operation is safer and more reliable, and the reservoir benefit is fully exerted.

Description of the drawings:

FIG. 1 is a flow chart of a method for randomly optimizing and dispatching a step reservoir of a wind-light-hybrid pumping and power storage station;

FIG. 2 is a scene cut flow chart of an ISODATA algorithm based on a correlation distance;

FIG. 3 is a medium-long term random optimized scheduling flow chart based on PER-DQN algorithm;

FIG. 4 is a flowchart of a medium-long term and short term coupled scheduling based on a Q-learning algorithm;

fig. 5 is a flow chart of real-time scheduling simulation.

The specific embodiment is as follows:

the invention is described in further detail below with reference to the drawings and the specific examples.

As shown in fig. 1, the wind-light-containing hybrid pumping and storing station cascade reservoir random optimization scheduling method comprises the following steps:

in the step 1, the method for constructing the water, wind and light random scene is as follows:

in the step 2, scene reduction is performed by using an ISODATA algorithm based on the relevant distance, and a method for constructing a long-term scheduling environment in a water-wind-solar complementary system is provided, because in a random scene of the water-wind-solar complementary system, a random value is represented by a long series of time sequence data, and a distance function which is commonly used in a clustering algorithm and represents the distance between two points is difficult to describe the distance between the two time sequence data, when the random scene reduction of the water-wind-solar complementary system is performed, the clustering is performed by using the relevant distance, and the specific process is as follows:

S2.2: calculating the related distance between each sample data and the clustering center, and dividing N samplesAssigning the nearest cluster S _j ；

Wherein, the clustering center is shown as a formula (2):

in the method, in the process of the invention,for correlation of samples in all clusters to their respective cluster centersA distance; n (N) _c The number of the clustering centers; n (N) _j The number of samples in the j-th cluster;The related distance from the sample in the j-th cluster to the cluster center is set;

S2.5: stopping split and merge calculations

s2.6: calculating standard deviation vectors of sample data in the clusters according to the distance values: sigma (sigma) _j ＝(σ _1j ,σ _2j ,…,σ _πj ) ^T The method comprises the steps of carrying out a first treatment on the surface of the Wherein the vector component is represented by formula (5):

S2.7: if sigma _jmax Greater than theta _S And satisfy the followingOr N _j >2(θ _N +1) under any conditions, z will be _j Split into two clusters, and σ _jmax 1 is added. Adding 1 to the iteration times, and turning to S2.1;

s2.8: calculating a correlation distance D between cluster centers _ij ；

in the method, in the process of the invention,is a new cluster center;Is the ith after rearrangement _k The number of samples in the cluster center;Is the j th after rearrangement _k The number of samples in the cluster center;Is the ith _k A cluster center;Is j th _k A cluster center; l is the number of cluster centers left in the process;

s2.12: outputting a clustering center;

in the step 3, the long-term random optimization scheduling model in the water-wind-solar complementary system takes the maximum power generation amount of the system as a target, and schedules the system by taking the day as a scheduling period to ensure that the total power generation amount of the water-wind-solar complementary system is the maximum, and the total power generation amount of each scheduling period of the system is shown as a formula (7):

wherein: e (E) _h,t 、E _p,t Respectively generating capacity and pumping electricity consumption of the reversible water pump turbine set in the t period; p (P) _h,t 、P _P,t Reversible water pump water with t time periods respectivelyThe power generation and pumping power of the wheel generator set; η (eta) _h 、η _p The power generation efficiency and the pumping efficiency of the reversible water pump hydroelectric generating set are respectively; h _t An average power generation head for the period t; q (Q) _h,t 、Q _p,t The flow rates of the power generation working condition and the pumping working condition of the t-period reversible water pump hydroelectric generating set are respectively; η (eta) _wp Is pipeline efficiency; delta T _t Scheduling a time length for a t period; ρ is the density of water; g is gravity acceleration;

step water level constraint, as shown in formula (10):

wherein:respectively the nth grade waterMinimum and maximum limit water levels at the beginning of the bin t period;The water level is the water level of the nth level reservoir at the beginning of the period t; />

The power generation flow rate is constrained as shown in the formula (11):

Wind, light output limit, as shown in equation (13):

in the step 4, the medium-long term and short term coupling optimization scheduling model of the water-wind-solar complementary system uses the system daily output in the medium-long term scheduling strategy to be used as the boundary constraint of the total output deviation of the short term scheduling system to solve the short term scheduling strategy, and the short term scheduling result is fed back to the medium-long term scheduling to update the scheduling process of the medium-long term scheduling solution scheduling strategy; in addition, the short-term scheduling is performed at an hour level, so that the daily residual load mean square error of the water-wind-solar complementary system is minimum, the fluctuation of the system output force is minimum and the square value of the total output force deviation is minimum;

f ₃ ＝(C _d -N _d ) ² (16)

in the step 5, the method for solving the medium-long term random optimization scheduling strategy based on the reinforcement learning PER-DQN algorithm considers the medium-long term random optimization scheduling of the step reservoir containing the hybrid pumping and storing station of wind-solar access as a multi-stage decision problem, takes each scheduling period as each stage, and solves the optimal decision sequence of the problem, namely the optimal scheduling strategy, through the reinforcement learning PER-DQN algorithm;

as shown in expression (19):

wherein: l (θ) is the timing differential error; q (Q) ^k (s _t ,a _t ；θ)、Q ^k+1 (s _t ,a _t The method comprises the steps of carrying out a first treatment on the surface of the θ) respectively representKth, k+1st training state s _t Action a _t When the main neural network parameter is theta, the calculated action value, namely the Q estimated value; q (Q) ^k (s _t+1 ,a _t+1 ；θ ^- ) To be in state s _t+1 Action a _t+1 Under the condition that the fixed target network parameter is theta ^- The calculated action value, namely, the Q target value; gamma is the discount rate, which is used to describe the effect of future benefits on the current; θ is a network parameter of the main neural network; θ ^- Network parameters for a fixed target network; r is (r) _t+1 A benefit, namely an action rewarding value, of t+1 time period;a gradient representing a loss function; α is denoted as a learning rate for determining the degree to which the error is learned;

the specific solving steps are as follows:

s5.9: repeating S5 to the upper limit of training times;

in the step 6, a method for solving a short-term optimization scheduling strategy in a medium-term and short-term coupling optimization scheduling model based on a reinforcement learning Q-learning algorithm is provided, wherein the Q-learning algorithm constructs a time sequence difference value to update a value estimation at the current time by calculating a difference between a value at the next time and a value at the current time, a Q value table stores Q values in all states of each stage of a multi-stage decision problem in an updating process, and an old value is always covered with a better Q value in a training process, so that the Q value is close to an optimal action value, thereby improving decision capability of an agent and acquiring the most available scheduling strategy; aiming at a determined model with a smaller discrete state space, the Q-learning algorithm with a simpler structure can ensure the calculation precision under the condition of smaller occupied memory, thereby saving the calculation cost;

the specific solving steps are as follows:

s6.7: repeating S5 to the upper limit of training times;

In the step 7, a real-time scheduling simulation model of the water-wind-solar complementary system follows the principle of 'electricity-based water determination', monitors the running state of a hydropower station warehouse in real time strictly according to a day-ahead power generation plan, adjusts a model of a water-wind-solar prediction deviation by using a step reservoir of the hybrid pumped storage power station, performs step reservoir scheduling of the hybrid pumped storage power station by updating the water-wind-solar prediction value by using a water-wind-solar observation value according to time intervals, updates the day-ahead scheduling plan by using the scheduled water level as an initial water level of the beginning of the next time interval, and continuously repeats the process to perform rolling update;

The method comprises the following specific steps:

In the step 1, the wind power output refers to output power of a wind generating set that converts kinetic energy of wind into electric energy. Photovoltaic output refers to the output power of a solar cell that directly converts solar energy into electrical energy. The step reservoir inflow refers to the interval inflow between the inflow runoff of the first-stage reservoir and the downstream reservoir of the built series of reservoirs in a step shape from the upstream to the downstream of the river basin. The runoff and the zone water entering the reservoir are rainfall, ice and snow water melting or water flowing along the surface or underground under the action of gravity when the water is poured on the ground. The historical data refers to the historical data of wind power output, photovoltaic output and cascade reservoir water supply. The water-wind-light complementary system is particularly used in the invention: the cascade hydropower station comprises a mixed pumped storage power station and has annual or multi-annual regulation capacity, and a combined power generation system consisting of wind power and photoelectricity. And the hybrid pumped storage power station utilizes a reversible unit in the cascade hydropower station, and uses two adjacent reservoirs of the cascade reservoir as an upper reservoir and a lower reservoir for pumping and storing. The hybrid pumped storage power station is the hybrid pumped storage power station. The medium-and-long-term dispatching refers to a reservoir dispatching mode taking a month as a dispatching cycle and a day as a dispatching period. The water refers to the cascade hydropower station water supply containing the mixed pumped storage power station, the wind refers to the wind power plant output, and the light refers to the photovoltaic power plant output. The hybrid pumped storage power station is the hybrid pumped storage power station.

The randomness analysis refers to a process of describing the probability of occurrence of random time with probability, which is represented by a scheduled time period markov transition probability in the present invention, and is found based on historical data. Markov state transition probabilities are the probabilities of state transitions by random variables in a markov decision process. In the Markov state transition process, the next state is only determined by the last state and has no aftereffect, as shown in the formula (21):

P(X _t+1 ＝x _t+1 |X _t ＝x _t ,X _t-1 ＝x _t-1 ,…)＝P(X _t+1 ＝x _t+1 |X _t ＝x _t ) (21)

wherein t represents a discrete time; x is X _t A state set of random variables at the moment t; x is x _t The state at time t.

The state transition probabilities are in matrix form as shown in equation (22):

wherein: m is the total number of discrete values, where j, k ε [1, M]J is the j-th state at the t-1 moment, and k is the k-th state at the t moment; p (P) _t,jk The state transition probability for the t period, i.e. the conditional probability for the t period to be k under the condition of the state j for the t-1 period (j, k E [1, M)])。

And P is _t,jk The following two conditions are to be satisfied, as shown in formula (23):

wherein: m is the total number of discrete values, where j, k ε [1, M]J is the j-th state at the t-1 moment, and k is the k-th state at the t moment; p (P) _t,jk The state transition probability for the t period, i.e. the conditional probability for the t period to be k under the condition of the state j for the t-1 period (j, k E [1, M) ])。

Element P in state transition probability matrix _jk Solving the formula as shown in formula (24):

wherein: m is the total number of discrete values, where j, k ε [1, M]J is the j-th state at the t-1 moment, and k is the k-th state at the t moment; p (P) _t,jk The state transition probability for the t period, i.e. the conditional probability for the t period to be k under the condition of the state j for the t-1 period (j, k E [1, M)])；f _t,jk The state transition frequency is the frequency of the transition of the runoff state from the j state of the t period to the k state of the t+1 period, and is obtained through statistics of historical data.

In the step 1, the random scenario refers to a long series of time sequence data containing random variable values of each random variable in each scheduling period in the scheduling period.

In the step 2, the correlation distance is a value reflecting the correlation degree between the two series, the smaller the value is, the greater the correlation degree between the two series is, and the greater the value is, the smaller the correlation degree between the two series is.

Because the random value is represented by a long series of time sequence data in the random scene of the water-wind-solar complementary system, the distance between two time sequence data is difficult to describe by adopting a distance function which is commonly used in a clustering algorithm and represents the distance between two points, and therefore, when the random scene of the water-wind-solar complementary system is cut, the clustering is carried out by adopting the related distance.

In the step 2, the ISODATA algorithm is a clustering method, and the ISODATA merges and splits the clustering result based on the traditional clustering algorithm K-means, so that the problem that the setting of the K value depends on experience is solved. And when the number of samples in a certain class of the clustering result is too large, or the variance in the certain class is too large, or the sample class is far smaller than the set class number, splitting is performed.

In the step 2, the scene cut refers to removing the scenes with too small probability and too high similarity from the whole scenes, so as to shorten the calculation time and improve the calculation efficiency.

In the step 2, the scheduling environment refers to a set of random scenes and corresponding markov state transition probabilities thereof, and features of random variables in the water-wind-solar complementary system can be described for solving a scheduling strategy.

In the step 4, the medium-long term and short term coupling scheduling means: and taking the daily output in the medium-long term scheduling strategy as the total output deviation boundary constraint of the short-term scheduling system to solve the short-term scheduling strategy, feeding back the short-term scheduling result to the medium-long term scheduling, and updating the scheduling process of the medium-long term scheduling solution scheduling strategy. The short-term scheduling refers to a scheduling method using a day as a scheduling period and an hour as a scheduling period.

In the step 4, the reliability of the complementary system is an index for evaluating the capability of the output of the complementary system to meet the load demand, so that the output curve and the load curve are as consistent as possible. Therefore, the invention improves the reliability of the complementary system by setting the objective function with the minimum residual load mean square error in the short-term scheduling model.

In the step 4, the stability of the complementary system is an index for evaluating the stability of the output of the complementary system, and because the wind-light output has volatility, the fluctuation of the wind-light output needs to be smoothed through water and electricity in order to ensure the safe and stable operation of the power system, and the output of the complementary system is more stable. Therefore, the stability of the complementary system is improved by setting the objective function with the minimum fluctuation of the system output force in the short-term scheduling model.

In the step 4, the economy of the complementary system is an index for measuring the power generation benefit of the complementary system, and in order to maximize the power generation benefit of the complementary system, the invention solves the month scheduling strategy by taking the maximum total power generation amount as a target in the medium-long term scheduling model, considers the short term scheduling output deviation penalty, and minimizes the deviation between the short term scheduling output and the daily output which maximizes the total power generation amount of the month, so as to improve the economy of the complementary system.

In the step 4, the short-term scheduling model with the minimum residual load mean square error, the minimum system output fluctuation and the minimum system total output deviation square value as targets is: and (3) enabling the daily residual load mean square error of the water-wind-solar complementary system to be minimum, enabling the fluctuation of the system output to be minimum and enabling the square value of the total output deviation to be minimum through scheduling at the level of each hour.

In the step 5, reinforcement learning is a method for solving a multi-stage decision problem by continuously trial and error and aiming at the maximum long-term benefit. The multi-stage decision problem refers to a problem that a decision process can be divided into a plurality of interrelated stages, and a decision sequence with optimal result is obtained by making decisions at each stage. In the invention, each stage is each scheduling period.

Aiming at the problems of larger sampling randomness and lower learning rate caused by uniform sampling of experience transfer from an experience pool according to the same frequency in the traditional DQN algorithm, the PER-DQN algorithm introduces a priority experience playback mechanism of a summing tree structure to improve the defect, improves the learning efficiency and quickens the convergence rate. The method improves the problem of large discrete state space dimension disaster by means of Q value updating based on the neural network, and has good effectiveness for long-term random optimization scheduling in a water-wind-solar complementary system.

Reinforcement learning elements are: agent, environment, action strategy, benefit signal, cost function. Each phase environment will be in a state where the agent acts as a decision maker in the decision problem, and based on the markov decision process and action strategy, the agent interacts with the environment in discrete time steps to obtain knowledge. Wherein, interaction with the environment refers to the action that the agent completes the decision, i.e. the action, through which the agent makes the decision. Repeating the steps to update the value estimation value of the intelligent body training, maximizing the value estimation value of the cost function, and finally obtaining the optimal strategy. Wherein, the cost function refers to a function describing expected names of values obtained by taking an action under a certain state of the intelligent agent, and the value of the function is also called as Q value. The action policy defines the behavior mode of the agent at a specific time, namely, the decision mode.

In the Markov decision process of reinforcement learning, knowledge acquired by interaction of an agent and an environment is represented by five-membered variable groups (S, A, R, pi and P), namely states, actions, rewards, strategies and state transition probabilities, and a state action sequence is a set of Markov chains with benefit processes.

In said step 5, the PER-DQN algorithm is a modified algorithm which is a Deep Q-network (DQN) algorithm. Aiming at the problems that the conventional DQN algorithm carries out uniform sampling from an experience pool according to the same frequency, so that the sampling randomness is high and the learning rate is low, the PER-DQN algorithm introduces a preferential experience playback mechanism of a summing tree structure to improve the defect.

The priority experience playback aims at measuring the importance of the samples by using the priority, and increasing the sampling weight of the samples with high priority, thereby avoiding the defect of uniform sampling of the DQN algorithm. Priority experience playback the sampling of sample priority is achieved by combining a summation tree (sumtree) algorithm with time difference (temporal difference, TD) errors as priority indicators. The larger the TD error is, the larger the prediction precision is, the higher the sample has higher learning value, and the higher the priority is. Whereas the Sumtree algorithm samples by using a binary tree structure. Each leaf node in the binary tree corresponds to the priority of the sample, and other nodes have no practical significance. The priority of a parent node is the sum of the priorities of its two child nodes, and the priority of the root node is the sum of all the sample priorities. When training data, firstly dividing the sample into a plurality of sections, dividing the priority of the root node by the total sample number in each batch, randomly pumping a number in each section, and determining the final sampling data from the leaf nodes searched from the root node from top to bottom.

In addition, the PER-DQN algorithm is the same as the DQN algorithm, and has the main neural network and the fixed target network which are completely the same in structure, and the parameter and Q value updating process is the same. The neural network is a calculation structure simulating the working process of the biological neural network, and generally consists of an input layer, an output layer and a hidden layer, wherein data is input from the input layer, processed by the hidden layer and output from the output layer, and parameters in the data are adjusted through training to improve the prediction capability. The main neural network and the fixed target network are respectively used for fitting the Q value corresponding to the current action state, namely the Q predicted value, and the Q value corresponding to the next action state, namely the Q target value. The difference is that the main neural network parameter is updated in each Q value updating process, and the fixed target network parameter is copied from the main neural network parameter at regular training times.

And through continuous iteration, namely training of the agent, the decision-making capability of the agent is improved, and the optimal scheduling strategy is obtained.

In the step 5, the medium-long term random optimization scheduling strategy refers to taking the randomness of random variables in the medium-long term model into consideration, regarding the scheduling process as a multi-stage decision problem, and obtaining an optimal decision sequence meeting the model objective function, namely an optimal scheduling strategy by optimizing the scheduling process of the water-wind-solar complementary system. The scheduling process refers to a step reservoir water level adjusting process, and the scheduling strategy is an array containing step reservoir water level values of each scheduling period.

In the step 6, the Q-learning algorithm is a reinforcement learning algorithm for updating the cost function based on the idea of time sequence difference (Temporal Difference, TD) as a core. The Q-learning algorithm constructs a time sequence difference value to update a value estimation value at the current moment by calculating the difference between the value at the next moment and the value at the current moment, a Q value table stores Q values in all states of each stage of a multi-stage decision problem in the updating process, and the old value is always covered with a better Q value in the training process, so that the Q value is close to the optimal action value, the decision capability of an agent is improved, and an optimal scheduling strategy is obtained.

In the step 6, the short-term optimization scheduling policy refers to that the short-term optimization scheduling process is regarded as a multi-stage decision problem, and the short-term scheduling process of the water-wind-solar complementary system is optimized to obtain an optimal decision sequence meeting the model objective function, namely an optimal scheduling policy. The scheduling process refers to a step reservoir water level adjusting process, and the scheduling strategy is an array containing step reservoir water level values of each scheduling period.

In the step 7, the water and electricity are generated by a step reservoir of a mixed pumped storage power station. The water-wind-light prediction deviation refers to the difference between the predicted value and the observed value of the incoming water of the cascade reservoir and the difference between the predicted value and the observed value of wind-light output. The real-time scheduling simulation model of the water-wind-solar complementary system is a model for simulating to follow the principle of 'electricity-based water determination', monitoring the running state of a hydropower station warehouse in real time strictly according to a day-ahead power generation plan and adjusting the water-wind-solar prediction deviation by using a cascade reservoir containing a hybrid pumped storage power station.

In the step 7, the step reservoir dispatching process of the rolling updating hybrid pumped storage power station comprises the following steps: and updating the water-wind-solar predicted value by the water-wind-solar observed value in a time period to carry out step reservoir scheduling of the hybrid pumped storage power station, using the scheduled water level as an initial water level at the beginning of the next time period to update a day-ahead scheduling plan, and continuously repeating the updating process.

In order to embody the advantages of the PER-DQN algorithm compared with the DQN algorithm, the two algorithms are respectively applied to a long-term random optimization scheduling model in a water-wind-solar complementary system for comparison analysis verification. Taking a step reservoir formed by two reservoirs A, B in series as an example, based on the calendar data of the hydrologic years and the first month, calculating to obtain 1000 random scene samples of the wind-light-water wind-light system as algorithm data sets, wherein the proportions of a training set, a verification set and a test set are set to be 70%, 20% and 10%. The model parameters are set in table 1, and the solving results are shown in table 2.

As can be seen from Table 2, the annual energy production of each reservoir is improved by the PER-DQN algorithm compared with the solving result of the DQN algorithm, and the total energy production is increased by 1.87×10 ⁶ kW.h, but the convergence rate of the PER-DQN algorithm is obviously faster, the DQN algorithm converges about 7500 times, and the PER-DQN algorithm can converge about 5000 times. And the solving time is 246.35 minutes,378.64 minutes, the solving speed is increased by 34.93 percent compared with the solving speed of the DQN algorithm. Therefore, the PER-DQN algorithm is improved in both solving quality and speed compared to the DQN algorithm.

TABLE 1 DQN model parameters

Table 2 algorithm optimization result comparison

Claims

1. A wind-light-containing hybrid pumping and storing station cascade reservoir random optimization scheduling method is characterized by comprising the following steps of: it comprises the following steps:

2. The wind-light-containing hybrid pumping and storing station cascade reservoir random optimization scheduling method according to claim 1, wherein the method comprises the following steps: in the step 1, the method for constructing the water, wind and light random scene is as follows:

3. The wind-light-containing hybrid pumping and storage station cascade reservoir random optimization scheduling method according to claim 1, wherein the method comprises the following steps: in the step 2, scene reduction is performed by using an ISODATA algorithm based on the relevant distance, and a method for constructing a long-term scheduling environment in a water-wind-solar complementary system is provided, because in a random scene of the water-wind-solar complementary system, a random value is represented by a long series of time sequence data, and a distance function which is commonly used in a clustering algorithm and represents the distance between two points is difficult to describe the distance between the two time sequence data, when the random scene reduction of the water-wind-solar complementary system is performed, the clustering is performed by using the relevant distance, and the specific process is as follows:

S2.1: inputting random variable time sequence data; preselection of N _c Initial cluster center { Z } ₁ ,Z ₂ ,…,Z _Nc Number of expected cluster centers K, minimum number of samples θ in each cluster domain _N (less than this number, not as an independent cluster), standard deviation θ of sample distance distribution in a cluster domain _S Minimum distance θ between two cluster centers _c (if it is smaller than this, twoThe clusters need to be combined), the maximum logarithm L of the cluster centers which can be combined in one iteration operation and the times I of the iteration operation;

Wherein, the clustering center is shown as a formula (2):

s2.5: stopping split and merge calculations

s2.6: calculating standard deviation vectors of sample data in the clusters according to the distance values: sigma (sigma) _j ＝(σ _1j ,σ _2j ,…,σπ _j ) ^T The method comprises the steps of carrying out a first treatment on the surface of the Wherein the vector component is represented by formula (5):

wherein i represents the sample feature vector dimension; j represents the serial number of the cluster; n (N) _j Representing the total number of samples in the j-th cluster; x is X _ik Representing the kth sample in the ith sample feature vector dimension; z is Z _ij Representing a jth cluster center in the dimension of the ith sample feature vector; and get sigma _j Maximum value sigma of (a) _jmax ；

S2.7: if it isσ _jmax Greater than theta _S And satisfy the followingOr N _j >2(θ _N +1) under any conditions, z will be _j Split into two clusters, and σ _jmax Adding 1, adding 1 to the iteration times, and converting to S2.1;

s2.8: calculating a correlation distance D between cluster centers _ij ；

in the method, in the process of the invention,is a new cluster center; n (N) _ik Is the ith after rearrangement _k The number of samples in the cluster center; n (N) _jk Is the j th after rearrangement _k The number of samples in the cluster center; z is Z _ik Is the ith _k A cluster center; z is Z _jk Is j th _k A cluster center; l is the number of cluster centers left in the process;

s2.12: outputting a clustering center;

4. The wind-light-containing hybrid pumping and storage station cascade reservoir random optimization scheduling method according to claim 1, wherein the method comprises the following steps: in the step 3, the long-term random optimization scheduling model in the water-wind-solar complementary system takes the maximum power generation amount of the system as a target, and schedules the system by taking the day as a scheduling period to ensure that the total power generation amount of the water-wind-solar complementary system is the maximum, and the total power generation amount of each scheduling period of the system is shown as a formula (7):

step water level constraint, as shown in formula (10):

The power generation flow rate is constrained as shown in the formula (11):

wind, light output limit, as shown in equation (13):

5. The wind-light-containing hybrid pumping and storage station cascade reservoir random optimization scheduling method according to claim 1, wherein the method comprises the following steps: in the step 4, the medium-long term and short term coupling optimization scheduling model of the water-wind-solar complementary system uses the system daily output in the medium-long term scheduling strategy to be used as the boundary constraint of the total output deviation of the short term scheduling system to solve the short term scheduling strategy, and the short term scheduling result is fed back to the medium-long term scheduling to update the scheduling process of the medium-long term scheduling solution scheduling strategy; in addition, the short-term scheduling is performed at an hour level, so that the daily residual load mean square error of the water-wind-solar complementary system is minimum, the fluctuation of the system output force is minimum and the square value of the total output force deviation is minimum;

f ₃ ＝(C _d -N _d ) ² (16)

6. The wind-light-containing hybrid pumping and storage station cascade reservoir random optimization scheduling method according to claim 1, wherein the method comprises the following steps: in the step 5, the method for solving the medium-long term random optimization scheduling strategy based on the reinforcement learning PER-DQN algorithm considers the medium-long term random optimization scheduling of the step reservoir containing the hybrid pumping and storing station of wind-solar access as a multi-stage decision problem, takes each scheduling period as each stage, and solves the optimal decision sequence of the problem, namely the optimal scheduling strategy, through the reinforcement learning PER-DQN algorithm;

as shown in expression (19):

the specific solving steps are as follows:

s5.9: repeating S5 to the upper limit of training times;

7. The wind-light-containing hybrid pumping and storage station cascade reservoir random optimization scheduling method according to claim 1, wherein the method comprises the following steps: in the step 6, a method for solving a short-term optimization scheduling strategy in a medium-term and short-term coupling optimization scheduling model based on a reinforcement learning Q-learning algorithm is provided, wherein the Q-learning algorithm constructs a time sequence difference value to update a value estimation at the current time by calculating a difference between a value at the next time and a value at the current time, a Q value table stores Q values in all states of each stage of a multi-stage decision problem in an updating process, and an old value is always covered with a better Q value in a training process, so that the Q value is close to an optimal action value, thereby improving decision capability of an agent and acquiring the most available scheduling strategy; aiming at a determined model with a smaller discrete state space, the Q-learning algorithm with a simpler structure can ensure the calculation precision under the condition of smaller occupied memory, thereby saving the calculation cost;

wherein: q (Q) ^k (s _t ,a _t )、Q ^k+1 (s _t ,a _t ) Represent the kth and k+1 training states s respectively _t Downward actingAct as a _t The action value obtained later is Q value; q (Q) ^k (s _t+1 ,a _t+1 ) Training state s at the kth time _t+1 Action a _t+1 The action value obtained is Q value;markov transition probabilities for adjacent time period step reservoir runoff states; r is (r) _t+1 A benefit, namely an action rewarding value, of t+1 time period; s is(s) _t 、s _t+1 The states of random variables in the t period and the t+1 period respectively; a, a _t 、a _t+1 Actions taken for the t period, t+1 period, respectively; s random variable state sets; delta _t Representing a timing differential error; alpha is a learning rate for determining the degree to which the error is learned;

the specific solving steps are as follows:

S6.7: repeating S5 to the upper limit of training times;

8. The wind-light-containing hybrid pumping and storage station cascade reservoir random optimization scheduling method according to claim 1, wherein the method comprises the following steps: in the step 7, a real-time scheduling simulation model of the water-wind-solar complementary system follows the principle of 'electricity-based water determination', monitors the running state of a hydropower station warehouse in real time strictly according to a day-ahead power generation plan, adjusts a model of a water-wind-solar prediction deviation by using a step reservoir of the hybrid pumped storage power station, performs step reservoir scheduling of the hybrid pumped storage power station by updating the water-wind-solar prediction value by using a water-wind-solar observation value according to time intervals, updates the day-ahead scheduling plan by using the scheduled water level as an initial water level of the beginning of the next time interval, and continuously repeats the process to perform rolling update;

The method comprises the following specific steps: