CN112054561B - Wind power-pumped storage combined system daily random dynamic scheduling method based on SARSA (lambda) algorithm - Google Patents

Wind power-pumped storage combined system daily random dynamic scheduling method based on SARSA (lambda) algorithm Download PDF

Info

Publication number
CN112054561B
CN112054561B CN202010973224.6A CN202010973224A CN112054561B CN 112054561 B CN112054561 B CN 112054561B CN 202010973224 A CN202010973224 A CN 202010973224A CN 112054561 B CN112054561 B CN 112054561B
Authority
CN
China
Prior art keywords
wind power
algorithm
time period
state
pumped storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010973224.6A
Other languages
Chinese (zh)
Other versions
CN112054561A (en
Inventor
李文武
郑凯新
刘江鹏
石强
余跃
赵迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Three Gorges University CTGU
Original Assignee
China Three Gorges University CTGU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Three Gorges University CTGU filed Critical China Three Gorges University CTGU
Priority to CN202010973224.6A priority Critical patent/CN112054561B/en
Publication of CN112054561A publication Critical patent/CN112054561A/en
Application granted granted Critical
Publication of CN112054561B publication Critical patent/CN112054561B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/46Controlling of the sharing of output between the generators, converters, or transformers
    • H02J3/466Scheduling the operation of the generators, e.g. connecting or disconnecting generators to meet a given demand
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/007Arrangements for selectively connecting the load or loads to one or several among a plurality of power lines or power sources
    • H02J3/0075Arrangements for selectively connecting the load or loads to one or several among a plurality of power lines or power sources for providing alternative feeding paths between load and source according to economic or energy efficiency considerations, e.g. economic dispatch
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/28Arrangements for balancing of the load in a network by storage of energy
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/10Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • H02J2300/28The renewable source being wind energy
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/40Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation wherein a plurality of decentralised, dispersed or local energy generation technologies are operated simultaneously
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E70/00Other energy conversion or management systems reducing GHG emissions
    • Y02E70/30Systems combining energy storage with energy generation of non-fossil origin
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Power Engineering (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Operations Research (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention provides a wind power-pumped storage combined system daily random dynamic scheduling method based on an SARSA (lambda) algorithm, which comprises the following steps: firstly, the randomness of wind power output is considered, and the probability distribution of the wind power output is represented by Beta distribution; secondly, establishing a daily random dynamic scheduling model of the wind power-storage combination system considering the time-of-use electricity price; and finally, introducing a multi-step time sequence differential SARSA (lambda) algorithm in reinforcement learning into model solution, learning through historical scene data, and continuously trial and error accumulating experience. The method provides a new idea for solving the multi-stage decision problem of wind storage combined optimization scheduling considering randomness, and improves the solution efficiency while obtaining the optimization scheduling target.

Description

Wind power-pumped storage combined system daily random dynamic scheduling method based on SARSA (lambda) algorithm
Technical Field
The invention belongs to the problems of water resource recycling and natural water collection and distribution in water-saving activities in the field of energy-saving and environment-friendly industries, and solves the problems by adopting a reinforcement learning method in big data. Relates to a daily random dynamic scheduling method of a wind power-pumped storage combined system based on a reinforcement learning SARSA (lambda) algorithm.
Background
Wind power generation is widely used worldwide today as the energy industry steps into high-quality development. Meanwhile, the randomness and the volatility of the wind power bring challenges to the operation scheduling of a power grid, and how to control the power characteristics when the wind power is connected to the grid becomes a difficult problem to be solved urgently by efficiently consuming large-scale wind power.
Along with the development of energy storage technology, the pumped storage power station has the characteristics of flexible response and rapid start and stop as an energy storage device with mature technology and wide application, is provided with the pumped storage power station for a power system, can not only cut peaks and fill valleys, but also provide dynamic services such as rotation standby, load tracking, phase modulation, frequency control and the like, improves the static stability and the dynamic stability of the system, brings considerable benefits for the system, and ensures the safe and stable operation of a power grid. The wind power and the pumped storage power station are combined to optimize operation, so that the wind power operation benefit can be effectively improved, the wind power grid connection limitation is reduced, and considerable economic benefit is obtained.
In the existing method, for the dispatching of the wind power-pumped storage combined system, a traditional random dynamic programming algorithm is adopted, and the technical problems of poor dispatching effect and low efficiency exist.
Disclosure of Invention
The invention provides a wind power-pumped storage combined system daily random dynamic scheduling method based on an SARSA (lambda) algorithm, which is used for solving or at least partially solving the technical problems of poor scheduling effect and low efficiency in the prior art.
In order to solve the technical problem, the invention provides a wind power-pumped storage combined system daily random dynamic scheduling method based on an SARSA (lambda) algorithm, which comprises the following steps:
s1: describing the randomness of wind power output;
s2: according to the randomness of wind power output and the time-of-use electricity price, a daily random dynamic scheduling model of the wind power-storage combination system is constructed:
Figure BDA0002684848080000021
in the formula: t is the number of time segments in a period; rtAn index function for a period t; vtThe storage capacity of an upper reservoir of the pumped storage power station at the beginning of the t time period; pt gdThe power generated by the pumped storage power station in the time period t is pumped when the power is less than 0 and is used for generating power when the power is more than 0; rt、Pt gdThe expression of (a) is as follows:
Figure BDA0002684848080000022
in the formula: ctThe peak-valley electricity price corresponding to the t time period; after a wind power prediction error distribution function curve in the t time period is dispersed into N values, the corresponding power is
Figure BDA0002684848080000023
Corresponding probability is pt,i;GhThe cost required for starting and stopping a single unit in a pumped storage power station, ntThe number of the units which are turned on/off in the t time period of the pumped storage power station; when the pumped storage power station unit is in a power generation state in the period of t,
Figure BDA0002684848080000024
is 1, otherwise is 0; p ist gThe generated output corresponding to the unit at the time t; when the pumped-storage power station unit is in the motor state in the period of t,
Figure BDA0002684848080000025
is 1, otherwise is 0, Pt dThe pumping power corresponding to the unit in the time period t;
s3: determining constraint conditions of a daily random dynamic scheduling model of the wind power-storage and pumping combined system;
s4: and solving a daily random dynamic scheduling model of the wind power-storage combination system by adopting an SARSA (lambda) algorithm in reinforcement learning to obtain a scheduling result.
In one embodiment, S1 specifically includes:
the probability density function of the wind power prediction error is expressed by adopting Beta distribution, and the expression is as follows:
Figure BDA0002684848080000026
in the formula: x is a wind power output prediction error; the a and the b are Beta distribution shape parameters, Beta distributions with different shapes can be obtained by changing the values of the a and the b, and the positive bias or the negative bias which possibly occurs in the wind power output prediction error is met; wherein B (a, B) is represented by:
Figure BDA0002684848080000027
acquiring and sorting historical data of the wind power plant to obtain the prediction error frequency distribution of the wind power plant, and calculating shape parameters a and b of Beta distribution according to the mean value and variance of prediction errors, wherein the calculation equation is as follows:
Figure BDA0002684848080000031
in the formula: mu is the mean value of the prediction errors; σ is the standard deviation of the prediction error.
In one embodiment, the constraints in S3 include:
(1) reservoir capacity constraint:
Vmin≤Vt≤Vmax
in the formula: vmin、VmaxRespectively the minimum and maximum storage capacity, V, available for the upper reservoir of the pumped storage power stationtActual storage capacity available for an upper reservoir of the pumped storage power station at the time t;
(2) and (3) constraint of the change amount of the reservoir capacity in the first and last periods of each day:
V24-V0=0
wherein the pumped storage reservoir is regulated day by day, the reservoir capacities of the reservoirs in the first and last periods of the day are equal, V24、V0The storage capacities of the reservoirs at 24 hours and 0 hours are shown, respectively;
(3) power generation and pumping power constraint:
Figure BDA0002684848080000032
in the formula:
Figure BDA0002684848080000033
respectively the upper and lower limits of the generated output, P, of the pumped storage power station unitt gActual generated output of the pumped storage power station unit is t time period;
the constraint of the pumping power is as follows:
Pt d=Pdkt
in the formula: pt dActual pumping power, P, of a single unit of the pumped storage power station at the time period of tdRated pumping power k for a single unit of a pumped storage power stationtThe total number of the water pumping units operated in the time period t;
(4) and (3) drawing and sending mutual exclusion constraint:
Figure BDA0002684848080000034
the constraint indicates that the pumped storage power station unit can not be in the power generation and water pumping states at the same time in the same time period,
Figure BDA0002684848080000035
the mark of whether the pumped storage power station unit is in the power generation state or not in the time period t,
Figure BDA0002684848080000036
the mark indicates whether the pumped storage power station unit is in a motor state or not at the time t;
(5) and (3) total station number constraint of the unit:
Figure BDA0002684848080000041
in the formula:
Figure BDA0002684848080000042
the total number of the units in the working state in the time period t is shown, and N is the total number of all the available units of the pumped storage power station.
In one embodiment, the SARSA (λ) algorithm in S4 introduces an E matrix recording the paths and attenuation situations traveled in each round based on the model-independent SARSA algorithm, and a step attenuation coefficient λ, and changes the single-step updating mode of the SARSA algorithm into the round updating mode of the SARSA (λ) algorithm.
In one embodiment, after one step in the round is taken, that is, after one action is selected in the current state, the value of the (S, a) position corresponding to the utility trace function E (S, a) is incremented by 1, and after each action, the value function Q (S, a) and the utility trace function E (S, a) are updated, where Q (S, a) is updated as follows:
Figure BDA0002684848080000043
wherein, alpha is a learning rate and is used for controlling the convergence condition of learning; gamma is an attenuation factor used for reducing the influence of future return on the current strategy; the TD error δ represents the error between the ideal value and the actual value of Q (S, a);
e (S, A) update mode is as follows:
E(S,A)=γλE(S,A)。
in one embodiment, a greedy strategy is adopted in an iterative process of the SARSA (λ) algorithm to select actions in each state, specifically: randomly generating a decimal between 0 and 1, and comparing the decimal with the exploration probability epsilon; if the probability is smaller than epsilon, the system selects actions in a random mode, and the probability of each action selected is the same; if the current state is not less than epsilon, the system selects the known optimal action in the current state, as shown in the following formula:
Figure BDA0002684848080000044
in the formula, AiIs the known optimal strategy in state S.
In one embodiment, S4 specifically includes:
s4.1: processing the historical data of electricity price and the historical data of wind power output, putting the processed historical data into a daily random dynamic scheduling model solution of a wind power-storage combination system for pre-learning, and continuously exploring accumulated experiences in a pre-learning stage to update the element values of a Q value table and an effect trace function E;
s4.2: and performing online learning according to the updated Q value table obtained in the pre-learning stage and the element value of the utility trace function E, and selecting the action with the maximum Q value in the current state according to a greedy strategy.
In one embodiment, S4.1 specifically includes:
s4.1.1: initializing a Q value table, a utility trace function E, iteration times and a learning rate, wherein each element in the Q value table in the initial stage of the SARSA (lambda) algorithm is 0, and the utility trace functions E corresponding to all state actions are 0;
s4.1.2: determining the upper reservoir capacity value corresponding to the current time interval as the first state of the state sequence, and solving the wind power output of the current time interval according to the wind power predicted value under the current time interval and a wind power prediction error probability density function obeying Beta distribution;
s4.1.3: determining a state S corresponding to the storage capacity value of a reservoir on the pumped storage power station at the current time period, selecting a pumping/power generation action through a greedy strategy according to the electricity price at the current time period and the electricity price trend at each time period, and determining the pumping/power generation flow of the reservoir on the pumped storage power station according to the action;
s4.1.4: obtaining a new state S' corresponding to the upper reservoir storage capacity of the pumped storage power station at the next time interval after the action is taken and a reward R obtained by taking the action by a state transfer equation;
s4.1.5: solving the wind power output in the next period;
s4.1.6: selecting a new pumping/generating action by a greedy strategy according to the electricity price of the next time period and the electricity price trend of each time period;
s4.1.7: updating a trace function E (S, A) and a TD error delta according to an updating mode of an SARSA (lambda) algorithm;
s4.1.8: updating the value function Q of the upper reservoir storage capacity state S and the pumping/generating flow A corresponding to all the current time intervals, and attenuating the updated trace function E in S4.1.7;
s4.1.9: and judging whether the algorithm reaches the specified iteration times, if not, making the time period t equal to t +1, and returning to S4.1.2 to continue the iteration.
One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:
the invention provides a daily random dynamic scheduling method of a wind power-pumped storage combined system based on an SARSA (lambda) algorithm, which comprises the steps of firstly considering the randomness of wind power output and establishing a daily random dynamic scheduling model of the wind power-pumped storage combined system considering time-of-use electricity price; and then determining constraint conditions of the model, and finally introducing an SARSA (lambda) algorithm in reinforcement learning into model solution, wherein the method provides a new idea for solving a problem of wind storage combined optimization scheduling considering randomness, analyzes the randomness of wind power output when a wind power-pumped storage combined system is researched for optimization scheduling, can improve the accuracy of prediction, establishes a wind power-pumped storage combined system daily random dynamic scheduling model and constraint conditions considering time-of-use electricity price, can optimize the scheduling effect, and can improve the efficiency of solution and scheduling by adopting the SARSA (lambda) algorithm to solve the model.
Furthermore, the method fully utilizes local historical data of the wind power plant to analyze the randomness of the wind power output, and simultaneously utilizes the historical data to pre-learn the optimization decision capability of the system so as to gradually accumulate experience.
Furthermore, the algorithm is applied to the wind storage combined optimization scheduling problem with randomness for the first time, the Q value table is updated through pre-learning, and the round updating mode of the algorithm is adopted, so that the solving time is greatly shortened, the problem of dimension disaster easily caused by the traditional algorithm is avoided, and meanwhile, the optimal strategy can be learned in the continuous interaction process with the environment, and a high-quality solution is obtained.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is an overall structural diagram of a scheduling method provided by the present invention;
figure 2 is a diagram of a markov decision process of the present invention;
FIG. 3 is a schematic diagram of reinforcement learning according to the present invention;
FIG. 4 is a flow chart of the SARSA (λ) algorithm of the present invention;
FIG. 5 is a flowchart illustrating an application of the SARSA (λ) algorithm in daily random dynamic scheduling of a wind storage combined system according to the present invention.
Detailed Description
The inventor of the application finds out through a great deal of research and practice that: with the gradual improvement and maturity of the theoretical research of wind power-pumped storage optimal scheduling, the factors for researching the problem are more and more complex, and higher requirements are provided for the research method. Considering that a certain error exists in the wind power output prediction, the randomness of the wind power output needs to be analyzed firstly when the optimal scheduling of the wind power-pumped storage combined system is researched, and the prediction accuracy is improved. Therefore, the problem of random optimization scheduling of the wind power-pumped storage combined system is a high-dimensional, multi-stage and nonlinear optimization problem, the constraint of the problem is complex, and more factors are comprehensively considered. When solving the random optimization scheduling problem, commonly used solving algorithms include a particle swarm algorithm, a random dynamic programming algorithm and the like. However, with the expansion of the system scale, the algorithms may have certain limitations in the solving process, for example, the particle swarm algorithm is easy to fall into a local optimal solution in the optimizing process, the local optimal solution needs to be skipped in the calculating process, and a theoretical global optimal solution is difficult to find; although the random dynamic programming algorithm can find out the theoretical optimal solution, dimension disaster easily occurs in the solving process, so that the solving time is too long, and the application is difficult to obtain in practice.
Reinforcement Learning (RL) is an important branch of the field of Machine Learning (ML), and generally includes two parts, Agent and environment, and Agent is not informed what action should be taken in the environment, but rather must be discovered through its own attempts to generate the greatest benefit. Actions tend to affect not only instant benefits, but also the next context, and thus subsequent benefits. Trial and error and delayed gain are the two most important and most significant features of reinforcement learning.
On the basis of the existing research, the invention applies the SARSA (lambda) algorithm in reinforcement learning to the daily random dynamic scheduling model solution of the wind power-storage-extraction combined system for the first time, considers the randomness of wind power output and the Markov decision process, basically updates each element in a Q value table after continuously trial and error accumulating experience in the pre-learning stage, and then puts the elements into online learning, so that the model can obtain high-quality solution and can effectively avoid the problem of dimension disaster. The algorithm provides a new idea for solving the problem of multi-energy complementation with randomness.
In order to achieve the technical effects, the main concept of the invention is as follows:
the novel method for solving the daily random dynamic scheduling problem of the wind storage combined system is provided, the scheduling effect is improved, meanwhile, the solving time can be effectively reduced, and the scheduling efficiency is improved. Firstly, learning an optimal strategy through continuous interaction between reinforcement learning and the environment, and effectively solving the problem of dimension disaster easily caused by the traditional random dynamic programming algorithm; secondly, on the basis of the SARSA algorithm irrelevant to the model, an E matrix for recording the path and attenuation condition of each round and a step attenuation coefficient lambda are introduced, the single-step updating mode of the SARSA algorithm is improved into the round updating mode of the SARSA (lambda) algorithm, meanwhile, the SARSA (lambda) algorithm belongs to an online learning method, and the improvement enables the SARSA (lambda) algorithm to find a high-quality solution more quickly and reduces the solving time. The invention researches the daily random dynamic scheduling problem of the wind-storage combined system, fully utilizes the pumped storage unit of the reservoir in the scheduling period, stores cheap wind power through the pumped storage power station, and sends out the wind power at high price in the peak period, and the economic benefit brought by the system is the maximum on the premise of meeting all constraint conditions. The research is mainly carried out aiming at the following aspects:
1) the randomness of the wind power output is described.
2) And establishing a daily random dynamic scheduling model of the wind power-storage combined system considering the time-of-use electricity price, and determining a target function and a constraint condition according to the maximum economic benefit of the wind power-storage combined system in one day.
3) And applying a reinforcement learning theory to a daily random dynamic scheduling problem of the wind storage combined system to solve, and determining a recursion equation and a state transition equation.
4) And applying the SARSA (lambda) algorithm to model solution to determine a solution process.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
The embodiment of the invention provides a wind power-pumped storage combined system daily random dynamic scheduling method based on an SARSA (lambda) algorithm, which comprises the following steps:
s1: describing the randomness of wind power output;
s2: according to the randomness of wind power output and the time-of-use electricity price, a daily random dynamic scheduling model of the wind power-storage combination system is constructed:
Figure BDA0002684848080000081
in the formula: t is the number of time segments in a period; rtAn index function for a period t; vtThe storage capacity of an upper reservoir of the pumped storage power station at the beginning of the t time period; pt gdThe power generated by the pumped storage power station in the time period t is pumped when the power is less than 0 and is used for generating power when the power is more than 0; r ist、Pt gdThe expression of (a) is as follows:
Figure BDA0002684848080000082
in the formula: ctThe peak-valley electricity price corresponding to the t time period; after a wind power prediction error distribution function curve in the t time period is dispersed into N values, the corresponding power is
Figure BDA0002684848080000083
Corresponding probability is pt,i;GhThe cost required for starting and stopping a single unit in a pumped storage power station, ntThe number of the units which are turned on/off in the t time period of the pumped storage power station; when the pumped storage power station unit is in a power generation state in the period of t,
Figure BDA0002684848080000084
is 1, otherwise is 0; pt gThe generated output corresponding to the unit at the time t; when the pumped-storage power station unit is in the motor state in the period of t,
Figure BDA0002684848080000085
is 1, otherwise is 0, Pt dThe pumping power corresponding to the unit in the time period t;
s3: determining constraint conditions of a daily random dynamic scheduling model of the wind power-storage and pumping combined system;
s4: and solving a daily random dynamic scheduling model of the wind power-storage combination system by adopting an SARSA (lambda) algorithm in reinforcement learning to obtain a scheduling result.
Specifically, the invention provides a wind power-pumped storage combined system daily random dynamic scheduling method based on SARSA (lambda) algorithm, which comprises the steps of firstly describing the randomness of wind power output; then establishing a daily random dynamic scheduling model of the wind power-storage combination system considering the time-of-use electricity price; then determining constraint conditions of the model; finally, introducing the SARSA (lambda) algorithm into model solution, specifically comprising determining a state transition equation and a recursion equation of the model, and determining a solution process of the SARSA (lambda) algorithm; in the specific implementation process, the method comprises two processes of pre-learning and online learning, and the method comprises the steps of firstly processing historical data of electricity price and historical data of wind power output and then putting the processed historical data into model solution for pre-learning; after the pre-learning is finished, putting the pre-learning into online learning. The overall structure of the process is shown in FIG. 1.
According to the method provided by the invention, on one hand, the randomness of the wind power output is considered, the precision of the predicted value of the wind power output is ensured, and the effect of the model is more in line with the reality.
On the other hand, the time-of-use electricity price is considered, the electricity utilization peak period is usually at noon and night, the wind power output is small at the moment, and the reservoir is used for pumping water to generate electricity; at night in the electricity consumption valley period, the wind power output is large, and redundant wind power is stored through reservoir energy storage; the daily income of the combined system can be maximized, the effects of peak clipping and valley filling can be achieved, and the problem of wind power consumption is effectively solved.
In one embodiment, S1 specifically includes:
the probability density function of the wind power prediction error is expressed by adopting Beta distribution, and the expression is as follows:
Figure BDA0002684848080000091
in the formula: x is a wind power output prediction error; the a and the b are Beta distribution shape parameters, Beta distributions with different shapes can be obtained by changing the values of the a and the b, and the positive bias or the negative bias which possibly occurs in the wind power output prediction error is met; wherein B (a, B) is represented by:
Figure BDA0002684848080000092
acquiring and sorting historical data of the wind power plant to obtain the prediction error frequency distribution of the wind power plant, and calculating shape parameters a and b of Beta distribution according to the mean value and variance of prediction errors, wherein the calculation equation is as follows:
Figure BDA0002684848080000093
in the formula: mu is the mean value of the prediction error; σ is the standard deviation of the prediction error.
Specifically, the randomness of the wind power output is determined by the time-varying property of the wind speed. For the in-day prediction of a large-scale wind power plant, the probability density function of the wind power prediction error is represented by Beta distribution.
And integrating to obtain a corresponding prediction error distribution function after obtaining the wind power output prediction error probability density function based on Beta distribution. Specifically, after the prediction error distribution function is obtained, the prediction error distribution function is dispersed into n values, the corresponding abscissa is the prediction error, the ordinate is the corresponding probability, and the expected value of the wind power output in the t time period can be obtained by combining the predicted value of the wind power output in the t time period. Mainly used for later solving wind power output expected value, namely p in modelt,i,
Figure BDA0002684848080000101
And a prize value formula RtP in (1)t w*
And combining the predicted value of the wind power output at each time interval of the next day to obtain the wind power output at the time interval t:
Figure BDA0002684848080000102
in the formula: predicted value P of wind power output in t time periodt wIs a determined value; prediction error of wind power output in t time period
Figure BDA0002684848080000103
Is a random variable; therefore, the wind power output in the t time period
Figure BDA0002684848080000104
The wind power output at each time interval forms a wind power output sequence, namely the expected value of the wind power output at each time interval is the embodiment of the randomness of the wind power output.
In one embodiment, the constraints in S3 include:
(1) reservoir capacity constraint:
Vmin≤Vt≤Vmax
in the formula: vmin、VmaxRespectively the minimum and maximum storage capacity, V, available for the upper reservoir of the pumped storage power stationtActual storage capacity available for an upper reservoir of the pumped storage power station at the time t;
(2) and (3) constraint of the change amount of the reservoir capacity in the first and last periods of each day:
V24-V0=0
wherein the pumped storage reservoir is regulated day by day, the reservoir capacities of the reservoirs in the first and last periods of the day are equal, V24、V0The storage capacities of the reservoirs at 24 hours and 0 hours are shown, respectively;
(3) power generation and pumping power constraint:
Figure BDA0002684848080000105
in the formula:
Figure BDA0002684848080000106
respectively the upper and lower limits of the generated output, P, of the pumped storage power station unitt gActual generated output of the pumped storage power station unit is t time period;
the constraint of the pumping power is as follows:
Pt d=Pdkt
in the formula: pt dActual pumping power of a single unit of a pumped storage power station for a period of t, PdRated pumping power, k, for a single unit of a pumped storage power stationtThe total number of the water pumping units operated in the time period t;
(4) and (3) drawing and sending mutual exclusion constraint:
Figure BDA0002684848080000111
the constraint indicates that the pumped storage power station unit can not be in the power generation and water pumping states at the same time in the same time period,
Figure BDA0002684848080000112
the mark of whether the pumped storage power station unit is in the power generation state or not in the time period t,
Figure BDA0002684848080000113
the mark indicates whether the pumped storage power station unit is in a motor state or not at the time t;
(5) and (3) total station number constraint of the unit:
Figure BDA0002684848080000114
in the formula:
Figure BDA0002684848080000115
the total number of the units in the working state in the time period t is shown, and N is the total number of all the available units of the pumped storage power station.
In one embodiment, the SARSA (λ) algorithm in S4 introduces an E matrix recording the paths and attenuation in each round and a step attenuation coefficient λ based on the model-independent SARSA algorithm, and changes the single-step updating mode of the SARSA algorithm into the round updating mode of the SARSA (λ) algorithm.
Specifically, a round refers to the number of steps in the reinforcement learning algorithm that are taken from the first step to the final reward.
E, matrix: SARSA (λ) is a round update algorithm, and the matrix E is introduced to save each step in the round path, i.e. an action is selected in each state, and the value of the element in the E matrix is larger by adding 1 to the corresponding (state, action) position in the E matrix, which indicates that the more times this step is performed, the more important this step is in the integration round. The matrix E is the utility trace function E.
In one embodiment, after one step in the round is taken, that is, after one action is selected in the current state, the value of the (S, a) position corresponding to the utility trace function E (S, a) is incremented by 1, and after each action, the value function Q (S, a) and the utility trace function E (S, a) are updated, where Q (S, a) is updated as follows:
Figure BDA0002684848080000116
wherein, alpha is a learning rate and is used for controlling the convergence condition of learning; gamma is an attenuation factor used for reducing the influence of future return on the current strategy; the TD error δ represents the error between the ideal value and the actual value of Q (S, a);
e (S, A) update mode is as follows:
E(S,A)=γλE(S,A)。
specifically, reinforcement learning is a method for learning the best strategy by continuously interacting with the environment, and besides an Agent and the environment, a reinforcement learning system has four core elements: policies, revenue signals, cost functions, and (optionally) models built to the environment. A policy defines the way a learning Agent behaves at a particular time, which is a mapping of environment states to actions. The revenue signal defines the goal in the reinforcement learning problem and is the important basis for changing the strategy, which indicates what actions are good in a short time. The cost function shows what is good in the long term, and the value of a state is that an Agent starts from the state and expects the total income accumulated in the future. While revenue determines the immediate, inherent appeal of environmental conditions, value represents a long-term expectation of all possible conditions to follow. If the environment is modeled, the environment model is used for planning, i.e. before the real experience, the environment model considers various situations that may occur in the future to decide what action to take in advance. The approach of using environmental models and planning to solve reinforcement learning problems is referred to as a modeled approach. A simple modeless approach is to try and error directly and select the best action by accumulating experience over time. The reinforcement learning process is illustrated in fig. 3.
The SARSA (lambda) algorithm is a multi-step time sequence difference online control algorithm irrelevant to a model, and is an improved version of the traditional SARSA algorithm. The core of the SARSA algorithm is the current state S, the current action A, the reward R obtained after taking action, the next state S entered after taking action, and the action A taken in the next state. The difference is that the updating mode of the SARSA algorithm belongs to a single step updating mode, and the SARSA (λ) algorithm belongs to a round updating mode, and in order to embody the round updating mode, the SARSA (λ) algorithm introduces a utility trace function E (S, a) and a step attenuation coefficient λ. The utility trace function E (S, a) is used to record the path taken and the attenuation in each round.
Referring to fig. 2 and fig. 3, fig. 2 is a diagram of a markov decision process, and fig. 3 is a diagram of reinforcement learning according to the present invention.
In one embodiment, a greedy strategy is adopted in an iterative process of the SARSA (λ) algorithm to select actions in each state, specifically: randomly generating a decimal between 0 and 1, and comparing the decimal with the exploration probability epsilon; if the probability is smaller than epsilon, the system selects actions in a random mode, and the probability of each action selected is the same; if the current state is not less than epsilon, the system selects the known optimal action in the current state, as shown in the following formula:
Figure BDA0002684848080000121
in the formula, AiIs the known optimal strategy in state S.
Specifically, the flow of the SARSA (λ) algorithm is summarized as follows:
Input:
iteration times T, a state set S, an action set A, a learning rate alpha, an attenuation factor gamma, an exploration probability epsilon and a pace attenuation coefficient lambda.
Start
And (4) randomly initializing values Q corresponding to all states and actions, wherein each element in the Q value table in the initial stage is 0.
For i from 1to T:
a) And initializing the utility trace function E corresponding to all state actions to be 0, and initializing S to be the first state of the current state sequence. Setting A as the action selected by the epsilon-greedy strategy under the current S.
b) After state S performs action A, a new state S' and reward R are obtained.
c) A 'under state S' is selected by an epsilon-greedy strategy.
d) Update utility trace function E (S, a) and TD error δ:
E(S,A)=E(S,A)+1
δ=R+γQ(S′,A′)-Q(S,A)
e) updating a cost function Q (S, A) and a utility trace function E (S, A) for all the states S and corresponding actions A of the current sequence:
Q(S,A)=Q(S,A)+αδE(S,A)
E(S,A)=γλE(S,A)
f)S=S′,A=A′
g) and c, judging whether the iteration times are reached, if so, finishing the iteration, otherwise, turning to the step b).
End
Output:
And Q value Q corresponding to all the states and actions in the Q value table.
In the algorithm flow, in order to ensure that the action cost function Q can converge, the learning rate α generally needs to be gradually reduced as the iteration progresses. The flow chart of the SARSA (lambda) algorithm is shown in FIG. 4.
In one embodiment, S4 specifically includes:
s4.1: processing the historical data of electricity price and the historical data of wind power output, putting the processed historical data into a daily random dynamic scheduling model solution of a wind power-storage combination system for pre-learning, and continuously exploring accumulated experiences in a pre-learning stage to update the element values of a Q value table and an effect trace function E;
s4.2: and performing online learning according to the updated Q value table obtained in the pre-learning stage and the element value of the utility trace function E, and selecting the action with the maximum Q value in the current state according to a greedy strategy.
Specifically, the Q value table in the initial stage of the SARSA (λ) algorithm and the element value of the utility trace function E are both 0, and if the method is directly put into online learning, a large amount of exploration needs to be performed during initial iteration, and most of the actions selected in each state are random and are not the optimal strategy. Therefore, before online learning, the historical data of electricity price and the historical data of wind power output are processed and then put into model solution for pre-learning. And continuously exploring accumulated experience in a pre-learning stage, updating the Q value table and the element value of the utility trace function E, and putting the updated Q value table and the element value into online learning, so that the system basically has the capability of providing the optimal action strategy in the initial stage.
After the pre-learning is finished, putting the pre-learning into online learning. At this time, the elements in the Q value table of the system are basically updated, a better strategy can be basically given at each stage, the exploration probability value at this time is larger, and the system selects the action with the maximum Q value at the current state according to a greedy strategy. Through the steps, the SARSA (lambda) algorithm can ensure to obtain a high-quality solution with the maximum economic benefit in the solution of the daily random dynamic scheduling problem of the wind storage combined system, can effectively reduce the solution time, and avoids the problem of dimension disaster.
Compared with the traditional optimization method, the SARSA (lambda) algorithm based on reinforcement learning provided by the invention has the following three improvements in solving the problem of wind storage joint random optimization scheduling:
one is as follows: a daily random dynamic scheduling model of the wind storage combined system is established, wherein the daily random dynamic scheduling model takes the daily economic benefit maximization of the wind storage combined system as a target and considers the time-of-use electricity price.
The second step is as follows: by introducing a reinforcement learning theory, the decision-making capability of the system is gradually improved under the continuous interaction of the learning system and the environment, the convergence of the value function Q in a short time is ensured, and the problem of dimension disaster easily occurring in the conventional algorithm is solved.
And thirdly: the SARSA (lambda) algorithm updated by the round system is applied to model solution, so that the solution time is shortened while high-quality solution is guaranteed.
In one embodiment, S4.1 specifically includes:
s4.1.1: initializing a Q value table, a utility trace function E, iteration times and a learning rate, wherein each element in the Q value table in the initial stage of the SARSA (lambda) algorithm is 0, and the utility trace functions E corresponding to all state actions are 0;
s4.1.2: determining an upper water reservoir capacity value corresponding to the current time period as a first state of a state sequence, and solving the wind power output of the current time period according to a wind power predicted value under the current time period and a wind power prediction error probability density function obeying Beta distribution;
s4.1.3: determining a state S corresponding to the storage capacity value of a reservoir on the pumped storage power station at the current time period, selecting a pumping/power generation action through a greedy strategy according to the electricity price at the current time period and the electricity price trend at each time period, and determining the pumping/power generation flow of the reservoir on the pumped storage power station according to the action;
s4.1.4: obtaining a new state S' corresponding to the upper reservoir storage capacity of the pumped storage power station at the next time interval after the action is taken and a reward R obtained by taking the action by a state transfer equation;
s4.1.5: solving the wind power output in the next period;
s4.1.6: selecting a new pumping/generating action by a greedy strategy according to the electricity price of the next time period and the electricity price trend of each time period;
s4.1.7: updating a trace function E (S, A) and a TD error delta according to an updating mode of an SARSA (lambda) algorithm;
s4.1.8: updating the value function Q of the upper reservoir storage capacity state S and the pumping/generating flow A corresponding to all the current time intervals, and attenuating the updated trace function E in S4.1.7;
s4.1.9: and judging whether the algorithm reaches the specified iteration times, if not, making the time period t equal to t +1, and returning to S4.1.2 to continue the iteration.
Specifically, the SARSA (lambda) algorithm is introduced into model solution, and a state transition equation and a recursion equation of the model are determined. The selection of the state variable is crucial, the state variable has to be closely related to the decision variable, the goal can be well reflected through the state variable, the size of the decision variable can be obtained at the same time, and more importantly, the recursion process in the whole recursion process can be well reflected, so that the characteristic of 'no aftereffect' can be met.
The wind storage combined daily random dynamic scheduling problem is a multi-stage decision problem, and when the SARSA (lambda) algorithm of reinforcement learning is adopted for solving, discrete processing needs to be carried out on state variables and decision variables.The state variable is the storage capacity of an upper reservoir of the pumped storage power station, and the water level Z is setiDiscrete as M values from small to large, the corresponding storage capacity is Vi(ii) a The decision variable is the pumping/generating power adopted by the pumped storage power station, the pumping power of a single unit is a fixed value, the pumping power can be dispersed according to the number of the units, and the generating power can be uniformly dispersed according to a fixed power interval. After the state variables and the decision variables are determined, the state transition equation can be obtained as follows:
Figure BDA0002684848080000151
in the formula, Vt、Vt+1The storage capacities of the first and last reservoirs at the time period t are respectively; qcIs the pumping flow rate of the t period, m3/s;QfdIs the generated flow rate of t period, m3S; and delta T is the time for generating electricity/pumping water in the period T.
According to the Bellman principle, under a certain state, maximizing the future reward is equivalent to maximizing the sum of the instant reward and the maximum future reward of the next state, and when the wind power output in each time period is an independent random variable, a recurrence equation can be obtained:
Figure BDA0002684848080000161
in the formula:
Figure BDA0002684848080000162
the expectation value representing the maximized economic benefit from the t time period to the 24 th time period;
Figure BDA0002684848080000163
the economic benefit value of the t time interval is obtained;
Figure BDA0002684848080000164
the expectation value of the maximized economic benefit is (t +1) to 24 time intervals.
In the same state, the rewards obtained by taking different decisions (actions) are different. In each state, the prize value is:
Rt=Ct(Pt w*+Pt gd)-Ghnt
referring to fig. 5, it is shown that in the flow chart of the application of the SARSA (λ) algorithm in daily random dynamic scheduling of the wind storage integrated system, in the specific implementation process, the wind power predicted value at each time interval is known, the scheduling center obtains the wind power predicted value at each time interval of today on the previous day according to the historical wind power data, the current time interval electricity price and the electricity price trend at each time interval are known, and the current time interval electricity price and the electricity price trend at each time interval of each day in the area can be obtained from the historical electricity price trend at each time interval of each day. S4.1.5 the method of solving for wind power in the next time period is the same as step S4.1.2.
The specific implementation examples described in this invention are merely illustrative of the system of the present invention. Those skilled in the art to which the invention relates may make various changes, additions or modifications to the described embodiments (i.e., using similar alternatives), without departing from the principles and spirit of the invention or exceeding the scope thereof as defined in the appended claims. The scope of the invention is only limited by the appended claims.

Claims (7)

1. A wind power-pumped storage combined system daily random dynamic scheduling method based on an SARSA (lambda) algorithm is characterized by comprising the following steps:
s1: describing the randomness of wind power output;
s2: according to the randomness of wind power output and the time-of-use electricity price, constructing a target function of a daily random dynamic scheduling model of the wind power-storage combination system:
Figure FDA0003574228360000011
in the formula: t is the number of time segments in a period; rtAn index function for a period t; vtThe storage capacity of an upper reservoir of the pumped storage power station at the beginning of the t time period; pt gdFor the pumped storage power station to emit in t time periodThe power of less than 0 is water pumping, and the power of more than 0 is power generation; rt、Pt gdThe expression of (a) is as follows:
Figure FDA0003574228360000012
in the formula: ctThe peak-valley electricity price corresponding to the t time period; after the wind power output prediction error distribution function curve in the t time period is dispersed into N values, the corresponding power is
Figure FDA0003574228360000013
Corresponding probability is pt,i;GhThe cost required for starting and stopping a single unit for a pumped storage power station, ntThe number of the units which are turned on/off in the t time period of the pumped storage power station; when the pumped storage power station unit is in a power generation state in the period of t,
Figure FDA0003574228360000014
is 1, otherwise is 0; pt gThe generated output corresponding to the unit at the time t; when the pumped-storage power station unit is in the motor state in the period of t,
Figure FDA0003574228360000015
is 1, otherwise is 0, Pt dThe pumping power corresponding to the unit in the time period t;
s3: determining constraint conditions of a daily random dynamic scheduling model of the wind power-storage and pumping combined system;
s4: solving a daily random dynamic scheduling model of the wind power-storage combination system by adopting an SARSA (lambda) algorithm in reinforcement learning to obtain a scheduling result;
wherein, S4 specifically includes:
s4.1: processing the historical data of electricity price and the historical data of wind power output, putting the processed historical data into a daily random dynamic scheduling model solution of a wind power-storage combination system for pre-learning, and continuously exploring accumulated experiences in a pre-learning stage to update the element values of a Q value table and an effect trace function E;
s4.2: and performing online learning according to the updated Q value table obtained in the pre-learning stage and the element value of the utility trace function E, and selecting the action with the maximum Q value in the current state according to a greedy strategy.
2. The dynamic scheduling method of claim 1, wherein the S1 specifically comprises:
the probability density function of the wind power prediction error is expressed by adopting Beta distribution, and the expression is as follows:
Figure FDA0003574228360000021
in the formula: x is a wind power output prediction error; the a and the b are Beta distribution shape parameters, Beta distributions with different shapes can be obtained by changing the values of the a and the b, and the positive bias or the negative bias which possibly occurs in the wind power output prediction error is met; wherein B (a, B) is represented by:
Figure FDA0003574228360000022
acquiring and sorting historical data of the wind power plant to obtain wind power plant prediction error frequency distribution, and calculating shape parameters a and b of Beta distribution according to the mean value and variance of prediction errors, wherein the calculation equation is as follows:
Figure FDA0003574228360000023
in the formula: mu is the mean value of the prediction error; σ is the standard deviation of the prediction error.
3. The dynamic scheduling method of claim 1 wherein the constraints in S3 include:
(1) reservoir capacity constraint:
Vmin≤Vt≤Vmax
in the formula: vmin、VmaxRespectively the minimum and maximum storage capacity, V, available for the upper reservoir of the pumped storage power stationtActual storage capacity available for an upper reservoir of the pumped storage power station at the time t;
(2) and (3) constraint of the change amount of the reservoir capacity in the first and last periods of each day:
V24-V0=0
wherein the pumped storage reservoir is of a daily regulation type, the reservoir capacities of the reservoirs in the first and last periods of each day are equal, V24、V0The storage capacities of the reservoirs at 24 hours and 0 hours are shown, respectively;
(3) power generation and pumping power constraint:
Figure FDA0003574228360000024
in the formula:
Figure FDA0003574228360000031
respectively the upper and lower limits of the generated output, P, of the pumped storage power station unitt gActual generated output of the pumped storage power station unit is t time period;
the constraint of the pumping power is as follows:
Pt d=Pdkt
in the formula: pt dActual pumping power of a single unit of a pumped storage power station for a period of t, PdRated pumping power, k, for a single unit of a pumped storage power stationtThe total number of the water pumping units operated in the time period t;
(4) and (3) drawing and sending mutual exclusion constraint:
Figure FDA0003574228360000032
the constraint indicates that the pumped storage power station unit can not be in the power generation and water pumping states at the same time in the same time period,
Figure FDA0003574228360000033
the mark of whether the pumped storage power station unit is in the power generation state or not in the time period t,
Figure FDA0003574228360000034
the mark indicates whether the pumped storage power station unit is in a motor state or not at the time t;
(5) and (3) total station number constraint of the unit:
Figure FDA0003574228360000035
in the formula:
Figure FDA0003574228360000036
the total number of the units in the working state in the time period t is shown, and N' is the total number of all the available units of the pumped storage power station.
4. The dynamic scheduling method of claim 1 wherein the SARSA (λ) algorithm in S4 introduces an E matrix recording the paths and fading in each round and a step fading coefficient λ based on the model-independent SARSA algorithm, and changes the single step updating mode of the SARSA algorithm into the round updating mode of the SARSA (λ) algorithm, where the E matrix is a utility trace function.
5. The dynamic scheduling method of claim 4 wherein the SARSA (λ) algorithm increments the value of the (S, A) position corresponding to the utility trace function E (S, A) by 1 after each step in the round, i.e. after an action is selected in the current state, and updates the action cost function Q (S, A) and the utility trace function E (S, A) after each action, the updating of Q (S, A) is as follows:
Figure FDA0003574228360000037
wherein, alpha is a learning rate and is used for controlling the convergence condition of learning; gamma is an attenuation factor used for reducing the influence of future return on the current strategy; TD error delta represents the error between the ideal value and the actual value of Q (S, A), S represents the current state, A represents the current action, R represents the reward obtained after action A is taken, S 'represents the next state entered after action is taken, and A' represents the action taken in the next state;
e (S, A) update mode is as follows:
E(S,A)=γλE(S,A)。
6. the dynamic scheduling method of claim 5, wherein a greedy strategy is adopted in the SARSA (λ) algorithm iteration process to select the actions in each state, specifically: randomly generating a decimal between 0 and 1, and comparing the decimal with the exploration probability epsilon; if the probability is smaller than epsilon, the system selects actions in a random mode, and the probability of each action selected is the same; if the current state is not less than epsilon, the system selects the known optimal action in the current state, as shown in the following formula:
Figure FDA0003574228360000041
in the formula, AiIs the known optimal strategy in state S.
7. The dynamic scheduling method of claim 1, wherein S4.1 specifically comprises:
s4.1.1: initializing a Q value table, a utility trace function E, iteration times and a learning rate, wherein each element in the Q value table in the initial stage of the SARSA (lambda) algorithm is 0, and the utility trace functions E corresponding to all state actions are 0;
s4.1.2: determining an upper water reservoir capacity value corresponding to the current time period as a first state of a state sequence, and solving the wind power output of the current time period according to a wind power predicted value under the current time period and a wind power prediction error probability density function obeying Beta distribution;
s4.1.3: determining a state corresponding to the storage capacity value of the upper reservoir of the pumped storage power station at the current time period, selecting a pumping/power generation action by a greedy strategy according to the current time period power price and the power price trend of each time period, and determining the pumping/power generation flow of the upper reservoir of the pumped storage power station according to the action;
s4.1.4: obtaining a new state corresponding to the upper reservoir storage capacity of the pumped storage power station at the next time interval after the action is taken and the reward obtained by taking the action by a state transfer equation;
s4.1.5: solving the wind power output in the next period;
s4.1.6: selecting a new pumping/generating action by a greedy strategy according to the electricity price of the next time period and the electricity price trend of each time period;
s4.1.7: updating a trace function E (S, A) and a TD error delta according to an updating mode of an SARSA (lambda) algorithm;
s4.1.8: updating the value function Q of the upper reservoir storage capacity state and the pumping/generating flow corresponding to all the current time intervals, and attenuating the updated trace function E in S4.1.7;
s4.1.9: and judging whether the algorithm reaches the specified iteration times, if not, making the time period t equal to t +1, and returning to S4.1.2 to continue the iteration.
CN202010973224.6A 2020-09-16 2020-09-16 Wind power-pumped storage combined system daily random dynamic scheduling method based on SARSA (lambda) algorithm Active CN112054561B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010973224.6A CN112054561B (en) 2020-09-16 2020-09-16 Wind power-pumped storage combined system daily random dynamic scheduling method based on SARSA (lambda) algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010973224.6A CN112054561B (en) 2020-09-16 2020-09-16 Wind power-pumped storage combined system daily random dynamic scheduling method based on SARSA (lambda) algorithm

Publications (2)

Publication Number Publication Date
CN112054561A CN112054561A (en) 2020-12-08
CN112054561B true CN112054561B (en) 2022-06-14

Family

ID=73604365

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010973224.6A Active CN112054561B (en) 2020-09-16 2020-09-16 Wind power-pumped storage combined system daily random dynamic scheduling method based on SARSA (lambda) algorithm

Country Status (1)

Country Link
CN (1) CN112054561B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112581312B (en) * 2020-12-21 2024-03-08 国网陕西省电力公司电力科学研究院 Wind power prediction error distribution analysis method, wind power prediction error distribution analysis device, computer equipment and readable storage medium
CN113437757B (en) * 2021-06-24 2022-08-05 三峡大学 Electric quantity decomposition method of wind-storage combined system based on prospect theory
CN113406939A (en) * 2021-07-12 2021-09-17 哈尔滨理工大学 Unrelated parallel machine dynamic hybrid flow shop scheduling method based on deep Q network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106899026A (en) * 2017-03-24 2017-06-27 三峡大学 Intelligent power generation control method based on the multiple agent intensified learning with time warp thought
CN110365057A (en) * 2019-08-14 2019-10-22 南方电网科学研究院有限责任公司 Distributed energy based on intensified learning participates in power distribution network peak regulation method for optimizing scheduling
CN110880048A (en) * 2019-11-06 2020-03-13 国网湖北省电力有限公司宜昌供电公司 Cascade reservoir ecological random optimization scheduling model and solving method
CN111064229A (en) * 2019-12-18 2020-04-24 广东工业大学 Wind-light-gas-storage combined dynamic economic dispatching optimization method based on Q learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106899026A (en) * 2017-03-24 2017-06-27 三峡大学 Intelligent power generation control method based on the multiple agent intensified learning with time warp thought
CN110365057A (en) * 2019-08-14 2019-10-22 南方电网科学研究院有限责任公司 Distributed energy based on intensified learning participates in power distribution network peak regulation method for optimizing scheduling
CN110880048A (en) * 2019-11-06 2020-03-13 国网湖北省电力有限公司宜昌供电公司 Cascade reservoir ecological random optimization scheduling model and solving method
CN111064229A (en) * 2019-12-18 2020-04-24 广东工业大学 Wind-light-gas-storage combined dynamic economic dispatching optimization method based on Q learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Reinforcement-Learning-Based Energy Storage System Operation Strategies to Manage Wind Power Forecast Uncertainty;EUNSUNG OH et al.;《IEEE ACCESS》;20200123;第20965-20976页 *
基于SARSA算法的水库长期随机优化调度研究;李文武等;《水电能源科学》;20180930;第36卷(第9期);第72-75页 *
基于动态离散电价协约的风电-抽蓄联合日运行优化研究;游文霞等;《水电能源科学》;20151231;第33卷(第12期);第209-214页 *
基于强化学习方法的风储合作决策;刘国静等;《电网技术》;20160930;第40卷(第9期);第2729-2736页 *

Also Published As

Publication number Publication date
CN112054561A (en) 2020-12-08

Similar Documents

Publication Publication Date Title
CN112054561B (en) Wind power-pumped storage combined system daily random dynamic scheduling method based on SARSA (lambda) algorithm
JP2020517227A (en) A short-term practical scheduling method for ultra-large-scale hydropower plants
CN113572157B (en) User real-time autonomous energy management optimization method based on near-end policy optimization
CN105207253B (en) Consider wind-powered electricity generation and the probabilistic AGC stochastic and dynamics Optimization Scheduling of frequency
CN113112077A (en) HVAC control system based on multi-step prediction deep reinforcement learning algorithm
CN112966445B (en) Reservoir flood control optimal scheduling method based on reinforcement learning model FQI
CN116207739B (en) Optimal scheduling method and device for power distribution network, computer equipment and storage medium
CN112508287A (en) Energy storage optimization configuration method based on user side BESS full life cycle
Kowahl et al. Micro-scale smart grid optimization
CN117117878A (en) Power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning
CN113269420B (en) Distributed event-driven power economy scheduling method based on communication noise
CN116258227A (en) Hydropower station optimal scheduling method and system
CN112734277A (en) Multi-level modeling method for demand side response resources with information physical fusion
CN110598925A (en) Energy storage in-trading market decision optimization method based on double-Q learning algorithm
CN118137577A (en) Multi-energy system model predictive control method and system for promoting photovoltaic digestion
CN115528687B (en) Power system flexible response capability optimization method under limited cost constraint
CN115730701B (en) Load prediction method and system suitable for power dispatching equipment in small energy consumption place
CN114091782B (en) Medium-long term power load prediction method
CN116260143A (en) Automatic control method and system for power distribution network switch based on reinforcement learning theory
CN118114901A (en) Reservoir power generation dispatching diagram optimization method
CN115189376A (en) Optimal configuration method and device for wind power plant energy storage and storage medium
CN117473604A (en) Water energy parameter optimization method for pumped storage power station considering standby storage capacity
CN118054446A (en) Energy storage multi-stage optimization configuration method based on improved whale optimization algorithm
Chen et al. Day-ahead Modified Dispatching Model Considering Power System Flexibility
CN118036925A (en) Multi-target random optimization annual ladder-crossing hydropower station optimal scheduling method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant