CN115049292A - Intelligent single reservoir flood control scheduling method based on DQN deep reinforcement learning algorithm - Google Patents
Intelligent single reservoir flood control scheduling method based on DQN deep reinforcement learning algorithm Download PDFInfo
- Publication number
- CN115049292A CN115049292A CN202210741864.3A CN202210741864A CN115049292A CN 115049292 A CN115049292 A CN 115049292A CN 202210741864 A CN202210741864 A CN 202210741864A CN 115049292 A CN115049292 A CN 115049292A
- Authority
- CN
- China
- Prior art keywords
- reservoir
- scheduling
- decision
- learning
- period
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 84
- 230000002787 reinforcement Effects 0.000 title claims abstract description 21
- 238000010248 power generation Methods 0.000 claims abstract description 85
- 230000008569 process Effects 0.000 claims abstract description 67
- 238000013473 artificial intelligence Methods 0.000 claims abstract description 27
- 238000013136 deep learning model Methods 0.000 claims abstract description 10
- 238000005259 measurement Methods 0.000 claims abstract description 6
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 53
- 230000006870 function Effects 0.000 claims description 44
- 230000008901 benefit Effects 0.000 claims description 41
- 239000003795 chemical substances by application Substances 0.000 claims description 26
- 230000009471 action Effects 0.000 claims description 13
- 238000003860 storage Methods 0.000 claims description 12
- 210000004556 brain Anatomy 0.000 claims description 10
- 230000008859 change Effects 0.000 claims description 9
- 238000004088 simulation Methods 0.000 claims description 9
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 238000011156 evaluation Methods 0.000 claims description 5
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000004519 manufacturing process Methods 0.000 claims description 3
- UFHFLCQGNIYNRP-UHFFFAOYSA-N Hydrogen Chemical compound [H][H] UFHFLCQGNIYNRP-UHFFFAOYSA-N 0.000 claims 1
- 229910052739 hydrogen Inorganic materials 0.000 claims 1
- 239000001257 hydrogen Substances 0.000 claims 1
- 238000003066 decision tree Methods 0.000 abstract description 12
- 238000005516 engineering process Methods 0.000 description 9
- 238000013135 deep learning Methods 0.000 description 7
- 101100537309 Arabidopsis thaliana TKPR1 gene Proteins 0.000 description 5
- 101150100920 KTI12 gene Proteins 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 238000011161 development Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 244000017020 Ipomoea batatas Species 0.000 description 2
- 235000002678 Ipomoea batatas Nutrition 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 241000207783 Ipomoea Species 0.000 description 1
- 235000021506 Ipomoea Nutrition 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 238000003973 irrigation Methods 0.000 description 1
- 230000002262 irrigation Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06312—Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A10/00—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
- Y02A10/40—Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- General Physics & Mathematics (AREA)
- Marketing (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Game Theory and Decision Science (AREA)
- Educational Administration (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Primary Health Care (AREA)
- Life Sciences & Earth Sciences (AREA)
- Water Supply & Treatment (AREA)
- Biomedical Technology (AREA)
- Public Health (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Feedback Control In General (AREA)
Abstract
The invention relates to a DQN deep reinforcement learning algorithm-based intelligent single reservoir flood control scheduling method, which comprises the following steps: the method comprises the steps of constructing an artificial intelligence-based reservoir scheduling unsupervised deep learning model, establishing DRL reward feedback based on reservoir power generation scheduling, and establishing artificial intelligence experts for scheduling of a certain reservoir based on an actual reservoir measurement warehousing runoff process. Compared with the optimal power generation scheduling process of dynamic programming solution, the power generation scheduling result of the intelligent single-reservoir flood control scheduling method based on the DQN deep reinforcement learning algorithm is obviously superior to the traditional reservoir power generation scheduling result based on the decision tree, and the unsupervised deep learning model for reservoir scheduling has strong learning capability and decision capability and strong adaptability in reservoir scheduling decision.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a single reservoir intelligent flood control scheduling method based on a DQN deep reinforcement learning algorithm.
Background
In 2016, the success of go AlphaGo activated the potential for artificial intelligence. The emergence of AlphaGo has had a sense of milestone, thereby raising the surge in artificial intelligence development. Under the push of the wave, the development of the core technology of the wave is further accelerated and the wave is derived to other industries. In the Alphago playing process, the postefficiency and the maximum winning probability of each step of decision need to be considered. The most central algorithm in AlphaGo is Deep Reinforcement Learning (DRL), which is suitable for the mode of state vs. decision, and in particular for the decision process with markov properties.
Traditional reinforcement learning theories have been continuously perfected over the last decades, but it is still difficult to solve the complex problems in the real world, especially in multi-state and multi-decision situations. Deep reinforcement learning DRL is a product of a combination of deep learning and reinforcement learning. The DRL integrates the strong comprehension ability of deep Learning on the problems of vision, cognition and the like and the decision-making ability of reinforcement Learning, and forms a brand-new mode of End-to-End Learning (End-to-End Learning) from Perception (Perception) to Action (Action). The mode enables machine learning to have a real meaning of 'autonomous learning'. The DRL technology enables the artificial intelligence technology to be really practical, and enables the artificial intelligence technology to have strong learning and survival ability in complex environments with high-dimensional states and decisions.
In the water conservancy industry, reservoir scheduling has the characteristic of a typical Markov decision process, and scheduling decisions depend on state conditions of reservoir storage, incoming water and the like, so the concepts of reservoir scheduling and a DRL algorithm have high coincidence. If the DRL technology is derived to the water conservancy industry, the reservoir dispatching direction is one of the main battlefields for the application.
Therefore, how to introduce the DRL technology into reservoir scheduling, adapt to reservoir scheduling decisions and determine the optimal control process of reservoir power generation is a problem to be solved.
Disclosure of Invention
In order to overcome the problems in the prior art, the invention provides a single reservoir intelligent flood control scheduling method based on a DQN deep reinforcement learning algorithm. The method is based on a DQN network and a DRL model, adopts reservoir power generation as reward feedback, establishes a reservoir operation control model based on a deep reinforcement learning algorithm, and establishes a reservoir dispatching artificial intelligence expert, thereby determining the optimal control process of the reservoir power generation.
The purpose of the invention is realized as follows:
a single reservoir intelligent flood control scheduling method based on a DQN deep reinforcement learning algorithm comprises the following steps:
step 1: constructing an artificial intelligence-based reservoir scheduling unsupervised deep learning model:
respectively establishing a brain, a memory library and an 'autonomous learning' algorithm module of the Agent by taking a DRL technical architecture as a reference;
the brain of the Agent is constructed by adopting a Deep Q-Network (DQN) algorithm and is provided with a double-layer neural Network, namely AN Action Network (AN) and a Target Network (TN);
the memory base stores the scheduling knowledge generated in the scheduling process, and the scheduling decision of each time interval can form a knowledge;
the value function of the autonomous learning module based on the Bellman equation is continuously increased, so that the Agent decision-making capability is continuously improved; with the increase of the learning times, the learning cost function of this time is embodied by the average value of the cost functions calculated by adjacent k times of learning, and the formula is as follows:
in the formula u k A decision cost function under the condition of a given scheduling state is learned for the kth time; u. of i A decision cost function under the condition of a given scheduling state is learned for the ith time; u shape k An average value function obtained after the current learning is performed for the kth learning; u shape k-1 The average cost function obtained after the current learning is learned for the k-1 th time.
After the autonomous learning, the updating of the cost function is realized by adopting the following formula:
U k (S t ,A t )=(1-α)U k-1 (S t ,A t )+α·u(S t ,A t )
in the formula, S t Is a conditional attribute at the beginning of the t period, A t A decision attribute at the beginning of the t period; alpha is the learning rate; u is a decision value function under the condition of a given scheduling state;
in the decision value estimation of reservoir power generation dispatching, according to the state S t+1 Calculating the U value of each decision in the decision set, and evaluating the decision by adopting an average value modeThe strategy benefit is as follows:
in the formula, R t A decision benefit value obtained for a time period t; s t+1 Is the condition attribute at the end of the t period,the decision attribute at the end of the t period; λ is a discount factor;
after each learning, updating error feedback of Neural Network weight parameters by using a gradient descent method according to the change of the value function, wherein the formula is as follows:
E k =U k -U k-1
in the formula, E k The difference value of the value function of the k-1 and k-th learning is obtained;
step 2: and (3) establishing reward feedback of the DRL on the basis of reservoir power generation scheduling:
evaluating the benefit of the decision according to the state of the current time period and the obtained decision, and feeding back the benefit in a reward mode; wherein, the generated energy and whether the guaranteed output is reached are taken as indexes of benefit evaluation;
and step 3: establishing a scheduling artificial intelligence expert aiming at a certain reservoir based on the actual measurement and warehousing runoff process of the reservoir:
taking actually measured reservoir warehousing runoff information and a corresponding scheduling time period as input states, carrying out autonomous learning through the 'autonomous learning' algorithm module, and deciding reservoir operation in a future time period through the brain of Agent, namely generating output; on the basis, a reservoir power generation dispatching simulation mode is adopted, and the reward of the operation, namely the power generation benefit, is estimated and returned; then, storing the state, operation and benefit of the reservoir into the memory bank in a knowledge mode, starting to learn the knowledge in memory when the memory bank has enough knowledge and meeting the learning condition, then continuously carrying out actual scheduling operation to obtain new knowledge and update the memory bank, and finally enabling the Agent to be gradually mature and become an artificial intelligent expert for reservoir scheduling by circulating the learning-actual scheduling process;
and using the established reservoir dispatching artificial intelligence expert for the reservoir power generation dispatching decision to determine the optimal control process of the reservoir power generation.
Further, in the memory bank, the scheduling decision of the time period t can be the condition attribute (S) at the beginning of the time period t ) Decision attribute (Action), Reward (Reward), and condition attribute at the end of t period (S) t+1 ) Jointly form a piece of knowledge and store the knowledge in a memory base, wherein the formula is as follows:
<S t =(T t ,L t ,Q t ),Reward=R t ,Action=A t ,S t+1 =(T t+1 ,L t+1 ,Q t+1 )>
in the formula, S t 、S t+1 The condition attributes are respectively at the end of the t period and the t period; r t Is the electric energy production benefit punished at the t period; a. the t Is the decision attribute for the t period; t is t 、T t+1 Numbering the scheduling periods in a year; l is t 、L t+1 Controlling the water level of the reservoir at the beginning of the time period t and the time period t +1 respectively; q t 、Q t+1 The total water volume of the reservoir at the end of the time period t and the time period t is respectively.
Further, the formula of the reward feedback in step 2 is as follows:
R(K t ,Q t ,N t )=[b(K t ,Q t ,N t )-a·{Max(e-b(K t ,Q t ,N t ),0)} b ]·Δt
V t+1 =V t +Q t -Q p,t -Q s,t
in the formula: r is the electric energy generating capacity benefit after the penalty of t time period, namely Reward; k t Is the initial water storage level of the t time period; q t Is the total water volume of the reservoir in the period t; n is a radical of t Generating output of the reservoir at the time t; q p,t Representing the total generated flow in the t period; q s,t The water abandon amount in the period t; b (-) is the hydropower station generated energy in the period t; a and b are penalty coefficients; e, ensuring the output of the system; v t Water storage for time period tStorage capacity; Δ t represents a scheduling period length; v t+1 The capacity of the water storage at the end of the t period;
the constraints are as follows:
K min ≤K t ≤K max
0≤N t ≤N M
0≤Q t ≤Q M
wherein, K min And K max Respectively representing the minimum and maximum reservoir water levels of the reservoir in the time period t; n is a radical of t Representing a time period t decision effort; n is a radical of M The installed capacity of the reservoir is the maximum power generation output of the reservoir; q M The maximum flow capacity of the water turbine.
The DRL algorithm is oriented to decision and control problems, and the decision and control directly determine the intelligent degree of artificial intelligence. Conventional Reinforcement Learning uses a state decision table to make decisions, which limits its capabilities. The DRL algorithm takes the Bellman equation as a core, and uses a deep neural network to solve the relation between the constructed state and the decision, thereby effectively improving the learning, decision and control abilities in the high-dimensional state and decision environment. The reservoir scheduling has the characteristic of a typical Markov decision process, and the scheduling decision depends on the state conditions of reservoir storage, incoming water and the like, so the reservoir scheduling and the DRL algorithm have high coincidence. The DRL technology is derived to the water conservancy industry, where the reservoir deployment direction will be one of the main battlefields in which it is used.
Compared with the prior art, the invention has the advantages and beneficial effects that:
the invention applies the deep reinforcement learning technology in the artificial intelligence algorithm to the reservoir power generation scheduling decision, explores the coupling mode of the deep reinforcement learning and the reservoir power generation scheduling decision, and has application potential. Firstly, establishing a DRL learning model based on a deep Q value learning algorithm (DQN) according to a theoretical framework of an RL algorithm; then coupling the DRL algorithm with a reservoir power generation dispatching model by taking the Reward estimation as a connection point; and finally, on the basis of a random simulation runoff process, constructing an unsupervised deep learning model for reservoir scheduling through unsupervised autonomous learning. Compared with the optimal power generation scheduling process of dynamic planning and solving, the power generation scheduling result of the single reservoir intelligent flood control scheduling method based on the DQN deep reinforcement learning algorithm is obviously superior to the traditional reservoir power generation scheduling result based on the decision tree, and the reservoir scheduling unsupervised deep learning model has strong learning capacity and decision making capacity and strong adaptability in reservoir scheduling decision making.
Drawings
The invention is further illustrated by the following figures and examples.
FIG. 1 is a schematic network structure diagram of an artificial intelligence-based reservoir scheduling unsupervised deep learning model of the invention;
FIG. 2 is a Q-Learning algorithm based cost function update process according to the present invention;
FIG. 3 is a comparison diagram of the water level control process of the hammer reservoir of different networks according to the present invention;
FIG. 4 is a graph illustrating the effect of learning efficiency parameters on DRL "autonomous learning" efficiency;
FIG. 5 is a deviation result diagram of the DRL and decision tree rule based power generation scheduling process and the optimization process of the present invention.
Detailed Description
Example (b):
in the embodiment, the power generation dispatching of the Huaren reservoir is taken as an example, the DRL technology is introduced into the reservoir dispatching, and the intelligent flood control dispatching method for the single reservoir based on the DQN depth-enhanced learning algorithm is provided by utilizing the reservoir dispatching 'unsupervised deep learning' model based on artificial intelligence constructed in the embodiment 1. The stem reservoir is located at middle and downstream of muddy river and is positioned at 124-136-50 'of east longitude, 40-42-15' of north latitude and 10364km of dam site control watershed area 2 . The average annual rainfall of the drainage basin for many years is 860mm, 70% of rainfall is concentrated between 6 and 9 months, and heavy flooding generally occurs in late 7 to middle 8 months. The hull reservoir is positioned at the upstream of the reservoir group, is a leading reservoir in the muddy river step reservoir group, has annual adjustment capacity, is mainly used for power generation, and has comprehensive utilization benefits of flood control, irrigation, cultivation, travel and the like.
The basic parameters of the bar reservoir are shown in table 1.
TABLE 1 basic parameter description of hammer reservoir
In the embodiment, the runoff process of 400 years is generated in a random simulation mode on the basis of the average warehousing observed flow in 2010 aristoma ten days, and the DRL deep learning is trained by simulating the runoff process, so that an artificial intelligence expert is established for scheduling in ten days based on the aristoma ten days.
The intelligent flood control scheduling method comprises the following steps:
step 1: constructing an artificial intelligence-based reservoir scheduling unsupervised deep learning model:
respectively establishing a brain, a memory library and an 'autonomous learning' algorithm module of the Agent by taking a DRL technical architecture as a reference;
the brain of the Agent is constructed by adopting a DeepQ-Network (DQN) algorithm, the Agent is provided with a double-layer neural Network which is respectively AN Action Network (AN) and a Target Network (TN), the DRL learns in a recollection mode, the learning aims to train the AN and the TN, and the Sarsa algorithm is adopted to update the Q value in the DQN.
In the memory base, the scheduling knowledge generated in the scheduling process is stored, and the scheduling decision of each time interval can form a knowledge. For example, in the memory bank, the scheduling decision of the time period t can be the condition attribute (S) at the beginning of the time period t ) Decision attribute (Action), Reward (Reward), and condition attribute at the end of t period (S) t+1 ) Jointly form a piece of knowledge and store the knowledge in a memory base, wherein the formula is as follows:
<S t =(T t ,L t ,Q t ),Reward=R t ,Action=A t ,S t+1 =(T t+1 ,L t+1 ,Q t+1 )> (1)
in the formula, S t 、S t+1 The condition attributes are respectively at the end of the t period and the t period; r t Is the electric energy production benefit punished at the t period; a. the t Is the decision attribute for the t period; t is t 、T t+1 Numbering the scheduling periods in a year; l is t 、L t+1 Controlling the water level of the reservoir at the beginning of the time period t and the time period t +1 respectively; q t 、Q t+1 The total water volume of the reservoir at the end of the time period t and the time period t is respectively.
The value function of the autonomous learning module based on the Bellman equation is continuously increased, so that the Agent decision-making capability is continuously improved; with the increase of the learning times, the learning cost function of this time is embodied by the average value of the cost functions calculated by adjacent k times of learning, and the formula is as follows:
in the formula u k A decision cost function under the condition of a given scheduling state is learned for the kth time; u. of i A decision cost function under the condition of a given scheduling state is learned for the ith time; u shape k An average value function obtained after the current learning is performed for the kth learning; u shape k-1 The average cost function obtained after the current learning is learned for the k-1 th time.
After the autonomous learning, the updating of the cost function is realized by adopting the following formula:
U k (S t ,A t )=(1-α)U k-1 (S t ,A t )+α·u(S t ,A t ) (3)
in the formula, S t Is a conditional attribute at the beginning of the t period, A t A decision attribute at the beginning of the t period; alpha is the learning rate, and the larger the value is, the more important the current decision value is; u is a decision value function under the condition of a given scheduling state;
in the decision value estimation of reservoir power generation dispatching, according to the state S t+1 Calculating the U value of each decision in the decision set, wherein the U value in the formula is used for evaluating the decision benefit in an average value mode, and the formula is as follows:
in the formula, R t A decision benefit value obtained for a time period t; s t+1 Is the condition attribute at the end of the t period,the decision attribute at the end of the t period; lambda is a discount factor, the larger the value is, the larger the influence of the remaining period on the decision value function is, and the smaller the influence is;
after each learning, updating error feedback of Neural Network weight parameters by using a gradient descent method according to the change of the value function, wherein the formula is as follows:
E k =U k -U k-1 (5)
in the formula, E k The difference value of the k-1 and k-th learned cost functions.
Step 2: and (3) establishing reward feedback of the DRL on the basis of reservoir power generation scheduling:
evaluating the benefit of the decision according to the state of the current time period and the obtained decision, and feeding back the benefit in a reward mode; and taking the generated energy and whether the guaranteed output is reached as indexes of benefit evaluation.
The formula for the reward feedback is:
R(K t ,Q t ,N t )=[b(K t ,Q t ,N t )-a·{Max(e-b(K t ,Q t ,N t ),0)} b ]·Δt (6)
V t+1 =V t +Q t -Q p,t -Q s,t (7)
in the formula: r is the electric energy generating capacity benefit after the penalty of t time period, namely Reward; k is t Is the initial water storage level of the t time period; q t Is the total water volume of the reservoir in the period t; n is a radical of t Generating output of the reservoir at the time t; q p,t Representing the total generated flow in the t period; q s,t The water abandon amount in the period t; b (-) is the hydropower station generating capacity in the time period t, and is obtained by calculating the water consumption rate, the water head and the like; a and b are punishment coefficients which are determined by the power generation guarantee rate of the hydropower station, and the values are respectively 1 and 2; e is system guaranteeForce; e, ensuring the output of the system; v t The reservoir capacity at time t; Δ t represents a scheduling period length; v t+1 The capacity of the water storage at the end of the t period;
the constraints are as follows:
K min ≤K t ≤K max (8)
0≤N t ≤N M (9)
0≤Q t ≤Q M (10)
wherein, K min And K max Respectively representing the minimum and maximum reservoir water levels of the reservoir in the time period t; n is a radical of t Representing a time period t decision effort; n is a radical of M The installed capacity of the reservoir is the maximum power generation output of the reservoir; q M The maximum flow capacity of the water turbine.
And step 3: establishing a scheduling artificial intelligence expert aiming at a certain reservoir based on the actual measurement and warehousing runoff process of the reservoir:
in an artificial intelligence system, an Agent is used to represent an object with behavioral capabilities, such as a robot. The main task of reinforcement learning is to realize that the Agent learns and masters the knowledge of Environment through exploration and forms own knowledge system and memory.
The learning mode of the DRL can be introduced through the process of playing the game. The Agent senses the state of the characters in the game Environment through eyes, then selects the best keyboard operation of the characters through the brain, evaluates the quality of the operation by using the state fed back in the Environment, and further continuously adjusts the operation to strive to make the characters in the game develop towards the victory direction. When the Agent plays the game for hundreds of times, the operation process and winning key of the game are stored in the Agent memory. The memory of the Agent is derived from the experience obtained by playing the game, and the experience of other people can be stored in the memory of the Agent. The Agent learns and summarizes continuously through a recall mode, and then the Agent becomes a high-hand player of the game gradually.
The above process can be followed for the cultivation of the reservoir scheduling artificial intelligence expert, as shown in fig. 1. The method comprises the steps of taking actually measured reservoir warehousing runoff information and a corresponding scheduling time interval as input states, conducting autonomous learning through an 'autonomous learning' algorithm module, firstly sensing the current state of a reservoir, including information such as water storage level and incoming water, and deciding reservoir operation in a future time interval through the brain of Agent. On the basis, a reservoir power generation dispatching simulation mode is adopted, and the reward of the operation, namely the power generation benefit, is estimated and returned; then, storing the state, operation and benefit of the reservoir into the memory bank in a knowledge mode, starting to learn the knowledge in memory when the memory bank has enough knowledge and meeting the learning condition, then continuously carrying out actual scheduling operation to obtain new knowledge and update the memory bank, and finally enabling the Agent to be gradually mature and become an artificial intelligent expert for reservoir scheduling by circulating the learning-actual scheduling process; and using the established reservoir dispatching artificial intelligence expert for the reservoir power generation dispatching decision to determine the optimal control process of the reservoir power generation.
The following research results of the hammer reservoir:
an artificial intelligence body is established aiming at the power generation dispatching of the reservoir, and because the power generation dispatching of the reservoir belongs to the Markov process, different power generation dispatching strategies are provided under different states of the reservoir, and the state at the end of the period is greatly different, the water storage level and the incoming water of the reservoir are used as input states. In addition, due to the abundant variation of the water process in the year, the remaining period benefits of different scheduling periods are different, so the scheduling period must be used as one of the input states. In this embodiment, an artificial intelligence expert for ten-day scheduling is established based on the runoff information of the ten-day scale, and then the state space of the scheduling period is the ten-day number in one year, that is, 1, 2, …, 36. The decision of the reservoir power generation dispatching can adopt the power generation discharge flow rate and also can select the generating set to generate power. The present embodiment takes the power generation output as a decision.
In the deep learning of the RDL based on the DQN network, both the state and the decision exist in discrete form. In order to increase the operability of practical application, the warehousing runoff of each magnitude of the hammer reservoir is subjected to discrete treatment. In discrete operation, the required flows of the reservoir unit and the downstream reservoir are considered together, the hammerThe reservoir runoff will be discretized according to the criteria of table 2. Wherein the flow rate is 150m 3 The/s is the maximum overcurrent capacity of the maximum output of a single unit; 300m 3 The flow rate of the downstream dragon-returning reservoir is/s; 500m 3 And/s is the full whistle flow.
TABLE 2 grading Standard of average warehousing runoff of hammer kernels reservoirs 1, 3, 7 and 10 balances
According to the actual power generation requirement and the unit structure of the hammer reservoir, the output of the reservoir is dispersed into 6 grades, namely 6 decisions, as shown in table 3. And the DQN network selects the optimal output from the 6 decisions to carry out reservoir scheduling according to the input state information.
Table 3 cluster center of power generation output of hammer reservoir (thousand kilowatts)
And simulating the runoff process of 400 years by the DRL model according to the actual measurement warehousing runoff process. And performing 'autonomous simulation' according to the simulated runoff, and using the knowledge obtained by simulation for learning. And (4) utilizing the learned DRL to decide the power generation dispatching process of the hammer reservoir. To evaluate the decision-making ability of DRLs in reservoir power generation scheduling, dynamic programming is used to determine the optimal control process for reservoir power generation.
In order to determine that the influence of the decision value function on the reservoir power generation dispatching is analyzed when the unsupervised deep learning model is applied to the learning of the reservoir power generation dispatching, the decision benefits are evaluated in a maximum value and average value mode respectively, and a DRL1 model and a DRL2 model are respectively established.
According to the following formula, in the decision value estimation of reservoir power generation dispatching, according to the state S t+1 The U value of each decision in the decision set is calculated, and the maximum U value is selected as the remaining period power generation benefit (Rr), as shown in fig. 2 (a).
In addition, in the formula (4), according to the state S t+1 Calculating the U value of each decision in the decision set, averaging the U values of each decision, and using the average as the remaining period power generation benefit (Rr), as shown in fig. 2 (b).
After the two models of DRL1 and DRL2 are studied 2000 times on the basis of simulated runoff in 400 years, the power generation scheduling process of 1116 ten days in 1980-2010 decision-making Onhua reservoir is simulated. Compared with the optimal control process determined by dynamic planning, the water level control process of the hammer reservoir is shown in fig. 3.
Fig. 3(a) is a comparison of the hammer reservoir water level control process based on the DRL1 model and the optimal water level control process. The comparison result shows that the water level based on the DRL1 model is in the dead water level operation in most scheduling time periods, and the reservoir water level can rise to the normal water storage level only in a few time periods when the warehousing flow rate in the flood season is particularly large. The main reason is that when the DRL1 model evaluates a decision value function, the power generation benefit of the remaining period is represented by a maximum U value, and finally, the Agent always selects the maximum power generation amount to make a decision in learning.
Fig. 3(b) is a comparison of the water level control process of the hammer reservoir based on the DRL2 model and the optimal water level control process. The comparison result shows that the DRL2 has strong decision-making capability, and the control process of the water level of the hammer barrel reservoir and the optimal water level control process have high consistency.
Therefore, the invention determines to evaluate the decision benefit in the form of an average value, rather than taking a maximum value.
The DRL algorithm theory shows that the learning efficiency of the DRL model is controlled by the model parameters. The values of the DRL learning efficiency parameter in this embodiment are shown in table 4. Model parameters can be divided into two categories, the first category being knowledge control parameters; the second type is a learning efficiency parameter. The knowledge control parameters control Memory capacity and "autonomous learning" start conditions, etc., which belong to low sensitivity parameters in the learning of the DRL. The learning efficiency parameter has a controlling function on the stability of 'autonomous learning', the search of a decision space and the convergence rate, and belongs to a sensitivity parameter. Therefore, the embodiment analyzes the influence of the learning efficiency parameter on the 'autonomous learning' of the reservoir power generation dispatching.
TABLE 4 control parameters for the DRL deep learning System
Knowledge control parameters | Value taking | Learning efficiency parameter | Value taking |
Memory total knowledge (M) | 2000 | Learning rate (alpha) | 0.03 |
One-time learning knowledge quantity (W) | 200 | Discount factor (lambda) | 0.9 |
Learning interval threshold (L) | 50 | Greedy probability (epsilon) | 0.9 |
Learning knowledge quantity threshold (D) | 200 | Weight update interval (K) | 30 |
FIG. 4 is a change process of Reward under different parameter values.
FIG. 4(a) is a Reward value change process of greedy probability (epsilon) under different values. The greedy probability (epsilon) determines the probability that the scheduling decision will jump out of "development" and "exploration" during the "autonomous simulation" process. When the greedy probability (epsilon) takes a value of 0.95, the probability of 'exploration' is only 0.05, which is not beneficial to finding a new knowledge sample, and thus the learning efficiency is low. When (epsilon) takes a value of 0.8, the probability of 'exploration' reaches 0.2, and a large amount of 'exploration sample' knowledge is generated and stored in a memory base in the 'autonomous learning' process, wherein the 'exploration sample' knowledge reflects the diversity of the sample. However, the optimal knowledge of the 'exploration samples' is only one, and if a large amount of knowledge of the 'exploration samples' is reserved in the memory base for a long time, the 'inferior' knowledge of the 'exploration samples' affects the stability and the accuracy of the learning efficiency of the 'exploration samples' knowledge.
Fig. 4(b) is a change process of the Reward value with the increase of the learning times under different values of the discount factor (λ). The discount factor (λ) represents the degree of influence of the remaining-period power generation benefit on the decision value. In the decision value estimation of reservoir power generation dispatching, the larger the lambda value is, the stronger the influence of the power generation benefit of the remaining period on the decision value is. When the lambda value is 0.95, the decision value mainly consists of the remaining power generation benefit, and the Reward influence of the power generation decision is weak. The decision value cannot fully reflect the scheduling effect of the power generation decision, so that the learning efficiency is reduced. When the lambda value is too low, the influence of the Reward of the power generation decision on the decision value is increased, so that the scheduling decision can pay more attention to the scheduling benefit of 'before the eye', and the decision value of 'exploring sample' knowledge is changed violently. The learning of the knowledge sample leads to poor stability of the network model.
Fig. 4(c) is a change process of the Reward value with the increase of the learning times under different values of the learning rate (α). In the 2000 learning processes, when the learning rate alpha parameter is 0.03, the learning rate alpha parameter has a higher Reward value. When the value of the alpha value is 0.001, the formula (3) shows that the average decision value is less influenced by Reward, and the DRL learning effect is the worst. The reason is that the average value function overloads historical average value in the updating process, so that the change of the average value function value is small, and the learning of reservoir scheduling is not facilitated. When the alpha value is 0.3, the decision value in the updating process of the average value function occupies a larger proportion. Because the decision value is influenced by a time interval scheduling decision (Action), when the scheduling decision randomly selects the scheduling decision Action in an 'exploration' mode, the fluctuation change of the decision value of 'exploration sample' knowledge is large. Thereby affecting the stability of learning and resulting in a reduction in learning efficiency.
Fig. 4(d) is the effect of the update interval (K) of TN network weights in DQN on the Reward value. And when the updating interval K is 10, the weight parameter of the AN network is assigned to the TN network after 10 times of learning is shown. When the value K is smaller, the assignment of the network weight parameter is frequently executed, and finally the average cost function is unstable. When the value K is large, the assignment of the network weight parameter needs to wait for a long time, and the difference between the weight parameters of the AN and the TN network is large. When the average cost function is updated by equation (2), a distortion of the cost results.
Comparative example:
in order to compare the learning effect of the DRL, the comparative example adopts dynamic planning and a decision tree (C5.0) to establish a power generation dispatching model of the Ipomoea batatas Lam. And taking the ten-day power generation result of 1980-2010 of dynamic programming solution as an optimal scheduling result, and taking the power generation amount and the guarantee rate as a comparison reference.
In reservoir power generation scheduling research, decision trees are often used for mining of scheduling knowledge. The comparison example adopts a decision tree (C5.0) algorithm to mine an optimal power generation scheduling process, and finds and establishes a power generation scheduling rule set suitable for the Ipomoea batatas Lam Hance reservoir. Firstly, the optimal power generation process of the hull core reservoir is obtained by dynamic programming, solving and simulating under the condition of the runoff process of 400 years. Then, dispersing the state variable and the decision variable of the reservoir by adopting the standards of the tables 2 and 3 in the embodiment; and finally, mining by adopting a decision tree (C5.0) algorithm to obtain a power generation dispatching set of the hammer reservoir. According to the scheduling rule, the electricity generation scheduling process of the Huanhun reservoir in 1980-2010 is simulated.
Fig. 5 shows the deviation of the reservoir simulated water level process from the DP optimal process based on the DRL and decision tree rules. As can be seen from fig. 5, compared with the water level process of DP optimal scheduling, the fluctuation range of the reservoir water level control process based on the decision tree scheduling rule is larger than that of the DRL water level process. The DRL-based reservoir power generation scheduling decision is closer to the optimal decision.
Based on the DRL model which is learned 2000 times, the ten-day power generation scheduling process of the Ipomoea batange lake is simulated. The power generation and the assurance rates for the dynamic programming, decision trees and DRL models were counted as shown in table 5. And DP is used as the optimal solution, and the generated energy and the guarantee rate are highest. Power generation scheduling based on decision tree scheduling rules has poor simulation results. The difference between the generated energy and the guarantee rate of the DRL and the result of the DP model is small, and the DRL has good decision-making capability.
TABLE 5 control parameters for DRL deep learning System
On the basis of a DQN (data Quadrature reference network) and a DRL (data logging language) model, the reservoir power generation is used as reward feedback, and an artificial intelligence-based reservoir scheduling unsupervised deep learning model is established. Taking the hammer reservoir as an example, the DRL model is learned by simulating the runoff process of 400 years, and the decision-making capability of the DRL model is checked and evaluated according to the actual measurement runoff process of 1980-2010. The study conclusion is as follows:
(1) the DRL model is applied to the study of reservoir power generation dispatching, and the power generation benefit of the reservoir remaining period needs to be evaluated in an average value mode instead of a maximum value mode during the evaluation of the value function. If the uncertainty of the runoff is considered or the runoff forecast information is considered, the Markov state transfer or Bayesian theory can be adopted to realize the evaluation of the power generation benefit of the reserve period of the reservoir.
(2) Under the influence of model parameters, the learning efficiency of the DRL shows a large difference. By comparing the learning rate, the discount factor, the greedy coefficient and the weight updating interval, the invention can preliminarily determine the value ranges of different parameters in reservoir power generation dispatching and the influence degree and the influence mode on the learning efficiency.
(3) Compared with the optimal power generation scheduling result solved by dynamic planning, the DRL-based power generation scheduling result is obviously superior to the traditional decision tree-based reservoir power generation scheduling result. The strong learning ability and decision making ability of the DRL are fully displayed, and the method has strong adaptability in reservoir scheduling decision making.
Finally, it should be noted that the above is only for illustrating the technical solution of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred arrangement, it should be understood by those skilled in the art that the technical solution of the present invention (such as the application of various formulas, the sequence of steps, etc.) can be modified or equivalently replaced without departing from the spirit and scope of the technical solution of the present invention.
Claims (3)
1. A single reservoir intelligent flood control scheduling method based on a DQN deep reinforcement learning algorithm is characterized by comprising the following steps:
step 1: constructing an artificial intelligence-based reservoir scheduling unsupervised deep learning model:
respectively establishing a brain, a memory library and an 'autonomous learning' algorithm module of the Agent by taking a DRL technical architecture as a reference;
the brain of the Agent is constructed by adopting a Deep Q-Network (DQN) algorithm, and is provided with a double-layer neural Network, namely AN Action Network (AN) and a Target Network (TN);
the memory base stores the scheduling knowledge generated in the scheduling process, and the scheduling decision of each time interval can form a knowledge;
the autonomous learning module continuously increases a value function based on a Bellman equation, so that the Agent decision-making capability is continuously improved; with the increase of the learning times, the learning cost function of this time is embodied by the average value of the cost functions calculated by adjacent k times of learning, and the formula is as follows:
in the formula u k A decision cost function under the condition of a given scheduling state is learned for the kth time; u. of i A decision cost function under the condition of a given scheduling state is learned for the ith time; u shape k An average value function obtained after the current learning is performed for the kth learning; u shape k-1 An average cost function obtained after the current learning is performed for the k-1 st learning;
after the autonomous learning, the updating of the cost function is realized by adopting the following formula:
U k (S t ,A t )=(1-α)U k-1 (S t ,A t )+α·u(S t ,A t )
in the formula, S t Is a conditional attribute at the beginning of the t period, A t A decision attribute at the beginning of the t period; alpha is the learning rate; u is a decision value function under the condition of a given scheduling state;
in the decision value estimation of reservoir power generation dispatching, according to the state S t+1 Calculating the U value of each decision in the decision set, wherein the U value in the formula is used for evaluating the decision benefit in an average value mode, and the formula is as follows:
in the formula, R t A decision benefit value obtained for a time period t; s t+1 Is the condition attribute at the end of the t period,the decision attribute at the end of the t period; λ is a discounting factor;
after each learning, updating error feedback of Neural Network weight parameters by using a gradient descent method according to the change of the value function, wherein the formula is as follows:
E k =U k -U k-1
in the formula, E k The difference value of the value function of the k-1 and k-th learning is obtained;
step 2: and (3) establishing reward feedback of the DRL on the basis of reservoir power generation scheduling:
evaluating the benefit of the decision according to the state of the current time period and the obtained decision, and feeding back the benefit in a reward mode; wherein, the generated energy and whether the guaranteed output is reached are taken as indexes of benefit evaluation;
and step 3: establishing a scheduling artificial intelligence expert aiming at a certain reservoir based on the actual measurement and warehousing runoff process of the reservoir:
taking actually measured reservoir warehousing runoff information and a corresponding scheduling time period as input states, carrying out autonomous learning through the 'autonomous learning' algorithm module, and deciding reservoir operation in a future time period through the brain of Agent, namely generating output; on the basis, a reservoir power generation dispatching simulation mode is adopted, and the reward of the operation, namely the power generation benefit, is estimated and returned; then, storing the state, operation and benefit of the reservoir into the memory bank in a knowledge mode, starting to learn the knowledge in memory when the memory bank has enough knowledge and meeting the learning condition, then continuously carrying out actual scheduling operation to obtain new knowledge and update the memory bank, and finally enabling the Agent to be gradually mature and become an artificial intelligent expert for reservoir scheduling by circulating the learning-actual scheduling process;
and using the established reservoir dispatching artificial intelligence expert for the reservoir power generation dispatching decision to determine the optimal control process of the reservoir power generation.
2. The intelligent single-reservoir flood control scheduling method based on DQN deep reinforcement learning algorithm as claimed in claim 1, wherein in the memory bank, the scheduling decision of time period t can be used to determine the condition attribute (S) at the beginning of the time period t ) Decision attribute (Action), Reward (Reward), and condition attribute at the end of t period (S) t+1 ) Jointly form a piece of knowledge and store the knowledge in a memory base, wherein the formula is as follows:
<S t =(T t ,L t ,Q t ),Reward=R t ,Action=A t ,S t+1 =(T t+1 ,L t+1 ,Q t+1 )>
in the formula, S t 、S t+1 The condition attributes are respectively at the end of the t period and the t period; r t Is the electric energy production benefit punished at the t period; a. the t Is the decision attribute for the t period; t is t 、T t+1 Numbering the scheduling periods in a year; l is t 、L t+1 Controlling the water level of the reservoir at the beginning of the time period t and the time period t +1 respectively; q t 、Q t+1 The total water volume of the reservoir at the end of the time period t and the time period t is respectively.
3. The intelligent single-reservoir flood control scheduling method based on DQN deep reinforcement learning algorithm of claim 1, wherein the formula of the prize feedback in step 2 is:
R(K t ,Q t ,N t )=[b(K t ,Q t ,N t )-a·{Max(e-b(K t ,Q t ,N t ),0)} b ]·Δt
V t+1 =V t +Q t -Q p,t -Q s,t
in the formula: r is the electric energy generating capacity benefit after the penalty of t time period, namely Reward; k t Is the initial water storage level of the t time period; q t Is the total water volume of the reservoir in the period t; n is a radical of t Generating output of the reservoir at the time t; q p,t Representing the total generated flow in the t period; q s,t The water abandon amount in the period t; b (-) is the hydropower station generated energy in the period t; a and b are penalty coefficients; e, ensuring the output of the system; v t The reservoir capacity at time t; Δ t represents a scheduling period length; v t+1 The capacity of the water storage at the end of the t period;
the constraints are as follows:
K min ≤K t ≤K max
0≤N t ≤N M
0≤Q t ≤Q M
wherein, K min And K max Respectively representing the minimum and maximum reservoir water levels of the reservoir in the time period t; n is a radical of t Representing a time period t decision effort; n is a radical of hydrogen M The installed capacity of the reservoir is the maximum power generation output of the reservoir; q M The maximum flow capacity of the water turbine is achieved.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210741864.3A CN115049292B (en) | 2022-06-28 | 2022-06-28 | Intelligent single reservoir flood control scheduling method based on DQN deep reinforcement learning algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210741864.3A CN115049292B (en) | 2022-06-28 | 2022-06-28 | Intelligent single reservoir flood control scheduling method based on DQN deep reinforcement learning algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115049292A true CN115049292A (en) | 2022-09-13 |
CN115049292B CN115049292B (en) | 2023-03-24 |
Family
ID=83163984
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210741864.3A Active CN115049292B (en) | 2022-06-28 | 2022-06-28 | Intelligent single reservoir flood control scheduling method based on DQN deep reinforcement learning algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115049292B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115952958A (en) * | 2023-03-14 | 2023-04-11 | 珠江水利委员会珠江水利科学研究院 | Reservoir group joint optimization scheduling method based on MADDPG reinforcement learning |
CN117132089A (en) * | 2023-10-27 | 2023-11-28 | 邯郸欣和电力建设有限公司 | Power utilization strategy optimization scheduling method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109636226A (en) * | 2018-12-21 | 2019-04-16 | 华中科技大学 | A kind of reservoir multi-objective Hierarchical Flood Control Dispatch method |
CN110930016A (en) * | 2019-11-19 | 2020-03-27 | 三峡大学 | Cascade reservoir random optimization scheduling method based on deep Q learning |
US20200119556A1 (en) * | 2018-10-11 | 2020-04-16 | Di Shi | Autonomous Voltage Control for Power System Using Deep Reinforcement Learning Considering N-1 Contingency |
CN112186743A (en) * | 2020-09-16 | 2021-01-05 | 北京交通大学 | Dynamic power system economic dispatching method based on deep reinforcement learning |
WO2022083029A1 (en) * | 2020-10-19 | 2022-04-28 | 深圳大学 | Decision-making method based on deep reinforcement learning |
-
2022
- 2022-06-28 CN CN202210741864.3A patent/CN115049292B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200119556A1 (en) * | 2018-10-11 | 2020-04-16 | Di Shi | Autonomous Voltage Control for Power System Using Deep Reinforcement Learning Considering N-1 Contingency |
CN109636226A (en) * | 2018-12-21 | 2019-04-16 | 华中科技大学 | A kind of reservoir multi-objective Hierarchical Flood Control Dispatch method |
CN110930016A (en) * | 2019-11-19 | 2020-03-27 | 三峡大学 | Cascade reservoir random optimization scheduling method based on deep Q learning |
CN112186743A (en) * | 2020-09-16 | 2021-01-05 | 北京交通大学 | Dynamic power system economic dispatching method based on deep reinforcement learning |
WO2022083029A1 (en) * | 2020-10-19 | 2022-04-28 | 深圳大学 | Decision-making method based on deep reinforcement learning |
Non-Patent Citations (2)
Title |
---|
练继建等: "基于Agent的水资源管理模型研究进展", 《水科学进展》 * |
董香栾: "基于DQN算法的综合能源系统优化调度策略研究", 《中国优秀硕士学位论文全文数据库(电子期刊)工程科技Ⅱ辑》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115952958A (en) * | 2023-03-14 | 2023-04-11 | 珠江水利委员会珠江水利科学研究院 | Reservoir group joint optimization scheduling method based on MADDPG reinforcement learning |
CN117132089A (en) * | 2023-10-27 | 2023-11-28 | 邯郸欣和电力建设有限公司 | Power utilization strategy optimization scheduling method and device |
CN117132089B (en) * | 2023-10-27 | 2024-03-08 | 邯郸欣和电力建设有限公司 | Power utilization strategy optimization scheduling method and device |
Also Published As
Publication number | Publication date |
---|---|
CN115049292B (en) | 2023-03-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115049292B (en) | Intelligent single reservoir flood control scheduling method based on DQN deep reinforcement learning algorithm | |
CN110110930B (en) | Recurrent neural network short-term power load prediction method for improving whale algorithm | |
CN114217524B (en) | Power grid real-time self-adaptive decision-making method based on deep reinforcement learning | |
CN112508275A (en) | Power distribution network line load prediction method and equipment based on clustering and trend indexes | |
CN104636823B (en) | A kind of wind power forecasting method | |
CN111736084B (en) | Valve-regulated lead-acid storage battery health state prediction method based on improved LSTM neural network | |
CN101599138A (en) | Land evaluation method based on artificial neural network | |
CN110472840B (en) | Agricultural water conservancy scheduling method and system based on neural network technology | |
CN110837915B (en) | Low-voltage load point prediction and probability prediction method for power system based on hybrid integrated deep learning | |
CN110598983A (en) | Cascade reservoir optimal scheduling method of self-adaptive improved particle swarm algorithm | |
CN109919356A (en) | One kind being based on BP neural network section water demand prediction method | |
CN110147890A (en) | A kind of method and system based on lion group's algorithm optimization extreme learning machine integrated study | |
CN109242265A (en) | Based on the smallest Urban Water Demand combination forecasting method of error sum of squares | |
CN111008790A (en) | Hydropower station group power generation electric scheduling rule extraction method | |
CN114362175A (en) | Wind power prediction method and system based on depth certainty strategy gradient algorithm | |
CN112036651A (en) | Electricity price prediction method based on quantum immune optimization BP neural network algorithm | |
CN109002878A (en) | A kind of GA Optimized BP Neural Network | |
CN115622056B (en) | Energy storage optimal configuration method and system based on linear weighting and selection method | |
CN104268802A (en) | Method for determining electricity demand factors of urban residential communities through neural network model | |
CN115587713A (en) | Marine ranch disaster decision method based on reinforcement learning | |
CN114204546B (en) | Unit combination optimization method considering new energy consumption | |
CN109615142A (en) | A kind of wind farm wind velocity combination forecasting method based on wavelet analysis | |
CN114357865A (en) | Hydropower station runoff and associated source load power year scene simulation and prediction method thereof | |
Liu | Machine learning for wind power prediction | |
CN113515889B (en) | Dynamic wind speed prediction model building method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |