CN115049292A - Intelligent single reservoir flood control scheduling method based on DQN deep reinforcement learning algorithm - Google Patents

Intelligent single reservoir flood control scheduling method based on DQN deep reinforcement learning algorithm Download PDF

Info

Publication number
CN115049292A
CN115049292A CN202210741864.3A CN202210741864A CN115049292A CN 115049292 A CN115049292 A CN 115049292A CN 202210741864 A CN202210741864 A CN 202210741864A CN 115049292 A CN115049292 A CN 115049292A
Authority
CN
China
Prior art keywords
reservoir
scheduling
decision
learning
period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210741864.3A
Other languages
Chinese (zh)
Other versions
CN115049292B (en
Inventor
任明磊
徐炜
刘昌军
魏国振
王刚
赵丽平
顾李华
王凯
张琪
刘小虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaihe River Water Resources Commission Hydrology Bureau (information Center)
China Institute of Water Resources and Hydropower Research
Chongqing Jiaotong University
Original Assignee
Huaihe River Water Resources Commission Hydrology Bureau (information Center)
China Institute of Water Resources and Hydropower Research
Chongqing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaihe River Water Resources Commission Hydrology Bureau (information Center), China Institute of Water Resources and Hydropower Research, Chongqing Jiaotong University filed Critical Huaihe River Water Resources Commission Hydrology Bureau (information Center)
Priority to CN202210741864.3A priority Critical patent/CN115049292B/en
Publication of CN115049292A publication Critical patent/CN115049292A/en
Application granted granted Critical
Publication of CN115049292B publication Critical patent/CN115049292B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A10/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
    • Y02A10/40Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Primary Health Care (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Water Supply & Treatment (AREA)
  • Biomedical Technology (AREA)
  • Public Health (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention relates to a DQN deep reinforcement learning algorithm-based intelligent single reservoir flood control scheduling method, which comprises the following steps: the method comprises the steps of constructing an artificial intelligence-based reservoir scheduling unsupervised deep learning model, establishing DRL reward feedback based on reservoir power generation scheduling, and establishing artificial intelligence experts for scheduling of a certain reservoir based on an actual reservoir measurement warehousing runoff process. Compared with the optimal power generation scheduling process of dynamic programming solution, the power generation scheduling result of the intelligent single-reservoir flood control scheduling method based on the DQN deep reinforcement learning algorithm is obviously superior to the traditional reservoir power generation scheduling result based on the decision tree, and the unsupervised deep learning model for reservoir scheduling has strong learning capability and decision capability and strong adaptability in reservoir scheduling decision.

Description

Intelligent single reservoir flood control scheduling method based on DQN deep reinforcement learning algorithm
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a single reservoir intelligent flood control scheduling method based on a DQN deep reinforcement learning algorithm.
Background
In 2016, the success of go AlphaGo activated the potential for artificial intelligence. The emergence of AlphaGo has had a sense of milestone, thereby raising the surge in artificial intelligence development. Under the push of the wave, the development of the core technology of the wave is further accelerated and the wave is derived to other industries. In the Alphago playing process, the postefficiency and the maximum winning probability of each step of decision need to be considered. The most central algorithm in AlphaGo is Deep Reinforcement Learning (DRL), which is suitable for the mode of state vs. decision, and in particular for the decision process with markov properties.
Traditional reinforcement learning theories have been continuously perfected over the last decades, but it is still difficult to solve the complex problems in the real world, especially in multi-state and multi-decision situations. Deep reinforcement learning DRL is a product of a combination of deep learning and reinforcement learning. The DRL integrates the strong comprehension ability of deep Learning on the problems of vision, cognition and the like and the decision-making ability of reinforcement Learning, and forms a brand-new mode of End-to-End Learning (End-to-End Learning) from Perception (Perception) to Action (Action). The mode enables machine learning to have a real meaning of 'autonomous learning'. The DRL technology enables the artificial intelligence technology to be really practical, and enables the artificial intelligence technology to have strong learning and survival ability in complex environments with high-dimensional states and decisions.
In the water conservancy industry, reservoir scheduling has the characteristic of a typical Markov decision process, and scheduling decisions depend on state conditions of reservoir storage, incoming water and the like, so the concepts of reservoir scheduling and a DRL algorithm have high coincidence. If the DRL technology is derived to the water conservancy industry, the reservoir dispatching direction is one of the main battlefields for the application.
Therefore, how to introduce the DRL technology into reservoir scheduling, adapt to reservoir scheduling decisions and determine the optimal control process of reservoir power generation is a problem to be solved.
Disclosure of Invention
In order to overcome the problems in the prior art, the invention provides a single reservoir intelligent flood control scheduling method based on a DQN deep reinforcement learning algorithm. The method is based on a DQN network and a DRL model, adopts reservoir power generation as reward feedback, establishes a reservoir operation control model based on a deep reinforcement learning algorithm, and establishes a reservoir dispatching artificial intelligence expert, thereby determining the optimal control process of the reservoir power generation.
The purpose of the invention is realized as follows:
a single reservoir intelligent flood control scheduling method based on a DQN deep reinforcement learning algorithm comprises the following steps:
step 1: constructing an artificial intelligence-based reservoir scheduling unsupervised deep learning model:
respectively establishing a brain, a memory library and an 'autonomous learning' algorithm module of the Agent by taking a DRL technical architecture as a reference;
the brain of the Agent is constructed by adopting a Deep Q-Network (DQN) algorithm and is provided with a double-layer neural Network, namely AN Action Network (AN) and a Target Network (TN);
the memory base stores the scheduling knowledge generated in the scheduling process, and the scheduling decision of each time interval can form a knowledge;
the value function of the autonomous learning module based on the Bellman equation is continuously increased, so that the Agent decision-making capability is continuously improved; with the increase of the learning times, the learning cost function of this time is embodied by the average value of the cost functions calculated by adjacent k times of learning, and the formula is as follows:
Figure BDA0003718351980000021
in the formula u k A decision cost function under the condition of a given scheduling state is learned for the kth time; u. of i A decision cost function under the condition of a given scheduling state is learned for the ith time; u shape k An average value function obtained after the current learning is performed for the kth learning; u shape k-1 The average cost function obtained after the current learning is learned for the k-1 th time.
After the autonomous learning, the updating of the cost function is realized by adopting the following formula:
U k (S t ,A t )=(1-α)U k-1 (S t ,A t )+α·u(S t ,A t )
in the formula, S t Is a conditional attribute at the beginning of the t period, A t A decision attribute at the beginning of the t period; alpha is the learning rate; u is a decision value function under the condition of a given scheduling state;
in the decision value estimation of reservoir power generation dispatching, according to the state S t+1 Calculating the U value of each decision in the decision set, and evaluating the decision by adopting an average value modeThe strategy benefit is as follows:
Figure BDA0003718351980000022
in the formula, R t A decision benefit value obtained for a time period t; s t+1 Is the condition attribute at the end of the t period,
Figure BDA0003718351980000023
the decision attribute at the end of the t period; λ is a discount factor;
after each learning, updating error feedback of Neural Network weight parameters by using a gradient descent method according to the change of the value function, wherein the formula is as follows:
E k =U k -U k-1
in the formula, E k The difference value of the value function of the k-1 and k-th learning is obtained;
step 2: and (3) establishing reward feedback of the DRL on the basis of reservoir power generation scheduling:
evaluating the benefit of the decision according to the state of the current time period and the obtained decision, and feeding back the benefit in a reward mode; wherein, the generated energy and whether the guaranteed output is reached are taken as indexes of benefit evaluation;
and step 3: establishing a scheduling artificial intelligence expert aiming at a certain reservoir based on the actual measurement and warehousing runoff process of the reservoir:
taking actually measured reservoir warehousing runoff information and a corresponding scheduling time period as input states, carrying out autonomous learning through the 'autonomous learning' algorithm module, and deciding reservoir operation in a future time period through the brain of Agent, namely generating output; on the basis, a reservoir power generation dispatching simulation mode is adopted, and the reward of the operation, namely the power generation benefit, is estimated and returned; then, storing the state, operation and benefit of the reservoir into the memory bank in a knowledge mode, starting to learn the knowledge in memory when the memory bank has enough knowledge and meeting the learning condition, then continuously carrying out actual scheduling operation to obtain new knowledge and update the memory bank, and finally enabling the Agent to be gradually mature and become an artificial intelligent expert for reservoir scheduling by circulating the learning-actual scheduling process;
and using the established reservoir dispatching artificial intelligence expert for the reservoir power generation dispatching decision to determine the optimal control process of the reservoir power generation.
Further, in the memory bank, the scheduling decision of the time period t can be the condition attribute (S) at the beginning of the time period t ) Decision attribute (Action), Reward (Reward), and condition attribute at the end of t period (S) t+1 ) Jointly form a piece of knowledge and store the knowledge in a memory base, wherein the formula is as follows:
<S t =(T t ,L t ,Q t ),Reward=R t ,Action=A t ,S t+1 =(T t+1 ,L t+1 ,Q t+1 )>
in the formula, S t 、S t+1 The condition attributes are respectively at the end of the t period and the t period; r t Is the electric energy production benefit punished at the t period; a. the t Is the decision attribute for the t period; t is t 、T t+1 Numbering the scheduling periods in a year; l is t 、L t+1 Controlling the water level of the reservoir at the beginning of the time period t and the time period t +1 respectively; q t 、Q t+1 The total water volume of the reservoir at the end of the time period t and the time period t is respectively.
Further, the formula of the reward feedback in step 2 is as follows:
R(K t ,Q t ,N t )=[b(K t ,Q t ,N t )-a·{Max(e-b(K t ,Q t ,N t ),0)} b ]·Δt
V t+1 =V t +Q t -Q p,t -Q s,t
in the formula: r is the electric energy generating capacity benefit after the penalty of t time period, namely Reward; k t Is the initial water storage level of the t time period; q t Is the total water volume of the reservoir in the period t; n is a radical of t Generating output of the reservoir at the time t; q p,t Representing the total generated flow in the t period; q s,t The water abandon amount in the period t; b (-) is the hydropower station generated energy in the period t; a and b are penalty coefficients; e, ensuring the output of the system; v t Water storage for time period tStorage capacity; Δ t represents a scheduling period length; v t+1 The capacity of the water storage at the end of the t period;
the constraints are as follows:
K min ≤K t ≤K max
0≤N t ≤N M
0≤Q t ≤Q M
wherein, K min And K max Respectively representing the minimum and maximum reservoir water levels of the reservoir in the time period t; n is a radical of t Representing a time period t decision effort; n is a radical of M The installed capacity of the reservoir is the maximum power generation output of the reservoir; q M The maximum flow capacity of the water turbine.
The DRL algorithm is oriented to decision and control problems, and the decision and control directly determine the intelligent degree of artificial intelligence. Conventional Reinforcement Learning uses a state decision table to make decisions, which limits its capabilities. The DRL algorithm takes the Bellman equation as a core, and uses a deep neural network to solve the relation between the constructed state and the decision, thereby effectively improving the learning, decision and control abilities in the high-dimensional state and decision environment. The reservoir scheduling has the characteristic of a typical Markov decision process, and the scheduling decision depends on the state conditions of reservoir storage, incoming water and the like, so the reservoir scheduling and the DRL algorithm have high coincidence. The DRL technology is derived to the water conservancy industry, where the reservoir deployment direction will be one of the main battlefields in which it is used.
Compared with the prior art, the invention has the advantages and beneficial effects that:
the invention applies the deep reinforcement learning technology in the artificial intelligence algorithm to the reservoir power generation scheduling decision, explores the coupling mode of the deep reinforcement learning and the reservoir power generation scheduling decision, and has application potential. Firstly, establishing a DRL learning model based on a deep Q value learning algorithm (DQN) according to a theoretical framework of an RL algorithm; then coupling the DRL algorithm with a reservoir power generation dispatching model by taking the Reward estimation as a connection point; and finally, on the basis of a random simulation runoff process, constructing an unsupervised deep learning model for reservoir scheduling through unsupervised autonomous learning. Compared with the optimal power generation scheduling process of dynamic planning and solving, the power generation scheduling result of the single reservoir intelligent flood control scheduling method based on the DQN deep reinforcement learning algorithm is obviously superior to the traditional reservoir power generation scheduling result based on the decision tree, and the reservoir scheduling unsupervised deep learning model has strong learning capacity and decision making capacity and strong adaptability in reservoir scheduling decision making.
Drawings
The invention is further illustrated by the following figures and examples.
FIG. 1 is a schematic network structure diagram of an artificial intelligence-based reservoir scheduling unsupervised deep learning model of the invention;
FIG. 2 is a Q-Learning algorithm based cost function update process according to the present invention;
FIG. 3 is a comparison diagram of the water level control process of the hammer reservoir of different networks according to the present invention;
FIG. 4 is a graph illustrating the effect of learning efficiency parameters on DRL "autonomous learning" efficiency;
FIG. 5 is a deviation result diagram of the DRL and decision tree rule based power generation scheduling process and the optimization process of the present invention.
Detailed Description
Example (b):
in the embodiment, the power generation dispatching of the Huaren reservoir is taken as an example, the DRL technology is introduced into the reservoir dispatching, and the intelligent flood control dispatching method for the single reservoir based on the DQN depth-enhanced learning algorithm is provided by utilizing the reservoir dispatching 'unsupervised deep learning' model based on artificial intelligence constructed in the embodiment 1. The stem reservoir is located at middle and downstream of muddy river and is positioned at 124-136-50 'of east longitude, 40-42-15' of north latitude and 10364km of dam site control watershed area 2 . The average annual rainfall of the drainage basin for many years is 860mm, 70% of rainfall is concentrated between 6 and 9 months, and heavy flooding generally occurs in late 7 to middle 8 months. The hull reservoir is positioned at the upstream of the reservoir group, is a leading reservoir in the muddy river step reservoir group, has annual adjustment capacity, is mainly used for power generation, and has comprehensive utilization benefits of flood control, irrigation, cultivation, travel and the like.
The basic parameters of the bar reservoir are shown in table 1.
TABLE 1 basic parameter description of hammer reservoir
Figure BDA0003718351980000051
In the embodiment, the runoff process of 400 years is generated in a random simulation mode on the basis of the average warehousing observed flow in 2010 aristoma ten days, and the DRL deep learning is trained by simulating the runoff process, so that an artificial intelligence expert is established for scheduling in ten days based on the aristoma ten days.
The intelligent flood control scheduling method comprises the following steps:
step 1: constructing an artificial intelligence-based reservoir scheduling unsupervised deep learning model:
respectively establishing a brain, a memory library and an 'autonomous learning' algorithm module of the Agent by taking a DRL technical architecture as a reference;
the brain of the Agent is constructed by adopting a DeepQ-Network (DQN) algorithm, the Agent is provided with a double-layer neural Network which is respectively AN Action Network (AN) and a Target Network (TN), the DRL learns in a recollection mode, the learning aims to train the AN and the TN, and the Sarsa algorithm is adopted to update the Q value in the DQN.
In the memory base, the scheduling knowledge generated in the scheduling process is stored, and the scheduling decision of each time interval can form a knowledge. For example, in the memory bank, the scheduling decision of the time period t can be the condition attribute (S) at the beginning of the time period t ) Decision attribute (Action), Reward (Reward), and condition attribute at the end of t period (S) t+1 ) Jointly form a piece of knowledge and store the knowledge in a memory base, wherein the formula is as follows:
<S t =(T t ,L t ,Q t ),Reward=R t ,Action=A t ,S t+1 =(T t+1 ,L t+1 ,Q t+1 )> (1)
in the formula, S t 、S t+1 The condition attributes are respectively at the end of the t period and the t period; r t Is the electric energy production benefit punished at the t period; a. the t Is the decision attribute for the t period; t is t 、T t+1 Numbering the scheduling periods in a year; l is t 、L t+1 Controlling the water level of the reservoir at the beginning of the time period t and the time period t +1 respectively; q t 、Q t+1 The total water volume of the reservoir at the end of the time period t and the time period t is respectively.
The value function of the autonomous learning module based on the Bellman equation is continuously increased, so that the Agent decision-making capability is continuously improved; with the increase of the learning times, the learning cost function of this time is embodied by the average value of the cost functions calculated by adjacent k times of learning, and the formula is as follows:
Figure BDA0003718351980000061
in the formula u k A decision cost function under the condition of a given scheduling state is learned for the kth time; u. of i A decision cost function under the condition of a given scheduling state is learned for the ith time; u shape k An average value function obtained after the current learning is performed for the kth learning; u shape k-1 The average cost function obtained after the current learning is learned for the k-1 th time.
After the autonomous learning, the updating of the cost function is realized by adopting the following formula:
U k (S t ,A t )=(1-α)U k-1 (S t ,A t )+α·u(S t ,A t ) (3)
in the formula, S t Is a conditional attribute at the beginning of the t period, A t A decision attribute at the beginning of the t period; alpha is the learning rate, and the larger the value is, the more important the current decision value is; u is a decision value function under the condition of a given scheduling state;
in the decision value estimation of reservoir power generation dispatching, according to the state S t+1 Calculating the U value of each decision in the decision set, wherein the U value in the formula is used for evaluating the decision benefit in an average value mode, and the formula is as follows:
Figure BDA0003718351980000062
in the formula, R t A decision benefit value obtained for a time period t; s t+1 Is the condition attribute at the end of the t period,
Figure BDA0003718351980000063
the decision attribute at the end of the t period; lambda is a discount factor, the larger the value is, the larger the influence of the remaining period on the decision value function is, and the smaller the influence is;
after each learning, updating error feedback of Neural Network weight parameters by using a gradient descent method according to the change of the value function, wherein the formula is as follows:
E k =U k -U k-1 (5)
in the formula, E k The difference value of the k-1 and k-th learned cost functions.
Step 2: and (3) establishing reward feedback of the DRL on the basis of reservoir power generation scheduling:
evaluating the benefit of the decision according to the state of the current time period and the obtained decision, and feeding back the benefit in a reward mode; and taking the generated energy and whether the guaranteed output is reached as indexes of benefit evaluation.
The formula for the reward feedback is:
R(K t ,Q t ,N t )=[b(K t ,Q t ,N t )-a·{Max(e-b(K t ,Q t ,N t ),0)} b ]·Δt (6)
V t+1 =V t +Q t -Q p,t -Q s,t (7)
in the formula: r is the electric energy generating capacity benefit after the penalty of t time period, namely Reward; k is t Is the initial water storage level of the t time period; q t Is the total water volume of the reservoir in the period t; n is a radical of t Generating output of the reservoir at the time t; q p,t Representing the total generated flow in the t period; q s,t The water abandon amount in the period t; b (-) is the hydropower station generating capacity in the time period t, and is obtained by calculating the water consumption rate, the water head and the like; a and b are punishment coefficients which are determined by the power generation guarantee rate of the hydropower station, and the values are respectively 1 and 2; e is system guaranteeForce; e, ensuring the output of the system; v t The reservoir capacity at time t; Δ t represents a scheduling period length; v t+1 The capacity of the water storage at the end of the t period;
the constraints are as follows:
K min ≤K t ≤K max (8)
0≤N t ≤N M (9)
0≤Q t ≤Q M (10)
wherein, K min And K max Respectively representing the minimum and maximum reservoir water levels of the reservoir in the time period t; n is a radical of t Representing a time period t decision effort; n is a radical of M The installed capacity of the reservoir is the maximum power generation output of the reservoir; q M The maximum flow capacity of the water turbine.
And step 3: establishing a scheduling artificial intelligence expert aiming at a certain reservoir based on the actual measurement and warehousing runoff process of the reservoir:
in an artificial intelligence system, an Agent is used to represent an object with behavioral capabilities, such as a robot. The main task of reinforcement learning is to realize that the Agent learns and masters the knowledge of Environment through exploration and forms own knowledge system and memory.
The learning mode of the DRL can be introduced through the process of playing the game. The Agent senses the state of the characters in the game Environment through eyes, then selects the best keyboard operation of the characters through the brain, evaluates the quality of the operation by using the state fed back in the Environment, and further continuously adjusts the operation to strive to make the characters in the game develop towards the victory direction. When the Agent plays the game for hundreds of times, the operation process and winning key of the game are stored in the Agent memory. The memory of the Agent is derived from the experience obtained by playing the game, and the experience of other people can be stored in the memory of the Agent. The Agent learns and summarizes continuously through a recall mode, and then the Agent becomes a high-hand player of the game gradually.
The above process can be followed for the cultivation of the reservoir scheduling artificial intelligence expert, as shown in fig. 1. The method comprises the steps of taking actually measured reservoir warehousing runoff information and a corresponding scheduling time interval as input states, conducting autonomous learning through an 'autonomous learning' algorithm module, firstly sensing the current state of a reservoir, including information such as water storage level and incoming water, and deciding reservoir operation in a future time interval through the brain of Agent. On the basis, a reservoir power generation dispatching simulation mode is adopted, and the reward of the operation, namely the power generation benefit, is estimated and returned; then, storing the state, operation and benefit of the reservoir into the memory bank in a knowledge mode, starting to learn the knowledge in memory when the memory bank has enough knowledge and meeting the learning condition, then continuously carrying out actual scheduling operation to obtain new knowledge and update the memory bank, and finally enabling the Agent to be gradually mature and become an artificial intelligent expert for reservoir scheduling by circulating the learning-actual scheduling process; and using the established reservoir dispatching artificial intelligence expert for the reservoir power generation dispatching decision to determine the optimal control process of the reservoir power generation.
The following research results of the hammer reservoir:
an artificial intelligence body is established aiming at the power generation dispatching of the reservoir, and because the power generation dispatching of the reservoir belongs to the Markov process, different power generation dispatching strategies are provided under different states of the reservoir, and the state at the end of the period is greatly different, the water storage level and the incoming water of the reservoir are used as input states. In addition, due to the abundant variation of the water process in the year, the remaining period benefits of different scheduling periods are different, so the scheduling period must be used as one of the input states. In this embodiment, an artificial intelligence expert for ten-day scheduling is established based on the runoff information of the ten-day scale, and then the state space of the scheduling period is the ten-day number in one year, that is, 1, 2, …, 36. The decision of the reservoir power generation dispatching can adopt the power generation discharge flow rate and also can select the generating set to generate power. The present embodiment takes the power generation output as a decision.
In the deep learning of the RDL based on the DQN network, both the state and the decision exist in discrete form. In order to increase the operability of practical application, the warehousing runoff of each magnitude of the hammer reservoir is subjected to discrete treatment. In discrete operation, the required flows of the reservoir unit and the downstream reservoir are considered together, the hammerThe reservoir runoff will be discretized according to the criteria of table 2. Wherein the flow rate is 150m 3 The/s is the maximum overcurrent capacity of the maximum output of a single unit; 300m 3 The flow rate of the downstream dragon-returning reservoir is/s; 500m 3 And/s is the full whistle flow.
TABLE 2 grading Standard of average warehousing runoff of hammer kernels reservoirs 1, 3, 7 and 10 balances
Figure BDA0003718351980000081
According to the actual power generation requirement and the unit structure of the hammer reservoir, the output of the reservoir is dispersed into 6 grades, namely 6 decisions, as shown in table 3. And the DQN network selects the optimal output from the 6 decisions to carry out reservoir scheduling according to the input state information.
Table 3 cluster center of power generation output of hammer reservoir (thousand kilowatts)
Figure BDA0003718351980000082
And simulating the runoff process of 400 years by the DRL model according to the actual measurement warehousing runoff process. And performing 'autonomous simulation' according to the simulated runoff, and using the knowledge obtained by simulation for learning. And (4) utilizing the learned DRL to decide the power generation dispatching process of the hammer reservoir. To evaluate the decision-making ability of DRLs in reservoir power generation scheduling, dynamic programming is used to determine the optimal control process for reservoir power generation.
In order to determine that the influence of the decision value function on the reservoir power generation dispatching is analyzed when the unsupervised deep learning model is applied to the learning of the reservoir power generation dispatching, the decision benefits are evaluated in a maximum value and average value mode respectively, and a DRL1 model and a DRL2 model are respectively established.
According to the following formula, in the decision value estimation of reservoir power generation dispatching, according to the state S t+1 The U value of each decision in the decision set is calculated, and the maximum U value is selected as the remaining period power generation benefit (Rr), as shown in fig. 2 (a).
Figure BDA0003718351980000083
In addition, in the formula (4), according to the state S t+1 Calculating the U value of each decision in the decision set, averaging the U values of each decision, and using the average as the remaining period power generation benefit (Rr), as shown in fig. 2 (b).
After the two models of DRL1 and DRL2 are studied 2000 times on the basis of simulated runoff in 400 years, the power generation scheduling process of 1116 ten days in 1980-2010 decision-making Onhua reservoir is simulated. Compared with the optimal control process determined by dynamic planning, the water level control process of the hammer reservoir is shown in fig. 3.
Fig. 3(a) is a comparison of the hammer reservoir water level control process based on the DRL1 model and the optimal water level control process. The comparison result shows that the water level based on the DRL1 model is in the dead water level operation in most scheduling time periods, and the reservoir water level can rise to the normal water storage level only in a few time periods when the warehousing flow rate in the flood season is particularly large. The main reason is that when the DRL1 model evaluates a decision value function, the power generation benefit of the remaining period is represented by a maximum U value, and finally, the Agent always selects the maximum power generation amount to make a decision in learning.
Fig. 3(b) is a comparison of the water level control process of the hammer reservoir based on the DRL2 model and the optimal water level control process. The comparison result shows that the DRL2 has strong decision-making capability, and the control process of the water level of the hammer barrel reservoir and the optimal water level control process have high consistency.
Therefore, the invention determines to evaluate the decision benefit in the form of an average value, rather than taking a maximum value.
The DRL algorithm theory shows that the learning efficiency of the DRL model is controlled by the model parameters. The values of the DRL learning efficiency parameter in this embodiment are shown in table 4. Model parameters can be divided into two categories, the first category being knowledge control parameters; the second type is a learning efficiency parameter. The knowledge control parameters control Memory capacity and "autonomous learning" start conditions, etc., which belong to low sensitivity parameters in the learning of the DRL. The learning efficiency parameter has a controlling function on the stability of 'autonomous learning', the search of a decision space and the convergence rate, and belongs to a sensitivity parameter. Therefore, the embodiment analyzes the influence of the learning efficiency parameter on the 'autonomous learning' of the reservoir power generation dispatching.
TABLE 4 control parameters for the DRL deep learning System
Knowledge control parameters Value taking Learning efficiency parameter Value taking
Memory total knowledge (M) 2000 Learning rate (alpha) 0.03
One-time learning knowledge quantity (W) 200 Discount factor (lambda) 0.9
Learning interval threshold (L) 50 Greedy probability (epsilon) 0.9
Learning knowledge quantity threshold (D) 200 Weight update interval (K) 30
FIG. 4 is a change process of Reward under different parameter values.
FIG. 4(a) is a Reward value change process of greedy probability (epsilon) under different values. The greedy probability (epsilon) determines the probability that the scheduling decision will jump out of "development" and "exploration" during the "autonomous simulation" process. When the greedy probability (epsilon) takes a value of 0.95, the probability of 'exploration' is only 0.05, which is not beneficial to finding a new knowledge sample, and thus the learning efficiency is low. When (epsilon) takes a value of 0.8, the probability of 'exploration' reaches 0.2, and a large amount of 'exploration sample' knowledge is generated and stored in a memory base in the 'autonomous learning' process, wherein the 'exploration sample' knowledge reflects the diversity of the sample. However, the optimal knowledge of the 'exploration samples' is only one, and if a large amount of knowledge of the 'exploration samples' is reserved in the memory base for a long time, the 'inferior' knowledge of the 'exploration samples' affects the stability and the accuracy of the learning efficiency of the 'exploration samples' knowledge.
Fig. 4(b) is a change process of the Reward value with the increase of the learning times under different values of the discount factor (λ). The discount factor (λ) represents the degree of influence of the remaining-period power generation benefit on the decision value. In the decision value estimation of reservoir power generation dispatching, the larger the lambda value is, the stronger the influence of the power generation benefit of the remaining period on the decision value is. When the lambda value is 0.95, the decision value mainly consists of the remaining power generation benefit, and the Reward influence of the power generation decision is weak. The decision value cannot fully reflect the scheduling effect of the power generation decision, so that the learning efficiency is reduced. When the lambda value is too low, the influence of the Reward of the power generation decision on the decision value is increased, so that the scheduling decision can pay more attention to the scheduling benefit of 'before the eye', and the decision value of 'exploring sample' knowledge is changed violently. The learning of the knowledge sample leads to poor stability of the network model.
Fig. 4(c) is a change process of the Reward value with the increase of the learning times under different values of the learning rate (α). In the 2000 learning processes, when the learning rate alpha parameter is 0.03, the learning rate alpha parameter has a higher Reward value. When the value of the alpha value is 0.001, the formula (3) shows that the average decision value is less influenced by Reward, and the DRL learning effect is the worst. The reason is that the average value function overloads historical average value in the updating process, so that the change of the average value function value is small, and the learning of reservoir scheduling is not facilitated. When the alpha value is 0.3, the decision value in the updating process of the average value function occupies a larger proportion. Because the decision value is influenced by a time interval scheduling decision (Action), when the scheduling decision randomly selects the scheduling decision Action in an 'exploration' mode, the fluctuation change of the decision value of 'exploration sample' knowledge is large. Thereby affecting the stability of learning and resulting in a reduction in learning efficiency.
Fig. 4(d) is the effect of the update interval (K) of TN network weights in DQN on the Reward value. And when the updating interval K is 10, the weight parameter of the AN network is assigned to the TN network after 10 times of learning is shown. When the value K is smaller, the assignment of the network weight parameter is frequently executed, and finally the average cost function is unstable. When the value K is large, the assignment of the network weight parameter needs to wait for a long time, and the difference between the weight parameters of the AN and the TN network is large. When the average cost function is updated by equation (2), a distortion of the cost results.
Comparative example:
in order to compare the learning effect of the DRL, the comparative example adopts dynamic planning and a decision tree (C5.0) to establish a power generation dispatching model of the Ipomoea batatas Lam. And taking the ten-day power generation result of 1980-2010 of dynamic programming solution as an optimal scheduling result, and taking the power generation amount and the guarantee rate as a comparison reference.
In reservoir power generation scheduling research, decision trees are often used for mining of scheduling knowledge. The comparison example adopts a decision tree (C5.0) algorithm to mine an optimal power generation scheduling process, and finds and establishes a power generation scheduling rule set suitable for the Ipomoea batatas Lam Hance reservoir. Firstly, the optimal power generation process of the hull core reservoir is obtained by dynamic programming, solving and simulating under the condition of the runoff process of 400 years. Then, dispersing the state variable and the decision variable of the reservoir by adopting the standards of the tables 2 and 3 in the embodiment; and finally, mining by adopting a decision tree (C5.0) algorithm to obtain a power generation dispatching set of the hammer reservoir. According to the scheduling rule, the electricity generation scheduling process of the Huanhun reservoir in 1980-2010 is simulated.
Fig. 5 shows the deviation of the reservoir simulated water level process from the DP optimal process based on the DRL and decision tree rules. As can be seen from fig. 5, compared with the water level process of DP optimal scheduling, the fluctuation range of the reservoir water level control process based on the decision tree scheduling rule is larger than that of the DRL water level process. The DRL-based reservoir power generation scheduling decision is closer to the optimal decision.
Based on the DRL model which is learned 2000 times, the ten-day power generation scheduling process of the Ipomoea batange lake is simulated. The power generation and the assurance rates for the dynamic programming, decision trees and DRL models were counted as shown in table 5. And DP is used as the optimal solution, and the generated energy and the guarantee rate are highest. Power generation scheduling based on decision tree scheduling rules has poor simulation results. The difference between the generated energy and the guarantee rate of the DRL and the result of the DP model is small, and the DRL has good decision-making capability.
TABLE 5 control parameters for DRL deep learning System
Figure BDA0003718351980000111
On the basis of a DQN (data Quadrature reference network) and a DRL (data logging language) model, the reservoir power generation is used as reward feedback, and an artificial intelligence-based reservoir scheduling unsupervised deep learning model is established. Taking the hammer reservoir as an example, the DRL model is learned by simulating the runoff process of 400 years, and the decision-making capability of the DRL model is checked and evaluated according to the actual measurement runoff process of 1980-2010. The study conclusion is as follows:
(1) the DRL model is applied to the study of reservoir power generation dispatching, and the power generation benefit of the reservoir remaining period needs to be evaluated in an average value mode instead of a maximum value mode during the evaluation of the value function. If the uncertainty of the runoff is considered or the runoff forecast information is considered, the Markov state transfer or Bayesian theory can be adopted to realize the evaluation of the power generation benefit of the reserve period of the reservoir.
(2) Under the influence of model parameters, the learning efficiency of the DRL shows a large difference. By comparing the learning rate, the discount factor, the greedy coefficient and the weight updating interval, the invention can preliminarily determine the value ranges of different parameters in reservoir power generation dispatching and the influence degree and the influence mode on the learning efficiency.
(3) Compared with the optimal power generation scheduling result solved by dynamic planning, the DRL-based power generation scheduling result is obviously superior to the traditional decision tree-based reservoir power generation scheduling result. The strong learning ability and decision making ability of the DRL are fully displayed, and the method has strong adaptability in reservoir scheduling decision making.
Finally, it should be noted that the above is only for illustrating the technical solution of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred arrangement, it should be understood by those skilled in the art that the technical solution of the present invention (such as the application of various formulas, the sequence of steps, etc.) can be modified or equivalently replaced without departing from the spirit and scope of the technical solution of the present invention.

Claims (3)

1. A single reservoir intelligent flood control scheduling method based on a DQN deep reinforcement learning algorithm is characterized by comprising the following steps:
step 1: constructing an artificial intelligence-based reservoir scheduling unsupervised deep learning model:
respectively establishing a brain, a memory library and an 'autonomous learning' algorithm module of the Agent by taking a DRL technical architecture as a reference;
the brain of the Agent is constructed by adopting a Deep Q-Network (DQN) algorithm, and is provided with a double-layer neural Network, namely AN Action Network (AN) and a Target Network (TN);
the memory base stores the scheduling knowledge generated in the scheduling process, and the scheduling decision of each time interval can form a knowledge;
the autonomous learning module continuously increases a value function based on a Bellman equation, so that the Agent decision-making capability is continuously improved; with the increase of the learning times, the learning cost function of this time is embodied by the average value of the cost functions calculated by adjacent k times of learning, and the formula is as follows:
Figure FDA0003718351970000011
in the formula u k A decision cost function under the condition of a given scheduling state is learned for the kth time; u. of i A decision cost function under the condition of a given scheduling state is learned for the ith time; u shape k An average value function obtained after the current learning is performed for the kth learning; u shape k-1 An average cost function obtained after the current learning is performed for the k-1 st learning;
after the autonomous learning, the updating of the cost function is realized by adopting the following formula:
U k (S t ,A t )=(1-α)U k-1 (S t ,A t )+α·u(S t ,A t )
in the formula, S t Is a conditional attribute at the beginning of the t period, A t A decision attribute at the beginning of the t period; alpha is the learning rate; u is a decision value function under the condition of a given scheduling state;
in the decision value estimation of reservoir power generation dispatching, according to the state S t+1 Calculating the U value of each decision in the decision set, wherein the U value in the formula is used for evaluating the decision benefit in an average value mode, and the formula is as follows:
Figure FDA0003718351970000012
in the formula, R t A decision benefit value obtained for a time period t; s t+1 Is the condition attribute at the end of the t period,
Figure FDA0003718351970000013
the decision attribute at the end of the t period; λ is a discounting factor;
after each learning, updating error feedback of Neural Network weight parameters by using a gradient descent method according to the change of the value function, wherein the formula is as follows:
E k =U k -U k-1
in the formula, E k The difference value of the value function of the k-1 and k-th learning is obtained;
step 2: and (3) establishing reward feedback of the DRL on the basis of reservoir power generation scheduling:
evaluating the benefit of the decision according to the state of the current time period and the obtained decision, and feeding back the benefit in a reward mode; wherein, the generated energy and whether the guaranteed output is reached are taken as indexes of benefit evaluation;
and step 3: establishing a scheduling artificial intelligence expert aiming at a certain reservoir based on the actual measurement and warehousing runoff process of the reservoir:
taking actually measured reservoir warehousing runoff information and a corresponding scheduling time period as input states, carrying out autonomous learning through the 'autonomous learning' algorithm module, and deciding reservoir operation in a future time period through the brain of Agent, namely generating output; on the basis, a reservoir power generation dispatching simulation mode is adopted, and the reward of the operation, namely the power generation benefit, is estimated and returned; then, storing the state, operation and benefit of the reservoir into the memory bank in a knowledge mode, starting to learn the knowledge in memory when the memory bank has enough knowledge and meeting the learning condition, then continuously carrying out actual scheduling operation to obtain new knowledge and update the memory bank, and finally enabling the Agent to be gradually mature and become an artificial intelligent expert for reservoir scheduling by circulating the learning-actual scheduling process;
and using the established reservoir dispatching artificial intelligence expert for the reservoir power generation dispatching decision to determine the optimal control process of the reservoir power generation.
2. The intelligent single-reservoir flood control scheduling method based on DQN deep reinforcement learning algorithm as claimed in claim 1, wherein in the memory bank, the scheduling decision of time period t can be used to determine the condition attribute (S) at the beginning of the time period t ) Decision attribute (Action), Reward (Reward), and condition attribute at the end of t period (S) t+1 ) Jointly form a piece of knowledge and store the knowledge in a memory base, wherein the formula is as follows:
<S t =(T t ,L t ,Q t ),Reward=R t ,Action=A t ,S t+1 =(T t+1 ,L t+1 ,Q t+1 )>
in the formula, S t 、S t+1 The condition attributes are respectively at the end of the t period and the t period; r t Is the electric energy production benefit punished at the t period; a. the t Is the decision attribute for the t period; t is t 、T t+1 Numbering the scheduling periods in a year; l is t 、L t+1 Controlling the water level of the reservoir at the beginning of the time period t and the time period t +1 respectively; q t 、Q t+1 The total water volume of the reservoir at the end of the time period t and the time period t is respectively.
3. The intelligent single-reservoir flood control scheduling method based on DQN deep reinforcement learning algorithm of claim 1, wherein the formula of the prize feedback in step 2 is:
R(K t ,Q t ,N t )=[b(K t ,Q t ,N t )-a·{Max(e-b(K t ,Q t ,N t ),0)} b ]·Δt
V t+1 =V t +Q t -Q p,t -Q s,t
in the formula: r is the electric energy generating capacity benefit after the penalty of t time period, namely Reward; k t Is the initial water storage level of the t time period; q t Is the total water volume of the reservoir in the period t; n is a radical of t Generating output of the reservoir at the time t; q p,t Representing the total generated flow in the t period; q s,t The water abandon amount in the period t; b (-) is the hydropower station generated energy in the period t; a and b are penalty coefficients; e, ensuring the output of the system; v t The reservoir capacity at time t; Δ t represents a scheduling period length; v t+1 The capacity of the water storage at the end of the t period;
the constraints are as follows:
K min ≤K t ≤K max
0≤N t ≤N M
0≤Q t ≤Q M
wherein, K min And K max Respectively representing the minimum and maximum reservoir water levels of the reservoir in the time period t; n is a radical of t Representing a time period t decision effort; n is a radical of hydrogen M The installed capacity of the reservoir is the maximum power generation output of the reservoir; q M The maximum flow capacity of the water turbine is achieved.
CN202210741864.3A 2022-06-28 2022-06-28 Intelligent single reservoir flood control scheduling method based on DQN deep reinforcement learning algorithm Active CN115049292B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210741864.3A CN115049292B (en) 2022-06-28 2022-06-28 Intelligent single reservoir flood control scheduling method based on DQN deep reinforcement learning algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210741864.3A CN115049292B (en) 2022-06-28 2022-06-28 Intelligent single reservoir flood control scheduling method based on DQN deep reinforcement learning algorithm

Publications (2)

Publication Number Publication Date
CN115049292A true CN115049292A (en) 2022-09-13
CN115049292B CN115049292B (en) 2023-03-24

Family

ID=83163984

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210741864.3A Active CN115049292B (en) 2022-06-28 2022-06-28 Intelligent single reservoir flood control scheduling method based on DQN deep reinforcement learning algorithm

Country Status (1)

Country Link
CN (1) CN115049292B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115952958A (en) * 2023-03-14 2023-04-11 珠江水利委员会珠江水利科学研究院 Reservoir group joint optimization scheduling method based on MADDPG reinforcement learning
CN117132089A (en) * 2023-10-27 2023-11-28 邯郸欣和电力建设有限公司 Power utilization strategy optimization scheduling method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109636226A (en) * 2018-12-21 2019-04-16 华中科技大学 A kind of reservoir multi-objective Hierarchical Flood Control Dispatch method
CN110930016A (en) * 2019-11-19 2020-03-27 三峡大学 Cascade reservoir random optimization scheduling method based on deep Q learning
US20200119556A1 (en) * 2018-10-11 2020-04-16 Di Shi Autonomous Voltage Control for Power System Using Deep Reinforcement Learning Considering N-1 Contingency
CN112186743A (en) * 2020-09-16 2021-01-05 北京交通大学 Dynamic power system economic dispatching method based on deep reinforcement learning
WO2022083029A1 (en) * 2020-10-19 2022-04-28 深圳大学 Decision-making method based on deep reinforcement learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200119556A1 (en) * 2018-10-11 2020-04-16 Di Shi Autonomous Voltage Control for Power System Using Deep Reinforcement Learning Considering N-1 Contingency
CN109636226A (en) * 2018-12-21 2019-04-16 华中科技大学 A kind of reservoir multi-objective Hierarchical Flood Control Dispatch method
CN110930016A (en) * 2019-11-19 2020-03-27 三峡大学 Cascade reservoir random optimization scheduling method based on deep Q learning
CN112186743A (en) * 2020-09-16 2021-01-05 北京交通大学 Dynamic power system economic dispatching method based on deep reinforcement learning
WO2022083029A1 (en) * 2020-10-19 2022-04-28 深圳大学 Decision-making method based on deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
练继建等: "基于Agent的水资源管理模型研究进展", 《水科学进展》 *
董香栾: "基于DQN算法的综合能源系统优化调度策略研究", 《中国优秀硕士学位论文全文数据库(电子期刊)工程科技Ⅱ辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115952958A (en) * 2023-03-14 2023-04-11 珠江水利委员会珠江水利科学研究院 Reservoir group joint optimization scheduling method based on MADDPG reinforcement learning
CN117132089A (en) * 2023-10-27 2023-11-28 邯郸欣和电力建设有限公司 Power utilization strategy optimization scheduling method and device
CN117132089B (en) * 2023-10-27 2024-03-08 邯郸欣和电力建设有限公司 Power utilization strategy optimization scheduling method and device

Also Published As

Publication number Publication date
CN115049292B (en) 2023-03-24

Similar Documents

Publication Publication Date Title
CN115049292B (en) Intelligent single reservoir flood control scheduling method based on DQN deep reinforcement learning algorithm
CN110110930B (en) Recurrent neural network short-term power load prediction method for improving whale algorithm
CN114217524B (en) Power grid real-time self-adaptive decision-making method based on deep reinforcement learning
CN112508275A (en) Power distribution network line load prediction method and equipment based on clustering and trend indexes
CN104636823B (en) A kind of wind power forecasting method
CN111736084B (en) Valve-regulated lead-acid storage battery health state prediction method based on improved LSTM neural network
CN101599138A (en) Land evaluation method based on artificial neural network
CN110472840B (en) Agricultural water conservancy scheduling method and system based on neural network technology
CN110837915B (en) Low-voltage load point prediction and probability prediction method for power system based on hybrid integrated deep learning
CN110598983A (en) Cascade reservoir optimal scheduling method of self-adaptive improved particle swarm algorithm
CN109919356A (en) One kind being based on BP neural network section water demand prediction method
CN110147890A (en) A kind of method and system based on lion group&#39;s algorithm optimization extreme learning machine integrated study
CN109242265A (en) Based on the smallest Urban Water Demand combination forecasting method of error sum of squares
CN111008790A (en) Hydropower station group power generation electric scheduling rule extraction method
CN114362175A (en) Wind power prediction method and system based on depth certainty strategy gradient algorithm
CN112036651A (en) Electricity price prediction method based on quantum immune optimization BP neural network algorithm
CN109002878A (en) A kind of GA Optimized BP Neural Network
CN115622056B (en) Energy storage optimal configuration method and system based on linear weighting and selection method
CN104268802A (en) Method for determining electricity demand factors of urban residential communities through neural network model
CN115587713A (en) Marine ranch disaster decision method based on reinforcement learning
CN114204546B (en) Unit combination optimization method considering new energy consumption
CN109615142A (en) A kind of wind farm wind velocity combination forecasting method based on wavelet analysis
CN114357865A (en) Hydropower station runoff and associated source load power year scene simulation and prediction method thereof
Liu Machine learning for wind power prediction
CN113515889B (en) Dynamic wind speed prediction model building method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant