WO2022206265A1 - 一种基于深度强化学习的水文预报模型参数率定方法 - Google Patents

一种基于深度强化学习的水文预报模型参数率定方法 Download PDF

Info

Publication number
WO2022206265A1
WO2022206265A1 PCT/CN2022/078763 CN2022078763W WO2022206265A1 WO 2022206265 A1 WO2022206265 A1 WO 2022206265A1 CN 2022078763 W CN2022078763 W CN 2022078763W WO 2022206265 A1 WO2022206265 A1 WO 2022206265A1
Authority
WO
WIPO (PCT)
Prior art keywords
reinforcement learning
calibration
parameter
hydrological
time
Prior art date
Application number
PCT/CN2022/078763
Other languages
English (en)
French (fr)
Inventor
胡鹤轩
胡强
张晔
胡震云
Original Assignee
河海大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 河海大学 filed Critical 河海大学
Priority to US17/906,995 priority Critical patent/US20230281459A1/en
Publication of WO2022206265A1 publication Critical patent/WO2022206265A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/06Multi-objective optimisation, e.g. Pareto optimisation using simulated annealing [SA], ant colony algorithms or genetic algorithms [GA]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/08Fluids
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A10/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
    • Y02A10/40Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping

Definitions

  • the invention belongs to the technical field of hydrological forecasting model parameter calibration, and in particular relates to a hydrological forecasting model parameter calibration method based on deep reinforcement learning.
  • Hydrological forecasting models are widely used in rainfall simulation forecasting, flood forecasting and early warning, hydrological process analysis and other fields, and play an important role in improving the efficiency of hydrological research. Because the structure of hydrological forecasting models is generally complex, after the model is established, determining the model parameters has become the core problem to be solved urgently. Parameter calibration is to find a set of optimal parameter solutions in the hydrological forecast model, so that the simulated and predicted results are as close as possible to the measured data. For hydrological forecasting models, parameter calibration directly affects the accuracy of forecasting. Therefore, it is of great scientific significance and application value to study how to improve the speed and accuracy of parameter calibration.
  • the parameter calibration methods of the early hydrological forecasting models mostly used the traditional manual trial and error method, gradient descent method, etc. Although these methods are intuitive, they require high experience of the staff and are easily affected by personal subjectivity. The efficiency and accuracy are relatively low.
  • modern intelligent algorithms such as genetic algorithm and particle swarm algorithm are widely used in the field of automatic calibration of hydrological forecast model parameters, making up for the shortcomings of traditional methods.
  • modern intelligent algorithms can search a wide range of solutions, However, there are problems of premature and easy to fall into the local optimal solution, which affects the selection of the global optimal solution.
  • the purpose of the present invention is to overcome the defects of the prior art and provide a method for parameter calibration of a hydrological forecast model based on deep reinforcement learning.
  • the present invention can freely control the final optimization accuracy of the calibration parameters by setting the stride of the action value of the deep reinforcement learning model, and use the DQN algorithm to search in the entire space of the calibration parameters, so as to ensure the optimality of the calibration parameter optimization and avoid the Modern intelligent algorithms are premature and easy to fall into the problem of local optimal solutions.
  • the present invention adopts the following technical solutions.
  • a method for calibration of parameters of a hydrological forecast model based on deep reinforcement learning of the present invention includes the following steps:
  • Step 1 Select a hydrological forecasting model and determine the required calibration parameters; the hydrological forecasting model takes rainfall and evaporation time series as input, and takes the time series of forecast flow as output;
  • Step 2 establishing a reinforcement learning model for parameter calibration of the hydrological forecasting model;
  • the reinforcement learning refers to the process of interactive learning between the agent and the environment, and the three key elements of the reinforcement learning are state space, action space and reward value function;
  • Step 3 Apply the deep reinforcement learning method DQN to optimize the parameters calibrated by the hydrological forecasting model.
  • the process of selecting a hydrological forecast model and determining several calibration parameters includes:
  • a reinforcement learning model for parameter calibration of the hydrological forecast model is established, and the process includes:
  • parameter at time t There are two possibilities for value change: increase or decrease; set parameter The increase or decrease range is ⁇ i , then the parameter at time t+1 value may be or
  • 2 N is the number of actions in the reinforcement learning action space; each row of the matrix A is the selected action, that is, the possible value of the action value a t at time t;
  • C 1 is a constant, greater than 0;
  • C 2 is a constant, less than 0;
  • C 3 is a constant, greater than 0.
  • the deep reinforcement learning method DQN is applied to optimize the parameters calibrated by the hydrological forecast model, and the process includes:
  • the total reward value is in a state of slight jitter, and the optimal parameter value is
  • the present invention has the following advantages and beneficial effects:
  • the present invention can freely control the accuracy of calibration parameters to be optimized for calibration parameters of different properties by setting the stride of the action value of the deep reinforcement learning model, to ensure the accuracy and rationality of calibration parameter optimization, and to avoid excessive calibration parameters. More computing resources are devoted to non-critical parameters.
  • the present invention uses the DQN algorithm to search in the entire space of calibration parameters, and ensures the optimality of calibration parameter optimization through the self-decision-making and correction ability of reinforcement learning, which avoids the premature maturity of modern intelligent algorithms and is easy to fall into local optimal solutions.
  • the problem is the DQN algorithm to search in the entire space of calibration parameters, and ensures the optimality of calibration parameter optimization through the self-decision-making and correction ability of reinforcement learning, which avoids the premature maturity of modern intelligent algorithms and is easy to fall into local optimal solutions. The problem.
  • FIG. 1 is a flowchart of a method according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a hydrological prediction model according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of reinforcement learning according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of an action value network and a target action value network according to an embodiment of the present invention.
  • FIG. 5 is a flowchart of a DQN optimization algorithm according to an embodiment of the present invention.
  • the invention discloses a method for parameter calibration of hydrological forecasting model based on deep reinforcement learning. , action space and reward function; apply the deep reinforcement learning method DQN to optimize the calibration parameters of the hydrological forecasting model.
  • the present invention can freely control the final optimization accuracy of the calibration parameters by setting the stride of the action value of the deep reinforcement learning model, and the DQN algorithm searches the entire space of the calibration parameters to ensure the optimality of the calibration parameter optimization and avoid modernization. Intelligent algorithms are premature and easy to fall into the problem of local optimal solutions.
  • FIG. 1 is a flowchart of a method according to an embodiment of the present invention. As shown in Figure 1, the method of this embodiment includes the following steps:
  • Step 1 Select a hydrological forecast model and determine the required calibration parameters
  • Each parameter has a range of values:
  • Step 2 establishing a reinforcement learning model for parameter calibration of the hydrological forecast model
  • reinforcement learning is the process of interactive learning between the agent and the environment.
  • the agent can take corresponding actions according to the current state of the environment, so that the current state of the environment changes.
  • the three key elements of reinforcement learning are state space, action space and reward value function.
  • parameter at time t There are two possibilities for value change: increase or decrease; set parameter The increase or decrease range is ⁇ i , then the parameter at time t+1 value may be or
  • N is the number of actions in the reinforcement learning action space; each row of matrix A is the selected action, that is, the possible value of the action value at time t .
  • N 2
  • C 1 is a constant, greater than 0;
  • C 2 is a constant, less than 0;
  • C 3 is a constant, greater than 0.
  • Step 3 Apply the deep reinforcement learning method DQN to optimize the parameters of the hydrological forecast model calibration
  • Figure 4 is a schematic diagram of the action value network and the target action value network.
  • the network takes the state as the input neuron, and the number of inputs is the number of parameters required by the hydrological forecast model; the action value is used as the output, and the number of output values is the number of actions in the action space.
  • Action-value network is a value function used to evaluate the current state-action pair. The reason for using neural network design is that there are many states.
  • the target action value network is used to slowly update the Q value.
  • the algorithm updates the parameters in the network according to the update formula of the loss function. After each iteration of C rounds, the parameters in the action value network are copied to the parameters in the target action value network.
  • the Q(s j , a j ; ⁇ ) of the target action value network remains unchanged for a period of time, which reduces the possibility of the loss value oscillating and divergent during training, thereby improving the stability of the algorithm.
  • FIG. 5 shows the flow chart of the DQN optimization algorithm, in which MainNet is the action value network, and targetNet is the target action value network.
  • MainNet is the action value network
  • targetNet is the target action value network.
  • the action-value network Q is initialized with random weights ⁇ , and the input and output of this network are illustrated in Figure 4;
  • the action-value network performs a gradient descent step (y j -Q(s j , a j ; ⁇ )) 2 to update the network parameters ⁇ ;

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Feedback Control In General (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种基于深度强化学习的水文预报模型参数率定方法,包括:根据流域特性选取合适的水文预报模型,确定模型率定的参数及参数取值范围;建立水文预报模型参数率定的强化学习模型,确定强化学习三要素即状态空间、动作空间及奖励函数;应用深度强化学习方法DQN,优化水文预报模型的率定参数。本发明可通过设置深度强化学习模型动作值的步幅,自由控制率定参数最终优化的精确度,并采用DQN算法在率定参数的整个空间进行搜索,以确保率定参数优化的最优性,从而避免现代智能算法早熟、易陷入局部最优解的问题。

Description

一种基于深度强化学习的水文预报模型参数率定方法 技术领域
本发明属于水文预报模型参数率定技术领域,具体涉及一种基于深度强化学习的水文预报模型参数率定方法。
背景技术
水文预报模型广泛应用于降雨模拟预报、洪水预报预警、水文过程分析等领域,对提高水文领域研究效率有着重要的作用。由于水文预报模型结构一般比较复杂,在建立了模型之后,确定模型参数就成为了亟待解决的核心问题。参数率定,是在水文预报模型中找到一组最优参数解,使模拟预报后的结果与实测数据尽可能地接近。对于水文预报模型来说,参数率定直接影响着预报的准确性,因此,研究如何提高参数率定的速度和准确度,具有重要的科研意义和应用价值。
早期的水文预报模型的参数率定方法,多采用传统的人工试错法、梯度下降法等,这些方法虽然直观,但对工作人员经验要求较高,易受个人主观性影响,且参数率定的效率和精度都比较低。随着计算机技术的发展,遗传算法、粒子群算法等现代智能算法被广泛应用于水文预报模型参数自动率定领域,弥补了传统方法的不足,然而,虽然现代智能算法能够广泛搜索解的范围,但存在着早熟和易于陷入局部最优解的问题,从而影响全局最优解的选取。
发明内容
本发明的目的在于克服现有技术的缺陷,提供一种基于深度强化学习的水文预报模型参数率定方法。本发明可通过设置深度强化学习模型动作值的步幅,自由控制率定参数最终优化的精确度,采用DQN算法在率定参数的整个空间进行搜索,确保率定参数优化的最优性,避免现代智能算法早熟,容易陷入局部最优解的问题。
为解决上述技术问题,本发明采用以下技术方案。
本发明的一种基于深度强化学习的水文预报模型参数率定方法,包括以下步骤:
步骤1、选定水文预报模型及确定所需率定参数;所述的水文预报模型以降雨及蒸发时间序列作为输入,以预报流量的时间序列作为输出;
步骤2、建立水文预报模型参数率定的强化学习模型;所述的强化学习是指智能体Agent与环境交互学习的过程,其强化学习的关键三要素为状态空间、动作空间和奖励值函数;
步骤3、应用深度强化学习方法DQN,优化水文预报模型率定的参数。
进一步的,在所述步骤1中,选定水文预报模型及确定若干率定参数的过程包括:
根据流域特性选取不同的水文预报模型,确定该水文预报模型所需要率定的参数w i,i=1,2,...,N,N为水文预报模型所需率定参数的个数;
每个参数的取值范围:
Figure PCTCN2022078763-appb-000001
其中,
Figure PCTCN2022078763-appb-000002
分别为第i个率定参数的最小值和最大值。
进一步的,所述步骤2中建立水文预报模型参数率定的强化学习模型,其过程包括:
2-1)确定强化学习的状态空间:
定义t时刻强化学习状态值为水文预报模型的数个率定参数组成的一维向量s t
Figure PCTCN2022078763-appb-000003
其中
Figure PCTCN2022078763-appb-000004
为当前t时刻水文预报模型率定参数值;
t时刻参数
Figure PCTCN2022078763-appb-000005
值变化具有两种可能:增大或减小;设参数
Figure PCTCN2022078763-appb-000006
增大或减小幅度均为Δ i,则t+1时刻参数
Figure PCTCN2022078763-appb-000007
值可能为
Figure PCTCN2022078763-appb-000008
Figure PCTCN2022078763-appb-000009
2-2)确定强化学习的动作空间:
定义强化学习动作空间A为各个率定参数变化的所有可能情况:
Figure PCTCN2022078763-appb-000010
其中,2 N为强化学习动作空间中动作的个数;矩阵A的每一行为选定的动作,即t时刻动作值a t的可能值;
2-3)确定强化学习的奖励值函数:
Figure PCTCN2022078763-appb-000011
为M个时段的实测的流量值,其中
Figure PCTCN2022078763-appb-000012
为第i个时段的实测流量值;
根据t时刻状态
Figure PCTCN2022078763-appb-000013
t+1时刻
Figure PCTCN2022078763-appb-000014
通过水文预报模型获取预测流量序列分别为:
Figure PCTCN2022078763-appb-000015
Figure PCTCN2022078763-appb-000016
其中
Figure PCTCN2022078763-appb-000017
分别为第i个时段的t时刻、t+1时刻预报流量值;
定义t时刻的均方根误差为RMS t,t+1时刻的均方根误差为RMS t+1
Figure PCTCN2022078763-appb-000018
Figure PCTCN2022078763-appb-000019
定义在t时刻状态s t下执行动作a t到达t+1时刻状态s t+1获得的奖励值r t(s t,a t,s t+1):
Figure PCTCN2022078763-appb-000020
Figure PCTCN2022078763-appb-000021
其中,C 1为常数,大于0;C 2为常数,小于0;C 3为常数,大于0。
进一步的,所述步骤3中应用深度强化学习方法DQN,优化水文预报模型率定的参数,其过程包括:
根据所确定的强化学习的关键要素,执行下述强化学习DQN算法:
输入:初始化经验池D容量为N;
使用随机权重θ初始化动作值网络Q;
使用权重θ′=θ初始化目标动作值网络
Figure PCTCN2022078763-appb-000022
过程:
For 1,M do
初始化状态s 1
For t=1,T do
根据概率ε选择一个随机动作a t,1-ε概率选择a t=argmax aQ(s t,a;θ);
执行动作a t,并获取奖励r t,得到下一状态s t+1
将(s t,a t,r t,s t+1)保存在经验池D中;
当经验池D达到一定数量时从中随机取出若干组(s t,a t,r t,s t+1)为mini-batch;
设置:
Figure PCTCN2022078763-appb-000023
执行一个梯度下降步骤(y j-Q(s j,a j;θ)) 2更新动作值网络参数θ;
每C步重置目标动作值网络
Figure PCTCN2022078763-appb-000024
End For
End For
输出:最优状态:
Figure PCTCN2022078763-appb-000025
进一步的,当DQN算法优化曲线达到收敛状态,则此时总奖励值处于轻微抖动状态,最优率定的参数值为
Figure PCTCN2022078763-appb-000026
与现有技术相比,本发明具有以下优点和有益效果:
1.本发明可通过设置深度强化学习模型动作值的步幅,针对不同性质的率定参数自由控制率定参数所要优化的精确程度,保证率定参数优化的精确性和合理性,避免将过多计算资源投入到非重要参数上。
2.本发明采用DQN算法在率定参数的整个空间进行搜索,通过强化学习的自我决策与纠正能力,确保率定参数优化的最优性,避免了现代智能算法早熟,容易陷入局部最优解的问题。
附图说明
图1为本发明的一种实施例的方法流程图。
图2为本发明的一种实施例的水文预报模型示意图。
图3为本发明的一种实施例的强化学习示意图。
图4为本发明的一种实施例的动作价值网络及目标动作价值网络示意图。
图5为本发明的一种实施例的DQN优化算法流程图。
具体实施方式
本发明公开了一种基于深度强化学习的水文预报模型参数率定方法,包括:根据流域特性选取合适的水文预报模型,确定模型率定的参数及参数取值范围;构建强化学习三要素状态空间、动作空间及奖励函数;应用深度强化学习方法DQN优化水文预报模型的率定参数。本发明可通过设置深度强化学习模型动作值的步幅,自由控制率定参数最终优化的精确度,DQN算法在率定参数的整个空间进行搜索,确保率定参数优化的最优性,避免现代智能算法早熟,容易陷入局部最优解的问题。
下面结合附图对本发明做进一步详细说明。
图1为本发明的一种实施例的方法流程图。如图1所示,本实施例方法,包括以下步骤:
步骤1、选定水文预报模型及确定所需的率定参数;
根据流域特性选取不同的水文预报模型如图2所示,该水文预报模型以降雨及蒸发时间序列作为水文预报模型的输入,水文预报模型的输出为预报流量的时间序列。确定该水文预报模型的需要率定的参数w i,i=1,2,...,N,N为水文预报模型所需率定参数的个数。
每个参数具有取值范围:
Figure PCTCN2022078763-appb-000027
其中,
Figure PCTCN2022078763-appb-000028
分别为第i个率定参数的最小值和最大值。
步骤2、建立水文预报模型参数率定的强化学习模型;
如图3所示,强化学习是智能体Agent与环境交互学习的过程,该智能体能够根据环境当前的状态采取相应的动作,从而使得当前环境的状态发生改变。强化学习的关键三要素为状态空间、动作空间和奖励值函数。
(1)确定强化学习的状态空间:
定义t时刻强化学习状态值为水文预报模型的数个率定参数组成的一维向量s t
Figure PCTCN2022078763-appb-000029
其中
Figure PCTCN2022078763-appb-000030
为当前t时刻水文预报模型率定参数值。
t时刻参数
Figure PCTCN2022078763-appb-000031
值变化具有两种可能:增大或减小;设参数
Figure PCTCN2022078763-appb-000032
增大或减小幅度均为 Δ i,则t+1时刻参数
Figure PCTCN2022078763-appb-000033
值可能为
Figure PCTCN2022078763-appb-000034
Figure PCTCN2022078763-appb-000035
(2)确定强化学习的动作空间:
定义强化学习动作空间A为各个率定参数变化的所有可能情况:
Figure PCTCN2022078763-appb-000036
其中,2 N为强化学习动作空间中动作的个数;矩阵A的每一行为选定的动作,即t时刻动作值a t的可能值。举例说明,当N=2,Δ 1=Δ 2=0.1时,
Figure PCTCN2022078763-appb-000037
此时a t的可能值为:[0.1,0.1]、[-0.1,0.1]、[0.1,-0.1]、[-0.1,-0.1]。
(3)确定强化学习的奖励值函数:
Figure PCTCN2022078763-appb-000038
为M个时段的实测的流量值,其中
Figure PCTCN2022078763-appb-000039
为第i个时段的实测流量值。
根据t时刻状态
Figure PCTCN2022078763-appb-000040
t+1时刻
Figure PCTCN2022078763-appb-000041
通过水文预报模型获取预测流量序列分别为:
Figure PCTCN2022078763-appb-000042
Figure PCTCN2022078763-appb-000043
其中
Figure PCTCN2022078763-appb-000044
分别为第i个时段的t时刻、t+1时刻预报流量值。
定义t时刻的均方根误差为RMS t,t+1时刻的均方根误差为RMS t+1
Figure PCTCN2022078763-appb-000045
Figure PCTCN2022078763-appb-000046
定义在t时刻状态s t下执行动作a t到达t+1时刻状态s t+1获得的奖励值r t(s t,a t,s t+1):
Figure PCTCN2022078763-appb-000047
Figure PCTCN2022078763-appb-000048
其中,C 1为常数,大于0;C 2为常数,小于0;C 3为常数,大于0。
步骤3、应用深度强化学习方法DQN,优化水文预报模型率定的参数;
如图4为动作价值网络及目标动作价值网络示意图,该网络以状态作为输入神经元,输入的个数为水文预报模型所需率定的参数个数;动作值作为输出,输出值的个数为动作空间中的动作的个数。动作价值网络是用来评估当前状态动作对的价值函数,采用神经网络设计的原因在于状态量较多。目标动作价值网络用于慢更新Q值,算法根据损失函数的更新公式来更新网络中的参数,每经过C轮迭代后,将动作价值网络中的参数复制给目标动作价值网络中的参数。在一段时间内目标动作价值网络的Q(s j,a j;θ)保持不变,使得训练时损失值震荡发散的可能性降低,从而提高了算法的稳定性。
如图5所示为DQN优化算法流程图,其中MainNet为动作价值网络,targetNet为目标动作价值网络。根据步骤2中所确定的强化学习关键三要素,执行强化学习DQN算法,其流程如下:
输入:初始化经验池D容量为N;
使用随机权重θ初始化动作值网络Q,该网络的输入输出由图4说明;
使用权重θ′=θ初始化目标动作值网络
Figure PCTCN2022078763-appb-000049
该网络的输入输出由图4说明;
过程:
For 1,M do
随机初始化状态
Figure PCTCN2022078763-appb-000050
For t=1,T do
根据概率ε(ε为较小值)选择随机动作a t,1-ε概率选择a t=argmax aQ(s t,a;θ),由动作价值网络计算得出;
执行动作a t,得到下一状态
Figure PCTCN2022078763-appb-000051
并获取奖励r t,r t由步骤2的计算公式得出;
将(s t,a t,r t,s t+1)保存在经验池D中,此时需判断经验池的容量是否已满,当容量已满时可采用先进先出的策略更新经验池D;
当经验池D达到一定数量时从中随机取出数个组(s t,a t,r t,s t+1)作为神经网络学习样本;
在目标动作值网络中求得
Figure PCTCN2022078763-appb-000052
动作价值网络执行一个梯度下降步骤(y j-Q(s j,a j;θ)) 2更新该网络参数θ;
每C步重置目标动作值网络
Figure PCTCN2022078763-appb-000053
意味将动作价值网络的参数θ设置为目标动作价值网络的参数θ';
End For
End For
输出:最优状态:
Figure PCTCN2022078763-appb-000054
当DQN算法优化曲线达到收敛状态,此时总奖励值处于轻微抖动状态,最优率定的参数值为
Figure PCTCN2022078763-appb-000055

Claims (5)

  1. 一种基于深度强化学习的水文预报模型参数率定方法,其特征在于,包括以下步骤:
    步骤1、选定水文预报模型及确定所需率定参数;所述的水文预报模型以降雨及蒸发时间序列作为输入,以预报流量的时间序列作为输出;
    步骤2、建立水文预报模型参数率定的强化学习模型;所述的强化学习是指智能体Agent与环境交互学习的过程,其强化学习的关键三要素为状态空间、动作空间和奖励值函数;
    步骤3、应用深度强化学习方法DQN,优化水文预报模型率定的参数。
  2. 根据权利要求1所述的一种基于深度强化学习的水文预报模型参数率定方法,其特征在于,在所述步骤1中,选定水文预报模型及确定若干率定参数的过程包括:
    根据流域特性选取不同的水文预报模型,确定该水文预报模型所需要率定的参数w i,i=1,2,...,N,N为水文预报模型所需率定参数的个数;
    每个参数的取值范围:
    Figure PCTCN2022078763-appb-100001
    其中,
    Figure PCTCN2022078763-appb-100002
    分别为第i个率定参数的最小值和最大值。
  3. 根据权利要求1所述的一种基于深度强化学习的水文预报模型参数率定方法,其特征在于:所述步骤2中建立水文预报模型参数率定的强化学习模型,其过程包括:
    2-1)确定强化学习的状态空间:
    定义t时刻强化学习状态值为水文预报模型的数个率定参数组成的一维向量s t
    Figure PCTCN2022078763-appb-100003
    其中
    Figure PCTCN2022078763-appb-100004
    为当前t时刻水文预报模型率定参数值;
    t时刻参数
    Figure PCTCN2022078763-appb-100005
    值变化具有两种可能:增大或减小;设参数
    Figure PCTCN2022078763-appb-100006
    增大或减小幅度均为Δ i,则t+1时刻参数
    Figure PCTCN2022078763-appb-100007
    值可能为
    Figure PCTCN2022078763-appb-100008
    Figure PCTCN2022078763-appb-100009
    2-2)确定强化学习的动作空间:
    定义强化学习动作空间A为各个率定参数变化的所有可能情况:
    Figure PCTCN2022078763-appb-100010
    其中,2 N为强化学习动作空间中动作的个数;矩阵A的每一行为选定的动作,即t时刻动作值a t的可能值;
    2-3)确定强化学习的奖励值函数:
    Figure PCTCN2022078763-appb-100011
    为M个时段的实测的流量值,其中
    Figure PCTCN2022078763-appb-100012
    为第i个时段的实测流量值;
    根据t时刻状态
    Figure PCTCN2022078763-appb-100013
    t+1时刻
    Figure PCTCN2022078763-appb-100014
    通过水文预报模型获取预测流量序列分别为:
    Figure PCTCN2022078763-appb-100015
    Figure PCTCN2022078763-appb-100016
    其中
    Figure PCTCN2022078763-appb-100017
    分别为第i个时段的t时刻、t+1时刻预报流量值;
    定义t时刻的均方根误差为RMS t,t+1时刻的均方根误差为RMS t+1
    Figure PCTCN2022078763-appb-100018
    Figure PCTCN2022078763-appb-100019
    定义在t时刻状态s t下执行动作a t到达t+1时刻状态s t+1获得的奖励值r t(s t,a t,s t+1):
    Figure PCTCN2022078763-appb-100020
    Figure PCTCN2022078763-appb-100021
    其中,C 1为常数,大于0;C 2为常数,小于0;C 3为常数,大于0。
  4. 根据权利要求3所述的一种基于深度强化学习的水文预报模型参数率定方法,其特征在于,所述步骤3中应用深度强化学习方法DQN,优化水文预报模型率定的参数,其过程包括:
    根据所确定的强化学习的关键要素,执行下述强化学习DQN算法:
    输入:初始化经验池D容量为N;
    使用随机权重θ初始化动作值网络Q;
    使用权重θ′=θ初始化目标动作值网络
    Figure PCTCN2022078763-appb-100022
    过程:
    For 1,M do
    初始化状态s 1
    For t=1,T do
    根据概率ε选择一个随机动作a t,1-ε概率选择a t=argmax a Q(s t,a;θ);
    执行动作a t,并获取奖励r t,得到下一状态s t+1
    将(s t,a t,r t,s t+1)保存在经验池D中;
    当经验池D达到一定数量时从中随机取出若干组(s t,a t,r t,s t+1)为mini-batch;
    设置:
    Figure PCTCN2022078763-appb-100023
    执行一个梯度下降步骤(y j-Q(s j,a j;θ)) 2更新动作值网络参数θ;
    每C步重置目标动作值网络
    Figure PCTCN2022078763-appb-100024
    End For
    End For
    输出:最优状态:
    Figure PCTCN2022078763-appb-100025
  5. 根据权利要求4所述的一种基于深度强化学习的水文预报模型参数率定方法,其特征在于,当DQN算法优化曲线达到收敛状态,则此时总奖励值处于轻微抖动状态,最优率定的参数值为
    Figure PCTCN2022078763-appb-100026
PCT/CN2022/078763 2021-04-02 2022-03-02 一种基于深度强化学习的水文预报模型参数率定方法 WO2022206265A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/906,995 US20230281459A1 (en) 2021-04-02 2022-03-02 Method for calibrating parameters of hydrology forecasting model based on deep reinforcement learning

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110361225.XA CN113255206B (zh) 2021-04-02 2021-04-02 一种基于深度强化学习的水文预报模型参数率定方法
CN202110361225.X 2021-04-02

Publications (1)

Publication Number Publication Date
WO2022206265A1 true WO2022206265A1 (zh) 2022-10-06

Family

ID=77220265

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/078763 WO2022206265A1 (zh) 2021-04-02 2022-03-02 一种基于深度强化学习的水文预报模型参数率定方法

Country Status (3)

Country Link
US (1) US20230281459A1 (zh)
CN (1) CN113255206B (zh)
WO (1) WO2022206265A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116500994A (zh) * 2023-05-05 2023-07-28 成都理工大学 一种低碳分布式柔性作业车间的动态多目标调度方法

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255206B (zh) * 2021-04-02 2023-05-12 河海大学 一种基于深度强化学习的水文预报模型参数率定方法
CN113783782B (zh) * 2021-09-09 2023-05-30 哈尔滨工程大学 一种深度强化学习的机会路由候选集节点排序方法
CN114739229A (zh) * 2022-03-07 2022-07-12 天津大学 一种基于强化学习的换热过程重要参数控制方法
CN116933949B (zh) * 2023-09-18 2023-12-19 北京金水永利科技有限公司 一种融合水动力模型和数理模型的水质预测方法及系统
CN117150975B (zh) * 2023-10-31 2024-01-26 长江三峡集团实业发展(北京)有限公司 一种水动力模型参数优化、水动力过程模拟方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103366099A (zh) * 2013-08-02 2013-10-23 贵州东方世纪科技有限责任公司 一种水文模型参数调试方法
US20190340940A1 (en) * 2017-11-03 2019-11-07 Climacell Inc. Improved real-time weather forecasting for transportation systems
CN111768028A (zh) * 2020-06-05 2020-10-13 天津大学 一种基于深度强化学习的gwlf模型参数调节方法
CN111795681A (zh) * 2020-06-30 2020-10-20 杭州鲁尔物联科技有限公司 一种山洪灾害预警方法、装置、服务器及存储介质
CN113255206A (zh) * 2021-04-02 2021-08-13 河海大学 一种基于深度强化学习的水文预报模型参数率定方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682355B (zh) * 2017-01-12 2018-12-21 中国水利水电科学研究院 一种基于pso-ga混合算法的水文模型参数率定方法
CN110619432B (zh) * 2019-09-17 2022-08-30 长江水利委员会水文局 一种基于深度学习的特征提取水文预报的方法
CN110930016A (zh) * 2019-11-19 2020-03-27 三峡大学 一种基于深度q学习的梯级水库随机优化调度方法
CN111259522B (zh) * 2020-01-09 2023-07-18 河海大学 一种水文模型在地理空间上多流域并行率定的方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103366099A (zh) * 2013-08-02 2013-10-23 贵州东方世纪科技有限责任公司 一种水文模型参数调试方法
US20190340940A1 (en) * 2017-11-03 2019-11-07 Climacell Inc. Improved real-time weather forecasting for transportation systems
CN111768028A (zh) * 2020-06-05 2020-10-13 天津大学 一种基于深度强化学习的gwlf模型参数调节方法
CN111795681A (zh) * 2020-06-30 2020-10-20 杭州鲁尔物联科技有限公司 一种山洪灾害预警方法、装置、服务器及存储介质
CN113255206A (zh) * 2021-04-02 2021-08-13 河海大学 一种基于深度强化学习的水文预报模型参数率定方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116500994A (zh) * 2023-05-05 2023-07-28 成都理工大学 一种低碳分布式柔性作业车间的动态多目标调度方法
CN116500994B (zh) * 2023-05-05 2024-05-03 成都理工大学 一种低碳分布式柔性作业车间的动态多目标调度方法

Also Published As

Publication number Publication date
CN113255206A (zh) 2021-08-13
CN113255206B (zh) 2023-05-12
US20230281459A1 (en) 2023-09-07

Similar Documents

Publication Publication Date Title
WO2022206265A1 (zh) 一种基于深度强化学习的水文预报模型参数率定方法
CN109902801B (zh) 一种基于变分推理贝叶斯神经网络的洪水集合预报方法
CN104834215B (zh) 一种变异粒子群优化的bp神经网络pid控制算法
WO2021109644A1 (zh) 一种基于元学习的混合动力车辆工况预测方法
CN105512832A (zh) 基于时变权最小方差的城市需水量组合预测方法
CN112434787A (zh) 基于楼宇总能耗的末端空间能耗预测方法、介质及设备
CN109445484A (zh) 一种基于猫群优化和免疫模糊pid的孵化室温度控制方法
CN107273693A (zh) 一种碳氢燃料机理简化方法
CN112926795A (zh) 一种基于sbo优化cnn的高层住宅建筑群热负荷预测方法及系统
CN103942434A (zh) 基于sspso-grnn的水电站厂坝结构振动响应预测方法
CN108830376A (zh) 针对时间敏感的环境的多价值网络深度强化学习方法
CN110097929A (zh) 一种高炉铁水硅含量在线预测方法
CN108805346A (zh) 一种基于多隐层极限学习机的热连轧轧制力预报方法
CN115310727B (zh) 一种基于迁移学习的建筑冷热电负荷预测方法及系统
CN114648170A (zh) 基于混合深度学习模型的水库水位预测预警方法及系统
CN114219131A (zh) 一种基于lstm的流域径流预测方法
CN114861881A (zh) 一种应用机器学习优化超冷原子蒸发冷却参数的方法
CN112801416A (zh) 基于多维水文信息的lstm流域径流量预测方法
CN113705922A (zh) 一种改进的超短期风电功率预测算法及模型建立方法
WO2022032874A1 (zh) 一种基于对抗神经网络的有资料地区水文参数率定方法
CN111950698A (zh) 基于卷积-门控循环神经网络的水泥回转窑电耗预测方法
CN116702292A (zh) 基于深度强化学习的扁平钢箱梁风嘴气动优化方法
CN116741315A (zh) 一种针对地聚物混凝土强度进行预测方法
CN114995106A (zh) 基于改进小波神经网络的pid自整定方法、装置和设备
WO2022032873A1 (zh) 一种基于对抗神经网络的无资料地区水文参数率定方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22778449

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22778449

Country of ref document: EP

Kind code of ref document: A1