CN110969282A

CN110969282A - Runoff stability prediction method based on LSTM composite network

Info

Publication number: CN110969282A
Application number: CN201910987183.3A
Authority: CN
Inventors: 李幼萌; 王雨晴; 章亦葵
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2019-10-17
Filing date: 2019-10-17
Publication date: 2020-04-07

Abstract

The invention discloses a runoff stability prediction method based on an LSTM composite network, which comprises the following steps: selecting hydrological, meteorological and land related data from a public database; respectively establishing a direct prediction model and a differential prediction model according to the selected data; respectively selecting respective prediction results from the direct prediction model and the differential prediction model; combining the direct prediction result, the differential prediction result and the meteorological data to calculate and generate a calibration model; and performing optimal calculation on the calibration model to obtain and output an optimal calibration model, wherein the method improves the performance stability of the prediction model by using two prediction modes based on the LSTM.

Description

Runoff stability prediction method based on LSTM composite network

Technical Field

The invention relates to the field of hydrologic prediction by applying a deep learning method, and quantitatively and qualitatively predicting hydrologic conditions, in particular to a runoff stability prediction method based on an LSTM composite network.

Background

1. Daily mean runoff

Runoff is the amount of water that passes through a water flow cross-section of a river over a period of time. Runoff is the main link of water circulation, and runoff is one of the most important hydrological factors on land and is the basic factor of water balance. Runoff may be affected by factors such as the watershed weather, land type, etc. The instantaneous runoff is averaged according to time, and the average runoff in a certain period (such as one day, one month, one year and the like) can be obtained, such as daily average runoff, monthly average runoff, annual average runoff and the like. The total water passing through a certain period of time is called runoff total amount, such as daily runoff total amount, monthly runoff total amount, annual runoff total amount and the like. In cubic meters, tens of thousands of cubic meters, or hundreds of millions of cubic meters.

2.LSTM

The recurrent neural network is suitable for time series data processing, but has the problem of gradient disappearance. To solve this problem, long short term memory networks (LSTM) have been proposed, which have advantages over the general Recurrent Neural Networks (RNN) in time series simulation. It is a time-cycled neural network suitable for processing and predicting important events with relatively long intervals and delays in a time series.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention provides a runoff stability prediction method based on an LSTM composite network, which uses an LSTM to perform stability prediction on average runoff per day of a river basin; the addition of weather and land type parameters to the input takes into account the difference in surface water penetration capacity between different land types. Meanwhile, two prediction modes, namely a direct prediction method and a differential prediction method, are used based on the LSTM to improve the performance stability of the prediction model, so that the prediction result is more reliable.

In order to solve the technical problems in the prior art, the invention adopts the following technical scheme to implement:

a runoff stability prediction method based on an LSTM composite network comprises the following steps:

selecting hydrological, meteorological and land related data from a public database;

respectively establishing a direct prediction model and a differential prediction model according to the selected data;

respectively selecting respective prediction results from the direct prediction model and the differential prediction model;

combining the direct prediction result, the differential prediction result and the meteorological data to calculate and generate a calibration model;

and performing optimal calculation on the calibration model to obtain and output an optimal calibration model.

The optimal calibration model obtaining step:

carrying out data normalization processing on the direct prediction result, the differential prediction result and the meteorological data;

splicing the direct prediction result, the differential prediction result and the meteorological data to generate a specified format;

deleting part of useless columns in the data with the specified format to generate a supervision data format;

carrying out training set and test set division on data in a supervision data format;

respectively establishing a compiling, training and evaluating model according to the data of the training set and the test set;

performing prediction processing on the compiling, training and evaluating model to output NSE and RMSE;

judging whether the calibration model completes the tuning processing of all the parameters or not;

and (4) comparing the NSE and the RMSE in the tuning and optimizing calibration model to select the optimal calibration model for output.

The direct prediction model and the differential prediction model both adopt three-layer network structures, namely an input layer, an LSTM layer and an output layer.

Advantageous effects

The invention provides a method for stably predicting the daily average runoff of a river basin by using two models and a final balance prediction result, which greatly improves the worst performance result of the prediction model, reduces the difference between the best performance and the worst performance, improves the average effect, makes the hydrologic prediction model more stable and reliable, and has important significance for pollutant treatment, flood control, drought resistance, reasonable utilization of water resources and the like.

Drawings

FIG. 1 is a data flow diagram of the overall architecture of the present invention;

FIG. 2 is a flow chart of the present invention for establishing a calibration model;

FIG. 3 is an overall process of the present invention for building a direct prediction model;

FIG. 4 is an overall process of the present invention for building a differential prediction model;

FIG. 5 is a process of the present invention involving direct predictive model data preprocessing;

FIG. 6 is a process for preprocessing data relating to differential predictive models in accordance with the present invention

FIG. 7 is a schematic diagram of the basic model structure of the present invention

FIG. 8 is an overall process of the present invention for building a base model tuning

FIG. 9 is a schematic diagram of a calibration model structure according to the present invention

FIG. 10 is a flowchart of the present invention relating to determining a final calibration model

Detailed Description

The invention is described in detail below with reference to the attached drawing figures:

the invention discloses a method for stably predicting daily average runoff of a river basin by using LSTM. The addition of weather and land type parameters to the input takes into account the difference in surface water penetration capacity between different land types. Meanwhile, two prediction modes, namely a direct prediction method and a differential prediction method, are used based on the LSTM to improve the performance stability of the prediction model, so that the prediction result is more reliable. In hydrologic prediction, a traditional conceptual hydrologic model is generally used for prediction of a stationary sequence, but hydrologic data is a nonlinear sequence with high uncertainty and complexity, so the prediction effect of the traditional model is not satisfactory. In the modern prediction method, people use an artificial neural network to establish a hydrological prediction model, which is suitable for fitting a highly nonlinear relation in the hydrological prediction field. While the use of LSTM was just started and is still in the exploration phase. In order to improve the accuracy and stability of hydrologic prediction, the invention provides a method for establishing a multi-model combined prediction model by using LSTM. The method adopts a supervised learning mode to predict the daily average runoff of the jing river. The method comprises the steps of inputting relative daily average runoff data of a jing river basin day by day, meteorological data and land type data, and outputting the predicted relative daily average runoff data of the next day of a certain day.

As shown in FIG. 1, in order to improve the accuracy and reliability of the hydrologic prediction model, the invention adopts a method of combining direct prediction and differential prediction to finally balance the result. Namely, two basic models are established firstly: and (3) directly predicting a model and a differential prediction model, and then establishing a calibration model by combining the prediction results of the two models with meteorological data to finally determine a predicted daily average runoff value.

Let the base model predictor be f_i(x) (i ═ 1,2), the calibration model predictor is f (x), x being the input to the model. By y₁、y₂And Y sequentially represents the direct output of the direct prediction model, the differential prediction model and the calibration model, namely an equivalent relation is established: y is₁＝f₁(x),y₂＝f₂(x),Y＝F(x)。

Let the training set data be S, Sx be the input element in the training set data.

The training set data set contains hydrologic flow data set Sr_dHydrologic flow differential data set Src_dMeteorological data set Sw_dAnd land type data set Sl_dThe d subscript represents the code of the day to which the data belongs, the codes are sorted according to the date, 0<d<length (S) +1, d is integer first. It should be noted that the precipitation amount and daily average air temperature at day d in the meteorological data set of the training set are denoted by ra in alphabets_dAnd mt_d. It should be noted again that the hydrological flow difference data is obtained by subtracting the flow of each day from the flow of the previous day, and the difference is the hydrological flow difference data of the current day, and all such hydrological flow difference data form a hydrological flow difference data set, where the formula (1) is as follows:

Src_d＝Sr_d-Sr_d-1(1)

both basic models are predicted using the LSTM timing dependent network. In the direct prediction model, since the daily average runoff volume of a certain day is affected by the hydrological conditions, meteorological conditions and land conditions of the previous days, the daily average runoff volume data of the day before the certain day and the meteorological and land data of the day and the day before the certain day are input as prediction reference conditions in the direct prediction model when predicting the daily average runoff volume of the jing river basin. Forecasting daily average of a certain day through hydrological, meteorological and land data of two daysRunoff, equations (2) and (3) representing the daily average runoff predicted by the direct prediction model on day d, where y is as follows_1dThe predicted value of the daily average runoff of the direct prediction model on day d is as follows:

Sx_d＝Sr_d-1+Sw_d-1+Sw_d+Sl_d-1+Sl_d(2)

this also reflects the time interval dependence and memory of LSTM.

The difference prediction model focuses on the factors influencing the average runoff difference between two adjacent days, the weather and land data of the day and the day before the day are input, the average runoff difference between the day and the day before the day is output, and then the predicted difference is added with the actual day average runoff observation value of the day before the day to obtain the day average runoff predicted value of the day. Equations (4) and (5) representing the predicted daily average runoff on day d, where y is as follows_2dThe difference prediction value of the daily average runoff on the day d of the difference prediction model is as follows:

Sx_d＝Sw_d-1+Sw_d+Sl_d-1+Sl_d(4)

the direct output of the difference prediction model also needs to be subjected to inverse difference operation to obtain the daily average runoff actual prediction value of the difference prediction model, and formula (6) is as follows, wherein y_2rdRepresenting the actual daily average runoff predicted value of the differential prediction model for day d:

the two basic models can finally obtain corresponding daily average runoff predicted values, and the LSTM network in the calibration model is required to carry out the maximum prediction on the two predicted valuesAnd final balancing, namely inputting the daily average runoff predicted value obtained by the direct prediction model, the daily average runoff predicted value obtained by the difference prediction model and the weather data of the day, and outputting the daily average runoff predicted value. Equations (7) and (8) representing the mean daily runoff of the calibration model predicted day d are as follows, where Y_dIs the predicted value of the daily average runoff on day d of the calibration model:

F(Sx_d)＝Y_d(8)

the method comprises the following concrete implementation steps:

1. preprocessing and reading in of data sets

(1) Data acquisition and data cleansing

Weather data, land type data and flow data required by the project can be obtained through various public data websites.

The meteorological data includes: year, month, day, average daily temperature, highest daily temperature, lowest daily temperature, precipitation amount of 20-20 hr, small evaporation capacity, and sunshine duration. The land type data comprises 13 land utilization types of cultivated land, forest land, shrub land, sparse land, high coverage grassland, medium coverage grassland, low coverage grassland, other forest land, lake and reservoir, river, town, rural residential land and other development land, and the unit is calculated in hectare. The flow data is daily average runoff data. It is necessary to find consecutive data segments for training, testing and validation.

The raw data set needs to be preprocessed firstly, because some raw meteorological observation data use a specific representation mode, for example, 32700 in precipitation represents that there is almost no precipitation; 32766 for absence; 32XXX represents precipitation; 31XXX represents snowfall; 30XXX indicates precipitation and snowfall. These values do not represent their actual amount of precipitation or rainfall, so it is necessary to convert these meteorological data into actual amounts of precipitation or snowfall.

Then, in the differential prediction mode, the screened data is also subjected to differential operation, and the daily average radial flow value of each line is subtracted by the daily average radial flow value of the previous line to be used as the differential value of the current line, which is shown in formula (1). And then splicing the row with the current day-diameter flow difference value with the weather and land data of the original row. While data in the direct prediction mode does not need to be differentially operated.

(2) Data normalization

The data is subjected to a normalization operation, i.e., a Z-Score operation. The Z-Score converts the multi-class data into a unitless Z-Score value, so that the data standard is unified, the data comparability is improved, and the influence of data dimension is eliminated. The standard score, also called the Z-score, is the difference between the score and the mean divided by the standard deviation. The amount of Z-value represents the distance between the original score and the mean of the mother, calculated in standard deviation. The standard score can truly reflect a relative standard distance of the score from the mean. Z is a negative number when the raw score is lower than the average value, and is a positive number when the raw score is lower than the average value. The standard score is a way to see the relative position of a certain score in the distribution. The formula (9) is as follows: x is the observed value of the individual data, μ is the mean of the overall data, σ is the standard deviation of the overall data:

(3) spliced into a supervised data form

And splicing the prepared direct runoff data set and the differential runoff data set into a supervision data form according to corresponding input and output contents. The input of the direct prediction mode is the daily average runoff of the previous day and the weather and land data of the previous day and the next day, and the output is the daily average runoff predicted value of the next day, which is shown in formulas (2) and (3). The input of the difference prediction mode is meteorological data and land data of the previous day and the next day, and the output is the predicted value of the daily average runoff difference of the next day, which is concretely shown in formulas (4) and (5).

(4) Partitioning a training set, a test set, and a validation set

And after the data available for the model is prepared, dividing the training set, the testing set and the verification set. The data volume used in the project is ten thousand orders of magnitude, and the proportion of the training set to the test set is kept about 7:3 in consideration of small data volume. The same data is used for the validation set and the test set. The data preprocessing process is completed. The data preprocessing process of the direct prediction model is shown in fig. 1, and the data preprocessing process of the differential prediction model is shown in fig. 2.

(5) Reading in a data set

At this time, as a result of the data preprocessing process, the prepared direct prediction data set and differential prediction data set can be respectively read into the LSTM direct prediction model and the LSTM differential prediction model as a training set and a test set.

2. Establishing two base models and tuning of the base models

An LSTM network is used to build two basic models, a direct prediction model and a differential prediction model. The two underlying models differ only slightly in the processing of the input data set and the predicted results, and not in other ways. Therefore, the overall processes of establishing, compiling, training, tuning the hyper-parameters and determining the final model of the two basic models are consistent, and the unified description is provided below.

(1) Initial model definition, compilation and training

Two basic models are defined and adopt three-layer network structures, namely an input layer, a hidden layer (namely an LSTM layer) and an output layer. The initial model can be compiled using the build method of the model module in the Keras framework, and then the model can be fit-trained on the training set using the fit method in the model module.

(2) Model prediction and outcome assessment

The prediction on a test set by using a trained model can be realized by using a prediction method of a model module in a Keras framework. It is noted that when the difference prediction base model is used for prediction, the direct prediction result obtained by the prediction model is the difference between two days. Therefore, an inverse differential operation is required, and the predicted difference value of the current day is added with the observed value of the average runoff of the previous day to obtain the predicted value of the average runoff of the current day, which is shown in formula (6).

After prediction, the deviation of the prediction result from the daily average runoff actual value needs to be evaluated. The three evaluation index parameters used in the evaluation are: NSE, RMSE, and Loss. NSE (Nash-Sutcliffe effect cycoeefficient) is a Nash efficiency coefficient, which is generally used for verifying the quality of the simulation result of the hydrological model, and the expression of the Nash efficiency coefficient is shown in formula (11).

Wherein O is_iDenotes the ith measured value, P_iIt represents the ith predicted value of the current signal,

represents the average of all predicted values, and n is the number of measured values and predicted values. The value of E is negative infinity to 1, E is close to 1, the representation mode quality is good, and the model reliability is high; e is close to 0, which means that the simulation result is close to the average value level of the observed value, namely the overall result is credible, but the process simulation error is large; e is much less than 0, the model is not trusted.

RMSE (root-mean-square error), a commonly used measure of the difference between measured values, which are often predicted by the model or observed estimates. The RMSE expression is expressed in more detail in equation (12) where the root mean square deviation represents the sample standard deviation of the difference between the predicted and observed values, and is used to measure the deviation of the observed value from the true value, where y_iIs the ith true value of

The ith prediction value, and m is the total number of samples. The larger the RMSE is, the worse the model effect is, and conversely, the smaller the RMSE is, the smaller the deviation between the predicted value and the true value is, and the better the model effect is.

Loss, i.e. the Loss value calculated by the Loss function, is mainly used for evaluating the Loss in the training and verification processes, wherein the Loss on the training set is 'Loss', and the Loss on the verification set is 'val _ Loss'. By recording the image of loss of the training verification process, we can see the loss variation trend of the model on the training set and the verification set as the epoch gradually increases. The best effect is that loss and val _ loss gradually decrease with the increase of training turns, and gradually reach the last gradual constant, i.e. convergence. Overfitting means that as the training turns increase, loss continuously decreases and val _ loss rises. Through the loss image, it can be observed whether loss and val _ loss are in the situation of not converging with continuous oscillation, if so, measures such as increasing the Batchsize and decreasing the learning rate are needed. The Loss image is an important indicator in adjusting the model parameters.

(3) Hyper-parametric tuning of models

In order to optimize the effect of the basic model, the single hidden layer LSTM layer needs to adjust many hyper-parameters: units, bach _ size, Epoch, activation function, loss function, optimizer, etc.

Units is the output dimension of the LSTM layer and also represents the number of neurons in this layer. The number of Units is too small, the network fitting capability is poor, and fitting is insufficient; the processing capacity of the neural network can be improved due to the large number of units, but when the number of units is too large, the excessive complex nonlinear relation expressed by the neural network can cause overfitting, and when the number of units is too large to a certain extent, the processing capacity of the neural network can not be obviously improved. When the optimal units are selected, firstly, roughly selecting the optimal units from a [ a, b ] large interval, equally dividing the [ a, b ] interval into x parts with the interval of (b-a)/x, and taking values of a, a + (b-a)/x, a +2 (b-a)/x, a +3 (b-a)/x, b- (b-a)/x and b as the units respectively. And then selecting the units which have better performance, taking a new cell in the left and right range of the units, and selecting the optimal values of the units according to a partition method. If the optimal value can not be determined after the first subdivision, the optimal units can be searched by performing multiple times of thinning interval segmentation. The Batch _ size is the number of samples required to update the parameter at one gradient descent. When the Batch _ size is larger than 1, gradient descending can be carried out according to a plurality of samples, so that the descending direction is accurate and stable, and the training speed can be improved; however, if the batch size is too large, the descending direction is single, and the local optimum is likely to be achieved. Too small a batch _ size will result in slow computation speed, large computation overhead, and violent target function oscillation, and the network will not be easy to converge. Finding a batch size that fits its own model requires constant experimentation. Different models have different blocksizes that fit themselves for different data sets. The Epoch is the number of rounds of all data training, and a mode of combining a coarse interval method and a fine interval method adopted when the optimal units are selected can be referred to when the optimal Epoch is selected. If the epoch is too small, the training times are not enough, convergence cannot be realized, and the network is under-fitted; if the training times are too many, overfitting may occur, so that the trained model performs well on the training set, but does not have good generalization ability on the test set.

The use of an activation function may increase the factor of non-linearity. By trying to use the default activation function tanh in the hidden layer of LSTM, it is best to not add any activation function in the sense output layer, so the effect of the activation function is different from application to application. The optimizer used in the invention is Adam, which can naturally realize the adjustment of the learning rate, and is suitable for unstable objective functions. Adam is an optimizer with excellent performance in many cases. The loss function used in this invention is MAE. The Mean Absolute Error (MAE) is a loss function for the regression model that represents the sum of the absolute values of the differences between the target variable and the predicted variables. It measures the average magnitude of the error in a set of predictions regardless of the direction of the error. The MAE is expressed in formula (10), where y is the target variable and y is^pFor predictor variables, i represents the number of samples, for a total of n samples. Different optimizers and combinations of different loss functions may produce different effects, so that multiple sets of experiments are performed as much as possible to cover more combination situations to obtain the optimal effect.

In the process of continuously adjusting the hyper-parameters, a plurality of different models are built, and each model is compiled, trained, predicted and evaluated.

(4) Determining two optimal base models

For the same parameter conditions, multiple experiments should be carried out to obtain the best effect, the worst effect and the average effect of the experiments, the prediction effect with larger NSE coefficient is good, and the prediction effect with smaller RMSE is good. And comparing the evaluation effects of different models on the test set through continuous super-parameter tuning experiments, and finally determining the optimal parameters of the direct prediction model and the differential prediction model.

Through experiments, the parameter values of the two basic models in the invention are finally determined as follows: unit 80, batch size 12, Epoch 80, dropout 0.2, hidden layer activation function tanh, optimizer Adam, loss function mae. The structural diagrams of the two basic models are shown, so far, the tuning work of the direct prediction model and the differential prediction model is completed, namely, the whole tuning process of the two basic models is shown in fig. 4.

3. A calibration model is established and tuned as shown in fig. 10.

After the basic model tuning work of the direct prediction model and the differential prediction model is completed, the structures and the hyper-parameters of the two basic models, namely the direct prediction model and the differential prediction model, are finally determined. In order to ensure that the prediction result is more stable and reliable, the method adopts a mode of calibrating by using double results to determine the final prediction value. The prediction results of the two basic models need to be trained again to obtain the final daily average runoff prediction value.

(1) Respective prediction results of the basic models and input of meteorological data

The inputs to this final calibration model are the direct prediction model's predictions for that day, the differential prediction model's predictions for that day, and the partial meteorological data for that day. And (5) calibrating the input and output of the model, specifically see the formulas (7) and (8).

(2) Calibration model hyper-parametric tuning and determination

After the data is read, the model training procedure is the same as described above. And carrying out normalization, splicing into a supervision data form, and dividing a training set, a test set and a verification set. After preparing data, the tuning model determines the optimal values of the structure and the hyper-parameters of the network: unit 80, batch size 12, Epoch 300, drop 0.2, recovery drop 0, activation function tanh (lstm default activation function), optimizer Adam. A schematic of the structure of the optimal calibration model is determined, see fig. 5 below.

(3) Beneficial effects of calibration model

Experiments show that the minimum NSE of the calibration model is greatly improved compared with that before calibration: from 0.37 to 0.85; and the RMSE is reduced on the whole, and the prediction error of the model is reduced. Therefore, the stability and the accuracy of the hydrological prediction model are greatly improved, and the method is an important progress for the traditional method.

We use the hydrological, meteorological and land data sets of the Jing river station of 31 years in 1956-1963, 1972-1987 and 2006-2012 to establish a base model and a calibration model to predict the daily average runoff of the Jing river, and finally the average NSE of the hydrological prediction model reaches 0.89 and the average RMSE is about 0.006. The direct prediction model can well predict the sudden change condition of the runoff volume, and the runoff volume in the prediction flat period is larger than the actual runoff volume value. Multiple tests prove that the direct prediction model is not stable enough in performance, and the optimal effect and the worst effect are greatly different. The differential prediction model generally has an NSE of about 0.5 when the predicted runoff volume is suddenly changed, and the predicted runoff volume in the gentle period is smaller than the actual runoff volume. After the two are balanced, the difference between the predicted value and the actual value of the calibration model in the slow period is reduced, the model is more accurate, and the worst performance of the calibration model is improved, so that the model is more stable.

It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A runoff stability prediction method based on an LSTM composite network is characterized by comprising the following steps:

2. The method for predicting the runoff stability based on the LSTM composite network as claimed in claim 1, wherein the optimal calibration model obtaining step comprises:

3. The method of claim 1, wherein the prediction method of the runoff stability based on the LSTM composite network is characterized in that,