CN113468803A

CN113468803A - Improved WOA-GRU-based flood flow prediction method and system

Info

Publication number: CN113468803A
Application number: CN202110642031.7A
Authority: CN
Inventors: 嵇春雷; 彭甜; 张楚; 孙娜; 赵环宇; 夏鑫; 孙伟; 李沂蔓; 王振
Original assignee: Huaiyin Institute of Technology
Current assignee: Hefei Jiuzhou Longteng Scientific And Technological Achievement Transformation Co ltd
Priority date: 2021-06-09
Filing date: 2021-06-09
Publication date: 2021-10-01
Anticipated expiration: 2041-06-09
Also published as: CN113468803B

Abstract

The invention discloses an improved WOA-GRU flood flow prediction method and system, wherein the method comprises the following steps: (1) preprocessing pre-acquired original data, and converting each processed data into matrix data according to time sequence; (2) establishing a random forest model for feature selection, training related data and carrying out importance measurement; (3) carrying out normalization processing on the data, and dividing sample data into a training set and a test set according to a time sequence; (4) building a GRU model, and initializing parameters of the GRU model; (5) optimizing the number of hidden layer units and the learning rate of the GRU model by using an improved WOA algorithm; (6) and establishing an improved WOA-GRU flood flow prediction model, predicting the flood flow by using the test set data and the model, and outputting an error and a prediction result. The invention uses the improved WOA-GRU model to predict the flood flow, has the characteristics of high convergence rate, strong generalization capability and high prediction precision, and is more suitable for predicting the flood flow.

Description

Improved WOA-GRU-based flood flow prediction method and system

Technical Field

The invention belongs to the field of flood flow prediction, and particularly relates to a flood flow prediction method and system based on improved WOA-GRU.

Background

In ancient times, natural disasters still exist as the biggest enemies encountered by people all over the world, and people all over the world gradually grow and grow up in the process of fighting with self-perceived disasters harmful to human beings. The population of China is continuously increased, the earth surface and the forest are artificially destroyed, and once severe weather such as rainstorm, strong convection and the like occurs, flood disasters can be caused to occur continuously. Flood disasters can cause great damage to lives and properties of people every time, so if accurate flood flow prediction can be achieved, loss of flood disasters to people can be greatly reduced, and great significance is achieved for the whole society and human beings.

Flood flow prediction is a trend of hydrology informatics in the current era and is also an important decision for making implementation scheduling of flood control and disaster reduction. Many models for flood flow prediction appear after years of development, but the modeling process can be divided into a physical process-based model and a data-driven model. The model based on the physical process is mainly used for calculating parameters according to the physical characteristics of local terrain, temperature and humidity, vegetation coverage, soil texture and the like, the model established by the method is high in prediction accuracy, but the calculation is quite troublesome, the model is only suitable for fixed terrain, and modeling needs to be carried out again when the place is changed. The data-driven model is a functional map or a joint distribution from input features to output features obtained by training past historical data and using an intelligent method such as machine learning. At present, a large number of flood flow prediction methods based on BP, ELM, SVM and other models exist, but the complexity of flood formation is basically not fully considered, the model parameter setting is not accurate enough, and the prediction precision needs to be improved.

Through the above analysis, the problems and defects of the prior art are as follows: due to the complexity of the flood forming process, when the current flood forecasting model carries out hydrologic flow sequence forecasting, an accurate and reliable forecasting result is difficult to make.

Disclosure of Invention

The purpose of the invention is as follows: the invention provides a flood flow prediction method and system based on improved WOA-GRU, solves the technical problem caused by the complexity of a flood forming process in the prior art, and improves the prediction accuracy of the flood flow.

The technical scheme is as follows: the invention provides a flood flow prediction method based on improved WOA-GRU, which specifically comprises the following steps:

(1) preprocessing flood data acquired in advance and environment data related to flood, and converting each processed data into matrix data according to time sequence;

(2) establishing a random forest model to perform feature selection on related environment data, taking flood data and the environment data related to the flood as input of the random forest model, training the environment data related to the flood, performing importance measurement, and selecting the environment data with high importance as an optimal feature set;

(3) normalization processing is carried out on the flood data and the screened optimal feature set, the flood data and the screened optimal feature set are mapped between 0 and 1, a sample data set is obtained, and the sample data set is divided into a training set and a testing set according to the time sequence;

(4) building a GRU model, and initializing parameters of the model, including batch processing size, model training times and dropout value;

(5) initializing WOA algorithm parameters, generating a population by adopting a chaos Tent initialization strategy, adding a nonlinear factor and an inertial weight to improve position updating, and adding a catfish effect strategy to improve the capacity of jumping out of local optimum by a whale optimization algorithm; optimizing the number n of hidden layer units and the learning rate epsilon of the GRU model by using an improved WOA algorithm;

(6) and establishing an improved WOA-GRU flood flow prediction model, predicting the flood flow by using the test set and the prediction model, and outputting an error and a prediction result.

Further, the environmental data related to flood in step (1) includes rainfall, temperature and humidity.

Further, the step (2) comprises the steps of:

(21) performing playback sampling on related environment data by a random extraction method, and dividing the data into sampled in-bag data and non-sampled out-of-bag data;

(22) constructing a random forest model with N decision trees by using in-bag data, and training out-of-bag data to obtain an error epsilon₁；

(23) Randomly changing the value of a certain relevant factor characteristic in the data outside the bag, and obtaining an error epsilon through random forest model training₂；

(24) Calculating the importance A of the relevant factors, wherein the formula is as follows:

where N is the number of decision trees, ε₁ and ε₂Is the error of the two training sessions;

(25) and continuously changing the characteristics of other factors in the data outside the bag to obtain the importance of all factors related to the flood, sequencing the factors, and using the factors with high importance as the input of a flood flow prediction model.

Further, the formula of the normalization process in step (3) is:

wherein ,

Is the normalized sample data, x_iIs the sample data before normalization, x_minIs the minimum value of the sample data, x_maxIs the minimum value of the sample data, and n is the number of sample data.

Further, the step (4) comprises the steps of:

(41) initializing model parameters, including: carrying out optimization determination on the batch processing size, the model training times and the dropout value, the number of hidden layer units and the learning rate through an improved whale algorithm;

(42) calculating a reset gate value r_tAnd candidate implicit states

Through r_tTo control

Whether h is used or not_t-1The calculation formula is as follows:

r_t＝σ(W_r·X_t+Q_r·h_t-1+b_r)

where σ is the activation function, select

W_r、W_z、Q_r、Q_zIs the connection weight, X_tIs the input vector at time t, b_r and b_zIs an offset vector, h_t-1Is the state information at the time t-1, and tanh is a hyperbolic tangent function, which functions as an activation function;

(43) calculating the value u of the update gate_tWith the current state h_tBy updating the value u of the gate_tTo control the current state h_tPreserving how much state information was last in time and accepting candidate implicit states

The calculation formula of the amount of the information in (1) is as follows:

u_t＝σ(W_u·X_t+Q_u·h_t-1+b_u)

where σ is the activation function, W_u、Q_uIs connected with weight value X_tIs the input vector at time t, b_uIs an offset vector, h_t-1Is the state information at time t-1; current state h_tI.e. the output of the GRU model at time t.

Further, the step (5) comprises the steps of:

(51) initializing the population scale, the iteration times and the upper and lower limits of the whale position of a whale algorithm, and generating a population by using an initialization strategy of chaotic Tent mapping;

(52) calculating fitness values of all individuals in the population, obtaining and recording the current optimal whale individual position vector, checking whether the optimal position vector is the same as the optimal position vector of the last iteration, and entering (53) if the optimal position vector does not evolve within the specified iteration times; otherwise, entering (54);

(53) ranking the fitness values of all the current individuals in a descending order, and reusing chaotic initialization for the positions of the next 50% of individuals to improve the capacity of the whale algorithm for jumping out of local optimum;

(54) adding a nonlinear factor a and an inertia weight omega into a basic whale algorithm to improve the optimization performance of the algorithm, wherein the calculation formulas are as follows:

wherein T is the current iteration number, T is the maximum iteration number, and λ is an adjustment coefficient, which is optimal when λ is 3 through experiments;

(55) introducing parameters p, A, p being [0, 1 ]]A ═ 2a · r₁-a; calculating the values of p and A if p is less than or equal to 0.5 and | A<1, entering (56); if p is less than or equal to 0.5 and | A | > is more than or equal to 1, entering (57); if p is>0.5, then enter (58);

(56) performing contraction surrounding iterative updating on the whale individual position vector, wherein the updating formula is as follows:

X(t+1)＝ω·X_rand-A·|C·X_rand-X_t|

wherein X is the position of the individual, t is the current iteration number, and C is [0, 2 ]]A random number of (2), X_randAre random individuals in whale populations;

(57) carrying out random search predation iterative updating on whale individual position vectors, wherein the updating formula is as follows:

X(t+1)＝ωX_best-A·|C·X_best-X_t|

wherein X is the position of the individual, t is the current iteration number, and C is the interval [0, 2 ]]Random number of (2), X_bestRepresenting the optimal individual in the current population;

(58) carrying out spiral predation iterative updating on whale individual position vectors according to the formula, wherein the updating formula is as follows:

X(t+1)＝D_bset·e^bl·cos(2πl)+(1-ω)·X_best

wherein X is the location of the individual, D_best＝|X_best-X_tI denotes that the individual X is at the optimum distance from the individual X before the location update_bestB is a constant, l is the interval [ -1, 1 [ - ]]A random number of (c);

(59) adding 1 to the iteration number, judging whether the maximum iteration number of the algorithm is reached, if so, ending, otherwise, entering (510)

(510) And (4) sending the optimal solution output by the whale algorithm to the GRU model, calculating the training error according to the optimal number of the units of the current hidden layer and the learning rate of the GRU model, and returning to (52).

Based on the same inventive concept, the invention also provides an improved WOA-GRU-based flood flow prediction system, which comprises a data acquisition module, a data preprocessing module, a feature extraction module, a parameter optimization module and a flow prediction module;

the data acquisition module is used for acquiring original data of flood and relevant factors, including flow, rainfall and temperature and humidity;

the data preprocessing module is used for preprocessing the acquired original data, cleaning abnormal values, completing missing values by using a sample mean value and converting the data into matrix data;

the characteristic extraction module is used for extracting characteristics in the data of the relevant factors, calculating the importance of each sample data through a random forest model, sequencing the sample data, and selecting the factor data with high importance as a sample data set of the prediction module;

the parameter optimization module is used for using the improved whale algorithm to optimize the number of hidden layer units and the learning rate in the GRU model, and establishing a prediction model for optimizing the GRU based on the improved whale algorithm;

and the flow prediction module is used for inputting the data of the sample data output by the characteristic extraction module into the model by utilizing the established flood flow prediction model and obtaining a prediction result through calculation.

Has the advantages that: compared with the prior art, the invention has the beneficial effects that: 1. according to the method, the training set and the test set are obtained by performing feature extraction on historical flow data and historical data of factors related to flood through a random forest model, so that the effectiveness and reliability of the training set data are improved; 2. the whale optimization algorithm is improved aiming at the defect that the whale optimization algorithm is easy to fall into local optimization, a chaotic Tent initialization strategy is adopted to generate a population, a nonlinear factor and an inertia weight are added, and a catfish effect strategy is added. The capacity of jumping out of local optimum by a whale optimization algorithm is enhanced, the efficiency of global search is improved, and the technical effect of improving the accurate prediction precision of flood is achieved; and the improved WOA-GRU model is used for flood flow prediction, so that the method has the characteristics of high convergence rate, strong generalization capability and high prediction precision, and is more suitable for flood flow prediction.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a flow chart of the improved WOA-GRU prediction model algorithm provided by the present invention;

FIG. 3 is a comparison of true and predicted values obtained from simulations using the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

The invention provides a flood flow prediction method based on improved WOA-GRU (world Wide area network-group Unit), which specifically comprises the following steps as shown in figure 1:

step 1: collecting flood data (sample data) of a hydrological observation station and environment data related to flood, cleaning the collected data, eliminating abnormal values, filling missing values with sample mean values corresponding to environment factors, and converting each processed data into matrix data according to time sequence.

The flood related environmental data includes: local rainfall, temperature, humidity. The sample mean used to fill in missing values is:

wherein ,x_iIs the sample mean of the ith factor, n is the number of samples, f_i(t) is the value of the t-th data of the i-th factor.

Step 2: establishing a random forest model to perform feature selection on related environment data, taking flood data and environment data related to flood as input of the random forest model, training the environment data related to flood, performing importance measurement, and selecting the environment data with high importance as an optimal feature set.

And (2.1) carrying out replacement sampling on the related environment data by a random extraction method, and dividing the data into sampled in-bag data and non-sampled out-of-bag data.

(2.2) constructing a random forest model with N decision trees by using the data in the bag, and training the data outside the bag to obtain an error epsilon₁。

(2.3) randomly changing the value of the characteristic of a certain relevant factor in the data outside the bag, and then obtaining an error epsilon through random forest model training₂。

(2.4) calculating the importance of the relevant factor, wherein the formula is as follows:

where N is the number of decision trees, ε₁ and ε₂Is the error of the two trainings.

And (2.5) continuously changing the characteristics of other factors in the data outside the bag to obtain the importance of all factors related to the flood, sequencing the factors, and using the high importance as the input of a flood flow prediction model.

And step 3: and carrying out normalization processing on the flood data and the screened optimal feature set, mapping the normalized flood data and the screened optimal feature set between 0 and 1 to obtain a sample data set, and dividing the sample data into a training set and a testing set according to a time sequence, wherein the proportion of the training set to the testing set is 7: 3.

The formula of the normalization process is:

wherein ,

is the normalized sample data, x_iIs the sample data before normalization, x_minIs the minimum value of the sample data, x_maxIs the minimum value of sample data, n is the number of samplesThe number of data.

And 4, step 4: and constructing a GRU model and initializing parameters of the model.

(4.1) initializing model parameters, including: and optimizing and determining the batch processing size, the model training times and the dropout value, the number of hidden layer units and the learning rate through an improved whale algorithm.

(4.2) calculating the reset gate value r_tAnd candidate implicit states

Through r_tTo control

Whether h is used or not_t-1The calculation formula is as follows:

r_t＝σ(W_r·X_t+Q_r·h_t-1+b_r)

where σ is the activation function, select

W_r、W_z、Q_r、Q_zIs the connection weight, X_tIs the input vector at time t, b_r and b_zIs an offset vector, h_t-1Is the state information at time t-1, and tanh is a hyperbolic tangent function, which acts as an activation function.

(4.3) calculating the value u of the update gate_tWith the current state h_tBy updating the value u of the gate_tTo control the current state h_tPreserving how much state information was last in time and accepting candidate implicit states

The calculation formula of the amount of the information in (1) is as follows:

u_t＝σ(W_u·X_t+Q_u·h_t-1+b_u)

where σ is the activation function, W_u、Q_uIs connected with weight value X_tIs the input vector at time t, b_uIs an offset vector, h_t-1Is the status information at time t-1. Current state h_tI.e. the output of the GRU model at time t.

And 5: initializing parameters of a WOA algorithm, generating a population by adopting a chaotic Tent initialization strategy aiming at the defect that a whale optimization algorithm is easy to fall into local optimum, adding a nonlinear factor and an inertia weight to improve position updating, adding a catfish effect strategy, improving the jumping-out local optimum capability of the whale optimization algorithm, and optimizing the number n of hidden layer units and the learning rate epsilon of a GRU model by utilizing the improved whale algorithm. As shown in fig. 2, the method specifically includes the following steps:

and (5.1) initializing the population scale, iteration times and the upper and lower limits of the whale position of the whale algorithm, and generating the population by using an initialization strategy of chaotic Tent mapping.

(5.2) calculating fitness values of all individuals in the population, obtaining and recording the current optimal whale individual position vector, checking whether the current optimal whale individual position vector is the same as the optimal position vector of the last iteration, if the recorded optimal position vector does not evolve within the specified iteration times, entering (5.3), and if not, entering (5.4).

(5.3) ranking the fitness values of all the current individuals in a descending order, and reusing chaotic initialization for the positions of the latter 50% of individuals to improve the capacity of the whale algorithm to jump out of local optimum.

(5.4) adding a nonlinear factor a and an inertia weight omega into a basic whale algorithm to improve the optimization performance of the algorithm, wherein the calculation formulas are as follows:

where T is the current iteration number, T is the maximum iteration number, and λ is an adjustment coefficient, which is optimal when λ is 3 through experiments.

(5.5) introducing the parameters p, A, p being [0, 1 ]]A ═ 2a · r₁-a; calculating the values of p and A if p is less than or equal to 0.5 and | A<1, entering (5.6); if p is less than or equal to 0.5 and | A | > is more than or equal to 1, entering (5.7); if p is>0.5, then enter (5.8).

(5.6) carrying out contraction and surrounding iterative updating on the whale individual position vector, wherein the updating formula is as follows:

X(t+1)＝ω·X_rand-A·|C·X_rand-X_t|

wherein X is the position of the individual, t is the current iteration number, and C is [0, 2 ]]A random number of (2), X_randAre random individuals in whale populations.

(5.7) carrying out random search predation iterative updating on whale individual position vectors, wherein the updating formula is as follows:

X(t+1)＝ωX_best-A·|C·X_best-X_t|

wherein X is the position of the individual, t is the current iteration number, and C is the interval [0, 2 ]]Random number of (2), X_bestRepresenting the best individual in the current population.

(5.8) carrying out spiral predation iterative updating on the whale individual position vector according to the formula:

X(t+1)＝D_bset·e^bl·cos(2πl)+(1-ω)·X_best

wherein X is the location of the individual, D_best＝|X_best-X_tI denotes that the individual X is at the optimum distance from the individual X before the location update_bestB is a constant, l is the interval [ -1, 1 [ - ]]The random number of (2).

(5.9) adding 1 to the iteration number, judging whether the maximum iteration number of the algorithm is reached, if so, ending, otherwise, entering (5.10).

And (5.10) sending the optimal solution output by the whale algorithm to the GRU model, wherein the optimal solution is the current optimal hidden layer unit number and the learning rate of the GRU model, calculating a training error, and returning to (5.2).

Step 6: and comparing the test sample with the result of the prediction model, calculating the Root Mean Square Error (RMSE) and the average absolute percentage error (MAPE) of the predicted flood flow and the actual flood flow, and evaluating the effectiveness of the improved WOA-GRU flood flow prediction method.

The specific formulas of mean absolute error (RMSE) and Mean Absolute Percent Error (MAPE) are:

wherein ,y_ob(t) is the true value of the t-th sample, y_pre(t) is the predicted value of the t-th sample, and N is the total number of samples.

To demonstrate the effectiveness and the improvement of the proposed algorithm, 4048 data of the regain river basin high-field hydrological observation station from 2009-4-2020-4 are exemplified herein, and the hydrological observation station is located in the high town of the narrative state district of yibin city, Sichuan province. An algorithm program is written by adopting MATLAB language, and a prediction model and four groups of comparison prediction models are respectively constructed: the particle swarm optimization BP neural network and GRU prediction model (PSO-BP and PSO-GRU), the whale optimization BP neural network and GRU prediction model (WOA-BP and WOA-GRU) and the improved whale optimization GRU prediction model (IWOA-GRU) based on the invention.

Five prediction models of PSO-BP, WOA-BP, PSO-GRU, WOA-GRU and IWOA-GRU are simulated respectively for 10 times, the maximum value and the minimum value of each performance index are selected respectively, meanwhile, the average value of the performance indexes of 10 times of simulation of each model is calculated, and the running statistical result of each model is shown in table 1.

Table 1 shows the statistical table of the result performance indexes of the model of the invention and the model of the control group

As is apparent from Table 1, the mean MAE value for IWOA-GRU is 293.8928, the mean RMSE value is 797.4514, and the mean MAPE value is 0.1034. The result of predicting by using the improved whale algorithm to optimize the GRU is obviously superior to the result of other prediction models, and all indexes are improved. The improved whale optimization algorithm has the advantages that the optimizing capability of the whale optimization algorithm is improved, and the prediction accuracy of the improved whale algorithm-based optimized GRU prediction model is also improved.

The error and the predicted value of a group of samples with the minimum RMSE in 10 times of simulation are selected for each model, the ratio of the error to the true value is calculated, and the number of samples less than 0.2 and 0.1 in the total number of samples is recorded, namely 80% accuracy and 90% accuracy, and the result is shown in table 2.

Table 2 is a statistical table of the result accuracy of the model of the invention and the model of the control group

As is apparent from Table 2, the prediction accuracy of IWOA-GRU is obviously higher than that of a control group model, and the prediction accuracy of the optimized GRU prediction model based on the improved whale algorithm is also improved, the prediction accuracy is also improved, and the method provides a method with higher accuracy for flood flow prediction.

Fig. 3 is a comparison graph of the predicted result and the actual value in the above embodiment, and it can be seen from the graph that the predicted result of the present invention can basically follow the actual value, can basically predict the future runoff, and can predict the arrival of flood in time.

Based on the same inventive concept, the present invention further provides an improved WOA-GRU flood flow prediction system, as shown in fig. 1, including five modules, which are respectively:

the data preprocessing module is used for preprocessing the acquired original data, cleaning abnormal values, completing missing values by using the sample mean value and converting the data into matrix data;

and the prediction module is used for inputting the data of the sample data output by the characteristic extraction module into the model by utilizing the established flood flow prediction model and obtaining a prediction result through calculation.

Those skilled in the art will appreciate that the flood flow prediction method of the present invention can be implemented by computer program instructions. These program instructions may be stored in memory and may be invoked for execution by a processor to perform the functions of the present invention. According to the present invention, there is also provided a computer-readable storage medium storing a computer program. The method for predicting flood flows according to the present invention can be designed as a related computer program, which can be executed and implemented by the computer readable storage medium.

It should be noted that the above-mentioned embodiments are illustrative rather than restrictive, and the present invention is not limited to the above-mentioned embodiments, and any other embodiments according to the principles of the present invention are also considered to be within the protection scope of the present invention.

Claims

1. A flood flow prediction method based on improved WOA-GRU is characterized by comprising the following steps:

2. The improved WOA-GRU flood flow prediction method according to claim 1, wherein the flood-related environmental data of step (1) includes rainfall, temperature, humidity.

3. The improved WOA-GRU flood traffic prediction method according to claim 1, wherein the step (2) comprises the steps of:

4. The improved WOA-GRU flood flow prediction method according to claim 1, wherein the normalization process of step (3) is formulated as:

wherein ,

is the normalized sample data, x_iIs the sample data before normalization, x_minIs the minimum value of the sample data, x_maxIs the minimum value of the sample data, n is the sampleThe number of data.

5. The improved WOA-GRU flood traffic prediction method according to claim 1, wherein the step (4) comprises the steps of:

(42) calculating a reset gate value r_tAnd candidate implicit states

Through r_tTo control

Whether h is used or not_t-1The calculation formula is as follows:

r_t＝σ(W_r·X_t+Q_r·h_t-1+b_r)

where σ is the activation function, select

The calculation formula of the amount of the information in (1) is as follows:

u_t＝σ(W_u·X_t+Q_u·h_t-1+b_u)

6. The improved WOA-GRU flood traffic prediction method according to claim 1, wherein said step (5) comprises the steps of:

X(t+1)＝ω·X_rand-A·|C·X_rand-X_t|

X(t+1)＝ωX_best-A·|C·X_best-X_t|

X(t+1)＝D_bset·e^bl·cos(2πl)+(1-ω)·X_best

7. An improved WOA-GRU based flood flow prediction system using the method of any of claims 1-6, comprising a data acquisition module, a data preprocessing module, a feature extraction module, a parameter optimization module, and a flow prediction module;