CN113468803B

CN113468803B - WOA-GRU flood flow prediction method and system based on improvement

Info

Publication number: CN113468803B
Application number: CN202110642031.7A
Authority: CN
Inventors: 嵇春雷; 彭甜; 张楚; 孙娜; 赵环宇; 夏鑫; 孙伟; 李沂蔓; 王振
Original assignee: Huaiyin Institute of Technology
Current assignee: Hefei Jiuzhou Longteng Scientific And Technological Achievement Transformation Co ltd
Priority date: 2021-06-09
Filing date: 2021-06-09
Publication date: 2023-09-26
Anticipated expiration: 2041-06-09
Also published as: CN113468803A

Abstract

The invention discloses an improved WOA-GRU flood flow prediction method and system, wherein the method comprises the following steps: (1) Preprocessing the pre-acquired original data, and converting each item of processed data into matrix data according to time sequence; (2) Establishing a random forest model for feature selection, training related data and carrying out importance measurement; (3) Carrying out normalization processing on the data, and dividing the sample data into a training set and a testing set according to time sequence; (4) constructing a GRU model, and initializing parameters of the GRU model; (5) Optimizing the number of hidden layer units and the learning rate of the GRU model by utilizing an improved WOA algorithm; (6) And establishing an improved WOA-GRU flood flow prediction model, predicting the flood flow by using the test set data and the model, and outputting errors and prediction results. The invention uses the improved WOA-GRU model to predict the flood flow, has the characteristics of high convergence rate, strong flooding capability and high prediction precision, and is more suitable for predicting the flood flow.

Description

WOA-GRU flood flow prediction method and system based on improvement

Technical Field

The invention belongs to the field of flood flow prediction, and particularly relates to an improved WOA-GRU (WOA-GRU) based flood flow prediction method and system.

Background

Natural disasters have been the biggest enemies we encounter worldwide, and our world people grow gradually in the process of combating against self-recognized disasters that harm humans. Flood disasters can cause huge damage to lives and properties of people every time, so that if accurate flood flow prediction can be achieved, loss of flood disasters to people can be greatly reduced, and the flood disaster prediction method has great significance to the whole society and people.

Flood flow prediction is a trend of hydrologic informatics in the current era, and is also a great decision for making flood control and disaster reduction implementation scheduling. Over the years of development, many models of flood flow predictions have emerged, but the modeling process can be divided into a physical process-based model and a data-driven model based on the model. The model based on the physical process is mainly calculated according to the physical characteristics of local terrain, temperature and humidity, vegetation coverage, soil texture and the like, the model built by the method has high prediction accuracy, but the calculation is quite troublesome, the model is only suitable for fixed terrain, and the modeling is needed to be performed again in the place of change. The model based on data driving is a function mapping or joint distribution from input features to output features obtained by training past historical data and utilizing intelligent methods such as machine learning. At present, a large number of flood flow prediction methods based on models such as BP, ELM, SVM exist, but basically, complexity of flood formation is not fully considered, model parameter setting is not accurate enough, and prediction accuracy needs to be improved.

Through the above analysis, the problems and defects existing in the prior art are as follows: due to the complexity of the flood forming process, when the current flood forecasting model predicts the hydrologic flow sequence, accurate and reliable forecasting results are difficult to make.

Disclosure of Invention

The invention aims to: the invention provides an improved WOA-GRU-based flood flow prediction method and an improved WOA-GRU-based flood flow prediction system, which solve the technical problems caused by the complexity of a flood forming process in the prior art and improve the prediction accuracy of the flood flow.

The technical scheme is as follows: the invention provides a method for predicting WOA-GRU flood flow based on improvement, which specifically comprises the following steps:

(1) Preprocessing pre-acquired flood data and environmental data related to the flood, and converting each item of processed data into matrix data according to time sequence;

(2) Establishing a random forest model to perform feature selection on related environmental data, taking flood data and the environmental data related to the flood as inputs of the random forest model, training the environmental data related to the flood, performing importance measurement, and selecting high-importance environmental data as an optimal feature set;

(3) Carrying out normalization processing on flood data and the screened optimal feature set, mapping the flood data and the screened optimal feature set to between 0 and 1 to obtain a sample data set, and dividing the sample data set into a training set and a testing set according to time sequence;

(4) Constructing a GRU model, and initializing parameters of the model, including batch processing size, model training times and dropout values;

(5) Initializing WOA algorithm parameters, generating a population by adopting a chaotic Tent initialization strategy, adding a nonlinear factor and inertia weight to improve position updating, and simultaneously adding a catfish effect strategy to improve the ability of a whale optimization algorithm to jump out of local optimization; optimizing the number n of hidden layer units and the learning rate epsilon of the GRU model by utilizing an improved WOA algorithm;

(6) An improved WOA-GRU flood flow prediction model is established, the flood flow is predicted by using the test set and the prediction model, and an error and a prediction result are output.

Further, the environmental data related to the flood in the step (1) includes rainfall, temperature and humidity.

Further, the step (2) includes the steps of:

(21) The related environmental data are put back and sampled by a random extraction method, and the data are divided into sampled bag internal data and non-sampled bag external data;

(22) Constructing a random forest model with N decision trees by using the data in the bag, and training the data outside the bag to obtain an error epsilon ₁ ；

(23) Randomly changing the value of a certain relevant factor characteristic in the data outside the bag, and obtaining the error epsilon through training of a random forest model ₂ ；

(24) The importance A of the relevant factor is calculated as follows:

wherein N is the number of decision trees, ε ₁ and ε₂ Is the error of two training;

(25) The characteristics of other factors in the data outside the bag are continuously changed, the importance of all the factors related to the flood can be obtained, the factors are ranked, and the factors with high importance are input as a flood flow prediction model.

Further, the formula of the normalization processing in the step (3) is as follows:

wherein ,is normalized sample data, x _i Is the sample data before normalization, x _min Is the minimum value of the sample data, x _max Is the minimum value of the sample data, and n is the number of sample data.

Further, the step (4) includes the steps of:

(41) Initializing model parameters, including: the batch processing size, the model training times and the dropout value, and the number of hidden layer units and the learning rate are determined by optimizing an improved whale algorithm;

(42) Calculating the value r of the reset gate _t And candidate implicit statesBy r _t To control->Whether or not to use h for calculation of (2) _t-1 The calculation formula is as follows:

r _t ＝σ(W _r ·X _t +Q _r ·h _t-1 +b _r )

where σ is the activation function, selectW _r 、W _z 、Q _r 、Q _z Is a connection weight, X _t Is the input vector at time t, b _r and b_z Is the offset vector, h _t-1 Is state information at time t-1, and tan is hyperbolic tangent function and plays a role of activating function;

(43) Calculating the value u of the update gate _t And the current state h _t By updating the value u of the gate _t To control the current state h _t Preserving how much state information was at the previous time and accepting candidate hidden statesThe information of the information is calculated as follows:

u _t ＝σ(W _u ·X _t +Q _u ·h _t-1 +b _u )

wherein σ is the activation function, W _u 、Q _u Is the connection weight value X _t Is the input vector at time t, b _u Is the offset vector, h _t-1 Is state information at time t-1; current state h _t The output of the GRU model at the time t is obtained.

Further, the step (5) includes the steps of:

(51) Initializing population scale, iteration times and upper and lower limits of whale positions of a whale algorithm, and generating a population by using an initialization strategy of chaotic Tent mapping;

(52) Calculating fitness values of all individuals in the population, obtaining and recording the current optimal whale individual position vector, checking whether the current optimal whale individual position vector is the same as the optimal position vector of the last iteration, and entering (53) if the optimal position vector does not evolve within the specified iteration times; otherwise, enter (54);

(53) Ranking the fitness values of all the current individuals in order from large to small, and re-using chaos initialization for the positions of the last 50% of individuals to improve the ability of a whale algorithm to jump out of local optimum;

(54) The nonlinear factor a and the inertia weight omega are added into the basic whale algorithm, so that the optimizing performance of the algorithm is improved, and the calculation formulas are respectively as follows:

wherein T is the current iteration number, T is the maximum iteration number, λ is the adjustment coefficient, and is optimal when λ=3 through experiments;

(55) The introduction parameters p, A, p being [0,1]A=2a·r ₁ -a; calculating the values of p and A, if p is less than or equal to 0.5 and A|<1, then enter (56); if p is less than or equal to 0.5 and |A| is more than or equal to 1, entering (57); if p>0.5, then enter (58);

(56) Performing contraction bounding iterative update on individual whale position vectors, wherein the update formula is as follows:

X(t+1)＝ω·X _rand -A·|C·X _rand -X _t |

wherein X is the position of the individual, t is the current iteration number, and C is [0,2]A random number, X _rand Is a random individual in a whale population;

(57) Performing random search predation iterative update on individual whale position vectors, wherein the update formula is as follows:

X(t+1)＝ωX _best -A|C·X _best -X _t |

wherein X is the position of the individual, t is the current iteration number, and C is the interval [0,2 ]]Random number, X on _best Representing the optimal individuals in the current population;

(58) Performing spiral predation iterative update on individual whale position vectors according to the following formula:

X(t+1)＝D _bset ·e ^bl ·cos(2πl)+(1-ω)·X _best

wherein X is the location of the individual, D _best ＝|X _best -X _t The representation indicates that the individual X is located from the optimal individual X prior to the location update _best B is a constant, l is the interval [ -1,1]A random number on the table;

(59) The iteration number is added with 1, whether the maximum iteration number of the algorithm is reached or not is judged, if the maximum iteration number is reached, the process is ended, and if not, the process enters (510)

(510) And sending the optimal solution output by the whale algorithm to the GRU model, calculating training errors for the current optimal hidden layer unit number and learning rate of the GRU model by the optimal solution, and returning to the step (52).

Based on the same inventive concept, the invention also provides an improved WOA-GRU-based flood flow prediction system, which comprises a data acquisition module, a data preprocessing module, a feature extraction module, a parameter optimization module and a flow prediction module;

the data acquisition module is used for acquiring original data of flood and related factors, including flow, rainfall and temperature and humidity;

the data preprocessing module is used for preprocessing the acquired original data, cleaning off abnormal values, complementing the missing values by using sample mean values, and simultaneously converting the data into matrix data;

the feature extraction module is used for extracting features in the data of related factors, calculating the importance of each sample data through a random forest model, sequencing the sample data, and selecting factor data with high importance as a sample data set of the prediction module;

the parameter optimization module is used for establishing a prediction model for optimizing the GRU based on the improved whale algorithm by using the number of hidden layer units and the learning rate in the GRU model which are optimized by the improved whale algorithm;

the flow prediction module is used for inputting the data of the sample data output by the characteristic extraction module into the model by utilizing the established flood flow prediction model, and obtaining a prediction result through calculation.

The beneficial effects are that: compared with the prior art, the invention has the beneficial effects that: 1. according to the invention, the training set and the testing set are obtained by extracting the characteristics of the historical flow data and the historical data of the factors related to flood through the random forest model, so that the effectiveness and the reliability of the training set data are improved; 2. aiming at the defect that the whale optimization algorithm is easy to sink into local optimization, the invention improves the whale optimization algorithm, adopts a chaotic Tent initialization strategy to generate a population, adds nonlinear factors and inertia weights, and adds a catfish effect strategy. The ability of the whale optimization algorithm to jump out of local optimization is enhanced, the global searching efficiency is improved, and the technical effect of improving the accurate prediction precision of flood is achieved; and the improved WOA-GRU model is used for flood flow prediction, so that the method has the characteristics of high convergence rate, strong generalization capability and high prediction accuracy, and is more suitable for flood flow prediction.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a flowchart of an algorithm for providing an improved WOA-GRU predictive model in accordance with the present invention;

FIG. 3 is a graph comparing actual values with predicted values obtained by simulation using the present invention.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings.

The invention provides an improved WOA-GRU (WOA-GRU) flood flow prediction method, which specifically comprises the following steps:

step 1: collecting flood data (sample data) of a hydrological observation site and environmental data related to the flood, cleaning the collected data, removing abnormal values, complementing missing values by using sample means of corresponding environmental factors, and converting each item of processed data into matrix data according to time sequence.

The flood related environmental data includes: local rainfall, temperature, humidity. The sample mean used to complement the missing values is:

wherein ,x_i Is the sample mean of the ith factor, n is the number of samples, f _i (t) is the value of the t data of the i-th factor.

Step 2: and establishing a random forest model to perform feature selection on related environmental data, taking flood data and the environmental data related to the flood as inputs of the random forest model, training the environmental data related to the flood, performing importance measurement, and selecting high-importance environmental data as an optimal feature set.

(2.1) performing put-back sampling on the related environmental data by a random extraction method, and dividing the data into sampled bag-in data and non-sampled bag-out data.

(2.2) constructing a random forest model with N decision trees by using the data in the bag, and training the data outside the bag to obtain an error epsilon ₁ 。

(2.3) randomly changing the value of a certain relevant factor characteristic in the data outside the bag, and obtaining the error epsilon through training of a random forest model ₂ 。

(2.4) calculating the importance of the relevant factor, the formula is as follows:

wherein N is the number of decision trees, ε ₁ and ε₂ Is the error of two exercises.

And (2.5) continuously changing the characteristics of other factors in the out-of-bag data, and obtaining the importance of all the factors related to the flood, sequencing the factors, and taking the factors with high importance as the input of a flood flow prediction model.

Step 3: carrying out normalization processing on flood data and the screened optimal feature set, mapping the flood data and the screened optimal feature set to between 0 and 1 to obtain a sample data set, and dividing the sample data into a training set and a testing set according to time sequence, wherein the ratio of the training set to the testing set is 7:3.

The formula of normalization processing is:

Step 4: and constructing a GRU model and initializing parameters of the model.

(4.1) initializing model parameters comprising: the batch processing size, the model training times and the dropout value, and the number of hidden layer units and the learning rate are determined by optimizing an improved whale algorithm.

(4.2) calculating the value r of the reset gate _t And candidate implicit statesBy r _t To control->Whether or not to use h for calculation of (2) _t-1 The calculation formula is as follows:

r _t ＝σ(W _r ·X _t +Q _r ·h _t-1 +b _r )

where σ is the activation function, selectW _r 、W _z 、Q _r 、Q _z Is a connection weight, X _t Is the input vector at time t, b _r and b_z Is the offset vector, h _t-1 Is the state information at time t-1, and tanh is the hyperbolic tangent function, and acts as an activation function.

(4.3) calculating the value u of the update gate _t And the current state h _t By updating the value u of the gate _t To control the current state h _t Preserving how much state information was at the previous time and accepting candidate hidden statesThe information of the information is calculated as follows:

u _t ＝σ(W _u ·X _t +Q _u ·h _t-1 +b _u )

wherein σ is the activation function, W _u 、Q _u Is the connection weight value X _t Is the input vector at time t, b _u Is the offset vector, h _t-1 Is the state information at time t-1. Current state h _t The output of the GRU model at the time t is obtained.

Step 5: the parameters of the WOA algorithm are initialized, meanwhile, aiming at the defect that the whale optimization algorithm is easy to fall into local optimization, a chaotic Tent initialization strategy is adopted to generate a population, nonlinear factors and inertia weights are added to improve position updating, a catfish effect strategy is added, the capability of the whale optimization algorithm to jump out of local optimization is improved, and the improved whale algorithm is utilized to optimize the hidden layer unit number n and the learning rate epsilon of the GRU model. As shown in fig. 2, the method specifically comprises the following steps:

and (5.1) initializing the population scale, the iteration times and the upper and lower limits of whale positions of a whale algorithm, and generating a population by using an initialization strategy of chaotic Tent mapping.

And (5.2) calculating fitness values of all individuals in the population, obtaining and recording the current optimal whale individual position vector, checking whether the current optimal whale individual position vector is the same as the optimal position vector of the last iteration, entering (5.3) if the recorded optimal position vector does not evolve within the specified iteration times, and otherwise entering (5.4).

And (5.3) ranking the fitness values of all the current individuals in order from large to small, and re-using chaos initialization for the positions of the last 50% of individuals to improve the ability of the whale algorithm to jump out of the local optimum.

(5.4) adding a nonlinear factor a and an inertia weight omega into a basic whale algorithm to improve the optimizing performance of the algorithm, wherein the calculation formulas are respectively as follows:

where T is the current iteration number, T is the maximum iteration number, λ is the adjustment coefficient, and is best when λ=3 through experiments.

(5.5) introduction parameters p, A, p being [0,1]A=2a·r ₁ -a; calculating the values of p and A, if p is less than or equal to 0.5 and A|<1, then enter (5.6); if p is less than or equal to 0.5 and |A| is more than or equal to 1, entering (5.7); if p>0.5, then go to (5.8).

(5.6) performing contraction and bounding iterative update on individual whale position vectors, wherein the update formula is as follows:

X(t+1)＝ω·X _rand -A·|C·X _rand -X _t |

wherein X is the position of the individual, t is the current iteration number, and C is [0,2]A random number, X _rand Is a random individual in the whale population.

(5.7) carrying out random search predation iterative update on the whale individual position vector, wherein the update formula is as follows:

X(t+1)＝ωX _best -A·|C·X _best -X _t |

wherein X is the position of the individual, t is the current iteration number, and C is the interval [0,2 ]]Random number, X on _best Representing the optimal individual in the current population.

(5.8) carrying out spiral predation iterative update on the whale individual position vector according to the following formula:

X(t+1)＝D _bset ·e ^bl ·cos(2πl)+(1-ω)·X _best

wherein X is the location of the individual, D _best ＝|X _best -X _t The representation indicates that the individual X is located from the optimal individual X prior to the location update _best B is a constant, l is the interval [ -1,1]Random numbers on the same.

And (5.9) adding 1 to the iteration number, judging whether the maximum iteration number of the algorithm is reached, ending if the maximum iteration number is reached, otherwise, entering (5.10).

And (5.10) sending the optimal solution output by the whale algorithm to the GRU model, wherein the optimal solution is the current optimal hidden layer unit number and learning rate of the GRU model, calculating training errors, and returning to the step (5.2).

Step 6: and comparing the test sample with the result of the prediction model of the invention, calculating Root Mean Square Error (RMSE) and average absolute percentage error (MAPE) of the predicted flood flow and the actual flood flow, and evaluating the effectiveness of the improved WOA-GRU flood flow prediction method.

Specific formulas of the mean absolute error (RMSE) and the Mean Absolute Percent Error (MAPE) are respectively:

wherein ,y_ob (t) is the true value of the t sample, y _pre (t) is the predicted value of the t-th sample, and N is the total number of samples.

To demonstrate the effectiveness and improvement of the proposed algorithm, 4048 data from the Min river basin high-field hydrologic observation station, 4 months 2009 through 4 months 2020, located in the high-field town of the Yibin, sichuan province, were taken as examples. An algorithm program is written by MATLAB language, and the prediction model and four groups of comparison prediction models are respectively constructed: particle swarm optimization BP neural network and GRU prediction model

(PSO-BP and PSO-GRU), a prediction model for a whale algorithm optimized BP neural network and GRU (WOA-BP and WOA-GRU), and a prediction model for a whale algorithm optimized GRU based on the improvement of the invention (IWA-GRU).

Five prediction models of PSO-BP, WOA-BP, PSO-GRU, WOA-GRU and IWOA-GRU are simulated for 10 times respectively, the maximum value and the minimum value of each performance index are selected respectively, meanwhile, the average value of the performance indexes simulated for 10 times of each model is calculated, and the running statistical results of each model are shown in table 1.

Table 1 shows the results performance index statistics of the inventive model and the control model

As is apparent from Table 1, the average MAE of IWOA-GRU was 293.8928, the average RMSE was 797.4514, and the average MAPE was 0.1034. The result of the improved whale algorithm for optimizing GRU to predict is obviously superior to the result of other prediction models, and various indexes are improved. According to the improved whale optimizing method, optimizing capability of a whale optimizing algorithm is improved, and prediction accuracy of a GRU prediction model optimized based on the improved whale algorithm is also improved.

The error and predicted value of a group of samples with the minimum RMSE in 10 simulations are selected for each model, the ratio of the error to the true value is calculated, and the number of samples less than 0.2 and 0.1 accounting for the total number of samples is recorded, namely, the accuracy of 80% and the accuracy of 90% are recorded, and the results are shown in table 2.

Table 2 shows the results accuracy statistics of the model of the present invention and the model of the control group

As apparent from the table 2, the prediction accuracy of the IWOA-GRU is obviously higher than that of the control group model, and the prediction accuracy of the optimized GRU prediction model based on the improved whale algorithm is also improved, so that the prediction accuracy is also improved.

Fig. 3 is a graph comparing the predicted result with the actual value in the above embodiment, and it can be seen from the graph that the predicted result of the present invention can basically follow the actual value, can basically predict the future runoff amount, and can predict the arrival of flood in time.

Based on the same inventive concept, the invention also provides an improved WOA-GRU flood flow prediction system, as shown in fig. 1, comprising five modules, namely:

the feature extraction module is used for extracting features in the data of the related factors, calculating the importance of each sample data through a random forest model, sequencing the sample data, and selecting the factor data with high importance as a sample data set of the prediction module;

the parameter optimization module is used for establishing a prediction model for optimizing the GRU based on the improved whale algorithm by using the number of hidden layer units and the learning rate in the GRU model which is optimized by the improved whale algorithm;

and the prediction module is used for inputting the data of the sample data output by the characteristic extraction module into the model by using the established flood flow prediction model, and obtaining a prediction result through calculation.

Those skilled in the art will appreciate that the flood flow prediction method of the present invention is implemented by computer program instructions. These program instructions may be stored in a memory and executed by a processor to perform the functions of the present invention. There is also provided a computer-readable storage medium storing a computer program according to the present invention. The method of predicting flood flow according to the present invention may be devised as a related computer program, which may be executed and implemented by means of the computer readable storage medium.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that it is intended that the invention include not be limited to the embodiments described, but that it also be considered to be within the scope of the invention, for example, as defined by the following claims.

Claims

1. An improved WOA-GRU-based flood flow prediction method, comprising the steps of:

(5) Initializing WOA algorithm parameters, generating a population by adopting a chaotic Tent initialization strategy, adding a nonlinear factor and inertia weight to improve position updating, and simultaneously adding a catfish effect strategy to improve the ability of a whale optimization algorithm to jump out of local optimization; optimizing the number n of hidden layer units and the learning rate epsilon of the GRU model by utilizing an improved WOA algorithm to form an improved WOA-GRU;

(6) An improved WOA-GRU flood flow prediction model is established, the flood flow is predicted by using the test set and the prediction model, and an error and a prediction result are output;

the step (5) comprises the following steps:

(53) Adding catfish effect strategies: ranking the fitness values of all the current individuals in order from large to small, and re-using chaos initialization for the positions of the last 50% of individuals to improve the ability of a whale algorithm to jump out of local optimum;

wherein T is the current iteration number, T is the maximum iteration number, lambda is the adjustment coefficient, and n is the number of sample data;

X(t+1)＝ω·X _rand -A·|C·X _rand -X _t |

X(t+1)＝ωX _best -A·|C·X _best -X _t |

X(t+1)＝D _bset ·e ^bl ·cos(2πl)+(1-ω)·X _best

wherein X is the location of the individual, D _best ＝|X _best -X _t The representation indicates that the individual X is located from the optimal individual X prior to the location update _best Length b is constantThe number, l, is the interval [ -1,1]A random number on the table;

2. The improved WOA-GRU based flood flow prediction method of claim 1, wherein the flood related environmental data of step (1) includes rainfall, temperature, humidity.

3. The improved WOA-GRU based flood flow prediction method according to claim 1, wherein step (2) comprises the steps of:

(24) The importance A of the relevant factor is calculated as follows:

4. The improved WOA-GRU based flood flow prediction method according to claim 1, wherein the normalization process of step (3) is formulated as:

5. The improved WOA-GRU based flood flow prediction method according to claim 1, wherein said step (4) comprises the steps of:

r _t ＝σ(W _r ·X _t +Q _r ·h _t-1 +b _r )

u _t ＝σ(W _u ·X _t +Q _u ·h _t-1 +b _u )

6. An improved WOA-GRU-based flood flow prediction system employing the method of any one of claims 1-5, comprising a data acquisition module, a data preprocessing module, a feature extraction module, a parameter optimization module, and a flow prediction module;