CN114819386A

CN114819386A - Conv-Transformer-based flood forecasting method

Info

Publication number: CN114819386A
Application number: CN202210532559.3A
Authority: CN
Inventors: 冯钧; 王众沂; 巫义锐; 陆佳民
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2022-05-19
Filing date: 2022-05-19
Publication date: 2022-07-29

Abstract

The invention belongs to the technical field of data-driven water flow forecasting and discloses a Conv-Transformer-based flood forecasting method which comprises the steps of firstly, collecting hydrologic data of a large-scale basin to be researched, and then inputting the collected hydrologic historical data into a model after data preprocessing; secondly, performing data cleaning, data transformation, data set division and the like on the hydrologic historical data; thirdly, constructing a Transformer-based flood forecasting model, extracting spatial information by using a convolution-operation-based long-short term memory network to perform relative position coding, and improving the learning capacity of the model on long-term dependency information, wherein a self-attention mechanism in a Transformer module can capture dynamic space-time correlation between hydrological elements by capturing internal correlation of hydrological sequences, and a multi-head attention mechanism enables the model to realize the simultaneous learning of long-term and short-term hydrological historical information; inputting test data to test and forecast the model performance, judging whether the network performance meets the requirements, and if not, adjusting parameters until an ideal prediction result is achieved; and finally, analyzing the model through the evaluation standard to finish flood forecasting. The invention has the beneficial effects that: the flood peak precision and the flood trend can be effectively forecasted, and the method is an effective tool for forecasting the flood in the large-scale drainage basin in real time.

Description

Conv-Transformer-based flood forecasting method

Technical Field

The invention relates to the technical field of data-driven flood forecasting, in particular to a Conv-Transformer-based flood forecasting method.

Background

Flood forecasting belongs to one of a series of important non-engineering measures for preventing flood disasters, timely and effective flood early warning and forecasting can help people to effectively defend flood and reduce flood damage, and belongs to important disaster prevention and reduction applications.

At present, flood forecasting generally adopts two modes, namely a hydrological model based on a runoff process and a data-driven intelligent model, and the two models complement each other in actual forecasting. The data-driven modeling basically does not consider the physical mechanism of the hydrological process, and is a black box method with the aim of establishing the optimal mathematical relationship between input and output data. The long-term dependence information needs to be considered in the flood disaster caused by long-term rainfall. Unlike sudden flood disasters caused by short-term heavy rainfall, the duration of flood caused by long-term rainfall is longer, and the flood peak is relatively backward, so that the long-term dependence characteristic of hydrological data must be perfectly modeled, and only short-term information cannot be considered. And when the task of flow prediction is performed for a large drainage basin, the number of flood fields caused by long-term rainfall is large. However, the existing flood intelligent model still has the problem that long-term sequence information is easy to lose in the network training process. The long-term characteristics and the short-term characteristics are comprehensively considered when the intelligent model is constructed, especially the learning of the long-term dependence information which is easy to lose is enhanced, and the accuracy of the flood forecast of the large-scale drainage basin is improved.

To better alleviate the problem of parallelizing recursive computation in the recurrent neural network, google team proposed a Transformer model in 2017. The model does not use a recurrent neural network unit structure, instead uses a multi-head attention mechanism to learn the association dependency relationship between word vectors, and realizes efficient parallel processing. Recently, the Transformer model is widely applied to various fields such as computer vision, natural language processing and time series prediction. The invention constructs a Transformer flood forecasting model for carrying out relative position coding on a long-short term memory neural network based on convolution operation, the model firstly carries out relative position coding and spatial information extraction through the long-short term memory network based on convolution operation and a full connection layer, and then carries out incidence relation mining between all characteristic elements by using a submodule based on a Transformer encoder, wherein a multi-head attention mechanism enables the model to simultaneously learn long-term and short-term hydrological historical information, thereby further realizing hydrological forecasting with higher accuracy.

In the Transformer in the natural language processing field, a loop structure of a loop neural network is abandoned, the relation of each input statement sequence and the relation between the input statement sequence and the output statement sequence are solved only by relying on a self-attention mechanism, and sine and cosine absolute position coding is carried out on the input statement in the original Transformer in the NLP field. However, the linear transformation of absolute coding easily loses position information, is difficult to control relative distance information of various hydrological features, and is not beneficial to extraction of hydrological space information. Therefore, the data contain the relative position information by changing the coding method, so that the flood forecasting intelligent model can learn the long-term dependence and the spatial dependence of the hydrological sequence, and the forecasting accuracy of the model is improved. The invention adopts a long-short term memory network based on convolution operation to carry out relative position coding, and realizes extraction of spatial features of hydrological data on the premise of meeting global information extraction.

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to solve the defects in the prior art and provide a Conv-Transformer-based flood forecasting method, which carries out relative position coding through a long-term and short-term memory network based on convolution operation, realizes extraction of spatial features of hydrological data on the premise of meeting global information extraction, efficiently excavates hydrological time information through a Transformer sub-module and improves accuracy of flood forecasting.

The technical scheme is as follows: the invention relates to a Conv-Transformer-based flood forecasting method, which comprises the following steps of:

step S1, collecting hydrological historical data of the large-scale basin;

step S2, preprocessing the collected hydrologic history data;

s3, carrying out relative position coding on the data subjected to data preprocessing through a long-term and short-term memory network based on convolution operation, and then carrying out time feature modeling on the hydrological historical data subjected to spatial information extraction through a Transformer submodule;

and step S4, testing the performance of the forecasting model through the forecasting value obtained by the model in each stage of the watershed, and analyzing the output data of the model through the evaluation standard to finish flood forecasting.

The step S1 is to collect basin history data, and the step S1 is further to: when the flow data are collected, the historical flow data of the outlet section of the basin are generally collected, and the recent data of the experimental survey station are covered. The meteorological data in the drainage basin mainly comprises the attributes of evaporation, rainfall, temperature, wind speed and the like.

The step S2 is data preprocessing, and the step S2 is further:

step S2.1, preprocessing the ground rainfall measurement station data in step S2 comprises data cleaning, data transformation and normalization;

the normalization formula is as follows:

wherein X is a normalized value, X _i Is an original value, X _min Is the minimum value in the original sequence, X _max Is the maximum value in the original sequence;

step S2.2, the preprocessing of the weather and flow attribute data in step S2 includes the construction of a two-dimensional matrix, and the specific operations are as follows: the column attributes formed by the two-dimensional matrix are historical runoff monitored by the hydrological station and various meteorological values detected by a plurality of meteorological monitoring stations; combining the runoff quantity and the meteorological value of each time period to obtain a final input two-dimensional matrix, and inputting the input matrix into the model subsequently;

s2.3, taking the first 80% of the data preprocessed in the step S2 as a model training set, and taking the last 20% of the data as a test set;

the drainage basin spatial geographic feature relationship is complex, and the drainage basin spatial geographic feature relationship needs to be researched from multiple angles, so as to fully mine the complex space-time features of the medium and small drainage basins, and the step S3 includes:

s3.1, performing relative position coding by using a long-term and short-term memory neural network based on convolution operation, extracting spatial information by perfecting the learning of global characteristic information, and improving the learning capability of a model to long-term dependency information, wherein the specific calculation method of the long-term and short-term memory network based on convolution operation in a relative position coding module is as follows:

f _t ＝σ(W _xf *x _t +W _hf *h _t-1 +W _cf ⊙C _t-1 +b _f )

i _t ＝σ(W _xi *x _t +W _hi *h _t-1 +W _ci ⊙C _t-1 +b _i )

C _t ＝f _t ⊙C _t-1 +i _t ⊙tanh(W _xc *x _t +W _hc *h _t-1 +b _c )

o _t ＝σ(W _xo *x _t +W _ho *h _t-1 +W _co ⊙C _t-1 +b _o )

h _t ＝o _t ⊙tanh(C _t )

wherein [ ] represents convolution operation, the [ ] represents Hadamard product, the σ represents activation function sigmoid function, and x _t Represented by the input data of the neuronal cells at time t, h _t Representing the state of the information passed to the next layer at time t, C _t Representing the value of the information state of the neuronal cell at time t, f _t 、i _t And o _t Respectively representing a forgetting gate, an input gate and an output gate;

and S3.2, after the hidden layer of the long-term and short-term memory network based on convolution operation is processed, the output value is followed by a full connection layer. The output of the transform is used as the relative position coding result of the input data of the subsequent transform submodule, and the formula of the full connection layer is as follows:

R _L (x _t )＝ReLU(W _R h _t +b _R )

wherein ReLU is an activation function, W _R Represents a weight, b _R The deviation value is indicated.

And S3.3, constructing a Transformer sub-module to extract time characteristics. The module consists of a multi-head self-attention mechanism layer, a feedforward network layer, a residual error connection and normalization operation, and a calculation method thereof

The method comprises the following specific steps:

wherein the content of the first and second substances,

the matrix representing the input is first subjected to a MultiHead autonomy mechanism layer to complete the MultiHead () operation, and then L&And (3) performing Norm () residual error connection and normalization operation, finally performing FNN () calculation of a feedforward network to complete dimension conversion, and performing residual error connection and normalization again. The detailed operation process of the multi-head self-attention mechanism is as follows:

q _i ＝W ^Q a _i

k _i ＝W ^K a _i

v _i ＝W ^V a _i

first, the input matrix X ═ X ₁ ，x ₂ ，...，x _N Processed through the embedding layer to obtain ai ═ Wx _i (i ═ 1, 2,. cndot., N); then, three linear transformation weight matrices W are used ^Q ，W ^K ，W ^V Calculating to obtain q _i ，k _i ，v _i . Wherein q is _i ，k _i ，v _i Is required for the following calculationsThe three vectors query respective subvectors of Q, key K and value V. Input x ₁ Output b of the corresponding multi-head self-attention mechanism module ₁ The calculation process of (2) is as follows:

wherein alpha is _1，1 ，α _1，2 ，...，α _1，N Is q ₁ Respectively with each k _i The vector dot product is obtained by calculation and normalization, and is obtained by Softmax normalization operation

Finally respectively react with v _i Multiplication to obtain b ₁ . Other b _i The calculation process of (a) is similar to the formula, only the corresponding value of i needs to be replaced, and finally, the final output B is obtained after passing through various different heads ₁ ，b ₂ ，...，b _N }. Residual concatenation is the superposition of the input and the output of the multi-headed self attention mechanism layer. Layer normalized LayerNorm _γ，β (x) The detailed operation process is as follows:

where x represents the input to the neuron, μ represents the mean, σ ² The variance is expressed, ε is a constant added to prevent the denominator from being 0, and γ, β are coefficients. The main components of the feedforward neural network comprise two full-connection layers and a ReLU activation function, and the operation process is as follows:

FeedForward(Z)＝ReLU(ZW ₁ +b ₁ )W ₂ +b ₂

wherein Z represents the output of a multi-headed self-attentive mechanism layer, W ₁ 、W ₂ Is a weight coefficient, b ₁ 、b ₂ Is a deviation;

after the construction of the complete Conv-Transformer model, a series of experiments will be performed to verify model feasibility-step S4 includes:

s4.1, inputting the predicted value to test the performance of the prediction model, and judging the size change trend of the loss function value of the whole model until the loss function value is in a decreasing trend and tends to be gentle;

step S4.2, evaluating the performance of the model by using the data obtained by the test set, so as to improve the model by changing the parameters of the model, specifically, evaluating the flood forecasting result based on the attention mechanism by using three evaluation standards, wherein the three evaluation standards are average absolute error, decision coefficient and root mean square error, and the three evaluation standard formulas are as follows:

1) mean absolute error MAE:

wherein the content of the first and second substances,

-the actual observed value of the flow of the sample stream of the mth,

-the m-th sample river flow prediction value, N-the number of test samples;

2) determining the coefficient R ² ：

Wherein the content of the first and second substances,

-the actual observed value of the flow of the sample stream of the mth,

-a predicted value of the flow rate of the m-th sample river,

-the predicted mean value of river flow at the mth sample, N-the number of test samples;

3) root mean square error RMSE:

wherein, the first and the second end of the pipe are connected with each other,

-the actual observed value of the flow of the sample stream of the mth,

-the predicted average value of the m sample river flow, N-the number of test samples;

and S4.3, outputting a model mining result.

Has the advantages that: compared with the prior art, the invention has the advantages that:

the method utilizes a deep learning algorithm, adopts a Transformer flood forecasting method based on a convolution long-short term memory network to carry out relative position coding, is called Conv-Transformer method for short, and compared with the traditional method, the method enhances the modeling capability of local context information in a hydrological sequence and improves the flood forecasting accuracy. The model not only references a self-attention mechanism to strengthen key characteristic information in hydrological characteristic elements, but also simultaneously focuses on long-term and short-term information in historical flood data through different attention heads in a multi-head attention mechanism, so that time sequence information is captured more efficiently.

In addition, the invention also adds the relative position code of the long-short term memory neural network based on convolution calculation, describes the relation between each moment information and the current information, captures the long-term dependency information, can complete the task of acquiring global information from local perception, and realizes the extraction of the spatial characteristics of the multidimensional hydrological data. The invention realizes the comprehensive consideration of the dynamic relevance of the time characteristic and the spatial characteristic, improves the learning capacity of long-term dependence information, and effectively improves the accuracy of flood forecasting.

Drawings

FIG. 1 is a flow chart of an experiment according to the present invention;

FIG. 2 is a schematic diagram of a convolution-based long-term and short-term memory network unit structure of the flood forecasting model according to the present invention;

fig. 3 is a detailed block diagram of the present invention.

Detailed Description

The technical solution of the present invention is described in detail below, but the scope of the present invention is not limited to the embodiments.

As shown in fig. 1, a Conv-Transformer-based flood forecasting method of the present embodiment includes the following steps:

and step S1, collecting hydrological history data of the cun-Tibet hydrological station in the Yangtze river basin.

When the flow data are collected, the collected data are data of the cun-shoal drainage basin flow station, the data are historical flow data of an outlet section, and the date data of the experiment survey station in recent years are covered. The meteorological data in the drainage basin is selected from meteorological data of 110 meteorological monitoring stations in the drainage basin of the Yangtze river from the China national meteorological information center, and comprises an evaporation value, a rainfall value, a temperature value and a wind speed value.

Step S2, preprocessing the collected flow data and meteorological data of the cun-beach hydrological station;

the normalization formula is as follows:

the preprocessing of the weather and flow attribute data in the step S2.2 and the step S2 includes the construction of a two-dimensional matrix, and the specific operations are as follows: the column attributes of the two-dimensional matrix are historical runoff monitored by hydrographic monitoring stations in the cun-beach basin and various weather values detected by a plurality of weather monitoring stations; combining the runoff and meteorological values of each day in the data set to obtain a final input two-dimensional matrix, and inputting the input matrix into the model subsequently;

and S2.3, taking the first 80% of the data of the cun-beach watershed preprocessed in the step S2 as a model training set, and taking the last 20% of the data as a test set.

S3, carrying out relative position coding on the cun-Tibet hydrological data subjected to data preprocessing through a long-term and short-term memory network based on convolution operation, and then carrying out time feature modeling on the hydrological historical data subjected to spatial information extraction through a Transformer sub-module;

f _t ＝σ(W _xf *x _t +W _hf *h _t-1 +W _cf ⊙C _t-1 +b _f )

i _t ＝σ(W _xi *x _t +W _hi *h _t-1 +W _ci ⊙C _t-1 +b _i )

C _t ＝f _t ⊙C _t-1 +i _t ⊙tanh(W _xc *x _t +W _hc *h _t-1 +b _c )

o _t ＝σ(W _xo *x _t +W _ho *h _t-1 +W _co ⊙C _t-1 +b _o )

h _t ＝o _t ⊙tanh(C _t )

R _L (x _t )＝ReLU(W _R h _t +b _R )

The method comprises the following specific steps:

q _i ＝W ^Q a _i

k _i ＝W ^K a _i

v _i ＝W ^V a _i

first, the input matrix X ═ X ₁ ，x ₂ ，...，x _N Is processed through the embedding layer to obtain a _i ＝Wx _i (i ═ 1, 2,. cndot., N); then, three linear transformation weight matrices W are used ^Q ，W ^K ，W ^V Calculating to obtain q _i ，k _i ，v _i . Wherein q is _i ，k _i ，v _i Are the respective subvectors of the three vector queries Q, the key K and the value V required for the following calculations. Input x ₁ Output b of the corresponding multi-head self-attention mechanism module ₁ The calculation process of (2) is as follows:

wherein alpha is _1，1 ，α _1，2 ，...，α _1，N Is q ₁ Respectively with each k _i Performing vector dot productIs obtained by calculation and normalization, and is obtained by Softmax normalization operation

FeedForward(Z)＝ReLU(ZW ₁ +b ₁ )W ₂ +b ₂

wherein Z represents the output of a multi-headed self-attentive mechanism layer, W ₁ 、W ₂ Is a weight coefficient, b ₁ 、b ₂ Is a deviation.

And step S4, testing the performance of the forecasting model through a forecasting value obtained by the cun-beach basin through the model at each stage, and analyzing the forecasting data of the model through an evaluation standard to finish flood forecasting.

S4.1, testing the performance of the forecasting model through a predicted value obtained at each stage of the cun-Tibet data set, and judging the size change trend of the loss function value of the whole model until the loss function value is in a decreasing trend and tends to be flat;

1) mean absolute error MAE:

wherein the content of the first and second substances,

-the actual observed value of the flow of the cun-beach basin of the mth sample,

-the m-th sample river flow prediction value, N-the number of test samples;

2) determining the coefficient R ² ：

Wherein the content of the first and second substances,

-a predicted value of the flow rate of the m-th sample river,

the m th sampleThe average value of river water flow prediction, N, the number of test samples;

3) root mean square error RMSE:

wherein the content of the first and second substances,

and S4.3, outputting a model prediction result.

Claims

1. A flood forecasting method based on Conv-Transformer is characterized in that: the method comprises the following steps:

step S1, collecting hydrological historical data of the large-scale basin;

step S2, preprocessing the collected hydrologic history data;

and step S4, testing the performance of the forecasting model through a forecasting value obtained by the cun-beach basin through the model at each stage, and analyzing the output data of the model through an evaluation standard to finish flood forecasting.

2. The Conv-Transformer-based flood forecasting method according to claim 1, wherein: when the hydrological data are collected, the hydrological data comprise historical meteorological data in a basin and historical flow data of an outlet section of the basin, and the data preprocessing comprises data cleaning, data transformation and data set division.

3. The Conv-Transformer-based flood forecasting method according to claim 1, wherein:

step S2.1, the preprocessing of the hydrological historical data in the step S2 comprises data preprocessing including data cleaning, data transformation and normalization;

step S2.2, the preprocessing of the hydrological historical data in the step S2 comprises the construction of a two-dimensional input matrix;

and S2.3, taking the first 80% of the data preprocessed in the step S2 as a training set, and taking the last 20% of the data as a test set.

4. The Conv-Transformer-based flood forecasting method according to claim 1, wherein:

s3.1, performing relative position coding by using a long-term and short-term memory neural network based on convolution operation, extracting spatial information by perfecting the learning of global characteristic information, and improving the learning capability of a model on long-term dependence information;

s3.2, realizing linearization through a full connection layer;

and S3.3, performing time characteristic modeling on the hydrological historical data subjected to spatial information extraction through a multi-head self-attention mechanism of the transform submodule.

5. The Conv-Transformer-based flood forecasting method according to claim 1, wherein:

s4.2, evaluating the performance of the model by using the data obtained by the test set, so as to improve the model by changing the parameters of the model, specifically, evaluating the flood forecasting result based on the attention mechanism by using three evaluation standards, wherein the three evaluation standards are an average absolute error, a decision coefficient and a root-mean-square error;

and S4.3, outputting a model prediction result.