CN117520784A

CN117520784A - Groundwater level multi-step prediction method based on convolution attention long-short-term neural network

Info

Publication number: CN117520784A
Application number: CN202311622198.2A
Authority: CN
Inventors: 兰涛; 张黎明; 孙均雨; 秦广冲; 刘鑫; 张法兴
Original assignee: China Shipbuilding Group International Engineering Co ltd; Qingdao University of Technology
Current assignee: China Shipbuilding Group International Engineering Co ltd; Qingdao University of Technology
Priority date: 2023-11-30
Filing date: 2023-11-30
Publication date: 2024-02-06

Abstract

The invention provides a multi-step groundwater level prediction method based on a convolution attention long-short period neural network, which comprises the following steps: s10, acquiring monitoring data related to the underground water level to obtain multi-feature monitoring time sequence data; s20, constructing a multi-feature monitoring data sample tensor based on multi-feature monitoring time sequence data; s30, building a convolution attention long-term and short-term neural network model; s40, training a convolution attention long-short-period neural network model based on multi-feature monitoring data sample tensors to obtain an optimal model; s50, substituting the monitoring data for prediction into an optimal model, and outputting multi-step prediction data through model calculation. The invention can combine the convolutional neural network, the attention mechanism and the long-term memory neural network, is a hybrid neural network model, can greatly improve the prediction precision of the groundwater level for a long time in the future, and provides more reliable data support for groundwater disaster decision making.

Description

Groundwater level multi-step prediction method based on convolution attention long-short-term neural network

Technical Field

The invention relates to the field of civil engineering and hydraulic engineering disaster prevention and treatment research, in particular to a method for predicting a groundwater level based on deep learning, and specifically relates to a groundwater level multi-step prediction method based on a convolution attention long-short-term neural network.

Background

Groundwater is an important factor affecting the safety of underground works, and it may cause various engineering disasters such as: soil liquefaction, settlement or water inflow for underground tunnels, pipelines and other underground facilities. In addition, groundwater may also induce geologic hazards such as landslide, collapse, etc. Therefore, the accurate prediction of groundwater level changes can better prevent and manage potential engineering disasters.

Conventional groundwater level prediction methods generally rely on statistical models or physical models, which are limited to a certain extent by data quality and model complexity, and the prediction accuracy cannot be adapted to complex engineering environments.

With the development of deep learning technology, especially the rise of neural network structures such as Convolutional Neural Network (CNN) and long-term memory network (LSTM), new methods have been developed to solve the problem of groundwater level prediction. These methods utilize neural networks to automatically capture timing patterns and feature extraction in the data, and have demonstrated good performance in some groundwater level prediction tasks.

However, groundwater level data is affected by various factors such as weather, geology, hydrology, and the like, and the water level data has a high degree of diversity and complexity, and multi-step prediction of groundwater levels remains challenging. Groundwater level data is often subject to long-term dependencies, and therefore requires consideration of predictions for long periods of time in the future, which increases the difficulty of groundwater level prediction. Groundwater level data may also be affected by deletions, noise or outliers, requiring efficient data processing methods. Because accurate prediction of groundwater level is critical to engineering safety, a more effective, accurate and stable groundwater level prediction method is required to improve prediction accuracy to meet the requirements of engineering disaster prevention.

Disclosure of Invention

Aiming at the defects of the prior art, the invention establishes a multi-step groundwater level prediction method based on a convolution attention long-short period neural network, and the method can be combined with the convolution neural network, an attention mechanism and a long-short period memory neural network, is a hybrid neural network model, and can greatly improve the prediction precision of the groundwater level for a long time in the future.

In order to achieve the above purpose, the invention is realized by the following technical scheme:

the invention provides a multi-step groundwater level prediction method based on a convolution attention long-short period neural network, which comprises the following steps: s10, acquiring monitoring data related to the underground water level to obtain multi-feature monitoring time sequence data; s20, constructing a multi-feature monitoring data sample tensor based on multi-feature monitoring time sequence data; s30, building a convolution attention long-term and short-term neural network model; s40, training a convolution attention long-short-period neural network model based on multi-feature monitoring data sample tensors to obtain an optimal model; s50, substituting the monitoring data for prediction into an optimal model, and outputting multi-step prediction data through model calculation.

In a preferred embodiment, in step S10, the monitoring data is multi-sensor data collected according to time sequence, including: groundwater level height, rainfall, temperature sensor data, local reservoir displacement, local river level height.

In a preferred embodiment, in step S20, the constructing a multi-feature monitoring data sample tensor includes:

s201, preprocessing multi-feature monitoring time sequence data, filling the vacant data, and deleting abnormal data;

s202, the multi-feature monitoring time sequence data is formed into a multi-feature monitoring time sequence data matrix X ^* Wherein each row represents a time series sample and each column represents a different feature;

s203, the multi-feature monitoring time sequence data matrix X ^* Each column of data X ^* _i Carrying out normalization conversion to obtain a normalized sample matrix X;

s204, processing the monitoring time data, converting the date into a time stamp, and combining the time stamp into a normalized sample matrix X to serve as one column of the matrix;

s205, dividing the normalized sample matrix X into a training set, a verification set and a test set according to the number of lines;

s206, dividing the training set, the verification set and the test set matrix into three-dimensional tensors x_train, x_val and x_test respectively, wherein the dimensions of the three-dimensional tensors are as follows: batch size, time step, number of characteristic channels.

In a preferred embodiment, in step S203, the multi-feature monitoring time series data matrix X is monitored using the following formula ^* Each column of data X ^* _i And (3) carrying out normalization conversion:

where k is the number of sequences of each column of data, i.eRepresenting a multi-feature monitoring time series data matrix X ^* The data value of the kth time series data in the ith column,is time series dataIs also the ith row of the k time sequence data forming the normalized sample matrix X, and the value range of the ith row of data is [ []I.e.Andrespectively, i-th column data X in multi-feature monitoring time sequence data matrix X #) _i Normalized for each column of data X in the sample matrix X _i Representing a monitoring data i.

In a preferred embodiment, in step S204, the date is converted into a time stamp X using the following formula _t And timestamp X _t Incorporated into the normalized sample matrix X:

wherein t represents the date, X _t Representing the converted timestamp.

As a preferred embodiment, in step S30, the convolved attention-to long-term neural network model includes: the device comprises a convolution layer, a multi-head self-attention mechanism layer, a long-short period cyclic neural network layer and a full-connection layer.

As a preferred embodiment, the convolution layers include a one-dimensional convolution layer and a one-dimensional max-pooling layer, wherein:

the one-dimensional convolution layer is represented by the following formula:

where ReLU represents a modified linear element activation function, conv1d represents a one-dimensional convolution operation, W is the convolution kernel, b is the bias of the convolution layer, x _cov Representing a three-dimensional tensor;

the one-dimensional maximum pooling layer is expressed by the following formula:

wherein MaxPool1d represents a one-dimensional max pooling operation, i represents the starting position of the pooling core on the input sequence, kernel_size represents the pooling core size, x _pool Representing the output tensor of the one-dimensional convolution layer.

As a preferred embodiment, the multi-head self-attention mechanism layer is composed of a self-attention score layer and a multi-head attention layer, and the multi-head attention layer is expressed by the following formula:

where Q represents a Query, a representation of the attention score, K represents a Key Key, a representation of the Key for comparison with the Query, a V represents a Value, and a representation multiplied by the attention weight to produce a final output, Q, K, V, is obtained by a self-attention score layer linear transformation as follows, i represents the ith attention head in a multi-head self-attention mechanism:

wherein X is self-attention fractional layer input tensor, W ^Qi 、W ^Ki 、W ^Vi Is a trainable parameter matrix.

As a preferred embodiment, the long-short period circulating neural network layer includes: forget gate, input gate and output gate, cell state and hidden state; the long-period cyclic neural network layer processing data comprises the following steps:

(1) Calculating the values of a forgetting gate, an input gate and an output gate according to the hidden state of the previous time step and the input of the current time step in each time step;

(2) Determining information forgotten from the cell state using a forgetting gate;

(3) Updating the cell state using the input gate and the new candidate value, adding new information to the cell state;

(4) Determining information extracted from the cell state by using an output gate, and generating a hidden state of the current time step;

(5) The cell state and hidden state are transferred to the next layer of long-short-period recurrent neural network layer in the next time step or used to generate the final output;

(6) The output data of the long-short period cyclic neural network layer is accessed into the full-connection layer, the combination and the nonlinear transformation of the characteristics are realized, the three-dimensional tensor is output, and the dimensions are respectively: batch size, time steps, forecast days.

In a preferred embodiment, in step S40, the training convolutional attention long-short neural network model includes the following steps:

s401, selecting the size of a time window according to requirements, and adjusting and evaluating the time window;

s402, training by using a convolutional attention long-short neural network model, and adjusting model parameters by using training set data and applying a back propagation algorithm to reduce a loss function;

s403, optimizing the model by adopting an Adam optimizer and a proper learning rate;

s404, monitoring model performance indexes, loss functions and accuracy, and evaluating model performance by using a verification set to avoid overfitting;

s405, dynamically adjusting the learning rate by adopting a cosine annealing strategy;

and S406, when the loss value on the verification set reaches the lowest point, saving the model parameters to obtain the optimal model.

As a preferred embodiment, the model performance index is determined by means of a mean square error SME and a decision coefficient R ² Value measurement, multiple training to obtain mean square error SME and decision coefficient R ² Model at optimum.

Compared with the prior art, the invention has the beneficial effects that: according to the multi-step prediction method for the groundwater level based on the convolution attention long-short period neural network, which is provided by the invention, the convolution neural network and the attention mechanism in deep learning are utilized to analyze and extract the characteristics of multi-sensor monitoring time sequence data, and the extracted data are input into the long-short period neural network to perform model training so as to realize multi-step prediction for the groundwater level. Through the combination of the one-dimensional convolutional neural network and the attention mechanism, the accuracy of feature extraction of the two in the neural network model is fully exerted, and the accurate prediction of the groundwater level for a long time in the future is realized. The technical innovation aims at coping with the complexity of groundwater level monitoring and the need of predicting the groundwater level for a long time in the future, and can provide more reliable data support for groundwater disaster related decision making.

The invention relates to a multistage groundwater level prediction method based on a deep learning technology. Particular advantages include at least one or more of the following:

(1) The invention can realize accurate prediction of the groundwater level for a long time in the future based on multi-step prediction of the groundwater level of the LSTM model of the improved long-term cyclic neural network layer.

(2) The invention collects a plurality of monitoring indexes of a physical field, such as groundwater level height, rainfall, temperature sensor data, local reservoir drainage, local river water level height and the like, has comprehensive coverage data, can cope with the high diversity and complexity of groundwater level related data, and improves the model prediction precision.

(3) The method overcomes the defect that the groundwater level data is possibly influenced by missing, noise or abnormal values, adopts a proper mean value substitution mode, and improves the model prediction precision.

(4) According to the invention, the multi-sensor monitoring time sequence data is analyzed and the characteristics are extracted by utilizing the techniques of CNN, attention, LSTM and the like in deep learning, so that multi-step prediction of the groundwater level is realized.

(5) The method of combining the one-dimensional convolutional neural network with the attention mechanism improves the accuracy of feature extraction, and realizes more accurate prediction of the ground water level in the future for a plurality of days.

(6) The invention aims to cope with the complexity of groundwater level monitoring and prediction, and the prediction result can provide more reliable data support for decision making.

It should be understood that the implementation of any of the embodiments of the invention is not intended to simultaneously possess or achieve some or all of the above-described benefits.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those skilled in the art from this disclosure that the drawings described below are merely exemplary and that other embodiments may be derived from the drawings provided without undue effort.

The structures, proportions, sizes, etc. shown in the present specification are shown only for the purposes of illustration and description, and are not intended to limit the scope of the invention, which is defined by the claims, but rather by the claims.

FIG. 1 is a schematic overall flow diagram of a multi-step groundwater level prediction method based on a convolution attention length neural network;

FIG. 2 is a schematic diagram of the structure of a long-short-term memory network (LSTM) cell according to the invention;

FIG. 3 is a schematic diagram of a convolutional attention long-short neural network structure according to the present invention;

fig. 4 is a graph showing the comparison of the prediction results of (a) a general double-layer LSTM multi-step prediction method with the actual values, and (b) a convolution attention long-short neural network prediction method with the actual values.

Like or corresponding reference characters indicate like or corresponding parts throughout the several views.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the embodiments and the accompanying drawings. The exemplary embodiments of the present invention and their descriptions herein are for the purpose of explaining the present invention, but are not to be construed as limiting the invention.

In the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "secured," and the like are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communicated with the inside of two elements or the interaction relationship of the two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.

It should be understood that the terms "comprises/comprising," "consists of … …," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a product, apparatus, process, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such product, apparatus, process, or method as desired. Without further limitation, an element defined by the phrases "comprising/including … …," "consisting of … …," and the like, does not exclude the presence of other like elements in a product, apparatus, process, or method that includes the element.

It is further understood that the terms "upper," "lower," "front," "rear," "left," "right," "top," "bottom," "inner," "outer," and the like are used in an orientation or positional relationship based on that shown in the drawings, merely to facilitate describing the present invention and to simplify the description, and do not indicate or imply that the devices, components, or structures referred to must have a particular orientation, be configured or operated in a particular orientation, and are not to be construed as limiting the present invention.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

In order to better understand the above technical solution, the following detailed description will refer to the accompanying drawings and specific embodiments.

The invention provides a multi-step groundwater level prediction method based on a convolution attention long-short period neural network, which is shown in a flow chart in FIG. 1 and comprises the following steps: s10, acquiring monitoring data related to the underground water level to obtain multi-feature monitoring time sequence data; s20, constructing a multi-feature monitoring data sample tensor based on multi-feature monitoring time sequence data; s30, building a convolution attention long-term and short-term neural network model; s40, training a convolution attention long-short-period neural network model based on multi-feature monitoring data sample tensors to obtain an optimal model; s50, substituting the monitoring data for prediction into an optimal model, and outputting multi-step prediction data through model calculation. The invention relates to a multi-step prediction method for groundwater level based on deep learning technology, which utilizes a convolutional neural network and an attention mechanism in deep learning to analyze and extract characteristics of multi-sensor time sequence data, inputs the extracted data into a long-short period neural network to perform model training, finally realizes multi-step prediction for groundwater level and provides more reliable data support for decision making related to groundwater disaster.

The multi-step prediction means that data for a plurality of days can be predicted and output by substituting the monitoring data into an optimal model. For example, given 30 days of monitoring data, the usual method uses 30 days of data to predict 31 days of data, and only 1 day of groundwater level data. While the multi-step prediction method predicts 31 days to 40 days of data using 30 days of data, which corresponds to predicting 10 days of groundwater level data. Of course, the method provided by the invention can also predict data of more days when the complexity of the monitored data is low and the accuracy of the final predicted result is in the required range.

In step S10, monitoring data related to the groundwater level is obtained, so as to obtain multi-feature monitoring time sequence data.

The ground water level related monitoring data refers to data comprising a plurality of characteristics, particularly including but not limited to the following characteristics: the groundwater level height, rainfall, temperature sensor data, local reservoir displacement, local river water level height and the like can be acquired according to time series. These data are recorded in real-time on a daily use monitoring instrument or sensor, including the number of rainfall millimeters for the time and for a particular value, such as a particular date. Of course, due to real-time recording of the monitoring instrument or sensor, the time dimension may be more finely segmented, such as per hour, minute, etc., as required by the model.

After the data acquisition is completed, multi-characteristic monitoring time sequence data are obtained through the monitoring data related to the underground water level and are arranged into a database for standby.

In step S20, a multi-feature monitoring data sample tensor is constructed based on the multi-feature monitoring time series data.

In this step, the data preprocessing and data segmentation are mainly performed on the multi-feature monitoring time sequence data, and specifically includes the following sub-steps:

in step S201, the multi-feature monitoring time series data is preprocessed. Due to the complexity of the data acquisition process, the acquired multi-feature monitoring time sequence data often have missing values, error values and extreme values. Entering erroneous data into the model affects the accuracy of the model, and therefore requires preprocessing of the erroneous data prior to modeling. In some embodiments, the error data may be replaced or deleted by mean fitting. It is easy to understand that the mean fitting method replaces the error data with the mean value of the related data according to the actual situation.

In step S202, multi-feature monitoring time series data such as groundwater level height, rainfall, temperature sensor data, local reservoir drainage, local river water level height and the like are formed into a multi-feature monitoring time series data matrix X ^* (i.e., a matrix of sample data) in which a single data represents a monitored data feature i, represented by X ^* _i The representation is made of a combination of a first and a second color,representing a multi-feature monitoring time series data matrix X ^* The data value of the kth time sequence data in the ith column, k is the time sequence number of each column of data.

In step S203, the multi-feature monitoring time series data matrix X ^* Each column of data X ^* _i And carrying out normalization conversion to obtain a normalized sample matrix X. Because of the diversity of the features of the monitoring time series data, the features of the data set have different value ranges. Therefore, before using the data, the data is converted into dimensionless pure quantities through normalization, so that different features have the same measurement scale, meanwhile, adverse effects caused by error data can be eliminated, and subsequent model training and prediction are convenient to carry out.

In some embodiments, the multi-feature monitoring time series data matrix X is monitored using the following formula ^* Each column of data X ^* _i Normalization processing:

where k is the number of sequences of each column of data, i.eRepresenting a multi-feature monitoring time series data matrix X ^* A data value of the ith column and the kth time series data;is time series dataIs also the ith row of the k time sequence data forming the normalized sample matrix, and the value range of the ith row of data is [ []I.e.Andrespectively, i-th column data X in multi-feature monitoring time sequence data matrix X #) _i Normalized for each column of data X in the sample matrix X _i Representing a monitoring data i.

The normalization conversion is used as one step of data preprocessing, and the multi-feature monitoring time sequence data interval is converted into [0,1], so that training of a deep learning model is better adapted, and the convergence speed is increased. When the evaluation standards of the sample data are different, the dimensionality is required, and the influence of the dimensionality on the evaluation result can be eliminated by normalizing, so that the comparability between different indexes is realized.

After the normalization processing, each characteristic data sequence is converted into the same metric scale, and the data sequences are all mapped to [0,1], so that later model calculation is facilitated.

In step S204, the monitoring time data is processed to convert the date into a time stamp.

In some embodiments, the conversion is performed using the following formula:

in the formula, t represents the date, xt represents the converted time stamp, and the obtained Xt is incorporated into a normalized sample matrix X to serve as a column of the sample matrix X, so that data can be conveniently called in the modeling process.

Taking 1 month and 10 days as an example, when t=10, xt is 0.59. Date data is mapped to [0,1] through timestamp conversion, so that later model calculation is facilitated.

Here, the monitoring time data refers to a date data column. For example, a series of date data of 1 month 1 day to 1 month 31 days, in total, 31 rows, each row being one of the days, are arranged in date order.

By mapping the date to a range between 0 and 1 using a formula, as a periodic feature, it is advantageous to capture seasonal trends in groundwater level. Further, the sine function may make the converted result smoother.

In step S205, the normalized sample matrix X is divided into a training set, a verification set, and a test set according to the number of rows, wherein 70% of the data is used as training set data, 10% of the data is used as verification set data, and 20% of the data is used as test set data. In machine learning, the training set, validation set, test set are three important parts of the data set for training, evaluating, and testing the performance of the machine learning model.

The training set is a data set used by the machine learning model for training and learning, and is used to train parameters of the model. The verification set is a data set for evaluating the performance of the model, and is used for adjusting parameters of the model in the training process, improving the performance of the model and avoiding over fitting or under fitting of the model. The test set is a data set for evaluating the final performance of the model, is not overlapped with the training set and the verification set, and judges whether the model is accurate or not. Generally, the training set is in a relatively large proportion, typically 60% -80% of the total data set, and the validation or test set is in a relatively small proportion, typically 10% -20% of the total data set.

In step S206, the training set, the verification set, and the test set matrix are respectively divided into three-dimensional tensors x_train, x_val, and x_test, which are used as input tensors of the neural network model, and the dimensions are respectively: batch size, time step, number of characteristic channels, the three dimensions are input dimensions of the CNN, LSTM and other models, and the data is divided into the three dimensions so as to be conveniently input into a neural network for training. In some embodiments, the batch size is set to be generally 32 based on the size of the computer memory used to train the neural network model. And the time step is set optimally by user definition, for example, the using time step is 10. The characteristic channel number is consistent with the characteristic number of the normalized sample matrix X of the multi-characteristic monitoring time sequence data, and if five data characteristics are adopted, the characteristic channel number is 5. The training data unit x_train after data processing is in the form of a Tensor 32,10,5.

In step S30, a convolutional attention-to-short-period neural network model is established. Referring to fig. 3, the neural network model includes: the device comprises a convolution layer, a multi-head self-attention mechanism layer, a long-short period cyclic neural network layer and a full-connection layer.

Specifically, the convolution layer is composed of a one-dimensional convolution layer and a one-dimensional maximum pooling layer.

The one-dimensional convolution layer is represented by the following formula:

where ReLU represents a modified linear element activation function, conv1d represents a one-dimensional convolution operation, W is the convolution kernel, b is the bias of the convolution layer, x _cov Is the three-dimensional tensor constructed in step S206.

The one-dimensional max-pooling layer is represented by the following formula:

wherein MaxPool1d represents a one-dimensional max pooling operation, i represents the starting position of the pooling core on the input sequence, kernel_size represents the pooling core size, x _pool Representing the output tensor of the one-dimensional convolutional layer (the upper neural network layer).

At each position i of the input sequence, a maximum value is selected as output from the subsequences of positions i to i+kernel_size by the one-dimensional maximization layer. This effectively reduces the length of the input sequence, preserving the most important features.

The convolution layer realizes the following key functions through one-dimensional convolution operation and pooling operation: capturing local features from the input data, helping to identify patterns, structures, and associated information in the data; the data scale is reduced, and the network parameters and the calculated amount are reduced, so that the problem of over-fitting is solved; the translation invariance is realized through a weight sharing mechanism, namely the same characteristics are detected at different positions, so that the robustness of the network to the position change of the input data is improved.

Specifically, the multi-head self-attention mechanism layer is composed of a self-attention score layer and a multi-head attention layer, and the multi-head attention layer is expressed by the following formula:

specifically, the multi-headed attention layer is represented by the following formula:

where Q represents a Query (Query), which is a representation used to calculate the attention score, which represents what you are paying attention to. K represents a Key (Key), which is a representation used to compare with a query, and represents information in the input. V represents a Value (Value), which is a representation multiplied by the attention weight to produce a final output, the Value representing what is desired to be output. Q, K, V is obtained by a hierarchical linear transformation of the self-attention score, i representing the ith attention head in a multi-head self-attention mechanism:

Is a normalization factor for scalingTo ensure that they are within the proper range, helping training and stability of the model.

The attention mechanism is usually applied to natural language processing and machine translation, and meanwhile, the effect of the attention mechanism in time sequence data prediction is also very prominent, and the core functions of the attention mechanism comprise the attention of an optimization model to input characteristics of different time steps, so that the dynamic weight distribution is realized, the model can be more focused on information related to a current task, the long-term dependency relationship in time sequence data is favorably solved, and the gradient problem possibly faced by the traditional RNN is overcome. Furthermore, the attention mechanism provides an explanation of model decisions, enabling the network to understand clearly which information is critical to a particular prediction. The attention mechanism also helps to merge multimodal data so that different data sources can be effectively integrated according to task requirements.

Specifically, the long-short-period circulating neural network Layer (LSTM) is a variant of a circulating neural network (RNN) commonly used for processing sequence data, and can effectively solve the long-sequence problem, and is used for solving the problems of gradient disappearance, gradient explosion and the like, and the structure of LSTM neuron cells is shown in fig. 2.

In fig. 2, xt represents the input of the current time step (t), ht represents the hidden state of the current time step (t), σ represents the Sigmoid function, tanh is the tangent function, and functions to map the input value to a range of-1 to 1, and X represents the element-by-element multiplication operation.

In LSTM, there are three gates: the forgetting gate, the input gate and the output gate and the gating mechanism in the LSTM structure can better control the flow of information, and effectively avoid the problems of interference and gradient disappearance of irrelevant information. In addition, LSTM has cell status and hidden status, and the workflow of LSTM is as follows:

(5) The cell state and hidden state are transferred to the next layer LSTM or used to generate the final output in the next time step;

(6) And (3) accessing the LSTM output data into a full connection layer to realize combination and nonlinear transformation of the characteristics.

The fully connected layer (Fully Connected Layer) is a base layer in a neural network, also known as a dense layer or a multi-layer sensor layer. Its main function is to connect all neurons of the previous layer of the neural network with each neuron of the current layer, thereby achieving combination of features and nonlinear transformation.

The output result of the final neural network model is a three-dimensional tensor, and the dimensions are respectively: batch size, time step, number of days predicted, dimension [32,10,30].

In step S40, based on the multi-feature monitoring data sample tensor, training the convolutional attention long-short neural network model to obtain an optimal model. In this step, the built model is trained, model training is performed by using multi-feature monitoring data sample tensors, and model performance indexes, loss functions and accuracy are monitored. Model performance was evaluated using a validation set and Early stop method (Early stop) to avoid overfitting.

When the super-parameter setting is performed, the following super-parameters are required to be paid attention to: time window size, batch size, feature quantity, LSTM hidden layer number, prediction days, learning rate, and endurance value in early stop method. The specific size of the time window is selected according to the problem requirement, and needs to be adjusted and evaluated; the endurance value is the number of rounds used to monitor the validation set loss function value continuously without decreasing during model training. The training process is terminated when the number of rounds of validation set loss that continue without drop exceeds the endurance value. Setting the patience value helps to stop training when the model reaches optimal performance, to avoid overfitting and to save computational resources.

The step S40 specifically includes the following sub-steps:

in step S401, the specific size of the time window should be selected according to the requirement of the problem, that is, the size of the time window is selected according to the requirement, and then the time window is adjusted and evaluated;

in step S402, the multi-feature monitoring time series data is preprocessed and then trained by using an Attention-LSTM model. Model training is performed using the training set data using a back propagation algorithm. The back propagation algorithm is called BP algorithm for short, is suitable for a learning algorithm of a multi-layer neuron network, is based on a gradient descent method, and has strong function reproduction capability because the information processing capability is derived from multiple complexes of simple nonlinear functions.

Model parameters are continuously adjusted, and a loss function is reduced. The loss function is used to measure the degree of deviation of the predicted value from the true value of the model, and generally the smaller the better, the model parameters need to be adjusted to continually reduce the loss function in this step.

In step S403, optimization of the model is performed using an Adam optimizer and an appropriate learning rate. Adam optimizer is an adaptive optimization algorithm that can adjust the learning rate based on historical gradient information. In the actual use process, the learning rate can be adjusted according to the data scale and the data complexity, and the optimal range is generally [0.0001,0.01].

In step S404, model performance metrics, loss functions, and accuracy are monitored, and model performance is evaluated using a validation set to avoid overfitting. Mean square error SME for model performance and decision coefficient R ² Value measurement, SME and decision coefficient R are obtained through multiple training ² Model at optimum. The loss function is an operation function for measuring the difference degree between the predicted value and the true value of the model, and is a non-negative real value function, and the smaller the loss function is, the better the robustness of the model is.In the process of tuning, the model is solved and evaluated by minimizing the loss function. The accuracy of the model refers to the number of all predicted correct samples/total observed samples in the model for a given test set, and the accuracy of the model should be improved as continuously as possible during the parameter adjustment process.

In step S405, a cosine annealing strategy is used to dynamically adjust the learning rate. The cosine annealing strategy adjusts the learning rate to a cosine function value between a minimum value and a maximum value within a specified range. The learning rate gradually increases from a maximum value to a minimum value during training. The learning rate is smoothly and dynamically adjusted through a cosine annealing strategy, so that local optimum is effectively avoided, self-adaptive learning rate adjustment can be realized, and the model training process is simplified.

In step S406, when the loss function value on the verification set reaches the minimum point, the model parameters are saved, and the optimal model is obtained.

And through the execution of the steps, obtaining an optimal prediction model for predicting the underground water level.

In step S50, the monitoring data for prediction is substituted into the optimal model, and the multi-step prediction data is outputted through model calculation.

When groundwater level prediction is carried out, input data is consistent with the input data structure of the training model, three-dimensional tensors are input, the tensor dimension is the batch size, the time step and the characteristic channel number. The predicted output result is a three-dimensional tensor, and the tensor dimension is the batch size, the time step and the predicted days.

The invention relates to a multi-step groundwater level prediction method based on an improved convolution attention long-short neural network model LSTM model. And analyzing and extracting characteristics of the multi-sensor time sequence data by utilizing the CNN, attention and LSTM technologies in deep learning so as to realize multi-step prediction of the groundwater level. By adopting a method of combining a one-dimensional convolutional neural network with an attention mechanism, the accuracy of feature extraction is improved, and more accurate prediction of the ground water level in the future for multiple days is realized. This technical innovation aims to address the complexity of groundwater level monitoring and prediction and provide more reliable data support for decision making.

Referring to fig. 4, the present embodiment is exemplified by the water level data of petrician (petricnano) in italy, and example data features include water level height, rainfall, temperature sensor data, local reservoir drainage, local river water level height in 2009 to 2016 of petrician in italy. The predicted time is 30 days of groundwater level in future.

In FIG. 4, (a) is a comparison of the normal double-layer LSTM multi-step prediction result with the true value, the mean square error value SME is 0.367, and the coefficient R is determined ² 0.710; (b) For comparison of convolution attention long-short period neural network prediction result and true value, the mean square error SME is 0.289, and the coefficient R is determined ² 0.822. Therefore, by the multi-step prediction method of the groundwater level based on the convolution attention long-short-term neural network, errors of a prediction model are reduced, accuracy of model prediction is improved, and accurate prediction of the groundwater level for a long time in the future is achieved.

While several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the invention. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.

The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. The multi-step groundwater level prediction method based on the convolution attention long-short-term neural network is characterized by comprising the following steps of:

s10, acquiring monitoring data related to the underground water level to obtain multi-feature monitoring time sequence data;

s20, constructing a multi-feature monitoring data sample tensor based on multi-feature monitoring time sequence data;

s30, building a convolution attention long-term and short-term neural network model;

s40, training a convolution attention long-short-period neural network model based on multi-feature monitoring data sample tensors to obtain an optimal model;

s50, substituting the monitoring data for prediction into an optimal model, and outputting multi-step prediction data through model calculation.

2. The prediction method according to claim 1, wherein in step S10, the monitoring data is multi-sensor data acquired according to time sequence, including: groundwater level height, rainfall, temperature sensor data, local reservoir displacement, local river level height.

3. The method according to claim 1, wherein in step S20, the constructing a multi-feature monitoring data sample tensor includes:

4. A prediction method according to claim 3, characterized in that in step S203, the multi-feature monitoring time series data matrix X is monitored using the following formula ^* Each column of data X ^* _i And (3) carrying out normalization conversion:

；

where k is the number of sequences of each column of data, i.eRepresenting a multi-feature monitoring time series data matrix X ^* Data value of the ith column k time series data,/->Is time series data->Is also the ith row of the k time sequence data forming the normalized sample matrix X, and the value range of the ith row of data is [>]I.e. +.>And->Respectively, i-th column data X in multi-feature monitoring time sequence data matrix X #) _i Normalized for each column of data X in the sample matrix X _i Representing a monitoring data i.

5. According to claimA prediction method according to claim 3, characterized in that in step S204, the date is converted into a time stamp X using the following formula _t And timestamp X _t Incorporated into the normalized sample matrix X:

；

wherein t represents the date, X _t Representing the converted timestamp.

6. The prediction method according to claim 1, wherein in step S30, the convolved attention-long term neural network model includes: the device comprises a convolution layer, a multi-head self-attention mechanism layer, a long-short period cyclic neural network layer and a full-connection layer.

7. The prediction method of claim 6, wherein the convolutional layers comprise a one-dimensional convolutional layer and a one-dimensional max-pooling layer, wherein:

the one-dimensional convolution layer is represented by the following formula:

；

8. The prediction method according to claim 6, wherein the multi-headed self-attention mechanism layer is composed of a self-attention score layer and a multi-headed attention layer, the multi-headed attention layer being expressed by the following formula:

；

9. The prediction method according to claim 6, wherein the long-short-period recurrent neural network layer comprises: forget gate, input gate and output gate, cell state and hidden state; the long-period cyclic neural network layer processing data comprises the following steps:

10. The prediction method according to claim 1, wherein in step S40, the training the convolutional attention long-short neural network model includes the steps of:

11. The prediction method according to claim 10, wherein the model performance index uses a mean square error SME and a decision coefficient R ² Value measurement, multiple training to obtain mean square error SME and decision coefficient R ² Modulus at optimumType (2).