CN117786396A

CN117786396A - Short-term sea surface temperature prediction method and system based on CSA-ConvLSTM model

Info

Publication number: CN117786396A
Application number: CN202311814702.9A
Authority: CN
Inventors: 林连雷; 张宗伟; 高升; 王俊凯; 于航懿
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2023-12-27
Filing date: 2023-12-27
Publication date: 2024-03-29

Abstract

The invention discloses a short-term sea surface temperature prediction method and system based on a CSA-ConvLSTM model, and belongs to the technical field of sea surface temperature prediction. The method comprises the following steps: acquiring a sea surface temperature SST historical time sequence data set; inputting the historical time sequence data set into a multi-layer CSA-ConvLSTM network model for training; and inputting SST data of the sea surface temperature to be predicted into a trained multilayer CSA-ConvLSTM network model to obtain a sea surface temperature prediction result. According to the sea surface temperature prediction method, the CSA-ConvLSTM network model is adopted to predict sea surface temperature, so that the spatial dependence of sea surface temperature SST data can be fully excavated, the importance degree of the characteristic channel is measured, and the accuracy of sea surface temperature data prediction is improved.

Description

Short-term sea surface temperature prediction method and system based on CSA-ConvLSTM model

Technical Field

The invention relates to the technical field of sea surface temperature prediction, in particular to a short-term sea surface temperature prediction method and system based on a CSA-ConvLSTM model.

Background

The existing ocean surface temperature prediction method is mainly divided into two directions, namely a numerical model and a data driving model. The numerical model is a marine model constructed by a series of complex thermodynamic and physical equations, the calculated amount is huge, and the general hardware conditions are difficult to meet the operation of the model. The data driving model is used for constructing a prediction model by learning the internal change rule of the sea surface temperature time sequence data so as to realize the prediction of the future sea surface temperature. The common data driving method includes linear regression, support vector machine, artificial neural network, etc. For small-scale data, the method can mine rules to achieve good prediction effect. However, in the face of large-scale sea surface temperature data, such as temperature predictions for multi-point areas, conventional machine learning algorithms are somewhat forgiving.

With the rapid development of deep learning technology, the cyclic neural network RNN and its variants LSTM, GRU and the like have better effects in the sea surface temperature prediction field. An LSTM-based SST prediction model is proposed as in the paper by Q.Zhang et al, prediction ofSea Surface Temperature Using Long Short-Term Memory, where temperature predictions are made from a time-dependent perspective, but point-to-point spatial relationships are ignored. The sea surface of a sea area is a whole, and the temperature of a certain point can be always subjected to the thermodynamic action of the states of surrounding points. Lin Z et al in the paper Self-Attention ConvLSTM for Spatiotemporal Prediction [ J ] propose SA-ConvLSTM, combine Self-attention with LSTM, promote ConvLSTM long-term prediction performance, have obtained very good effect on sea temperature-early Nino prediction. [14] The Encoder-decoder model is combined with ConvLSTM to propose the D2CL algorithm. Facebook 2021 proposes a timeformer algorithm based on transducer for prediction of time-space characteristics, and has a good effect on sea-surface temperature prediction.

However, from the existing deep learning method, most sea surface temperature prediction methods pay attention to time features of the SST data, and spatial features are still not deep enough to extract, so that accurate change trend of the SST in space cannot be obtained, and the current prediction accuracy is still low.

Therefore, how to obtain accurate trends in the sea surface temperature SST in space is a problem that needs to be solved by those skilled in the art.

Disclosure of Invention

In view of the above, the invention provides a short-term sea surface temperature prediction method based on a CSA-ConvLSTM model, which is used for at least solving part of technical problems existing in the background art.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

the invention discloses a short-term sea surface temperature prediction method based on a CSA-ConvLSTM model, which comprises the following steps:

acquiring a sea surface temperature SST historical time sequence data set;

inputting the historical time sequence data set into a multi-layer CSA-ConvLSTM network model for training;

and inputting SST data of the sea surface temperature to be predicted into a trained multilayer CSA-ConvLSTM network model to obtain a sea surface temperature prediction result.

Preferably, before the step of inputting the historical time sequence data set into the multi-layer CSA-ConvLSTM network model for training, the method further comprises the step of establishing the multi-layer CSA-ConvLSTM network model.

Preferably, in the step of establishing the multilayer CSA-ConvLSTM network model, the multilayer CSA-ConvLSTM network model specifically includes:

a plurality of interconnected single-layer CSA-ConvLSTM networks, wherein the output of the former layer CSA-ConvLSTM network is used as the input of the latter layer CSA-ConvLSTM network;

the single-layer CSA-ConvLSTM network comprises a ConvLSTM module and a CSA module connected with the ConvLSTM module, wherein the CSA module comprises a channel attention sub-module and a space attention sub-module, and the output of the channel attention sub-module is used as the input of the space attention sub-module.

Preferably, the single layer CSA-ConvLSTM network is represented by the following expression:

wherein i is _t Memory gate input representing LSTM network at time t, sigma representing Sigmoid activation function, W _xi Representing the convolution input x at time t _t Is a memory gate convolution kernel weight, W _hi Representing the hidden state h at the time of t-1 _t-1 Is a memory gate convolution kernel weight, W _ci Representing the cell state c at time t-1 _t-1 Is the memory gate transformation weight of (a), O represents the multiplication of the matrix corresponding elements, b _i Representing a memory gate bias term;

f _t forgetting gate input representing LSTM network at time t, W _xf Representing the convolution input x at time t _t Is a forgetting gate convolution kernel weight, W _hf Representing the hidden state h at the time of t-1 _t-1 Is a forgetting gate convolution kernel weight, W _cf Representing the cell state c at time t-1 _t-1 Is the forgetting gate conversion weight of b _f Representing a forget gate bias term;

c _t representing the cell state at time t of the LSTM network, tanh represents tanh activation function, W _xc Representing a convolution input at time tx _t Is a convolution kernel weight, W _hc Representing the hidden state h at the time of t-1 _t-1 Is a cell state convolution kernel weight of b _c Representing a cell state unit bias term;

o _t represents the output of the LSTM network at time t, W _xo Representing the convolution input x at time t _t Output gate convolution kernel weight, W _ho Representing the hidden state h at the time of t-1 _t-1 Output gate convolution kernel weight, W _co Indicating the cell state c at time t _t Output gate transform weights of b) _o Representing an output gate bias term;

h _t the hidden state at time t is represented, SA represents a spatial attention calculation operation, and CA represents a channel attention calculation operation.

Preferably, the expressions of the spatial attention calculating operation SA and the channel attention calculating operation CA specifically include the following formulas:

SA＝σ((f ^3×3 +f ^5×5 +f ^7×7 )*([Avgpool(h)；Maxpool(h)]))

wherein f ^3×3 、f ^5×5 、f ^7×7 Convolution operations of 3*3, 5*5, 7*7, respectively;

CA＝σ(f ^1×1 ([Avgpool(h)+Maxpool(h)]))

wherein f ^1×1 Denote 1*1 convolution operations, avgpool denote average pooling operations, maxpool denote maximum pooling operations, and h denotes hidden states.

Preferably, acquiring a sea surface temperature SST historical time series dataset specifically includes:

dividing the sea surface into a plurality of grids according to space information, wherein each grid is used as a sea surface temperature SST observation point, and the space information comprises longitude and latitude;

constructing an H multiplied by W matrix according to all grid areas, wherein W represents the number of corresponding longitudes in the grid areas, and H represents the number of corresponding latitudes in the grid areas;

observing sea surface temperature SST data of a plurality of days, and forming a matrix formed by the SST data of the plurality of days into a sea surface temperature SST historical time sequence data set.

Preferably, the step of inputting the historical time sequence data set into the multi-layer CSA-ConvLSTM network model for training further comprises the step of adopting a Teacher forming mechanism to assist the multi-layer CSA-ConvLSTM network model for training, and specifically comprises the steps of replacing sea temperature data obtained by prediction in the previous day with true values, wherein the replacement proportion decays 3% according to each round of network training round until the replacement proportion is gradually reduced from 1 to 0, taking the sea temperature data after replacement as input in the next day of prediction, and circulating to output the prediction result in the last day.

Preferably, the historical time sequence data set is input into a multi-layer CSA-ConvLSTM network model for training, specifically comprising,

training the multi-layer CSA-ConvLSTM network model by adopting a two-stage self-adaptive loss function training method:

the first stage, using RMSE as the loss function of the model, to obtain initial training weight;

the loss function may be represented by the following formula:

wherein Ytrue represents the true value of the predicted variable, and Ypre is the predicted result;

the second stage, predicting by utilizing the initial training weight, and reassigning the loss function weight according to the predicting result to obtain the retraining assigned weight;

the retraining assigned weights are utilized as the final loss weights for the multi-layer CSA-ConvLSTM network model,

the final loss weight can be expressed by the following formula:

where δ is the weight assigned to retraining.

Preferably, the short-term sea surface temperature prediction method based on the CSA-ConvLSTM model further comprises the following steps:

and evaluating the prediction result of the CSA-ConvLSTM model by taking the mean square error MSE, the root mean square error RMSE, the mean absolute error MAE and the mean absolute percentage error MAPE as evaluation indexes.

The invention further discloses a short-term sea surface temperature prediction system based on the CSA-ConvLSTM model, which comprises a computer program for executing the short-term sea surface temperature prediction method based on the CSA-ConvLSTM model according to any one of claims 1-8.

Compared with the prior art, the invention discloses a short-term sea surface temperature prediction method and a short-term sea surface temperature prediction system based on a CSA-ConvLSTM model, which have the following beneficial effects:

according to the sea surface temperature prediction method, the CSA-ConvLSTM network model is adopted to predict sea surface temperature, so that the spatial dependence of sea surface temperature SST data can be fully excavated, the importance degree of the characteristic channel is measured, and the accuracy of sea surface temperature data prediction is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic overall flow chart of a method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of the overall structure of a multilayer CSA-ConvLSTM model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a single-layer CSA-ConvLSTM model according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a channel attention module according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a spatial attention module according to an embodiment of the present invention;

fig. 6 is a flowchart of a self-adaptive loss function training method according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

The embodiment discloses a short-term sea surface temperature prediction method based on a CSA-ConvLSTM model, which comprises the following steps as shown in figure 1:

acquiring a sea surface temperature SST historical time sequence data set;

Each step in the examples is further described below:

acquiring a sea surface temperature SST historical time sequence data set:

the sea surface is typically divided into grids according to spatial information (i.e., longitude and latitude). Each grid is an SST observation point and each cycle contains one SST value. All grid areas form an H x W matrix T representing a specific time T _i SST where W and H correspond to the number of grid areas along the longitude and latitude, respectively. All matrices in the history SST record form time series T1, T2, …, tt.

Inputting the historical time sequence data set into a multi-layer CSA-ConvLSTM network model for training:

the present embodiment performs model training through a historical segment of SST records, and uses the trained model to predict future SSTs.

For a certain set area, giving u-day history data T ₁ ,T ₂ ,...,T _u According to u daysData speculation SST data for future v days, formulated as:

T _u+1 ,...,T _u+v ＝g(T ₁ ,T ₂ ,...,T _u )

where g represents the predicted network. For example, when u=30, v=7, it means that SST data of 7 days in the future is predicted from SST history data of 30 days given.

In the training process, the historical data of N days is converted from two-dimensional plane data into three-dimensional data, wherein the three-dimensional data comprises the number of characteristic channels and the width and height of a characteristic map. And then, sequentially sending the historical input data into a multi-layer CSA-ConvLSTM network for network training, wherein the overall grid structure of the multi-layer CSA-ConvLSTM network is shown in figure 2, outputting Tn+1 when the input is Tn days, sending the Tn+1 days as new input into the network for reasoning to obtain data of Tn+2 days, and similarly obtaining SST data of Tn+t days. In order to accelerate network convergence, the direction is indicated to the parameter optimization of the initial training of the network, and Teacher forming is used for assisting the early training of the network. And (3) obtaining SST data of t days in the future through network reasoning, and carrying out loss calculation with the real data to obtain training loss. The training loss function uses the two-stage adaptive RMSE loss function we propose.

The traditional ConvLSTM combines the CNN with the LSTM, so that the spatial information extraction of time-series data is realized, and better prediction effect is achieved in the aspects of precipitation, air temperature and the like compared with the LSTM. The formula for the ConvLSTM grid is as follows:

c _t representing the cell state at time t of the LSTM network, tanh represents tanh activation function, W _xc Representing the convolution input x at time t _t Is a convolution kernel weight, W _hc Representing the hidden state h at the time of t-1 _t-1 Is a cell state convolution kernel weight of b _c Representing a cell state unit bias term;

h _t the hidden state at time t is indicated.

However, SST predictions have strong spatial correlation, and the original ConvLSTM has difficulty in meeting the prediction accuracy requirement. In order to realize deep mining of sea surface temperature space information, sea temperature at a certain point and other surrounding point temperatures are jointly considered, and the embodiment provides a CSA self-attention module which is more suitable for sea surface temperature detection.

As shown in FIG. 3, the structure of the single-layer CSA-ConvLSTM model is shown by the following expression:

In the ConvLSTM calculation process, firstly, the two-dimensional SST spatial distribution characteristics are converted into a 3-dimensional characteristic diagram with a channel of 1, 64 channels or more are achieved after convolution, and channel redundancy is caused while the spatial characteristics are extracted deeply. In order to make the model focus on important feature channels, adaptive feature optimization is achieved.

The present embodiment adds a channel attention sub-module and a spatial attention sub-module to the conventional ConvLSTM model.

FIG. 4 shows a channel attention submodule structure, in which the input F of the channel attention submodule is the output H [ c×H×W of ConvLSTM]. The input characteristic diagram h is respectively processed by Global Max Pooling and Global Average Pooling based on the characteristic diagram size to obtain h _m [c×1×1]、h _a [c×1×1]. Adding the output characteristics based on the ElementWise to obtain an aggregate characteristic h _c [c×1×1]. To enhance the inter-channel correlation, additional one-dimensional convolutions are used to calculate channel weights, and channel correlation for different receptive fields is obtained by adjusting the convolution kernel size. Generating final channel weight bit through Sigmoid activationSign chart h _c '. And carrying out an ElementWise multiplication operation on the channel weight feature map and the input feature F to generate an input feature F' required by the spatial attention module. The channel attention calculation operation may be expressed by the following formula:

CA＝σ(f ^1×1 ([Avgpool(h)+Maxpool(h)]))

wherein σ represents a Sigmoid activation function, f ^1×1 Representing 1*1 convolution, avgpool represents average pooling operations and Maxpool represents maximum pooling operations.

FIG. 5 is a schematic diagram of a spatial attention sub-module, where the input F 'of the spatial attention sub-module is the output of the channel module and has a size H' [ c×H×W ]]First, carrying out channel-based Global Max Pooling and Global Average Pooling on h to obtain a characteristic h _m ′[1×H×W]、h _a ′[1×H×W]The channel-based addition operation is then performed to aggregate features to obtain h _s ′[2×H×W]. In order for the spatial attention module to obtain different receptive fields to extract enough spatial features, we use three convolution checks h of different sizes _s ' respectively performing convolution operation, adding the convolved features, and obtaining a spatial feature layer h through Sigmoid activation function action _s [1×H×W]. Finally, input features F' and h _s The corresponding multiplication generates the final feature F ". The spatial attention calculation operation formula is as follows:

SA＝σ((f ^3×3 +f ^5×5 +f ^7×7 )*/([Avgpool(h)；Maxpool(h)]))

wherein σ represents a Sigmoid activation function, f ^3×3 、f ^5×5 、f ^7×7 Convolution operations 3*3, 5*5, 7*7, avgpool for average pooling operation and Maxpool for maximum pooling operation, respectively.

The formula of the single layer CSA ConvLSTM network model is therefore shown below.

In timing prediction, the loss function generally uses MSE or RMSE to uniformly calculate the loss in one prediction period. Such a loss calculation method ignores a change in loss with a longer prediction time, that is, the longer the time is, the more difficult the prediction is. After observing the experimental results, we found that the prediction errors of 5, 6 and 7 days are significantly larger than the previous days in the prediction time period of 1 to 7 days. This also confirms the previous discussion. According to the characteristics, in order to further improve the overall prediction accuracy of the model, the invention creatively provides a two-stage adaptive loss function training method, which respectively adjusts the training stages for the pre-training stage and the adaptive loss function.

As shown in fig. 6, the two-stage adaptive loss function training method is to spread the training process in two parts according to the difference of the loss functions. The first stage, using RMSE as a loss function, obtaining initial training weights; and in the second stage, retraining weight distribution is carried out according to the predicted result of the initial training weight, and adjustment training is carried out. The loss function of the overall training process can be expressed as follows.

Wherein Ytrue represents the true value of the predicted variable, ypre represents the predicted result, and delta represents the weight assigned to retraining. Through a large number of experiments, we select the predictive loss of the pre-trained model as the basis for the computation of δ.

Where RMSE' (i) is the pre-trained model prediction on day i. In the second training phase, the weight of the loss function is reassigned according to the predicted effect obtained by the pre-training. The factors with larger loss in the pre-training model can obtain larger loss contribution ratio, so that the training effect is improved, and higher prediction accuracy is obtained.

The predictive effect of our algorithm was evaluated using MSE, RMSE, MAE and MAPE. The following expression formula is the expression formula of four indexes, Y _True And Y _Pre Respectively representing the SST true value and the predicted result.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A short-term sea surface temperature prediction method based on a CSA-ConvLSTM model is characterized by comprising the following steps:

acquiring a sea surface temperature SST historical time sequence data set;

2. The short-term sea surface temperature prediction method based on a CSA-ConvLSTM model according to claim 1, wherein before the step of inputting the historical time series data set into the multi-layer CSA-ConvLSTM network model for training, the method further comprises the step of establishing the multi-layer CSA-ConvLSTM network model.

3. The short-term sea surface temperature prediction method based on a CSA-ConvLSTM model according to claim 2, wherein in the step of establishing a multi-layer CSA-ConvLSTM network model, the multi-layer CSA-ConvLSTM network model specifically comprises:

4. A short-term sea surface temperature prediction method based on a CSA-ConvLSTM model according to claim 3, wherein the single-layer CSA-ConvLSTM network is represented by the following expression:

wherein i is _t Memory gate input representing LSTM network at time t, sigma representing Sigmoid activation function, W _xi Representing the convolution input x at time t _t Is a memory gate convolution kernel weight, W _hi Representing the hidden state h at the time of t-1 _t-1 Is a memory gate convolution kernel weight, W _ci Representing the cell state c at time t-1 _t-1 Is used to determine the memory gate conversion weight of the memory gate,representing matrix corresponding element phasesMultiplication, b _i Representing a memory gate bias term;

5. The short-term sea surface temperature prediction method based on CSA-ConvLSTM model as recited in claim 4, wherein the hidden state h at time t _t The expression of the spatial attention calculation operation SA and the channel attention calculation operation CA specifically includes the following formulas:

SA＝σ((f ^3×3 +f ^5×5 +f ^7×7 )*([Avgpool(h)；Maxpool(h)]))

wherein f ^3×3 、f ^5×5 、f ^7×7 Convolution operations 3*3, 5*5, 7*7, respectively, h representing a hidden state;

CA＝σ(f ^1×1 ([Avgpool(h)+Maxpool(h)]))

wherein f ^1×1 Representing 1*1 convolution operations, avgpool represents average pooling operations, maxpool represents maximum pooling operations.

6. The short-term sea surface temperature prediction method based on the CSA-ConvLSTM model according to claim 1, wherein the acquisition of the sea surface temperature SST historical time series data set specifically comprises:

7. The short-term sea surface temperature prediction method based on the CSA-ConvLSTM model according to claim 1, wherein the historical time sequence data set is input into a multi-layer CSA-ConvLSTM network model for training, a Teacher force mechanism is adopted to assist the multi-layer CSA-ConvLSTM network model for training, specifically, the sea temperature data obtained through prediction on the previous day are replaced by true values, the replacement proportion is attenuated by 3% according to each round of network training round until the sea temperature data is gradually reduced to 0 from 1, and the replaced sea temperature data is used as input for predicting the next day and is circulated to output the prediction result of the last day.

8. The short-term sea surface temperature prediction method based on the CSA-ConvLSTM model according to claim 1, wherein the step of inputting the historical time series data set into a multi-layer CSA-ConvLSTM network model for training comprises the steps of,

the first stage, using root mean square error as a loss function of a model to obtain initial training weight;

the retraining assigned weights are utilized as the final loss weights for the multi-layer CSA-ConvLSTM network model.

9. The short-term sea surface temperature prediction method based on the CSA-ConvLSTM model according to claim 1, further comprising:

10. A short-term sea surface temperature prediction system based on a CSA-ConvLSTM model, characterized in that the system comprises a computer program for executing the short-term sea surface temperature prediction method based on a CSA-ConvLSTM model according to any of claims 1-8.