CN111144666A

CN111144666A - Ocean thermocline prediction method based on deep space-time residual error network

Info

Publication number: CN111144666A
Application number: CN202010000236.0A
Authority: CN
Inventors: 姜宇; 何丽莉; 白洪涛; 陈庆忠; 刘志涛; 李舞桂; 詹锦其; 欧阳丹彤
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2020-01-02
Filing date: 2020-01-02
Publication date: 2020-05-12
Anticipated expiration: 2040-01-02
Also published as: CN111144666B

Abstract

The invention discloses a marine thermocline prediction method based on a depth space-time residual error network, which is based on special attributes of space-time data, simulates the proximity, periodicity and trend of marine thermocline evolution by using a residual error neural network framework, designs one branch of a residual convolution unit aiming at each attribute, models the spatial attribute of marine temperature in each unit, dynamically aggregates the output of three residual error neural networks according to data, distributes different weights for different branches to predict the final thermocline condition of a certain specified sea area, and shows that compared with the traditional SVM method, the proposed ST-ResNet has the advantages that the loss value is respectively reduced by 68 DEG, 9.5 DEG N-9.5 DEG S sea area under the same operation environment and different data specifications of longitude, latitude and depth, 11. 9 times, the running time is respectively improved by 3 times, 35 times and 53 times.

Description

Ocean thermocline prediction method based on deep space-time residual error network

Technical Field

The invention relates to the field of ocean temperature prediction, in particular to an ocean thermocline prediction method based on a deep space-time residual error network.

Background

In recent years, with the deepening of people's understanding of the sea, activities related to the sea, such as marine transportation, marine military activities, marine economy, marine environmental protection and development, are more and more abundant. Meanwhile, in order to match the smooth progress of the activities, more requirements are put forward on the development of marine science. Among them, the problem of how to predict the thermocline range of a specific sea area quickly and accurately is becoming more and more important.

As one of the important points of physical oceanographic research, there are many definitions of ocean thermocline, wherein hosoda.s et al propose: a "thermocline" is a thin, distinct layer in a volume of fluid in which the temperature is more rapidly above or below the upper or lower layers as the depth changes. In the ocean, the thermocline separates the upper mixed layer from calm deepwater. The water layer with larger vertical gradient of seawater temperature in the ocean is obtained.

The thermocline has important influence on human production activities, including fishery production, weather forecast, counter-diving warfare and the like. In fishery production, the thermocline limits the nutrient content brought to a shallow depth by the ascending process, and greatly influences the fish crops in the same year, so the release depth of the net in marine fishery fishing needs to consider the characteristics of the thermocline; secondly, in weather forecasting, hurricane forecaster needs to consider not only sea surface temperature but also the temperature water depth above thermocline. The water vapor evaporated from the ocean is the primary fuel for a hurricane and the depth of the thermocline is a measure of its "tank" size, which helps predict the risk of hurricane formation. In the counter-submarine battle, the characteristic negative sound velocity gradient of the ocean thermocline can reflect active sonar and other acoustic signals, so that the thermocline has an important role in the submarine battle.

Predicting the evolution trend of a thermocline within a certain range of the ocean is very challenging and is influenced by two factors: (1) spatial correlation: the prediction region is affected by the nearby region as well as the distant region. Similarly, the vicinity r₂Other areas, such as distant areas, may also be affected, and in addition, the prediction area may also affect itself; (2) time correlation: thermocline in a certain sea area is affected by near term and long term, for example, the state of thermocline on the day in a certain range will affect the next day; secondly, the evolution of the ocean thermocline may be similar for several consecutive weeks, repeating for several weeks; furthermore, this evolution may exhibit a certain trend of evolution over a longer time scale of years. Due to the temporal and spatial properties described above, other deep neural network architectures such as Recurrent Neural Networks (RNNs) and long short term memory networks (LSTM) are not suitable here. Because the traditional RNN and LSTM are directly used, if the periodicity and the trend are expected to be contained in the data characteristics, the input data is long in time scale, if only the data of the last two hours is used for input, the periodicity is difficult to embody, and the trend cannot be embodied, but if the data of the last 3 months is used for input of the recurrent neural network, the model is large and complex, and finally the model is difficult to train and has poor effect. In addition, if one wants to capture the correlation between a far area and a certain area, the network hierarchy must be relatively deep. The training of the present invention can become very complex and difficult once the network level is deep.

Disclosure of Invention

The invention designs and develops an ocean thermocline prediction method based on a deep space-time residual error network, and aims to simulate the adjacent and far-away spatial correlation between any two regions based on the convolution residual error network, and solve the problem of model degradation by aiming at the SGD optimization problem caused by gradient dispersion of the deep network and the residual error structure.

The invention aims to simulate the time attribute of the evolution trend of the ocean thermocline, classify the time attribute into three categories including proximity, periodicity and trend, and better predict the ocean thermocline according to different influences of the attributes.

The technical scheme provided by the invention is as follows:

a marine thermocline prediction method based on a deep space-time residual error network comprises the following steps:

acquiring ocean temperature data in a designated time of a target sea area, and preprocessing the ocean temperature data;

dividing the preprocessed ocean temperature data into three time segment data according to the latest time, the nearer time and the farther time according to a time axis, and using the three time segment data as input data of a prediction model;

dividing the data into a proximity attribute, a periodic attribute and a trend attribute according to determined interval time in the deep space-time survivor network, and respectively inputting the input data into the proximity attribute, the periodic attribute and the trend attribute;

determining a depth residual error network and simultaneously capturing a spatial dependence relation between nearby and distant sea areas, determining the weight of an output result of each residual error network according to the distributed matrix, fusing and outputting the weight;

and fifthly, predicting the ocean thermocline through the fused output result through an activation function.

Preferably, in the third step, the proximity attribute, the periodicity attribute, and the trend attribute share a network structure having the same convolutional neural network and residual unit sequence, which includes: pre-convolution and residual unit.

Preferably, the parameters of the proximity attribute include: the number of time points is 4, the time interval is 1 month, the number of residual error units is 7, the size of a convolution kernel is 3 multiplied by 3, the stepping is 1, 1, 1, and the number of the convolution in each layer is 32;

the parameters of the periodic attribute include: the number of time points is 3, the time interval is 2 months, the number of residual error units is 6, the size of a convolution kernel is 3 multiplied by 3, the stepping is 1, 1, 1, and the number of the convolution in each layer is 32; and

the parameters of the proximity attribute include: the time points are 3, the time interval is 4 months, the residual error units are 6, the convolution kernel size is 3 multiplied by 3, the step size is 1, 1, 1, and the number of the convolution in each layer is 32.

Preferably, in the first step, the ocean temperature data is preprocessed to obtain a data format of 32 × 26 × 20 × 20.

Preferably, in the third step, in the proximity property, L residual units are stacked on the pre-convolution layer to obtain

Adding a convolution layer to the L-th residual unit, and obtaining the output of the proximity component as

In the periodic attribute, stacking L residual error units on the pre-convolution layer to obtain

In the trend attribute, stacking L residual error units on the pre-convolution layer to obtain

Preferably, in the fourth step, the determining and fusing the weight of the output result of each residual error network includes:

wherein the content of the first and second substances,

is the Hadamard product, Wc is the degree of proximity effect, Wp is the degree of periodicity effect, and Wq is the degree of trend effect.

Preferably, in the step one, the target region is 95 ° W to 115 ° W, 9.5 ° N to 9.5 ° S, and the precision is 1 °; and

the distance between the two-dimensional model and the sea surface is 0-300 m, and the precision is 5 m.

Preferably, in the step five, the activation function is a Tanh function.

Compared with the prior art, the invention has the following beneficial effects:

1. most of the researches in the field of the current ocean thermocline are directed at the theoretical method in the aspect of calculating the thermocline and the research of the characteristic rule of the thermocline, and the project prospectively proposes that the evolution trend of the ocean thermocline is predicted by predicting whether a certain point in a certain area of the ocean exists in the thermocline;

2. the invention adopts a deep space-time residual error network. In the traditional method, people use the knowledge of statistics to collect, sample, analyze and calculate a large amount of data. The project utilizes a deep space-time residual error network, on one hand, the time attribute, namely the proximity, the periodicity and the trend, of the ocean thermocline evolution trend is simulated by building a residual error neural network based on a TensorFlow framework on the basis of the space-time data attribute, the output of the residual error neural network branches is dynamically fused, different weights are distributed to different branches, the prediction effect is further optimized, and the spatial correlation is captured by utilizing a convolutional neural network; on the other hand, compared with the traditional SVM method, the method provided by the item is remarkably improved in precision and efficiency.

Drawings

FIG. 1 is a diagram of the ST-ResNet model architecture according to the present invention.

Fig. 2 is a schematic diagram of a residual error unit structure according to the present invention.

FIG. 3 is a schematic diagram showing the comparison of the ST-ResNet and the SVM prediction result evaluation index MSE according to the present invention.

Fig. 4 is a schematic diagram of the actual result (square) and the predicted result (circle) in 2017 from month 1 to month 12.

Detailed Description

The present invention is further described in detail below with reference to the attached drawings so that those skilled in the art can implement the invention by referring to the description text.

As shown in fig. 1, the invention provides a marine thermocline prediction method based on a deep space-time residual error network, which introduces the deep residual error network in order to ensure the training effect and improve the training precision, so as to solve the technical problem that the network is difficult to continue training due to gradient loss and system precision slip of the traditional network along with the increase of the network depth; in the invention, some key frames are extracted, for example, yesterday is the same time, the previous day is the same time, and other time can be used for not inputting. Therefore, periodicity and trend contained in several months can be embodied only by using dozens of key frames, so that the network structure of the invention is greatly simplified, and the training quality and effect are greatly improved. Deep convolutional neural networks divide some regions into grids and then perform a convolution operation on the relevant regions to combine them into one value. Meanwhile, in the process of summarizing the outputs of the three residual error networks and distributing different weights to different branches, the influence degrees of the thermocline on the proximity, periodicity and trend are different along with the selection of ocean areas. The method dynamically summarizes the output of the three residual error neural networks and distributes different weights to different branches to predict the final thermocline condition of a certain designated sea area.

Meanwhile, in view of the problems of both spatial correlation and temporal correlation mentioned herein, the present invention is summarized as the following solutions:

(1) spatial correlation between any two regions in the ocean is simulated: the space-time residual error network adopts a convolution-based residual error network to simulate the spatial correlation between the adjacent region and the distant region between any two regions, and simultaneously, aiming at the SGD optimization problem caused by deep network gradient dispersion, the model degradation problem is well solved by a residual error structure.

(2) Simulating the time attribute of the evolution trend of the ocean thermocline: we generalize the temporal attributes into three categories, including proximity, periodicity, and trending; the proximity refers to that the evolution process corresponding to the adjacent time is a proximity process; the periodicity refers to the evolution within a time scale and has a certain cyclic rule; trending refers to the tendency of evolution to form an overall enlargement or reduction over a longer time scale. The spatio-temporal residual network models these properties using three residual networks, respectively.

The method specifically comprises the following steps:

dividing the data into a proximity attribute, a periodic attribute and a trend attribute according to determined interval time in the deep space-time survivor network, and respectively inputting input data into the proximity attribute, the periodic attribute and the trend attribute;

In the example, data in the BOA _ Argo is selected for training, wherein the BOA _ Argo comprises temperature and salinity data within 0-3000 m of the whole ocean depth in recent years; the invention selects temperature data, the time is 2004 to 2017, each year has 12 data, which represents ocean temperature of 12 months. The data-selected area has the longitude of 95-115 degrees W (the precision is 1 degree), the 9.5 degrees N-9.5 degrees S (the precision is 1 degree), the distance from the sea surface is 0-300 m (the precision is 5m), and the area has obvious thermocline characteristics; accordingly, the specification of a single time point is 26 × 20 × 20, and 168 time points are total; preferably, the present invention uses the first 9/10 of the first 156 time points as the training set, the last 1/10 as the verification set, and the last 12 data as the test set.

The deep learning model is a deep space-time residual error network which can carry out deep learning on data streams with time attributes, the network divides the data into different segments according to different time intervals, the data are called as attributes, each attribute respectively passes through a pre-convolution unit and a residual error unit, then the attributes are combined in weight distribution, and finally a prediction result is obtained through activation of an activation function.

FIG. 1 shows the model of the present deep learning, i.e., the architecture of the instant empty residual network, which consists of three main attributes, corresponding to proximity, periodicity and trend, respectively; in the invention, firstly, input data of each time point is processed into a data format required by a residual error unit through a convolution layer, namely 32 × 26 × 20 × 20; then, the invention divides the time axis into three segments, which respectively represent the nearest time, the nearer time and the farther time; the invention then inputs the data of the intervals in each time slice into three main attributes, respectively, to simulate the three time attributes, respectively: proximity, periodicity and trending, and these three attributes share a network structure with the same convolutional neural network and residual unit sequence; the structure captures the spatial dependence between the nearby and distant sea areas, and fuses the outputs of the first three components according to the distributed parameter matrix, and the residual result of each attribute has different weights; finally, the aggregation is mapped to [ -1, 1] by the Tanh function, which can produce faster convergence than the standard logistic function in the back propagation learning process.

When the ocean thermocline is predicted, the method is provided with 3 attributes in a deep space-time survival network, wherein the attributes correspond to the proximity, the periodicity and the trend respectively, and all parameters are shown in a table 1:

TABLE 1 parameters corresponding to three attributes

These three attributes share a network structure with the same convolutional neural network and residual unit sequence, i.e. it consists of two sub-parts: pre-convolution and residual unit.

Processing input data into a data format required by a residual error unit through a convolution layer in advance; from the above specification of a single time point and the convolution number of each layer of residual network, the data format is 32 × 26 × 20 × 20.

Generally, the area of a target sea area is large, intuitively speaking, nearby sea areas may influence each other, which can be effectively processed through a convolutional neural network, and the network has strong capability of capturing spatial structure information in a layered manner; in addition, two space points with longer distance may have a certain dependency relationship; in order to capture the spatial dependence of any region, the invention needs to design a multilayer convolutional neural network, because limited by the size of the kernel, one convolution can only explain the approximate dependence on the space; unlike classical convolutional neural networks, the present invention does not use subsampling, but only convolution.

The present invention finds that nodes in the high level feature map depend on nodes of the medium level feature map, which depend on all nodes in the lower level feature map (i.e. the input), which means that one convolution naturally captures dependencies near the space, and a stack of convolutions can even further capture dependencies within a specified sea area.

The proximity component of FIG. 1 uses several two-channel flow matrices spaced in the near future to model the temporal proximity dependence, let the nearest segment be

This is also called a proximity dependent sequence. Hair brushFirst of all, they are connected together with the time interval as a tensor

Followed by a convolution (i.e., the pre-convolution shown in fig. 1) as follows:

wherein denotes a convolution; f is an activation function; in the present embodiment, as a preferable one, the tanh function;

is a parameter for proximity sequence learning.

As shown in fig. 2, the residual unit (residual unit parameters refer to table 1) convolutes the input, activates by ReLU, convolutes again, and finally adds the input as an output.

On one hand, although the present invention uses activation functions (such as ReLU) and regularization techniques, the very deep convolutional network can reduce training efficiency; on the other hand, the invention still needs a deeper network to capture the spatial dependency relationship between very large sea domains.

For ocean temperature data, assuming an input size of 32 x 32 and a fixed kernel size of 3 x 3 for convolution, if the invention were to model dependencies within a specified sea area (i.e., each node in the deep layer depends on all input nodes), more than 15 consecutive convolutional layers would be required; therefore, to address this issue, the present invention employs residual learning in the model, which has proven to be very effective for training ultra-deep neural networks of over 1000 layers.

As shown in fig. 1, in the ST-ResNet of the present invention, the present invention stacks L residual units on the pre-convolution as follows,

where F is the residual function (i.e., the combination of the residual and convolution)As shown in fig. 2), and θ^(l)All learning parameters in the ith residual unit are included; the invention also adds an attempted Batch Normalization (BN) before the residual, and adds a convolutional layer (i.e., the convolution shown in fig. 1) on top of the lth remaining cell. The output of the proximity component of FIG. 1 is using 2 convolutions and L residual units

Also, using the above operations, the present invention may construct the periodic component of FIG. 1, assuming there is l in the periodic segment_pA time interval with a period p. Thus, the period-dependent sequence is

Using the convolution operation and L residual units (as shown in fig. 1), L residual units are stacked on the pre-convolution,

where F is the residual function (i.e., the combination of the residual and convolution, as shown in FIG. 2), and θ^(l)All learning parameters in the ith residual unit are included; the invention also adds an attempted Batch Normalization (BN) before the residual. On the lth remaining cell, the present invention adds a convolutional layer (i.e., the convolution shown in FIG. 1). The output of the periodic component of FIG. 1 is, using 2 convolutions and L residual units

Using the above operations, the present invention can construct the trending component of FIG. 1, assuming there is l in the periodic segment_qIs the length of the trending sequence. Thus, the trending dependence sequence is

where F is the residual function (i.e., the combination of the residual and convolution, see FIG. 2), and θ^(l)Including all the learning parameters in the ith residual unit. The invention also adds an attempted Batch Normalization (BN) before the residual. On the lth remaining cell, the present invention adds a convolutional layer (i.e., the convolution shown in FIG. 1). The output of the trend component of FIG. 1 is using 2 convolutions and L residual units

Wherein p and q are time periods corresponding to the periodicity and the trend types, respectively; preferably, in this embodiment, p is equal to 3 and q is equal to 6.

The present invention fuses the first three components of FIG. 3 (i.e., compactness, periodicity, trend) as follows:

wherein the content of the first and second substances,

is a Hadamard product (i.e., a unit multiplication), Wc, Wp and Wq are all adjustable parameters for reflecting the degree of influence by proximity, periodicity and trending, i.e., the weight; different weights are assigned to the residual result of each attribute to represent the influence degree of the time interval of the attribute on the current predicted data. The layer uses a convolution kernel with a step size of 1 in longitude, latitude and depth to realize that different attributes are given different weights, and the weights of the three attributes in the embodiment are 0.33648, 0.23492 and 0.19308 respectively.

The activation function in the invention adopts tanh function, and the result is mapped between [0, 1] as the final prediction result.

The following algorithm outlines the ST-ResNet training process. The invention first constructs training examples from the raw data, and then ST-ResNet trains through back propagation and Adam.

ST-ResNet training algorithm

Let the temperature data in the ocean space be { X without input₀,…,X_n-1}; let the proximity, periodicity and trend sequence lengths be l_c,l_p,l_q(ii) a Let the period be p, q.

The expected output is the learned ST-ResNet model.

First, a training example is constructed. Let D be the training example set, initialized to the empty set. In all possible time intervals t (t is more than or equal to 1 and less than or equal to n-1), a cyclic process is carried out: sc ═ X_t-lc，X_t-(lc-1)，···，X_t-1]、Sp＝[X_t-1p·p，X_t(1p-1)·p，…，X_t-p]、Sq＝[X_t-1q·q，X_t(1q-1)·q，…，X_t-q]. Wherein Xt is a target value at time t; then, the training example ({ S }_c，S_p，S_q，E_t}，X_t) And is transmitted into D.

Next, the model is trained. All learned parameters θ in ST-ResNet are initialized. Performing a circulating process: randomly selecting a batch of examples D from D_bBy minimizing D_bUntil the stopping criterion is met.

The invention is further illustrated by the following specific examples.

Examples

In the training and prediction process, the ST-ResNet parameters are set to use 3 attributes, each attribute uses 15 layers of residual error units, 11 layers of residual error units and 11 layers of residual error units, time intervals are 1 month, 3 months and 6 months respectively, data in the year 2004 to 2016 are used as a training set in the aspect of data, data in the year 2017 are predicted, and mean square error MSE of the data is calculated.

In comparison aspect, the same prediction is carried out by using a libsvm tool based on a traditional SVM model for comparison; wherein, the data pair is shown in fig. 3, fig. 4 and table 3, and the training time pair is shown in table 2.

TABLE 2 ST-ResNet vs. SVM training time

Table 3 ST-ResNet and SVM prediction result evaluation index MSE comparison

	3×3×3	5×5×5	8×8×8	10×10×10	15×15×15
						ST-ResNet	0.0000778	0.000186	0.00123	0.00198	0.0086
SVM	0.0148	0.0128	0.0138	0.0172

Since the time taken for the SVM model to calculate the grid data of 15 × 15 × 15 is too long, it is expected that it may exceed one month, and it is considered that the SVM model is not suitable for calculating a data set of a large range and has no practical significance.

The comparison shows that the MSE accuracy of the ST-ResNet in various data sizes is better than that of the result calculated by the SVM, even can be higher by several orders of magnitude, and the ST-ResNet has great advantages in performance; with the expansion of the data range, the increased training time of the ST-ResNet is far shorter than that of the SVM, and the performance of the ST-ResNet is generally superior to that of a traditional SVM prediction model; specifically, experiments on sea areas with 95-115 ° W longitude and 9.5 ° N-9.5 ° S latitude show that compared with the traditional SVM method, the proposed ST-ResNet has loss values (MSE, Mean Sqare Error) reduced by 68, 11 and 9 times and operating times improved by 3, 35 and 53 times under the same operating environment and different data specifications of longitude, latitude and depth.

While embodiments of the invention have been described above, it is not limited to the applications set forth in the description and the embodiments, which are fully applicable in various fields of endeavor to which the invention pertains, and further modifications may readily be made by those skilled in the art, it being understood that the invention is not limited to the details shown and described herein without departing from the general concept defined by the appended claims and their equivalents.

Claims

1. A marine thermocline prediction method based on a deep space-time residual error network is characterized by comprising the following steps:

2. The marine thermocline prediction method based on deep space-time residual network of claim 1 wherein in the third step, the proximity property, the periodicity property and the trend property share a network structure with the same convolutional neural network and residual unit sequence, which comprises: pre-convolution and residual unit.

3. The marine thermocline prediction method based on the deep space-time residual network as claimed in claim 1 or 2, wherein the parameters of the proximity property comprise: the number of time points is 4, the time interval is 1 month, the number of residual error units is 7, the size of a convolution kernel is 3 multiplied by 3, the stepping is 1, 1, 1, and the number of the convolution in each layer is 32;

4. The marine thermocline prediction method based on the deep space-time residual error network as claimed in claim 3, wherein in the first step, the marine temperature data is preprocessed to obtain a data format of 32 x 26 x 20.

5. The ocean thermocline prediction method based on the deep space-time residual error network as claimed in claim 1, wherein in the step three, L residual error units are stacked on the pre-convolution layer in the proximity property to obtain

6. The marine thermocline prediction method based on deep space-time residual error networks as claimed in claim 1, wherein in the fourth step, the process of determining the weight of the output result of each residual error network and fusing the weights comprises:

wherein the content of the first and second substances,

7. The marine thermocline prediction method based on the deep space-time residual error network as claimed in claim 1, wherein in the step one, the target region is 95 ° W-115 ° W, 9.5 ° N-9.5 ° S, and the precision is 1 °; and

8. The marine thermocline prediction method based on the deep space-time residual error network as claimed in claim 1, wherein in the step five, the activation function is a Tanh function.