CN116090509A

CN116090509A - Multivariate spatiotemporal data generation method based on separation attention mechanism

Info

Publication number: CN116090509A
Application number: CN202211574764.2A
Authority: CN
Inventors: 林连雷; 王俊凯
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2022-12-08
Filing date: 2022-12-08
Publication date: 2023-05-09
Also published as: CN117371487A

Abstract

The invention discloses a multivariable spatiotemporal data generating method based on a separation attention mechanism, which comprises the steps of constructing a multivariable spatiotemporal data generating network model; the model comprises an encoder and a decoder, wherein the encoder and the decoder encode and decode space-time data by adopting a separation attention mechanism, and the separation attention mechanism comprises a multi-head time attention unit, a multi-head channel attention unit and a multi-head space attention unit; determining a loss function according to the homodyne uncertainty and the Gaussian likelihood maximization; historical data and target data are obtained, and a multivariate space-time data generation network model is trained according to the loss function; and generating a network model by using the trained multivariable space-time data to generate space-time data. The space-time data generation method disclosed by the invention can automatically capture the coupling relation between space-time context information and channels, further realize space modeling and mapping on a regular grid, and finally reconstruct complete space-time signals.

Description

Multivariate spatiotemporal data generation method based on separation attention mechanism

Technical Field

The invention relates to the technical field of space-time data mining, in particular to a multivariate space-time data generation method based on a separation attention mechanism.

Background

Spatio-temporal data generation can be regarded as a spatio-temporal data prediction problem, which is a classical problem in the field of spatio-temporal data mining. The essence of the space-time data generation is that the hidden space-time rule is searched according to the historical space-time data, so that the accurate judgment of the future development state of the space-time rule and the accurate generation of the space-time data in a future period of time are realized.

The spatio-temporal data has a correlation pattern, a periodicity pattern, and a heterogeneity pattern as compared with conventional data. Three different modes have different expression forms on different spatio-temporal data, and for a multi-parameter spatio-temporal data generation task, not only the characteristics of the data shown in the spatio-temporal dimension are comprehensively considered, but also hidden information such as mutual exclusion, coupling and the like contained among different parameters are considered so as to realize accurate generation of the spatio-temporal data of all the parameters.

At present, the spatiotemporal data generation methods can be divided into two categories. One is a numerical-based method; the other is a data-driven method, which uses kinetic and thermodynamic equations to build a predictive model, requiring extremely high computational resources. However, the algorithms based on machine learning have been widely used to solve the problem of data generation, but the algorithms require a domain expert to construct features for data manually according to experience, that is, the effect in practical application depends largely on the effectiveness of feature design, so that model limitation is large and generalization is poor.

Although the deep learning method based on CNN and RNN operators improves the generation accuracy when solving the problem of space-time data generation, the method is limited by inherent inefficiency of core operators in capturing space-time long-range dependency, the model still cannot meet the requirement of increasing space-time data generation accuracy in a real application scene, and the self-attention mechanism model solves the limitation of the traditional model to a great extent by replacing CNN and RNN, but still is difficult to complete synchronous generation of multi-parameter space-time data. The reason is that the training parameters of classical attention models exhibit an exponential increase with the dimension of the spatio-temporal variables, limited by computational timeliness and computational resources, such models are typically used to handle the generation of a single spatio-temporal field, which ignores multi-field coupling information in the spatio-temporal data. In addition, the default loss processing method of the single-task model also prevents the application of the model in the multi-element space-time data generation task, specifically, the loss of the single-task model is obtained by simply summing the losses of different channels, when the model is in different learning phases, the loss of each channel can be in different numerical scales, and the learning task of the corresponding channel with the larger numerical scale occupies a dominant position in the whole learning process, so that the synchronous convergence of the global task is seriously affected.

Therefore, how to provide a method for generating spatio-temporal data, which can consider multi-dimensional hidden information between different parameters of spatio-temporal data and solve the problem of multi-channel loss processing, is a problem that needs to be solved by those skilled in the art.

Disclosure of Invention

In view of the above, the present invention provides a multivariate spatiotemporal data generating method based on a separation attention mechanism, which aims to design a convolution-free multivariate spatiotemporal data synchronous generating network capable of capturing the relation of channels, time and space multidimensional, and provides a loss processing method capable of automatically learning the loss weight of each channel based on gaussian loss distribution assumption and homodyne uncertainty, so as to realize synchronous convergence of global tasks.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

a method of generating multivariate spatiotemporal data based on a split attention mechanism, comprising:

constructing multivariable space-time data to generate a network model; the multivariate spatiotemporal data generating network model comprises a plurality of encoders and a plurality of decoders, wherein the encoders and the decoders adopt a separation attention mechanism for encoding and decoding, and the separation attention mechanism comprises a multi-head time attention unit, a multi-head channel attention unit and a multi-head space attention unit;

determining a loss function according to the homodyne uncertainty and the Gaussian distribution maximum likelihood estimation;

historical data and target data are obtained, and the multivariate space-time data generation network model is trained according to the loss function;

and generating a network model by using the trained multi-variable space-time data, and predicting the space-time data.

Preferably, the encoder and the decoder are plural and equal in number, one of the separate attention mechanisms is provided in the encoder, and two of the separate attention mechanisms are provided in the decoder.

Preferably, the multi-head time attention unit captures the time correlation of all time steps;

the multi-head channel attention unit is used for adaptively extracting a plurality of types of space-time data in grid points at each time step so as to realize heterogeneous information fusion;

the multi-head space attention unit is used for learning the unknown space correlation of the grid points according to the element attention result of each time step.

Preferably, the time length of the historical data used when training the multivariate spatiotemporal data generating network model is consistent with the time length of the historical data used when predicting by using the trained multivariate spatiotemporal data generating network model.

Preferably, when the historical space-time data is used as input, sliding window sampling and decomposition are performed first, and then flattening is performed to form a vector containing a spatial position index, a channel index and a time index.

Preferably, the encoding process of the encoder by adopting the separation attention mechanism is as follows:

the multi-head time attention unit calculates the code according to the code result and time attention of the former coder

The multi-headed channel attention unit performs the encoding based on channel attention and the encoding

Calculating to obtain the code

The multi-head space attention unit is used for paying attention to space according to spaceAnd said coding

Calculated->

Encoding the code

And carrying out residual connection with the coding result of the former coder, and then carrying out final coding.

Preferably, the expression of the loss function is:

wherein p is a probability function, σ ₁ 、σ ₂ Respectively, weight relation factors of regression task loss functions corresponding to two channels, log sigma ₁ σ ₂ As a weight relation factor sigma ₁ 、σ ₂ Regular term, f ^w (x) An output of the network model is generated for the multivariate spatiotemporal data input x and weighted value W, y being the true value corresponding to x.

In order to avoid the problem caused by the difference of dimensions, in practical application, the square root of the desirable loss is substituted into a loss function to calculate, and the final expression of the loss function is as follows:

compared with the prior art, the invention discloses a meteorological and environmental space-time data depth generation network based on a self-attention mechanism, which is used for simulating the coupling relation and space-time background among multiple time sequence variables and further completing synchronous generation of multiple space-time fields. In particular, the present application exploits the triple attention to the mechanism of interaction of data in step-by-step extraction time, channels, and space; the method specifically comprises a coupling mechanism between variables in the same time and space, a spatial relationship in the same time and an influence mode in different time and space; meanwhile, a loss processing method capable of automatically learning the loss weight of each channel based on Gaussian loss distribution assumption and homodyne uncertainty is provided, and the loss processing method is used for balancing the learning progress of each space-time variable.

By the space-time data generation method disclosed by the invention, the coupling relation between space-time context information and channels can be automatically captured, so that space modeling and mapping on a regular grid are realized, and finally, the complete space-time signal is reconstructed.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for generating spatiotemporal data provided by the invention;

FIG. 2 is a diagram of a network structure for generating multi-parameter spatio-temporal data depth based on an Encoder-Decoder framework;

FIG. 3 is a schematic diagram of a spatio-temporal data temporal sliding window process;

FIG. 4 is a graph of STDGN model performance based on RMSE indicators for different lead periods;

fig. 5 is a graph of STDGN model performance based on ACC metrics for different lead periods.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment of the invention discloses a multivariate spatiotemporal data generation method based on a separation attention mechanism, which comprises the following steps of, as shown in fig. 1:

constructing multivariable space-time data to generate a network model; the multi-variable space-time data generation network model comprises a plurality of encoders and a plurality of decoders, wherein the encoders and the decoders adopt a separation attention mechanism for encoding and decoding, and the separation attention mechanism comprises a multi-head time attention unit, a multi-head channel attention unit and a multi-head space attention unit;

acquiring historical data and target data, and training a multivariate space-time data generation network model according to a loss function;

and generating a network model by using the trained multivariable space-time data, and predicting the space-time data.

The invention can capture complex relations such as mutual exclusion, coupling and the like among the multiple space-time fields through a channel attention mechanism, thereby realizing the generation of the multiple space-time data. When the space variable correlation is higher, the learning of the whole task can be promoted through parameter sharing; when the time-space variable correlation is low, the tasks in each channel are mutually 'noise';

the traditional loss is equal to the task loss of each channel, when the space variable correlation is lower, the model preferentially learns for difficult tasks, and the synchronous learning of all the generated tasks in the model is not facilitated. According to the invention, by applying a new loss function, the effect of the task easy to learn can be preferentially ensured while the task difficult to learn is considered, and the overall performance of the model is ensured to the greatest extent.

The split-attention mechanism is a multi-head attention, and by forming a plurality of subspaces, a model can be enabled to notice a plurality of different information. According to the method, the spatial and temporal structure evolution characteristics which are non-local in space and more continuous in time are considered, the spatial and temporal structure information and the channel coupling information are fully extracted by utilizing a self-attention mechanism, and a separated multi-attention mechanism is provided for the coupling action between the spatial and temporal characteristics of single-parameter data and multi-parameter data, so that the spatial and temporal evolution information of various field variables can be more accurately simulated.

The basic structure of the multi-attention mechanism in the application comprises a multi-head time attention unit, a multi-head channel attention unit and a multi-head space attention unit, wherein the multi-head time attention unit is used for capturing the time correlation of all time steps;

the multi-head channel attention unit is used for adaptively extracting related variables in grid points at each time step to realize heterogeneous information fusion, wherein the related variables comprise a plurality of types of space-time data;

a multi-head spatial attention unit for learning an unknown spatial correlation of the grid points based on the element attention results of each time step.

Further, there are multiple and equal numbers of encoders and decoders, and there is one split attention mechanism in the encoder and two split attention mechanisms in the decoder. The separation attention mechanism comprises a multi-head time attention unit, a multi-head channel attention unit and a multi-head space attention unit; the encoder and decoder of the present invention further encode and decode based on the multiple attention mechanism.

Typically, before encoding the temporal data, the encoder performs sliding window sampling and decomposition on the input temporal data, and then flattens the temporal data into vectors including spatial position indexes, channel indexes, and temporal indexes.

And the time length of the historical data used when training the multivariate spatiotemporal data generating network model is consistent with the time length of the historical data used when predicting by utilizing the trained multivariate spatiotemporal data generating network model; in one embodiment, the input spatiotemporal data is spatiotemporal data from time t-7 to time t, and the output spatiotemporal data is from time t+1 days to t+N days.

Specifically, the spatial distribution of the spatiotemporal data at a certain moment can be regarded as the distribution condition of the image pixels in the picture, the variable quantity of the spatiotemporal data at a certain moment is regarded as the channel quantity in the picture, and F data frames with H multiplied by W and C channels are obtained from the data set through sliding window samplingIs input. Then, all channels of each frame are decomposed into N non-overlapping patches of size P, where N=HW/P ² Such that the N patches span the spatial domain of the entire data frame. These tiles are then flattened into vectors containing spatial position indices, channel indices, and temporal indices

Where p=1..n represents a spatial position index, c=1.. C represents a channel index, t=1,..f represents a time index.

In one embodiment, the vector is mapped into an embedded vector and its position is encoded, specifically according to the following formula,

wherein E is a learning matrix, and

embedded for a leachable location.

Further, the process of the encoder adopting the separation attention mechanism for encoding is as follows:

Multi-headed channel attention unit based on channel attention and coding

Calculating to obtain code->

Multi-head spatial attention unit based on spatial attention and coding

Calculated->

Will encode

And performing residual connection with the coding result of the previous coder, and performing final coding.

In one embodiment, the multivariate spatiotemporal data generation model comprises a plurality of encoders connected in a stacked manner, the first encoder receives as input vectors as described above, encodes by a separate multi-attention mechanism, transmits the encoded result output to the next encoder,

when each multi-attention mechanism codes, the vector or the q/k/v vector of the last coding result is calculated firstly, and the specific calculation formula is as follows:

where LN () represents LayerNorm, a=1,..a is an index of a plurality of attention heads, a is the total number of attention heads, D _h ＝D/A；

Secondly, calculating a time attention weight through the q and k vectors, and the formula is as follows:

wherein SM is a softmax activation function.

Then, the vector v is weighted and summed according to the calculated time attention weight, specifically:

further, the calculation generates the code

And uses its feedback for channel attention calculations, rather than being passed to the MLP,

the calculation formula is as follows: />

In other words, from

The new q/k/v vector is obtained, the channel attention is calculated to be available +.>

The calculation formula for channel attention is as follows:

similarly, computational space attention is available

The calculation formula for spatial attention is as follows:

finally, the obtained vector is used for

And->

Concatenation using residual connectionProjecting and inputting to MLP to calculate final code of patch at coding block l +.>

The formula is:

at this time, all the inputted feature information is extracted by the Encoder and compressed into the last hidden layer.

For the decoding of the data,

each decoder comprises two split attention mechanisms, a first split attention mechanism and a second split attention mechanism in turn from bottom to top,

during decoding, the data frame at the last moment in the history period is used as a starting signal and is input into a first separation attention mechanism to perform decoding and extraction of a query value,

extracting a Value and a Key Value according to the final code output by the encoder according to the historical period;

the second separation attention mechanism decodes the data frame at the first moment according to the obtained query Value, value and Key Value;

then, combining the initial signal and the data frame at the first moment, inputting the combined initial signal and the data frame at the first moment to a first separation attention mechanism again after position coding, obtaining a new query value after decoding, inputting the new query value, the Values and Keys extracted from the historical period coding result to a second separation attention mechanism together to decode the data frame at the next moment,

and (5) cycling until an end symbol is encountered.

In training, the generated result at each moment depends on the accuracy of the generated result at the previous moment, so that in order to prevent the model convergence difficulty caused by error accumulation, a teacher forming mechanism is introduced to assist model learning and accelerate convergence. Specifically, in the invention, a group trunk data frame is obtained by sampling a data set in a sliding window, and is input into a first separation attention mechanism to cooperate with the operation of a teacher forming mechanism. I.e. by means of the teacher mapping mechanism, randomly replaces the partly temporal predicted frames generated in the decoding process with the group Truth data frames to correct the intermediate erroneous data.

However, the previous teacher mapping mechanism uses a fixed probability substitution, and the model becomes fragile if the samples have non-uniformity and vary greatly from sample to sample. The invention modifies the mechanism for accelerating the convergence speed of the model, namely, adopts a Mask obeying Bernoulli distribution at each moment to process the input at the next moment so as to achieve the aim of randomly replacing the output at the moment with the group trunk. In order to gradually reduce the dependence of the model on the group trunk, the probability of Bernoulli distribution gradually decreases to 0 along with the training time, and the final training is that the generation result at each moment in the whole prediction period completely depends on the output of the last moment. Meanwhile, for parallel training, the Decoder takes as input a complete data frame over a period of time. Therefore, a mask is required to mask the data frames after the current time at the time of training.

In the prediction process, the decoding process is similar to the training process, but because the weight is fixed after model training is completed and the group trunk data cannot be obtained in advance in practical application, the input of each moment in the decoding process in the prediction process is completely dependent on the output of the last moment, and the participation of the group trunk is avoided.

In one embodiment, the multivariate spatio-temporal data generation model disclosed in the present invention can be implemented based on an Encoder-Decoder framework, and the specific structure can include four parts of an input, an output, an Encoder and a Decoder, as shown in fig. 2, wherein the Encoder layer is composed of two sub-layer connection structures, the first sub-layer connection structure includes a multi-head separation self-attention layer, a normalization layer and a residual connection, and the second sub-layer connection structure includes a feedforward full-connection sub-layer, a normalization layer and a residual connection. The orange area is a network decoder, which is formed by stacking a plurality of decoder layers, each encoder layer is composed of three sub-layer connection structures, the first and second sub-layer connection structures each comprise a multi-head separated self-attention layer, a normalization layer and a residual connection, and the third sub-layer connection structure comprises a feedforward full-connection sub-layer, a normalization layer and a residual connection. The multi-head separate word attention layer is a multi-attention mechanism comprising a multi-head time attention unit, a multi-head channel attention unit and a multi-head space attention unit.

The input end of the model can be designed to comprise a historical data frame unforld layer, an embedded layer and a position encoder, and a Teacher modeling layer, an unforld layer, an embedded layer and a position encoder of a target data frame;

the output may be designed to include a linear layer and a FOLD layer.

On the other hand, for the numerical control data generation method of generating tasks through multiple channels, in order to balance the loss of each channel generating task, the method introduces the same variance uncertainty in the field of multi-parameter space-time data generation, converts a loss function of simple weighted summation into an uncertainty loss function, specifically, supposes that the loss of the space-time data generating task accords with Gaussian distribution, deduces the loss function based on the same variance uncertainty and Gaussian likelihood maximization, and the expression of the loss function is as follows:

The function optimizes the weight of each channel loss in learning using the homodyne uncertainty as noise. The specific deduction process is as follows:

first, let f ^w (x) For input x and weightThe output of the neural network with weight W, for the regression task, the probability model can be defined as a gaussian distribution function whose mean is given by the model output:

p(y|f ^w (x))＝N(f ^w (x),σ ² )

at the observed noise scalar λ, to match the classification task, the model output is processed using a softmax function and sampled from the generated probability vector:

p(y|f ^w (x))＝Soft max(f ^w (x))

in the presence of a plurality of outputs y ₁ ,...,y _k In the case of (1), f is defined assuming independent and equal distribution among the tasks ^w (x) Is a sufficient statistic. The multi-task likelihood estimates are:

p(y ₁ ,...,y _k |f ^w (x))＝p(y ₁ |f ^w (x))...p(y _k |f ^w (x))

the log-likelihood of the model is maximized in the maximum likelihood inference, taking the regression task as an example, its log-likelihood estimation:

for a gaussian likelihood function, the present application defines σ as the noise observation parameter of the model for capturing the amount of noise in the model output.

Based on the above theoretical derivation, for two regression-based tasks, a loss function can be defined:

in addition, let L ₁ (w)＝||y ₁ -f ₁ ^w (x)|| ² ,L ₂ (w)＝||y ₂ -f ₂ ^w (x)|| ² The loss and sigma will be minimized herein ₁ Sum sigma ₂ Is interpreted as learning the loss L adaptively based on data ₁ (w) and L ₂ (w) relative weights. When sigma is ₁ (variable y ₁ Noise parameter of (c) increases, L ₁ The weight of (w) decreases. Conversely, the weight of the corresponding target increases as noise decreases. The last term, as a regularized noise term, can effectively ignore the data, thereby suppressing excessive noise increases.

In order to avoid the problem caused by the difference of dimensions, in practical application, the square root of L (w) is substituted into the loss function to calculate, and finally, the loss function can be defined as:

in order to verify the effectiveness of the spatio-temporal data generation method provided by the application in modeling complex multi-parameter spatio-temporal fields, the application uses ERA5 to analyze the data set for training and testing. Re-analyzing the dataset provides a best guess for the atmospheric conditions at any point in time by combining the predictive model with available observations. Since this original data set is very large (the amount of data for a single vertical level over the entire time period is almost 700 GB), we re-divide the data into lower resolutions. This is also a more realistic use case because the deep learning model still has difficulty handling very high resolution due to GPU memory limitations and I/O speeds. Therefore, we have chosen a resolution of 5.625 ° (32×64 grid points) for the data. Repartitioning is done by bilinear interpolation using xesmf Python packets. The grid uses a second power, as this is common in many deep learning architectures, where the image size is halved in the algorithm. In the processed dataset we use pressure in hundred Pa as the vertical coordinate instead of the physical height. The pressure at sea level is about 1000hPa, decreasing approximately exponentially with altitude. The 850hPa height is about 1.5km. The 500hPa height is about 5.5km. If the surface pressure is less than a given pressure level, such as at high altitudes, the pressure level value is interpolated. The dataset also includes a two-dimensional field for providing latitude and longitude values for each point. In particular, latitude information is important for certain neural network models to learn latitude-specific information (e.g., coriolis effects).

In order to apply the data set to supervised learning, a sliding window process is required for the data set. In the present application, 7 days are taken as historical time steps, 1 day is taken as sliding time steps, and 3 days, 5 days and 7 days are taken as future time steps to sample the data set in a sliding mode, and the sliding window processing process is shown in fig. 3. It is generally considered that the meteorological field does not change drastically within 6 hours, so downsampling the data after sliding sampling with a resolution of 6 hours is an effective means for reducing the computational overhead and improving the computational efficiency.

The application uses 2017 to 2018 data as a test set and adopts various evaluation indexes to evaluate the performance of each model at multiple angles. To prevent information leakage, it is necessary to ensure that there is no overlap between training and test data sets. The first test time point of the first sample after the sliding window treatment is 2017, 1, 00UTC plus the historical time step, the position on the time axis corresponds to the division point in the graph (i.e., for a seven day prediction, the starting point of the first test date would be 2017, 01, 8, 00UTC, the ending point of the first test date would be 2017, 01, 14, 23 UTC), and the target ending point of the last training would be 2016, 12, 31, 23UTC.

For the evaluation index;

RMSE, collectively Root Mean Square Error, the root mean square error, represents the sample standard deviation of the difference (called the residual) between the predicted and observed values. Root Mean Square Error (RMSE) is chosen herein as the primary indicator, as it is not only a loss function commonly used by most machine learning programs, but also an efficient characterization of the degree of sample dispersion. We define RMSE as the average latitude weighted RMSE of all predictions:

where f is the model result and t is the real value of ERA 5. L (j) is a latitude weighting factor at the jth latitude:

for smooth fields such as Z500 and T850, the qualitative difference between the indices is small. For intermittent fields like precipitation, the choice of metrics is more important. Therefore, we add a latitude weighted Anomaly Correlation Coefficient (ACC) [45] as an auxiliary index to evaluate the performance of the spatiotemporal data generation model. Anomaly correlation coefficients (Anomaly Correlation Coefficient, ACC) are one of the most widely used metrics in spatial field verification, representing the correlation between the predicted anomalies and verification values and reference values (e.g., climate data). The Anomaly Correlation Coefficient (ACC) is defined as:

wherein the superscript' indicates a difference from climatology. Climatology is defined herein as:

algorithm efficiency may be defined as the time required for an algorithm to complete a task. The improvement of algorithm efficiency enables researchers to complete more and more complex scientific operations under the same time and economic conditions. In addition to serving as an index for measuring overall AI progress, the improvement in algorithm efficiency also speeds up future AI research. The amount of computation cannot be used alone to evaluate algorithm efficiency, but must be combined with hardware characteristics and access volume to make a comprehensive evaluation. The same algorithm may change its properties on different platforms, requiring case-specific analysis, and it is difficult to draw a conclusion of versatility. Thus, algorithm execution time on the same computing platform can be used to simply characterize algorithm efficiency.

The verification result is as follows:

ablation experiment: to evaluate the performance gain from the channel loss weighting approach presented herein, we compared the performance of the STDGN model when using the loss auto-weighting approach (WLOSS) with the simple loss summing approach (MLOSS), and see table 1 for detailed data. The model performance of WLOSS adopted in the space-time data generation task of four meteorological fields is greatly advanced than that of MLOSS adopted model by analysis from two evaluation angles of RMSE and ACC respectively. In the task of generating the TP space-time field, two types of model performances adopting different losses respectively generate the maximum 17.37% and 14.29% deviation on two evaluation indexes. The reason for this is that the losses of different field variables have different numerical scales at different stages of model training. When simple summation is used as the loss handling approach, the smaller scale losses are suppressed due to the larger numerical scale of some channel losses. When WLOSS is adopted as a loss processing method, the new loss function automatically weights the loss of each channel based on uncertainty of the homodyne, so that the loss of each field variable is unified to the same order of magnitude, the loss with small gradient is prevented from being taken away by the loss with large gradient, and the learned characteristics have better generalization capability.

TABLE 1STDGN model comparison with different LOSS Effect

Comparison of comprehensive properties:

the STDGN model completes the generation of the space-time data of the global meteorological field with the prepositive period of 5 days, and the table 2 shows the comparison of the space-time data generation performance of the STDGN and the SOTA model on multiple meteorological fields in different prepositions. In short-term prediction with a lead period of 3 days, the IFS T42 model performs slightly ahead of the STDGN model due to the use of more sophisticated grid data than other models. In the middle-long term prediction with the pre-period of 5 days, the performance gap between the STDGN and the IFS T42 model starts to shrink, and the STDGN model obtains achievements exceeding the IFST42 model in the generation tasks of all space-time fields under the condition that fine grid data are not adopted, so that the unique advantage of our network in the aspect of capturing the long-range dependency relationship of time sequence data is reflected. The performance of the iterative generation model is highly dependent on the characteristic extraction capability of the iterative generation model on time sequence data, and in all iterative models, the accurate capture of the STDGN model on time rules exceeds a classical linear regression model and a convolutional neural network model, and the performance of the convolutional model is behind the linear regression model due to the weaker time characteristic capture capability of the convolutional model. The performance of the direct generation model mainly depends on the accurate capture of the time-space mapping relation, and in all the direct generation models, the STDGN model obtains the performance exceeding the SOTA model by virtue of the excellent time-space characteristic extraction capability and the accurate capture of the channel coupling information, and the convolution network model obtains the performance higher than the linear regression model by virtue of the strong space characteristic extraction capability. In summary, the STDGN model can accurately capture both the space-time mapping relationship and the coupling relationship between a plurality of field variables, and the performance of the STDGN model exceeds that of a classical machine learning model and an advanced physical model. It should be noted that the reference model for performance comparison in the table is designed for a single space-time field, and synchronous generation of multiple space-time fields cannot be achieved.

TABLE 2 model RMSE

Robustness analysis:

RMSE is an effective evaluation index for continuous space-time field generation, and when a meteorological field contains an intermittent field such as precipitation, the use of ACC as an auxiliary evaluation index is a more desirable choice. In order to fully evaluate the performance of the STDGN model and its robustness in generating a composite spatio-temporal field at different pre-periods, table 3 gives the evaluation results of all models under ACC indices. The data in the analysis table can be found that: the performance of the STDGN model is superior to all reference models in the generation task of T850 and TP space-time fields, the performance of the STDGN model is almost the same as that of a physical model in the generation task of Z500 space-time fields, and the evaluation results of the models based on ACC and RMSE indexes are basically consistent. When the STDGN model is used for predicting any composite field in three gas image fields of Z500, T850 and TP, the ACC calculation results are completely the same, which shows that the STDGN model has good robustness in channel dimension and good universality in synchronous generation tasks of various composite space-time fields. When the STDGN model is used for generating the space-time data of three space-time fields of Z500, T850 and TP in different leading periods in an iterative and direct mode, the iterative generation model and the direct generation model show good consistency on the generation results of the three gas image fields. The two generation modes obtain the identical ACC score in the generation tasks of the T850 and TP fields, and only show 1.18% deviation in the Z500 space-time field generation task with the pre-period of 3 days. The performance consistency of the STDGN iterative generation model and the direct generation model in the generation tasks of different lead periods shows that the depth generation network provided by the method has good robustness in the time dimension, so that the accurate prediction capability of short-term, medium-term and long-term meteorological fields can be obtained only through one training without independently training the model for each lead period when the meteorological field generation tasks of different lead periods are completed by the network.

TABLE 3 model ACC

Demonstration of space-time feature capturing capability:

in order to further demonstrate the capture capability of the STDGN model on the space-time characteristics, the generation results of the Z500-bit potential field and the T850 temperature field are compared according to different models. The ERA5 time difference was found to show several interesting features. First, the earth potential field and differences are much smoother than the temperature field. In tropical regions, the difference between these two fields is also much smaller than in the outer tropical region, since the propagation of the front in the outer tropical region leads to rapid temperature changes. An interesting feature can be detected in the 6hZ500 differential field in tropical regions, these alternate modes being characteristic of atmospheric tides. The meteorological field with a pre-period of 6 hours generated by the CNN model cannot capture the wavy modes, which implies that the meteorological field cannot capture the basic evolution rule of the atmospheric environment. In the generation task with a pre-period of 5 days, the CNN model generates a smooth field which does not conform to the reality. This may be caused by two factors: firstly, the CNN model cannot learn time information in historical data, the available information is relatively insufficient, and a space-time field with a pre-period of 5 days cannot be accurately generated; second, the atmosphere has shown some chaotic behavior at a pre-period of 5 days. At this time, the meteorological fields are mutually coupled and mutually influenced, and since the CNN model predicts by using a single field, the coupling information between a plurality of meteorological fields cannot be learned, and thus the generated result tends to be a smooth field. In summary, the STDGN model has much smaller error than the CNN model, and can accurately capture the propagation of tropical waves, and even in the space-time data generation task with a pre-period of 5 days, the STDGN model can still accurately simulate the atmospheric tide phenomenon in the space-time field.

In addition, the invention also analyzes the generation result of the global accumulated precipitation by the STDGN model when the pre-period is 5 days. In the space-time field, the STDGN model accurately characterizes the distribution of the global precipitation area, and the generation of global precipitation extreme points is accurately realized 24 hours in advance, which means that the network can provide data support for the generation of extreme space-time events. Meanwhile, the data generation result shows that the STDGN model accurately captures global rainfall trend in the 3-day pre-period, and the STDGN model has the capability of capturing various basic physical phenomena, so that dynamic simulation of various composite space-time fields is realized. In addition, we found that after the pre-period exceeded 3 days, although the STDGN model had difficulty capturing the evolution characteristics of fine global precipitation, the model generated precipitation areas were still concentrated on the tropical coast, which still met the objective precipitation laws of the physical world. In conclusion, the STDGN model can complete accurate generation of intermittent fields such as global precipitation and the like when the pre-period is within 24 hours, the STDGN model can perform refined simulation on the precipitation trend in the time space field when the pre-period is within 36 hours, and the STDGN model can only complete rough simulation on the time space rule after the pre-period exceeds 36 hours.

Model limit performance evaluation:

for intermittent spatio-temporal fields, the STDGN model can only complete the task of spatio-temporal data generation with a pre-period of up to 3 days. To explore the limiting performance, we trained an STDGN model with a lead period of up to 7 days for two consecutive spatio-temporal fields Z500 and T850, and figures 4-5 show the trend of model performance with lead period under different indices. Although the model is trained on continuous space-time fields independently, when the pre-period is less than 5 days, the generation result of the model on each space-time field is basically consistent with the result of the composite space-time field prediction model of three variables Z500, T850 and TP, and the model is proved to have good robustness in both the time dimension and the channel dimension and can be suitable for multiple space-time field generation tasks. Among all reference models, persistence and Weekly climatology are the two simplest ways. Persistence uses the data at initialization as a generating task (i.e. "tomorrow's weather is today's weather"), weekly climatology is calculated from a single average over time in the training set (1979-2016) and an average over each of the 52 calendar weeks. Because Weekly climatology considers seasonal periods, its performance is generally better than the Persistence method. This means that the new computational model needs to defeat both Weekly climatology and Persistence methods simultaneously in order to function. It is not difficult to find that, for the task of generating the two space-time fields of Z500 and T850, the STDGN model reaches or even exceeds the performance of the Weekly climatology model on all evaluation indexes when the pre-period is 7 days, which means that our model increases the pre-period of generating the multiple continuous space-time fields to 7 days.

From the computational efficiency point of view, on the same computing platform used in the present application, the physical model represented by the T42 model takes 275 seconds to perform a single prediction, whereas the STDGN model takes only 1.08 seconds to perform a single prediction with a5 day pre-period, which is 254.63 times the performance of the physical model. It should be noted that the computational cost of the deep learning model and the physical model is closely related to the data resolution, where the STDGN model is run on grid data at 5.625 degree intervals, while the T42 model is not optimized to run at such a coarse resolution, meaning that the computational efficiency advantage of the STDGN model derives in part from the model being based on coarser grid data. However, the grid density is not sufficient to result in a performance gap of 250 times between the two. Overall, the STDGN model outperforms the lead method in the physical model in terms of performance and is far more computationally efficient than the physical model.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for generating multivariate spatiotemporal data based on a split attention mechanism, comprising:

constructing multivariable space-time data to generate a network model; the multivariate spatiotemporal data generating network model comprises an encoder and a decoder, wherein the encoder and the decoder encode and decode spatiotemporal data by adopting a separation attention mechanism, and the separation attention mechanism comprises a multi-head time attention unit, a multi-head channel attention unit and a multi-head space attention unit;

and generating a network model by using the trained multivariable space-time data to generate space-time data.

2. The method of claim 1, wherein said encoder and said decoder are plural and equal in number, one of said split attention mechanisms being present in said encoder and two of said split attention mechanisms being present in said decoder.

3. The method for generating multivariate spatiotemporal data based on separation attention mechanism of claim 1, wherein,

the multi-head time attention unit captures the time correlation of all time steps;

4. The method for generating multivariate spatiotemporal data based on separation attention mechanism of claim 1, wherein the time length of the historical data used in training the multivariate spatiotemporal data generating network model is consistent with the time length of the historical data used in predicting by using the trained multivariate spatiotemporal data generating network model.

5. The method of generating multivariate spatiotemporal data based on split attention mechanisms of claim 1, wherein said historical data is sampled and decomposed by sliding window first and then flattened into vectors containing spatial position index, channel index and temporal index as input.

6. The method for generating multivariate spatiotemporal data based on separation attention mechanism of claim 1, wherein the encoding by the encoder using separation attention mechanism comprises the following steps:

the multi-head timeThe attention unit calculates the code based on the code result and time attention of the former encoder

Calculating to obtain code->

The multi-head spatial attention unit is based on spatial attention and the encoding

Calculated->

Encoding the code

7. The method for generating multivariate spatiotemporal data based on separation attention mechanisms of claim 1, wherein the expression of the loss function is:

/>

wherein p is a probability function, σ ₁ 、σ ₂ Respectively, weight relation factors of regression task loss functions corresponding to two channels, log sigma ₁ σ ₂ As a weight relation factor sigma ₁ 、σ ₂ Is regular of (a)Items f ^w (x) An output of the network model is generated for the multivariate spatiotemporal data input x and weighted value W, y being the true value corresponding to x.

8. The method for generating multivariate spatiotemporal data based on separation attention mechanisms of claim 1, wherein the expression of the loss function is: