CN116051857A

CN116051857A - Short-term precipitation prediction method improved by using random mask and transducer

Info

Publication number: CN116051857A
Application number: CN202310057412.8A
Authority: CN
Inventors: 方巍; 齐媚涵
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2023-01-14
Filing date: 2023-01-14
Publication date: 2023-05-02

Abstract

The invention discloses a short-term rainfall prediction method improved by using a random mask and a transducer, belonging to the field of rainfall prediction; a method of short-term precipitation prediction using a random mask and a transducer improvement comprising: s1, randomly masking a space-time sequence image; s2, constructing a network model, and inputting the space-time sequence image marked by the mask into the network for model training; the network model comprises an encoder-decoder structure taking UNet as a core model, wherein a SwinTransformer module is embedded in the encoder, and a SENet attention mechanism is introduced; s3, in the model training process, an input image obtains a predicted value through a forward propagation process, then reverse tuning is performed on the model according to a loss function, fine tuning is performed on the model continuously, the loss function is minimized, and the accurate prediction capability of the model is realized; s4, regularization of L1+L2 is used in the training process to prevent overfitting. The modeling method improves the high-order non-stationarity in the modeling space-time sequence, simultaneously learns short-term and long-term dependency information in the space-time sequence, and improves the prediction accuracy of the model.

Description

Short-term precipitation prediction method improved by using random mask and transducer

Technical Field

The invention belongs to the field of precipitation prediction, and particularly relates to a short-term precipitation prediction method improved by using a random mask and a transducer.

Background

With the continuous development of technology at present, short-term rainfall forecasting is an important problem in the field of weather forecasting, and the objective is to accurately and timely predict rainfall intensity of a local area in a relatively short time (0-6 hours), so that the method plays a vital role in various fields of economy, agriculture, commerce, transportation industry, electric power public utilities and the like. The short-cut rainfall prediction can be defined as a space-time sequence prediction problem, and the problem can be effectively solved by an image extrapolation technology based on deep learning, namely, the image sequence of the future M frames is predicted according to the image sequence of the previous N frames, and the technology is widely applied to the fields of weather prediction, video prediction, traffic flow prediction and the like, but has great limitation in the aspect of prediction accuracy and cannot meet the requirements of actual business. First, natural spatiotemporal processes have high-order non-stationarity in many respects, such as radar echo generation and dissipation in short-term precipitation forecasting. Higher order variations such as accumulation or distortion appear as predicted images tend to blur. Secondly, on the one hand, when the target changes rapidly, future images should be generated based on nearby frames rather than distant frames, which requires that the predictive model be able to learn short-term information in the spatio-temporal sequence; on the other hand, when moving objects in a scene are frequently entangled together, it is difficult to separate them into the generation of future frames, which requires that the predictive model be able to extract context information in the images and long-term information between the sequential images. Thus, modeling high-order non-stationarity in radar echo images and simultaneously learning short-term and long-term dependency information in image sequences is crucial for accurately predicting future precipitation intensities.

Space-time sequence prediction models are mainly divided into three classes: a Recurrent Neural Network (RNN) based model, a Convolutional Neural Network (CNN) based model, and a transducer based model. The RNN-based model learns important information in the sequence and forgets the secondary information through an injection gating mechanism, and has remarkable advantages in capturing long-term dependence information in the space-time sequence. In 2015, xinjian Shi et al originally combined convolution with LSTM, and proposed a new network Convlutional LSTM (ConvLSTM) that could learn both spatial and temporal dimensional features. In 2016, xingjian Shi et al continue to propose trace GRU (TrajGRU) against the problem of local invariance of the convolved structure of ConvLSTM networks. In 2017, yunbo Wang et al proposed a "zig-zag network PredRNN for the disadvantage of ConvLSTM being independent of each other at each time step. In 2018, yunbo Wang et al have proposed predrnn++, a deepened network using a new recursive structure causer LSTM, and Gradient Highway Unit to prevent the long-term gradient from disappearing. In 2019, yunbo Wang et al proposed MIM networks by taking reference to the differential idea of classical time series prediction for the problem of high-order non-stationarity of radar images. In 2021, haixu Wu et al decomposed physical motion into transient and motion trend components, and proposed a new spatio-temporal prediction model MotionRNN. However, the model based on the cyclic neural network has a continuity characteristic, is very time-consuming in the back propagation process, and is difficult to meet the timeliness requirement of the short-term precipitation forecast.

The convolution-based encoder-decoder architecture also achieves superior performance in short-run precipitation forecasting tasks. In 2019, shreya Agrawal et al introduced UNet network into precipitation forecasting task, captured spatial correlation using convolution operation, superimposed multiple radar frames into time dimension to extract time correlation. In 2021, kevin Trebing et al proposed SmaAt-UNet al, which used only one-fourth of the trainable parameters to obtain performance comparable to the UNet model. However, convolution operation is based on local connection to extract image features, and has a great limitation in learning long-term dependency information of a space-time sequence.

Transfomers were originally proposed in Natural Language Processing (NLP), but have been successfully introduced in many other fields because of their ability to extract long-term dependent information in sequences and good parallelism. 2021, alexey Dosovitskiy et al introduced the transducer architecture into the computer vision field and proposed the ViT model; 2021, ze Liu et al proposed a Swin transducer model that limits self-attention calculations to non-overlapping local windows by shifting the windows while allowing cross-window connections, improving efficiency and reducing computational effort to some extent. However, the model based on the transducer has higher performance requirements on the computer, and has a certain limitation in learning short-term dependence of time-space sequences, and has great difficulty in directly applying the model to a short-term precipitation forecasting task.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide an improved short-term precipitation prediction method by using a random mask and a transducer.

The aim of the invention can be achieved by the following technical scheme:

a method for improved short-term precipitation prediction using a random mask and a transducer, comprising the steps of:

s1, randomly masking a space-time sequence image;

s2, constructing a network model, and inputting the space-time sequence image marked by the mask into the network for model training; the network model comprises an encoder-decoder structure taking UNet as a core model, wherein a SwinTransformer module is embedded in the encoder, and a SENet attention mechanism is introduced;

s3, in the model training process, an input image obtains a predicted value through a forward propagation process, then reverse tuning is performed on the model according to a loss function, fine tuning is performed on the model continuously, the loss function is minimized, and the accurate prediction capability of the model is realized;

s4, regularization of L1+L2 is used in the training process to prevent overfitting.

Further, in S1, randomly masking the patch of the image sequence, then marking the masking area, and inputting the marked image sequence into the network;

and in S1, training is performed using an input image with a mask rate of 75%, and a batch normalization operation is applied to the input image after the random masking, so that it is subjected to gaussian distribution to stabilize the training process.

Further, in S2, the encoder includes a double convolution operation, a max pooling operation, a swinTransformer module, and a SENet attention mechanism; the double convolution operation is used for doubling the number of characteristic channels of the image, the maximum pooling is used for halving the size of the characteristic image, and the four double convolution operations and the maximum pooling operation are staggered to learn short-term dependency information in the space-time sequence; embedding a SwinTransformer module in the last part of the encoder for learning the long-term dependency information in the space-time sequence; a SENet attention mechanism is introduced between the double-rolling and max-pooling operations of each layer to focus on important information in the channel dimension and suppress minor information that is not important for the current task.

Further, in S2, the Swin transducer module includes Patch Partition, linear coding and Swin Transformer Block; firstly, a picture sequence is partitioned through a PatchPartification layer, a feature map is divided into a plurality of disjoint areas, then, the channel data of each pixel is subjected to Linear transformation through a Linear interpolation layer, and finally, feature extraction is carried out through a Swin TransformerBlock layer.

Further, the W-MSA module in Swin TransformerBlock is configured to limit multi-head self-attention computation to each local window, where the SW-MSA module can enable information to be transferred in an adjacent window, and the multi-head self-attention computation process is as follows:

MultiHead(Q,K,V)＝Concat(head ₁ ,...,head _h ) (3)

wherein the physical meanings of Q, K and V are respectively query vector, key vector and value vector, W ^Q ，W ^K ，W ^V The convolution kernel is represented as a function of the convolution kernel,

representing the dimension of the query vector, and B represents the relative position offset.

Further, the SENet attention mechanism includes two operations, namely, squeeze operation and specification operation, firstly, squeeze operation is performed on a feature map obtained through convolution to obtain global features in the channel dimension, then specification operation is performed on the obtained global features to learn the relation among channels and the weight occupied by different channels, and the obtained weight is multiplied by the initial feature map to obtain final features.

Further, in S3, during the model training process, two steps of forward propagation and reverse tuning are performed, assuming that there is a training sample N (x ⁱ ,y ⁱ ) Wherein i is E [1, N]Input is

Standard output is +.>

The predicted output is +.>

The loss function is defined as the MSE, the euclidean distance between the predicted value and the true value, as follows:

further, in S4, the regularized expressions of L1 and L2 are shown in formulas (5) and (6), respectively:

L1(w)＝α∑ _i |w _i | (5)

where α is a constant for controlling the degree of regularization, w _i Representing the inverse of the weights, L1 regularization prevents overfitting by making the weight vectors sparse during the optimization; l1 regularization is added to the loss function as a penalty term, and the final loss function is shown as a formula (7)Meanwhile, L2 regularization is deployed by setting a weight_decay parameter of the Adam optimizer;

further, in the model training process, a learning rate attenuation strategy is introduced, and the attenuation process is shown in a formula (8):

wherein, the decay_rate represents the initial coefficient, epoch _i Represents the ith training, alpha ₀ Representing the initial learning rate.

A short-coming precipitation prediction system utilizing random masking and fransformer improvements, comprising:

an image processing module: for randomly masking the spatiotemporal sequence image;

model construction module: the method comprises the steps of constructing a network model, and inputting the space-time sequence images marked by the mask into the network for model training; the network model comprises an encoder-decoder structure taking UNet as a core model, a Swin transducer module is embedded in the encoder, and a SENet attention mechanism is introduced;

and a prediction module: in the model training process, an input image obtains a predicted value through a forward propagation process, and then reverse tuning is performed according to a loss function to continuously fine tune the model, so that the loss function is minimized, and the accurate prediction capability of the model is realized;

and (3) an optimization training module: l1+l2 regularization was used during training to prevent overfitting.

The invention has the beneficial effects that: the method is characterized in that a Swin transform basic module is embedded in a UNet model, a SENet attention module is provided, an image sequence subjected to random masking is used as input to model high-order non-stationarity in a space-time sequence, short-term and long-term dependency information in the space-time sequence is learned at the same time, and prediction accuracy of the model is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to those skilled in the art that other drawings can be obtained according to these drawings without inventive effort.

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a diagram of the network model architecture of the present invention;

FIG. 3 is a block diagram of a Swin transducer module of the present invention;

FIG. 4 is a detail view of the Swin transducer calculation of the present invention;

fig. 5 is a diagram of the SENet attention mechanism of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, a method for forecasting short-term precipitation with random mask and converter improvement comprises the following steps:

s1, randomly masking a space-time sequence image;

before inputting an image, firstly, randomly masking a patch of an image sequence, then marking a masking area, inputting the marked image sequence into a network, and reconstructing missing pixels by using the non-masked patch to train the ability of the network to model high-order non-stationarity. In order to accelerate the training speed of the model and improve the prediction precision of the model, the invention adopts a high-proportion masking scheme, and the input image with the masking rate of 75% is used for training, so that the optimal prediction effect can be achieved. In addition, a batch normalization operation is applied to the input image after the random masking, so that the input image is subjected to Gaussian distribution to stabilize the training process

S2, constructing a network model, inputting the space-time sequence images marked by the mask into a network for training, and extracting features;

as shown in fig. 2, the network model includes an encoder-decoder structure using UNet as a core model, and a SwinTransformer module is embedded in the encoder, and a SENet attention mechanism is introduced;

wherein the encoder comprises a double convolution operation, a max pooling operation, a SwinTransformer module and a SENet attention mechanism; the double convolution operation is used for doubling the number of characteristic channels of the image, the maximum pooling is used for halving the size of the characteristic image, and the four double convolution operations and the maximum pooling operation are staggered to learn short-term dependency information in a space-time sequence by utilizing inherent local characteristics of the four double convolution operations; embedding a Swin Transformer module in the last part of the encoder for learning the long-term dependency information in the space-time sequence; the resulting encoder section combines the advantages of UNet and Swin transducer and can capture both short-term and long-term dependencies in space-time sequences. To further enhance the feature extraction capability of the encoder, a SENet attention mechanism is introduced between the double convolution and max pooling operations of each layer to focus on important information in the channel dimension and suppress minor information that is not important to the current task.

As shown in fig. 3, the Swin Transformer module includes image segmentation (PatchPartition), linear mapping (Linear mapping), and Swin Transformer Block. Firstly, a picture sequence is subjected to blocking processing through a Patch Partition layer, a feature map is divided into a plurality of disjoint areas, then, the channel data of each pixel is subjected to Linear transformation through a Linear Embedding layer, and finally, the feature extraction is performed through a Swin Transformer Block layer.

The W-MSA module in Swin TransformerBlock is used for limiting multi-head self-attention calculation to each local window, so that the calculation amount of self-attention calculation can be effectively reduced, and the SW-MSA module can enable information to be transmitted in adjacent windows, so that global modeling is realized, wherein the learning process of W-MSA and SW-MSA is shown in FIG. 4; the calculation process of the multi-head self-attention is shown in the formulas (1) - (3):

MultiHead(Q,K,V)＝Concat(head ₁ ,...,head _h ) (3)

The SENet attention mechanism comprises two operations of Squeeze and specification, firstly, squeeze operation is carried out on a feature map obtained through convolution to obtain global features on the dimension of a channel, then specification operation is carried out on the obtained global features to learn the relation among channels and the weight occupied by different channels, and the obtained weight is multiplied by the original feature map to obtain final features. The SENet attention mechanism can make the model focus on channel characteristics with larger information quantity, inhibit unimportant channel characteristics and improve the performance of the model, and the whole structure diagram is shown in fig. 5.

S3, in the model training process, reverse reconstruction is carried out; the input image obtains a predicted value through a forward propagation process, and then the model is subjected to reverse tuning according to a loss function to be subjected to fine tuning continuously, so that the loss function is minimized, and the accurate prediction capability of the model is realized;

the decoder part comprises double convolution operation, up-sampling and jump connection, wherein the up-sampling operation is realized by a bilinear interpolation method, and the jump connection part realizes the fusion of bottom layer position information and deep semantic information by splicing with the characteristic diagram of the current layer in the encoder.

In the model training process, two steps of forward propagation and reverse tuning are performed, assuming that there is a training sample N (x ⁱ ,y ⁱ ) Wherein i is E [1, N]Input is

Standard output is +.>

The predicted output is

The loss function is defined as the MSE, the euclidean distance between the predicted value and the true value, as shown in equation (4):

the input image obtains a predicted value through a forward propagation process, and then the model is subjected to fine adjustment continuously by reverse tuning according to a loss function, so that the loss function is minimized, and o ⁱ And y is ⁱ Infinite proximity, thereby achieving accurate predictive capability of the model.

S4, in order to prevent overfitting and enhance the generalization capability of the model, a regularization concept is introduced, and in the training process, L1+L2 regularization is used, wherein the expression of the L1 regularization and the L2 regularization are respectively shown as formulas (5) and (6):

L1(w)＝α∑ _i |w _i | (5)

where α is a constant for controlling the degree of regularization, w _i Representing the inverse of the weights, L1 regularization prevents overcompaction by making the weight vectors sparse during the optimization processFitting, L2 regularization tends to penalize large-valued weight vectors compared to L1 regularization. The L1 regularization is added to the loss function as a penalty term, and the final loss function is shown in a formula (7). Meanwhile, the invention deploys L2 regularization by setting the weight_decay parameter of the Adam optimizer, wherein the penalty system alpha of L1 and L2 regularization is set to 0.0001.

In the model training process, in order to control the updating speed of the learning rate and enable the learning rate to oscillate near an optimal value so as to accelerate the training speed, the invention also introduces a learning rate attenuation strategy, and the attenuation process is shown in a formula (8):

wherein, the decay_rate represents the initial coefficient, epoch _i Represents the ith training, alpha ₀ Representing an initial learning rate; the larger the learning rate is, the faster the convergence rate of the model is, so that a larger learning rate is set at the initial stage of training to ensure accelerated convergence, when training reaches a certain degree, the model can be sunk into a local optimal solution due to the overlarge learning rate, and the convergence step can be reduced due to the reduced learning rate, so that model learning is more optimized.

In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims.

Claims

1. A method for improved short-term precipitation prediction using a random mask and a transducer, comprising the steps of:

s1, randomly masking a space-time sequence image;

2. The method for forecasting short-term rainfall using random masking and transformation improvement as claimed in claim 1, wherein in S1, the patch of the image sequence is masked randomly, then the mask area is marked again, and the marked image sequence is input into the network;

3. The method for short-cut precipitation prediction using stochastic masking and transform improvement according to claim 1, wherein in S2 the encoder comprises a double convolution operation, a max pooling operation, a Swin transform module, and a SENet attention mechanism; the double convolution operation is used for doubling the number of characteristic channels of the image, the maximum pooling is used for halving the size of the characteristic image, and the four double convolution operations and the maximum pooling operation are staggered to learn short-term dependency information in the space-time sequence; embedding a Swin Transformer module in the last part of the encoder for learning the long-term dependency information in the space-time sequence; a SENet attention mechanism is introduced between the double-rolling and max-pooling operations of each layer to focus on important information in the channel dimension and suppress minor information that is not important for the current task.

4. The method for short-term precipitation prediction using random masking and transformation as claimed in claim 1, wherein in S2, the Swin transformation module comprises Patch Partition, linear coding and Swin Transformer Block; firstly, a picture sequence is subjected to blocking processing through a Patch Partition layer, a feature map is divided into a plurality of disjoint areas, then, the channel data of each pixel is subjected to Linear transformation through a Linear Embedding layer, and finally, the feature extraction is performed through a SwinTransformer Block layer.

5. The method for short-term precipitation prediction using stochastic masking and transform improvement according to claim 4, wherein the W-MSA module in Swin TransformerBlock is configured to limit multi-headed self-attention computation to each local window, and the SW-MSA module is configured to enable information to be transferred in adjacent windows, and the multi-headed self-attention computation process is as follows:

MultiHead(Q,K,V)＝Concat(head ₁ ,...,head _h ) (3)

6. The method for forecasting short-term rainfall, which is improved by using random masks and convertors, according to claim 1, wherein the SENet attention mechanism comprises two operations, namely a Squeeze operation and an accounting operation, wherein the characteristic diagram obtained by convolution is firstly subjected to the Squeeze operation to obtain global characteristics in the channel dimension, then the accounting operation is performed on the obtained global characteristics to learn the relation among the channels and the weight occupied by different channels, and the obtained weights are multiplied by the initial characteristic diagram to obtain final characteristics.

7. The method for forecasting short-term rainfall using stochastic masking and transform improvement according to claim 1, wherein in S3, two steps of forward propagation and reverse tuning are performed during model training, assuming training samples N (x ⁱ ,y ⁱ ) Wherein i is E [1, N]Input is

Standard output is +.>

The predicted output is +.>

8. the method for forecasting short-term precipitation with stochastic masking and transformation improvement according to claim 1, wherein in S4, the regularized expressions of L1 and L2 are shown in formulas (5) and (6), respectively:

L1(w)＝α∑ _i |w _i | (5)

where α is a constant for controlling the degree of regularization, w _i Representing the inverse of the weights, L1 regularization prevents overfitting by making the weight vectors sparse during the optimization; l1 regularization is added to the loss function as a penalty term, the final loss function is shown as a formula (7), and meanwhile, L2 regularization is deployed by setting a weight_decay parameter of an Adam optimizer;

9. the method for forecasting short-term precipitation improved by using random masking and Transformer as claimed in claim 8, wherein a learning rate attenuation strategy is introduced in the model training process, and the attenuation process is as shown in formula (8):

10. A system for improved short-term precipitation prediction using a stochastic mask and a fransformer, comprising: