CN116051857A - Short-term precipitation prediction method improved by using random mask and transducer - Google Patents

Short-term precipitation prediction method improved by using random mask and transducer Download PDF

Info

Publication number
CN116051857A
CN116051857A CN202310057412.8A CN202310057412A CN116051857A CN 116051857 A CN116051857 A CN 116051857A CN 202310057412 A CN202310057412 A CN 202310057412A CN 116051857 A CN116051857 A CN 116051857A
Authority
CN
China
Prior art keywords
model
term
short
module
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310057412.8A
Other languages
Chinese (zh)
Inventor
方巍
齐媚涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202310057412.8A priority Critical patent/CN116051857A/en
Publication of CN116051857A publication Critical patent/CN116051857A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01WMETEOROLOGY
    • G01W1/00Meteorology
    • G01W1/14Rainfall or precipitation gauges
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Environmental & Geological Engineering (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Hydrology & Water Resources (AREA)
  • Computational Linguistics (AREA)
  • Atmospheric Sciences (AREA)
  • Ecology (AREA)
  • Environmental Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a short-term rainfall prediction method improved by using a random mask and a transducer, belonging to the field of rainfall prediction; a method of short-term precipitation prediction using a random mask and a transducer improvement comprising: s1, randomly masking a space-time sequence image; s2, constructing a network model, and inputting the space-time sequence image marked by the mask into the network for model training; the network model comprises an encoder-decoder structure taking UNet as a core model, wherein a SwinTransformer module is embedded in the encoder, and a SENet attention mechanism is introduced; s3, in the model training process, an input image obtains a predicted value through a forward propagation process, then reverse tuning is performed on the model according to a loss function, fine tuning is performed on the model continuously, the loss function is minimized, and the accurate prediction capability of the model is realized; s4, regularization of L1+L2 is used in the training process to prevent overfitting. The modeling method improves the high-order non-stationarity in the modeling space-time sequence, simultaneously learns short-term and long-term dependency information in the space-time sequence, and improves the prediction accuracy of the model.

Description

Short-term precipitation prediction method improved by using random mask and transducer
Technical Field
The invention belongs to the field of precipitation prediction, and particularly relates to a short-term precipitation prediction method improved by using a random mask and a transducer.
Background
With the continuous development of technology at present, short-term rainfall forecasting is an important problem in the field of weather forecasting, and the objective is to accurately and timely predict rainfall intensity of a local area in a relatively short time (0-6 hours), so that the method plays a vital role in various fields of economy, agriculture, commerce, transportation industry, electric power public utilities and the like. The short-cut rainfall prediction can be defined as a space-time sequence prediction problem, and the problem can be effectively solved by an image extrapolation technology based on deep learning, namely, the image sequence of the future M frames is predicted according to the image sequence of the previous N frames, and the technology is widely applied to the fields of weather prediction, video prediction, traffic flow prediction and the like, but has great limitation in the aspect of prediction accuracy and cannot meet the requirements of actual business. First, natural spatiotemporal processes have high-order non-stationarity in many respects, such as radar echo generation and dissipation in short-term precipitation forecasting. Higher order variations such as accumulation or distortion appear as predicted images tend to blur. Secondly, on the one hand, when the target changes rapidly, future images should be generated based on nearby frames rather than distant frames, which requires that the predictive model be able to learn short-term information in the spatio-temporal sequence; on the other hand, when moving objects in a scene are frequently entangled together, it is difficult to separate them into the generation of future frames, which requires that the predictive model be able to extract context information in the images and long-term information between the sequential images. Thus, modeling high-order non-stationarity in radar echo images and simultaneously learning short-term and long-term dependency information in image sequences is crucial for accurately predicting future precipitation intensities.
Space-time sequence prediction models are mainly divided into three classes: a Recurrent Neural Network (RNN) based model, a Convolutional Neural Network (CNN) based model, and a transducer based model. The RNN-based model learns important information in the sequence and forgets the secondary information through an injection gating mechanism, and has remarkable advantages in capturing long-term dependence information in the space-time sequence. In 2015, xinjian Shi et al originally combined convolution with LSTM, and proposed a new network Convlutional LSTM (ConvLSTM) that could learn both spatial and temporal dimensional features. In 2016, xingjian Shi et al continue to propose trace GRU (TrajGRU) against the problem of local invariance of the convolved structure of ConvLSTM networks. In 2017, yunbo Wang et al proposed a "zig-zag network PredRNN for the disadvantage of ConvLSTM being independent of each other at each time step. In 2018, yunbo Wang et al have proposed predrnn++, a deepened network using a new recursive structure causer LSTM, and Gradient Highway Unit to prevent the long-term gradient from disappearing. In 2019, yunbo Wang et al proposed MIM networks by taking reference to the differential idea of classical time series prediction for the problem of high-order non-stationarity of radar images. In 2021, haixu Wu et al decomposed physical motion into transient and motion trend components, and proposed a new spatio-temporal prediction model MotionRNN. However, the model based on the cyclic neural network has a continuity characteristic, is very time-consuming in the back propagation process, and is difficult to meet the timeliness requirement of the short-term precipitation forecast.
The convolution-based encoder-decoder architecture also achieves superior performance in short-run precipitation forecasting tasks. In 2019, shreya Agrawal et al introduced UNet network into precipitation forecasting task, captured spatial correlation using convolution operation, superimposed multiple radar frames into time dimension to extract time correlation. In 2021, kevin Trebing et al proposed SmaAt-UNet al, which used only one-fourth of the trainable parameters to obtain performance comparable to the UNet model. However, convolution operation is based on local connection to extract image features, and has a great limitation in learning long-term dependency information of a space-time sequence.
Transfomers were originally proposed in Natural Language Processing (NLP), but have been successfully introduced in many other fields because of their ability to extract long-term dependent information in sequences and good parallelism. 2021, alexey Dosovitskiy et al introduced the transducer architecture into the computer vision field and proposed the ViT model; 2021, ze Liu et al proposed a Swin transducer model that limits self-attention calculations to non-overlapping local windows by shifting the windows while allowing cross-window connections, improving efficiency and reducing computational effort to some extent. However, the model based on the transducer has higher performance requirements on the computer, and has a certain limitation in learning short-term dependence of time-space sequences, and has great difficulty in directly applying the model to a short-term precipitation forecasting task.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide an improved short-term precipitation prediction method by using a random mask and a transducer.
The aim of the invention can be achieved by the following technical scheme:
a method for improved short-term precipitation prediction using a random mask and a transducer, comprising the steps of:
s1, randomly masking a space-time sequence image;
s2, constructing a network model, and inputting the space-time sequence image marked by the mask into the network for model training; the network model comprises an encoder-decoder structure taking UNet as a core model, wherein a SwinTransformer module is embedded in the encoder, and a SENet attention mechanism is introduced;
s3, in the model training process, an input image obtains a predicted value through a forward propagation process, then reverse tuning is performed on the model according to a loss function, fine tuning is performed on the model continuously, the loss function is minimized, and the accurate prediction capability of the model is realized;
s4, regularization of L1+L2 is used in the training process to prevent overfitting.
Further, in S1, randomly masking the patch of the image sequence, then marking the masking area, and inputting the marked image sequence into the network;
and in S1, training is performed using an input image with a mask rate of 75%, and a batch normalization operation is applied to the input image after the random masking, so that it is subjected to gaussian distribution to stabilize the training process.
Further, in S2, the encoder includes a double convolution operation, a max pooling operation, a swinTransformer module, and a SENet attention mechanism; the double convolution operation is used for doubling the number of characteristic channels of the image, the maximum pooling is used for halving the size of the characteristic image, and the four double convolution operations and the maximum pooling operation are staggered to learn short-term dependency information in the space-time sequence; embedding a SwinTransformer module in the last part of the encoder for learning the long-term dependency information in the space-time sequence; a SENet attention mechanism is introduced between the double-rolling and max-pooling operations of each layer to focus on important information in the channel dimension and suppress minor information that is not important for the current task.
Further, in S2, the Swin transducer module includes Patch Partition, linear coding and Swin Transformer Block; firstly, a picture sequence is partitioned through a PatchPartification layer, a feature map is divided into a plurality of disjoint areas, then, the channel data of each pixel is subjected to Linear transformation through a Linear interpolation layer, and finally, feature extraction is carried out through a Swin TransformerBlock layer.
Further, the W-MSA module in Swin TransformerBlock is configured to limit multi-head self-attention computation to each local window, where the SW-MSA module can enable information to be transferred in an adjacent window, and the multi-head self-attention computation process is as follows:
Figure BDA0004060732670000041
Figure BDA0004060732670000042
MultiHead(Q,K,V)=Concat(head 1 ,...,head h ) (3)
wherein the physical meanings of Q, K and V are respectively query vector, key vector and value vector, W Q ,W K ,W V The convolution kernel is represented as a function of the convolution kernel,
Figure BDA0004060732670000043
representing the dimension of the query vector, and B represents the relative position offset.
Further, the SENet attention mechanism includes two operations, namely, squeeze operation and specification operation, firstly, squeeze operation is performed on a feature map obtained through convolution to obtain global features in the channel dimension, then specification operation is performed on the obtained global features to learn the relation among channels and the weight occupied by different channels, and the obtained weight is multiplied by the initial feature map to obtain final features.
Further, in S3, during the model training process, two steps of forward propagation and reverse tuning are performed, assuming that there is a training sample N (x i ,y i ) Wherein i is E [1, N]Input is
Figure BDA0004060732670000051
Standard output is +.>
Figure BDA0004060732670000052
The predicted output is +.>
Figure BDA0004060732670000053
The loss function is defined as the MSE, the euclidean distance between the predicted value and the true value, as follows:
Figure BDA0004060732670000054
further, in S4, the regularized expressions of L1 and L2 are shown in formulas (5) and (6), respectively:
L1(w)=α∑ i |w i | (5)
Figure BDA0004060732670000055
where α is a constant for controlling the degree of regularization, w i Representing the inverse of the weights, L1 regularization prevents overfitting by making the weight vectors sparse during the optimization; l1 regularization is added to the loss function as a penalty term, and the final loss function is shown as a formula (7)Meanwhile, L2 regularization is deployed by setting a weight_decay parameter of the Adam optimizer;
Figure BDA0004060732670000056
further, in the model training process, a learning rate attenuation strategy is introduced, and the attenuation process is shown in a formula (8):
Figure BDA0004060732670000057
wherein, the decay_rate represents the initial coefficient, epoch i Represents the ith training, alpha 0 Representing the initial learning rate.
A short-coming precipitation prediction system utilizing random masking and fransformer improvements, comprising:
an image processing module: for randomly masking the spatiotemporal sequence image;
model construction module: the method comprises the steps of constructing a network model, and inputting the space-time sequence images marked by the mask into the network for model training; the network model comprises an encoder-decoder structure taking UNet as a core model, a Swin transducer module is embedded in the encoder, and a SENet attention mechanism is introduced;
and a prediction module: in the model training process, an input image obtains a predicted value through a forward propagation process, and then reverse tuning is performed according to a loss function to continuously fine tune the model, so that the loss function is minimized, and the accurate prediction capability of the model is realized;
and (3) an optimization training module: l1+l2 regularization was used during training to prevent overfitting.
The invention has the beneficial effects that: the method is characterized in that a Swin transform basic module is embedded in a UNet model, a SENet attention module is provided, an image sequence subjected to random masking is used as input to model high-order non-stationarity in a space-time sequence, short-term and long-term dependency information in the space-time sequence is learned at the same time, and prediction accuracy of the model is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to those skilled in the art that other drawings can be obtained according to these drawings without inventive effort.
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a diagram of the network model architecture of the present invention;
FIG. 3 is a block diagram of a Swin transducer module of the present invention;
FIG. 4 is a detail view of the Swin transducer calculation of the present invention;
fig. 5 is a diagram of the SENet attention mechanism of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, a method for forecasting short-term precipitation with random mask and converter improvement comprises the following steps:
s1, randomly masking a space-time sequence image;
before inputting an image, firstly, randomly masking a patch of an image sequence, then marking a masking area, inputting the marked image sequence into a network, and reconstructing missing pixels by using the non-masked patch to train the ability of the network to model high-order non-stationarity. In order to accelerate the training speed of the model and improve the prediction precision of the model, the invention adopts a high-proportion masking scheme, and the input image with the masking rate of 75% is used for training, so that the optimal prediction effect can be achieved. In addition, a batch normalization operation is applied to the input image after the random masking, so that the input image is subjected to Gaussian distribution to stabilize the training process
S2, constructing a network model, inputting the space-time sequence images marked by the mask into a network for training, and extracting features;
as shown in fig. 2, the network model includes an encoder-decoder structure using UNet as a core model, and a SwinTransformer module is embedded in the encoder, and a SENet attention mechanism is introduced;
wherein the encoder comprises a double convolution operation, a max pooling operation, a SwinTransformer module and a SENet attention mechanism; the double convolution operation is used for doubling the number of characteristic channels of the image, the maximum pooling is used for halving the size of the characteristic image, and the four double convolution operations and the maximum pooling operation are staggered to learn short-term dependency information in a space-time sequence by utilizing inherent local characteristics of the four double convolution operations; embedding a Swin Transformer module in the last part of the encoder for learning the long-term dependency information in the space-time sequence; the resulting encoder section combines the advantages of UNet and Swin transducer and can capture both short-term and long-term dependencies in space-time sequences. To further enhance the feature extraction capability of the encoder, a SENet attention mechanism is introduced between the double convolution and max pooling operations of each layer to focus on important information in the channel dimension and suppress minor information that is not important to the current task.
As shown in fig. 3, the Swin Transformer module includes image segmentation (PatchPartition), linear mapping (Linear mapping), and Swin Transformer Block. Firstly, a picture sequence is subjected to blocking processing through a Patch Partition layer, a feature map is divided into a plurality of disjoint areas, then, the channel data of each pixel is subjected to Linear transformation through a Linear Embedding layer, and finally, the feature extraction is performed through a Swin Transformer Block layer.
The W-MSA module in Swin TransformerBlock is used for limiting multi-head self-attention calculation to each local window, so that the calculation amount of self-attention calculation can be effectively reduced, and the SW-MSA module can enable information to be transmitted in adjacent windows, so that global modeling is realized, wherein the learning process of W-MSA and SW-MSA is shown in FIG. 4; the calculation process of the multi-head self-attention is shown in the formulas (1) - (3):
Figure BDA0004060732670000081
Figure BDA0004060732670000082
MultiHead(Q,K,V)=Concat(head 1 ,...,head h ) (3)
wherein the physical meanings of Q, K and V are respectively query vector, key vector and value vector, W Q ,W K ,W V The convolution kernel is represented as a function of the convolution kernel,
Figure BDA0004060732670000083
representing the dimension of the query vector, and B represents the relative position offset.
The SENet attention mechanism comprises two operations of Squeeze and specification, firstly, squeeze operation is carried out on a feature map obtained through convolution to obtain global features on the dimension of a channel, then specification operation is carried out on the obtained global features to learn the relation among channels and the weight occupied by different channels, and the obtained weight is multiplied by the original feature map to obtain final features. The SENet attention mechanism can make the model focus on channel characteristics with larger information quantity, inhibit unimportant channel characteristics and improve the performance of the model, and the whole structure diagram is shown in fig. 5.
S3, in the model training process, reverse reconstruction is carried out; the input image obtains a predicted value through a forward propagation process, and then the model is subjected to reverse tuning according to a loss function to be subjected to fine tuning continuously, so that the loss function is minimized, and the accurate prediction capability of the model is realized;
the decoder part comprises double convolution operation, up-sampling and jump connection, wherein the up-sampling operation is realized by a bilinear interpolation method, and the jump connection part realizes the fusion of bottom layer position information and deep semantic information by splicing with the characteristic diagram of the current layer in the encoder.
In the model training process, two steps of forward propagation and reverse tuning are performed, assuming that there is a training sample N (x i ,y i ) Wherein i is E [1, N]Input is
Figure BDA0004060732670000091
Standard output is +.>
Figure BDA0004060732670000092
Figure BDA0004060732670000093
The predicted output is
Figure BDA0004060732670000094
The loss function is defined as the MSE, the euclidean distance between the predicted value and the true value, as shown in equation (4):
Figure BDA0004060732670000095
the input image obtains a predicted value through a forward propagation process, and then the model is subjected to fine adjustment continuously by reverse tuning according to a loss function, so that the loss function is minimized, and o i And y is i Infinite proximity, thereby achieving accurate predictive capability of the model.
S4, in order to prevent overfitting and enhance the generalization capability of the model, a regularization concept is introduced, and in the training process, L1+L2 regularization is used, wherein the expression of the L1 regularization and the L2 regularization are respectively shown as formulas (5) and (6):
L1(w)=α∑ i |w i | (5)
Figure BDA0004060732670000096
where α is a constant for controlling the degree of regularization, w i Representing the inverse of the weights, L1 regularization prevents overcompaction by making the weight vectors sparse during the optimization processFitting, L2 regularization tends to penalize large-valued weight vectors compared to L1 regularization. The L1 regularization is added to the loss function as a penalty term, and the final loss function is shown in a formula (7). Meanwhile, the invention deploys L2 regularization by setting the weight_decay parameter of the Adam optimizer, wherein the penalty system alpha of L1 and L2 regularization is set to 0.0001.
Figure BDA0004060732670000101
In the model training process, in order to control the updating speed of the learning rate and enable the learning rate to oscillate near an optimal value so as to accelerate the training speed, the invention also introduces a learning rate attenuation strategy, and the attenuation process is shown in a formula (8):
Figure BDA0004060732670000102
wherein, the decay_rate represents the initial coefficient, epoch i Represents the ith training, alpha 0 Representing an initial learning rate; the larger the learning rate is, the faster the convergence rate of the model is, so that a larger learning rate is set at the initial stage of training to ensure accelerated convergence, when training reaches a certain degree, the model can be sunk into a local optimal solution due to the overlarge learning rate, and the convergence step can be reduced due to the reduced learning rate, so that model learning is more optimized.
In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims.

Claims (10)

1. A method for improved short-term precipitation prediction using a random mask and a transducer, comprising the steps of:
s1, randomly masking a space-time sequence image;
s2, constructing a network model, and inputting the space-time sequence image marked by the mask into the network for model training; the network model comprises an encoder-decoder structure taking UNet as a core model, wherein a SwinTransformer module is embedded in the encoder, and a SENet attention mechanism is introduced;
s3, in the model training process, an input image obtains a predicted value through a forward propagation process, then reverse tuning is performed on the model according to a loss function, fine tuning is performed on the model continuously, the loss function is minimized, and the accurate prediction capability of the model is realized;
s4, regularization of L1+L2 is used in the training process to prevent overfitting.
2. The method for forecasting short-term rainfall using random masking and transformation improvement as claimed in claim 1, wherein in S1, the patch of the image sequence is masked randomly, then the mask area is marked again, and the marked image sequence is input into the network;
and in S1, training is performed using an input image with a mask rate of 75%, and a batch normalization operation is applied to the input image after the random masking, so that it is subjected to gaussian distribution to stabilize the training process.
3. The method for short-cut precipitation prediction using stochastic masking and transform improvement according to claim 1, wherein in S2 the encoder comprises a double convolution operation, a max pooling operation, a Swin transform module, and a SENet attention mechanism; the double convolution operation is used for doubling the number of characteristic channels of the image, the maximum pooling is used for halving the size of the characteristic image, and the four double convolution operations and the maximum pooling operation are staggered to learn short-term dependency information in the space-time sequence; embedding a Swin Transformer module in the last part of the encoder for learning the long-term dependency information in the space-time sequence; a SENet attention mechanism is introduced between the double-rolling and max-pooling operations of each layer to focus on important information in the channel dimension and suppress minor information that is not important for the current task.
4. The method for short-term precipitation prediction using random masking and transformation as claimed in claim 1, wherein in S2, the Swin transformation module comprises Patch Partition, linear coding and Swin Transformer Block; firstly, a picture sequence is subjected to blocking processing through a Patch Partition layer, a feature map is divided into a plurality of disjoint areas, then, the channel data of each pixel is subjected to Linear transformation through a Linear Embedding layer, and finally, the feature extraction is performed through a SwinTransformer Block layer.
5. The method for short-term precipitation prediction using stochastic masking and transform improvement according to claim 4, wherein the W-MSA module in Swin TransformerBlock is configured to limit multi-headed self-attention computation to each local window, and the SW-MSA module is configured to enable information to be transferred in adjacent windows, and the multi-headed self-attention computation process is as follows:
Figure FDA0004060732660000021
Figure FDA0004060732660000022
MultiHead(Q,K,V)=Concat(head 1 ,...,head h ) (3)
wherein the physical meanings of Q, K and V are respectively query vector, key vector and value vector, W Q ,W K ,W V The convolution kernel is represented as a function of the convolution kernel,
Figure FDA0004060732660000023
representing the dimension of the query vector, and B represents the relative position offset.
6. The method for forecasting short-term rainfall, which is improved by using random masks and convertors, according to claim 1, wherein the SENet attention mechanism comprises two operations, namely a Squeeze operation and an accounting operation, wherein the characteristic diagram obtained by convolution is firstly subjected to the Squeeze operation to obtain global characteristics in the channel dimension, then the accounting operation is performed on the obtained global characteristics to learn the relation among the channels and the weight occupied by different channels, and the obtained weights are multiplied by the initial characteristic diagram to obtain final characteristics.
7. The method for forecasting short-term rainfall using stochastic masking and transform improvement according to claim 1, wherein in S3, two steps of forward propagation and reverse tuning are performed during model training, assuming training samples N (x i ,y i ) Wherein i is E [1, N]Input is
Figure FDA0004060732660000031
Standard output is +.>
Figure FDA0004060732660000032
The predicted output is +.>
Figure FDA0004060732660000033
The loss function is defined as the MSE, the euclidean distance between the predicted value and the true value, as follows:
Figure FDA0004060732660000034
8. the method for forecasting short-term precipitation with stochastic masking and transformation improvement according to claim 1, wherein in S4, the regularized expressions of L1 and L2 are shown in formulas (5) and (6), respectively:
L1(w)=α∑ i |w i | (5)
Figure FDA0004060732660000035
where α is a constant for controlling the degree of regularization, w i Representing the inverse of the weights, L1 regularization prevents overfitting by making the weight vectors sparse during the optimization; l1 regularization is added to the loss function as a penalty term, the final loss function is shown as a formula (7), and meanwhile, L2 regularization is deployed by setting a weight_decay parameter of an Adam optimizer;
Figure FDA0004060732660000036
9. the method for forecasting short-term precipitation improved by using random masking and Transformer as claimed in claim 8, wherein a learning rate attenuation strategy is introduced in the model training process, and the attenuation process is as shown in formula (8):
Figure FDA0004060732660000037
wherein, the decay_rate represents the initial coefficient, epoch i Represents the ith training, alpha 0 Representing the initial learning rate.
10. A system for improved short-term precipitation prediction using a stochastic mask and a fransformer, comprising:
an image processing module: for randomly masking the spatiotemporal sequence image;
model construction module: the method comprises the steps of constructing a network model, and inputting the space-time sequence images marked by the mask into the network for model training; the network model comprises an encoder-decoder structure taking UNet as a core model, a Swin transducer module is embedded in the encoder, and a SENet attention mechanism is introduced;
and a prediction module: in the model training process, an input image obtains a predicted value through a forward propagation process, and then reverse tuning is performed according to a loss function to continuously fine tune the model, so that the loss function is minimized, and the accurate prediction capability of the model is realized;
and (3) an optimization training module: l1+l2 regularization was used during training to prevent overfitting.
CN202310057412.8A 2023-01-14 2023-01-14 Short-term precipitation prediction method improved by using random mask and transducer Pending CN116051857A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310057412.8A CN116051857A (en) 2023-01-14 2023-01-14 Short-term precipitation prediction method improved by using random mask and transducer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310057412.8A CN116051857A (en) 2023-01-14 2023-01-14 Short-term precipitation prediction method improved by using random mask and transducer

Publications (1)

Publication Number Publication Date
CN116051857A true CN116051857A (en) 2023-05-02

Family

ID=86127291

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310057412.8A Pending CN116051857A (en) 2023-01-14 2023-01-14 Short-term precipitation prediction method improved by using random mask and transducer

Country Status (1)

Country Link
CN (1) CN116051857A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116432870A (en) * 2023-06-13 2023-07-14 齐鲁工业大学(山东省科学院) Urban flow prediction method
CN116719002A (en) * 2023-08-08 2023-09-08 北京弘象科技有限公司 Quantitative precipitation estimation method, quantitative precipitation estimation device, electronic equipment and computer storage medium
CN117096875A (en) * 2023-10-19 2023-11-21 国网江西省电力有限公司经济技术研究院 Short-term load prediction method and system based on ST-transducer model
CN118033590A (en) * 2024-04-12 2024-05-14 南京信息工程大学 Short-term precipitation prediction method based on improved VIT neural network
CN118228004A (en) * 2024-05-22 2024-06-21 四川海太克科技有限责任公司 Space-time prediction method based on mask self-encoder
CN118504779A (en) * 2024-07-16 2024-08-16 青岛阅海信息服务有限公司 Intelligent correction method for sea wave forecast

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116432870A (en) * 2023-06-13 2023-07-14 齐鲁工业大学(山东省科学院) Urban flow prediction method
CN116432870B (en) * 2023-06-13 2023-10-10 齐鲁工业大学(山东省科学院) Urban flow prediction method
CN116719002A (en) * 2023-08-08 2023-09-08 北京弘象科技有限公司 Quantitative precipitation estimation method, quantitative precipitation estimation device, electronic equipment and computer storage medium
CN116719002B (en) * 2023-08-08 2023-10-27 北京弘象科技有限公司 Quantitative precipitation estimation method, quantitative precipitation estimation device, electronic equipment and computer storage medium
CN117096875A (en) * 2023-10-19 2023-11-21 国网江西省电力有限公司经济技术研究院 Short-term load prediction method and system based on ST-transducer model
CN117096875B (en) * 2023-10-19 2024-03-12 国网江西省电力有限公司经济技术研究院 Short-term load prediction method and system based on spatial-Temporal Transformer model
CN118033590A (en) * 2024-04-12 2024-05-14 南京信息工程大学 Short-term precipitation prediction method based on improved VIT neural network
CN118228004A (en) * 2024-05-22 2024-06-21 四川海太克科技有限责任公司 Space-time prediction method based on mask self-encoder
CN118228004B (en) * 2024-05-22 2024-07-19 四川海太克科技有限责任公司 Space-time prediction method based on mask self-encoder
CN118504779A (en) * 2024-07-16 2024-08-16 青岛阅海信息服务有限公司 Intelligent correction method for sea wave forecast

Similar Documents

Publication Publication Date Title
CN116051857A (en) Short-term precipitation prediction method improved by using random mask and transducer
CN112418409A (en) Method for predicting time-space sequence of convolution long-short term memory network improved by using attention mechanism
Chen et al. AtICNet: semantic segmentation with atrous spatial pyramid pooling in image cascade network
Zhang Research on remote sensing image de‐haze based on GAN
CN115810149A (en) High-resolution remote sensing image building extraction method based on superpixel and image convolution
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion
Xiao et al. Generative adversarial network with hybrid attention and compromised normalization for multi-scene image conversion
CN114677560A (en) Deep learning algorithm-based lane line detection method and computer system
Wen et al. A self-attention multi-scale convolutional neural network method for SAR image despeckling
Li et al. Two‐stage single image dehazing network using swin‐transformer
Ma et al. Db-rnn: A rnn for precipitation nowcasting deblurring
CN117392387A (en) Unsupervised domain adaptive segmentation method based on wavelet transformation and context relation
Li et al. FA-GAN: A feature attention GAN with fusion discriminator for non-homogeneous dehazing
CN116597144A (en) Image semantic segmentation method based on event camera
CN116863437A (en) Lane line detection model training method, device, equipment, medium and vehicle
CN116148864A (en) Radar echo extrapolation method based on DyConvGRU and Unet prediction refinement structure
WO2023206532A1 (en) Prediction method and apparatus, electronic device and computer-readable storage medium
Cao et al. Deep feature interactive aggregation network for single image deraining
CN114066750B (en) Self-encoder deblurring method based on domain transformation
CN113255459A (en) Image sequence-based lane line detection method
Zhou et al. Double recursive sparse self-attention based crowd counting in the cluttered background
Wang et al. An input sampling scheme to radar echo extrapolation for RNN-based models
CN114187331B (en) Unsupervised optical flow estimation method based on Transformer feature pyramid network
Pal et al. MAML-SR: Self-adaptive super-resolution networks via multi-scale optimized attention-aware meta-learning
CN114972444B (en) Target tracking method based on multi-head comparison network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination