CN116912711A

CN116912711A - Satellite cloud image prediction method based on space-time attention gate

Info

Publication number: CN116912711A
Application number: CN202310948132.6A
Authority: CN
Inventors: 张秀再; 李景轩
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2023-07-31
Filing date: 2023-07-31
Publication date: 2023-10-20

Abstract

The invention discloses a satellite cloud picture prediction method based on a space-time attention gate, which comprises the following steps of: the method comprises the steps of (1) preprocessing a remote sensing image; (2) constructing a satellite cloud image prediction network model; (3) Quantitatively evaluating satellite cloud picture prediction performance by adopting a mean square error MSE, a mean absolute error MAE, a structural similarity SSIM and a peak signal-to-noise ratio PSNR; the invention improves the feature extraction module in the CryvNet, integrates the lightweight attention module SGE into the bidirectional self-encoder, and enhances the semantic information of cloud image features and reduces the information loss during feature extraction under the condition of not increasing the calculation amount; compared with other prediction algorithms, the 3D convolution and historical information module is integrated, so that space-time characteristics with global dependence and local dependence in the sequence cloud image can be more fully utilized, and further future cloud images can be more accurately predicted.

Description

Satellite cloud image prediction method based on space-time attention gate

Technical Field

The invention relates to the technical field of deep learning and remote sensing image processing, in particular to a satellite cloud image prediction method based on a space-time attention gate.

Background

Cloud image prediction is essentially a problem of space-time sequence prediction, the space-time sequence prediction based on deep learning starts later, and most space-time sequence prediction algorithms in the present stage are based on LSTM and variants thereof. Ranzato et al, for the first time, applied RNNs to the field of video prediction, proposed a strong baseline model for unsupervised feature learning using video data. Integration of LSTM into RNN by srivastatin suggests FC-LSTM and points out the feasibility of CNN in combination with LSTM to improve prediction accuracy. The Shi et al combines a convolution network with an LSTM to provide ConvLSTM suitable for image prediction, designs an EF (Encoding-shaping) prediction structure according to the transmission of information flow in the network, and applies the prediction model to rainfall prediction to obtain good effect; ballas et al tried to combine convolution with the GRU. The method has the advantages that the EF prediction structure is further optimized by Shi et al, convGRU which is lighter than ConvLSTM is provided, and trajector yGRU is provided by introducing a learnable convolution, so that the algorithm is more suitable for predicting data with irregular motion characteristics such as precipitation, cloud picture movement and the like; wang Yunbo et al separate the time information and the space information in the space-time data in LSTM for the first time, and add space-time memory cells on the basis of ConvLSTM, and propose ST-LSTM (space temporal-LSTM), which is more favorable for the efficient utilization of space-time sequence data. Wang Yunbo et al constructed a new LSTM structure, causal LSTM, with the bimemory in cascade, and solved the gradient vanishing problem with gradient highwayunits work; wang Yunbo et al incorporate 3D convolution into LSTM and propose a recovery mechanism in the gating unit of LSTM using the thought of attention, enabling the network to control memory cells over a period of time; xu et al propose a GAN-LSTM algorithm applied to satellite cloud image prediction, so that the problems of insufficient feature extraction of input data and low definition of a predicted image are effectively solved; how to capture long-term spatial dependence in input spatio-temporal data in the problem of spatio-temporal sequence prediction is an important difficulty, and in 2020 Lin et al designed Self-Attention Memory Module to replace an original gating memory unit in LSTM, the method can effectively capture long-term spatial dependence in spatio-temporal data, so that accuracy of prediction data is better improved, but Self-attention makes the method difficult to apply to a large-size image dataset due to higher computational complexity; the space-time sequence prediction algorithm is characterized in that the model has no information loss in the feature extraction process, low memory consumption and high calculation efficiency, and simultaneously has high prediction accuracy, and the problems of insufficient memory and low calculation speed in the operation of the traditional space-time sequence algorithm are effectively solved. CryvNet is widely appreciated because of improving the accuracy of prediction while guaranteeing the lightweight of model, but the special problem of cloud image prediction is faced, and the feature extraction depth is insufficient, and the prediction module is older and the like.

Disclosure of Invention

The invention aims to: the invention aims to provide a satellite cloud picture prediction method based on a CryvNet model, which solves the problem that the motion trail of a nonlinear cloud picture is difficult to predict in the prior art.

The technical scheme is as follows: the invention discloses a satellite cloud picture prediction method based on a space-time attention gate, which comprises the following steps of:

(1) Preprocessing the remote sensing image;

(2) The method for constructing the satellite cloud image prediction network model comprises the following steps of:

(21) Introducing a light attention model SGE to obtain an SGE self-encoder for feature extraction; r is (r)

(22) Inputting the extracted features into a reversible STA-GRU prediction model for prediction;

(23) Introducing the lightweight attention model SGE results in the SGE decoding the prediction information from the decoder.

(3) And quantitatively evaluating the satellite cloud picture prediction performance by adopting a mean square error MSE, a mean absolute error MAE, a structural similarity SSIM and a peak signal-to-noise ratio PSNR.

Further, the step (1) specifically includes: the selected remote sensing image is from a wind cloud number four FY-4A satellite cloud image dataset; and preprocessing remote sensing data of the acquired data set through a Satpy library to obtain a satellite cloud image in the PNG format.

Further, the step (21) specifically includes: image frames are obtained from the satellite cloud image sequence, and the image frames are reconstructed into two groups of image data; alternately extracting features from the two groups of image data and then sequentially adding the features; introducing a light attention model SGE into a second layer in three-layer convolution after pixel shuffle operation to obtain a self-encoder SGE; carrying out feature extraction on input data; the formula is as follows:

let the input image frame be x, then reconstruct to x ₁ ,x ₂ Two groups; will x ₁ Two ports of the input two-way SGE self-encoder are subjected to convolution and activation operation to perform feature extraction and then are connected with x ₂ Adding as newThe encoding operation is expressed as:

wherein , and />Is x ₁ ,x ₂ Updated 1, delta is the coincidence operation of convolution and activation;

further, the step (22) specifically includes: the STA-GRU is used for replacing a prediction algorithm in the original model RPM, and an improved RPM formula is as follows:

G _t ＝φ(W ₂ *ReLU(W ₁ *H ^* +b ₁ )+b ₂ )

wherein ,and->For two sets of features of a single image frame, phi is a sigmoid activation function, G _t Is a soft attention model; the STA-GRU model is expressed as:

wherein σ is a sigmoid activation function representing a 3D convolution operation, and wherein, as well as, and, respectively, hadamard product and matrix product, softmax is a normalized exponential function, R _t ,Z _t , and R_t ',/>Respectively hidden state-> and />Reset gates, update gates, and candidate hidden states, W and b represent the weight and bias, O, of the corresponding data in each gate _t and H^* The output door and the fusion hidden state are respectively.

Further, the step (23) specifically includes: introducing a light attention model SGE into a second layer in three-layer convolution after pixel shuffle operation to obtain a self-decoding SGE; performing decoding operation on the prediction information; the formula is as follows:

further, the mean square error MSE in the step (3) is specifically:

wherein ,Y_i Representing the predicted image of the i-th frame,representing the i-th frame real image.

Further, the average absolute error MAE in the step (3) is specifically:

further, the structural similarity SSIM in the step (3) is specifically:

wherein ,is the brightness comparison of the real image and the predicted image, < >>Is contrast comparison +.>The structural comparison, α, β, γ is set to 1.

Further, the peak signal-to-noise ratio PSNR in the step (3) is specifically:

wherein, PSNR represents an index for measuring image quality; MSE is the mean square error.

The device according to the invention, comprising a memory, a processor and a program stored on the memory and executable on the processor, is characterized in that the processor implements the steps of a satellite cloud image prediction method based on spatio-temporal attention gate according to any of the above-mentioned claims when executing the program.

The beneficial effects are that: compared with the prior art, the invention has the following remarkable advantages: the feature extraction module in CryvNet is improved, the lightweight attention module SGE is integrated into the bidirectional self-encoder, and under the condition that the calculated amount is not increased, the semantic information of cloud image features is enhanced, and the information loss during feature extraction is reduced; compared with other prediction algorithms, the 3D convolution and historical information module is integrated, so that space-time characteristics with global dependence and local dependence in the sequence cloud image can be more fully utilized, and further future cloud images can be more accurately predicted.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a schematic diagram of a predictive network model according to the present invention;

FIG. 3 is a schematic diagram of a light-weight attention module SGE according to the present invention;

FIG. 4 is a schematic diagram of a time-space gated attention cycle unit network STA-GRU according to the present invention;

FIG. 5 is a graph showing the comparison of the predicted results of FY-4A satellite cloud image by five methods according to the present invention;

FIG. 6 is a frame-by-frame comparison line graph of the result of predicting FY-4A satellite cloud image by five methods according to the invention.

Detailed Description

The technical scheme of the invention is further described below with reference to the accompanying drawings.

As shown in fig. 1, the embodiment of the invention provides a satellite cloud image prediction method based on space-time attention gate, which comprises the following steps:

(1) Preprocessing the acquired remote sensing image; the method comprises the following steps: the selected remote sensing image is from a wind cloud No. four FY-4A satellite cloud image data set, 2100 frames of cloud image samples of a triangular region with the length of 6 months to 8 months in 2021 are mainly selected from the satellite cloud image, the time interval is 1h, and the ratio of a training set to a verification set is 8:2. The longitude and latitude range of the cloud picture sample is 115 degrees 2'E to 123 degrees 5'E of east longitude, 26 degrees 6'N to 35 degrees 2'N of north latitude, the pixel size is 256 multiplied by 256, and the spatial resolution is 4km; and carrying out data extraction, geometric correction, radiometric calibration and data normalization on the acquired data set through a Satpy library to obtain a satellite cloud image in the PNG format.

(2) As shown in fig. 2, constructing a satellite cloud image prediction network model, which comprises the following steps:

(21) Introducing a lightweight attention model SGE as shown in FIG. 2 (a) to obtain the SGE to perform feature extraction from the encoder; the method comprises the following steps: image frames are obtained from the satellite cloud image sequence, and the image frames are reconstructed into two groups of image data; alternately extracting features from the two groups of image data and then sequentially adding the features; introducing a light attention model SGE into a second layer in three-layer convolution after pixel shuffle operation to obtain a self-encoder SGE; carrying out feature extraction on input data; the formula is as follows:

(22) Inputting the extracted features into a reversible STA-GRU prediction model for prediction as shown in FIG. 2 (b); the STA-GRU is used for replacing a prediction algorithm in the original model RPM, and an improved RPM formula is as follows:

G _t ＝φ(W ₂ *ReLU(W ₁ *H ^* +b ₁ )+b ₂ )

wherein ,and->For two sets of features of a single image frame, phi is a sigmoid activation function, G _t Is a soft attention model; in order to fully utilize time information and space information in a cloud image sequence, a STA-GRU model is based on a traditional prediction model ConvGRU, and a space-time memory unit is introduced>Blue circled portions as in fig. 4; a history information module Historical Information Module (HIM) is introduced in the model, which can make full use of the history hidden state +_, as shown in the orange part of fig. 4>And the sensitivity of the network to the overall motion perception of the target is improved. Thus, the STA-GRU model is expressed as:

wherein σ is a sigmoid activation function representing a 3D convolution operation, and ". Sur. Represent Hadamard product and matrix product, respectively. In the history information module HIM, the history hiding state is multiplied by the Reset Gate, and then a series of probability information is obtained after the history hiding state passes through softmax, the probability information is multiplied by the history hiding state in a one-to-one correspondence manner to obtain integral AttenionScuteres, and then the AttenionScuteres are multiplied by the hiding state at the previous momentIntegration by LayerNorm is the process of selecting the overall information of the past hidden state. />τ in (1) is the number of historic hidden states participating in Reset Gate, i.e., 1 frame image is predicted from the previous τ frame history image information. The model adds a group of new Reset Gate and update Gate to the original ConvGRU to guide the construction of the space-time memory unit The pressure of the hidden state can be shared in the network, and the space-time motion information can be acquired as much as possible. Finally will-> and />Connected and reduced by 1X 1 convolution, and finally obtained by output gate OutputGate ^* 。

(23) As shown in fig. 2 (c), the introduction of the lightweight attentiveness model SGE results in the SGE decoding the prediction information from the decoder. The method comprises the following steps: introducing a light attention model SGE into a second layer in three-layer convolution after pixel shuffle operation to obtain a self-decoding SGE; performing decoding operation on the prediction information; the formula is as follows:

the model of the present invention uses an Adam optimizer for parameter optimization with an initial learning rate set to α=0.0005 and a batch size set to 8. When the Epoch is around 54, the loss function reaches convergence. Five satellite cloud image prediction algorithms were used as comparison methods, namely ConvLSTM method, convGRU method, predRNN method, predRNN++ method, cryvNet method. All experiments were performed in a Windows11 operating system, intel Core i5 12400F CPU, 32G memory, and NVIDIA GeForce RTX 3060 (12 GB) GPU environment.

(3) And quantitatively evaluating the satellite cloud picture prediction performance by adopting a mean square error MSE, a mean absolute error MAE, a structural similarity SSIM and a peak signal-to-noise ratio PSNR. The mean square error MSE is specifically:

The mean absolute error MAE is specifically:

the structural similarity SSIM specifically comprises the following steps:

The peak signal-to-noise ratio PSNR is specifically:

wherein, PSNR represents an index for measuring image quality, and the higher the decibel is, the better the image quality is; MSE is the mean square error.

As shown in FIG. 5, the visual comparison of the cloud images of the wind cloud No. four satellite by the five comparison methods can see that ConvLSTM and ConvGRU are taken as the most traditional prediction models, the prediction effect is not ideal, and the motion trend of the cloud images cannot be captured well. The PredRNN first distinguishes time from spatial information, and utilizes two LSTMs to process time information and spatial information respectively, so that cloud image space-time information is captured more effectively than ConvLSTM. PredRNN++ is similar to PredRNN, a new prediction model is built through cascading LSTM, and meanwhile, gradient highway units work is added to further reduce the gradient disappearance problem, so that the obtained prediction accuracy is improved compared with the PredRNN. The prediction module used by the CryvNet is the same as the PredRNN, and the bidirectional self-encoder and the reversible prediction module greatly reduce information loss in the process of feature extraction and sequence prediction, reduce memory occupation and enable the prediction result of the CryvNet to be stronger than the PredRNN. SmartCrevNet and CrevNet can clearly predict the first two frames of cloud pictures due to the existence of the bidirectional self-encoder, but SmartCrevNet comprises the improved bidirectional SGE self-encoder and a novel prediction model STA-GRU, so that the SmartCrevNet has better long-term prediction capability, and therefore, the prediction effect of the last two frames of cloud pictures of SmartCrevNet is better than that of CrevNet.

Table 1 shows four video predictive quantitative evaluation indexes of Mean Square Error (MSE), mean Absolute Error (MAE), structural Similarity (SSIM) and peak signal-to-noise ratio (PSNR) according to the present invention compared quantitatively with five comparison methods. It can be seen that SmartCrevNet achieves the best results in all four indicators. In addition, mean Square Error (MSE) and Structural Similarity (SSIM) are selected to compare the predicted image frame by frame, and the result is shown in FIG. 6.

Table 1:

to further analyze the effect of the novel module in SmartCrevNet, ablation experiments were performed on bi-directional self-encoders and reversible prediction modules, the results of which are shown in table 2. In order to improve the efficiency of bi-directional self-encoder feature extraction and enhance cloud image semantic features, an SGE, convolutional Block Attention Module (CBAM) and Squeeze-and-Excitation (SE) attention model was tried at the time of the experiment to conduct an ablation experiment. The lightest SGE achieves a better effect from the overall evaluation index. In addition, attempts are made to remove HIM, where each cloud frame predicted only considers the information of the previous frame, and not the historical cloud information. From the results, the model is superior to other models of ablation experiments on the whole, which illustrates the feasibility and effectiveness of the innovation module of the model

TABLE 2

The embodiment of the invention also provides equipment, which comprises a memory, a processor and a program stored on the memory and capable of running on the processor, and is characterized in that the steps in the satellite cloud image prediction method based on the space-time attention gate are realized when the processor executes the program.

Claims

1. The satellite cloud image prediction method based on the space-time attention gate is characterized by comprising the following steps of:

(1) Preprocessing the remote sensing image;

(21) Introducing a light attention model SGE to obtain an SGE self-encoder for feature extraction;

2. The satellite cloud image prediction method based on space-time attention gate according to claim 1, wherein the step (1) specifically comprises: the selected remote sensing image is from a wind cloud number four FY-4A satellite cloud image dataset; and preprocessing remote sensing data of the acquired data set through a Satpy library to obtain a satellite cloud image in the PNG format.

3. The satellite cloud image prediction method based on space-time attention gate according to claim 1, wherein the step (21) specifically comprises: image frames are obtained from the satellite cloud image sequence, and the image frames are reconstructed into two groups of image data; alternately extracting features from the two groups of image data and then sequentially adding the features; introducing a light attention model SGE into a second layer in three-layer convolution after pixel shuffle operation to obtain a self-encoder SGE; carrying out feature extraction on input data; the formula is as follows:

wherein , and />Is x ₁ ,x ₂ Updated 1, delta is the convolution and activation coincidence operation.

4. The satellite cloud image prediction method based on space-time attention gate according to claim 1, wherein the step (22) specifically comprises: the STA-GRU is used for replacing a prediction algorithm in the original model RPM, and an improved RPM formula is as follows:

G _t ＝φ(W ₂ *ReLU(W ₁ *H ^* +b ₁ )+b ₂ )

wherein σ is a sigmoid activation function representing a 3D convolution operation, and wherein, as well as, respectively, hadamard product and matrix product, softmax is a normalized exponential function, and />Respectively hidden state->Andreset gates, update gates, and candidate hidden states, W and b represent the weight and bias, O, of the corresponding data in each gate _t and H^* The output door and the fusion hidden state are respectively.

5. The satellite cloud image prediction method based on space-time attention gate according to claim 1, wherein the step (23) specifically comprises: introducing a light attention model SGE into a second layer in three-layer convolution after pixel shuffle operation to obtain a self-decoding SGE; performing decoding operation on the prediction information; the formula is as follows:

6. the satellite cloud image prediction method based on space-time attention gate according to claim 1, wherein the mean square error MSE in the step (3) is specifically:

7. The satellite cloud image prediction method based on space-time attention gate according to claim 1, wherein the average absolute error MAE in the step (3) is specifically:

8. the satellite cloud image prediction method based on space-time attention gate according to claim 1, wherein the structural similarity SSIM in step (3) is specifically:

9. The satellite cloud image prediction method based on space-time attention gate according to claim 1, wherein the peak signal-to-noise ratio PSNR in the step (3) is specifically:

10. An apparatus comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor performs the steps in a satellite cloud image prediction method based on spatio-temporal attention gate as claimed in any one of claims 1 to 9.