CN117233724B

CN117233724B - Radar echo extrapolation method and device based on depth space-time attention network

Info

Publication number: CN117233724B
Application number: CN202311516391.8A
Authority: CN
Inventors: 唐卫; 胡骏楠; 渠寒花
Original assignee: Public Meteorological Service Center Of China Meteorological Administration National Early Warning Information Release Center
Current assignee: Public Meteorological Service Center Of China Meteorological Administration National Early Warning Information Release Center
Priority date: 2023-11-15
Filing date: 2023-11-15
Publication date: 2024-02-06
Anticipated expiration: 2043-11-15
Also published as: CN117233724A

Abstract

The invention provides a radar echo extrapolation method and a device based on a depth space-time attention network, which relate to the technical field of weather prediction and comprise the following steps: acquiring radar echo images with multiple frames of time continuity in a preset geographic area to obtain an original radar echo sequence; preprocessing an original radar echo sequence to obtain at least one target radar echo sequence; and processing at least one target radar echo sequence by using the target radar echo extrapolation model to obtain a predicted radar echo sequence of the preset geographic area in a future time period. In the method of the invention, the space-time attention model in the target radar echo extrapolation model comprises the following steps: at least one group of spatial attention modules and temporal attention modules adopting the same attention structure not only reduces the complexity of the model, but also can fully extract the spatial characteristics of each frame of radar echo image and the time sequence characteristics of multiple frames of radar echo images, so that the model has more accurate prediction capability and an accurate radar echo prediction result is obtained.

Description

Radar echo extrapolation method and device based on depth space-time attention network

Technical Field

The invention relates to the technical field of weather prediction, in particular to a radar echo extrapolation method and device based on a depth space-time attention network.

Background

The radar echo data plays a very important role in Chinese meteorological data, and the radar echo image data has the advantages of high precision, high resolution and the like, so that the rainfall condition in the atmosphere can be accurately reflected, and the precision and the accuracy of rainfall forecast are improved. The radar echo extrapolation technology is a method for predicting future radar echo according to historical time sequence radar echo data. In conclusion, the prediction quality of the radar echo directly influences the development of weather services. The efficient and accurate weather service also directly influences the development of aspects such as agricultural production, urban hydrology, disaster prevention preparation, prediction work and the like.

The traditional radar echo extrapolation method mainly predicts radar echo by researching the shape and movement trend of the echo. For example, the optical flow method, which predicts the future radar echo situation by calculating an optical flow field describing the motion situation between echo images using a plurality of consecutive echo images or two adjacent echo images. However, due to the complexity and diversity of weather phenomena, it is often difficult to achieve higher prediction accuracy by means of optical flow-based echo extrapolation methods alone.

Disclosure of Invention

The invention aims to provide a radar echo extrapolation method and device based on a depth space-time attention network, so as to solve the technical problem that the radar echo prediction result in the radar echo extrapolation method in the prior art is poor in accuracy.

In a first aspect, the present invention provides a method for radar echo extrapolation based on a depth spatio-temporal attention network, comprising: acquiring radar echo images with multiple frames of time continuity in a preset geographic area to obtain an original radar echo sequence; preprocessing the original radar echo sequence to obtain at least one target radar echo sequence; wherein the preprocessing comprises: normalization processing, outlier processing, missing value processing and electromagnetic echo interference processing; processing the at least one target radar echo sequence by utilizing a target radar echo extrapolation model to obtain a predicted radar echo sequence of the preset geographic area in a future time period; wherein the target radar echo extrapolation model comprises: an encoder, a spatiotemporal attention model, and a decoder; the encoder is used for extracting the visual characteristics of each frame of radar echo image in the at least one target radar echo sequence to obtain historical visual characteristics; the spatiotemporal attention model includes: at least one set of spatial attention module and temporal attention module employing the same attention structure, the spatiotemporal attention model being used to predict visual features of each frame of radar echo predicted image in the future time period from the historical visual features; the decoder is configured to convert visual features of each frame of radar echo predicted image within the future time period into the predicted radar echo sequence.

In an alternative embodiment, the spatiotemporal attention model further includes: the system comprises a first dimension remodelling module, a second dimension remodelling module and a third dimension remodelling module; the number of the first dimension remodelling modules is consistent with that of the space attention modules, and the output ends of the first dimension remodelling modules are connected with the input ends of the space attention modules; the first dimension reshaping module is used for reshaping the dimension of the input data intoOutputting the mixture after the reaction; wherein B represents batch size, T represents the number of radar echo images in each target radar echo sequence, C represents the number of channels, H represents the height, and W represents the width; the second dimension remodelling module is consistent with the time attention module in number, and the output end of the second dimension remodelling module is connected with the input end of the time attention module; the second dimension remodelling module is used for remodelling the dimension of the input data into +.>Outputting the mixture after the reaction; the input end of the third dimension remodelling module is connected with the output end of the target attention module; wherein the target attention module represents a last attention module in the spatiotemporal attention model; the third dimension remodelling module is used for remodelling the dimension of the input data into: And outputting the result.

In an alternative embodiment, the spatial attention module includes: a spatial attention sub-module combining global and local, and a channel attention sub-module based on global pooling and average pooling; the input data of the space attention module is processed by the space attention sub-module, and then the processing result of the space attention sub-module is input into the channel attention sub-moduleA module; or the input data of the space attention module is processed by the channel attention sub-module, and then the processing result of the channel attention sub-module is input into the space attention sub-module; the spatial weight matrix of the spatial attention sub-module is expressed as:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Input data representing said spatial attention sub-module,/->Representing an activation function->Representation->The convolution is performed with the result that,representation->Is a deep normal convolution of>Indicating that the expansion ratio is +.>Is->Depth expansion convolution; the channel weight matrix of the channel attention sub-module is expressed as:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Input data representing the channel attention sub-module,and->Weight matrix representing multi-layer perceptron, +.>Representing the averaged pooled characteristics of the input data of the channel attention sub-module on each channel independently,/for each channel >The input data representing the channel attention sub-module is characterized by being maximally pooled independently on each channel.

In an alternative embodiment, the number of output channels of the encoder and the number of input channels of the decoder are equal to the number of frames of radar echo images in the target radar echo sequence.

In an alternative embodiment, the method further comprises: acquiring a plurality of historical radar echo images of the preset geographic area in a historical time period; preprocessing a plurality of historical radar echo images to obtain a plurality of target radar echo images; dividing a plurality of target radar echo images according to a preset time interval to obtain a plurality of radar echo extrapolation samples; wherein each radar echo extrapolated sample comprises: a training radar echo sequence for inputting the encoder and a real radar echo sequence for calculating the loss; and training the initial radar echo extrapolation model by utilizing the radar echo extrapolation samples to obtain a target radar echo extrapolation model.

In an alternative embodiment, the loss function of the initial radar echo extrapolation model is expressed as:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1) >Representing a real radar echo sequence,/->Representing a predicted radar echo sequence; />；/>；；/>Representing +.>Frame radar echo image,/->Representing +.>Frame radar echo image,/->Representation ofThe%>Individual pixel values +.>Representation->The%>Individual pixel values +.>Representing the number of frames of radar echo images in a radar echo sequence; />And->Representing the super parameter.

In an alternative embodiment, the normalization process includes: determining a maximum pixel value and a minimum pixel value in a plurality of historical radar echo images; and scaling the pixel values in all radar echo images in equal proportion based on the maximum pixel value and the minimum pixel value.

In a second aspect, the present invention provides a radar echo extrapolation apparatus based on a depth spatio-temporal attention network, comprising: the acquisition module is used for acquiring radar echo images with multiple frames of time continuity in a preset geographic area to obtain an original radar echo sequence; the preprocessing module is used for preprocessing the original radar echo sequence to obtain at least one target radar echo sequence; wherein the preprocessing comprises: normalization processing, outlier processing, missing value processing and electromagnetic echo interference processing; the processing module is used for processing the at least one target radar echo sequence by utilizing a target radar echo extrapolation model to obtain a predicted radar echo sequence of the preset geographic area in a future time period; wherein the target radar echo extrapolation model comprises: an encoder, a spatiotemporal attention model, and a decoder; the encoder is used for extracting the visual characteristics of each frame of radar echo image in the at least one target radar echo sequence to obtain historical visual characteristics; the spatiotemporal attention model includes: at least one set of spatial attention module and temporal attention module employing the same attention structure, the spatiotemporal attention model being used to predict visual features of each frame of radar echo predicted image in the future time period from the historical visual features; the decoder is configured to convert visual features of each frame of radar echo predicted image within the future time period into the predicted radar echo sequence.

In a third aspect, the present invention provides an electronic device comprising a memory, a processor, the memory having stored thereon a computer program executable on the processor, when executing the computer program, implementing the steps of the depth spatio-temporal attention network based radar echo extrapolation method according to any one of the preceding embodiments.

In a fourth aspect, the present invention provides a computer readable storage medium storing computer instructions which, when executed by a processor, implement a depth spatiotemporal attention network based radar echo extrapolation method according to any one of the preceding embodiments.

The method of the invention uses a target radar echo extrapolation model when the radar echo extrapolation is performed, the model adopts a non-autoregressive coding-space-time attention-decoding frame, the problem of error accumulation can be effectively alleviated, and the space-time attention model in the target radar echo extrapolation model comprises: at least one group of spatial attention modules and temporal attention modules adopting the same attention structure not only reduces the complexity of the model, but also can fully extract the spatial characteristics of each frame of radar echo image and the time sequence characteristics of multiple frames of radar echo images, so that the model has more accurate prediction capability and an accurate radar echo prediction result is obtained.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for radar echo extrapolation based on a depth spatio-temporal attention network according to an embodiment of the present invention;

FIG. 2 is a block diagram of a target radar echo extrapolation model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a data processing flow of a spatio-temporal attention model according to an embodiment of the present invention;

FIG. 4 is a feature diagram in a spatial attention module;

FIG. 5 is a feature diagram in a time attention module;

FIG. 6 is a functional block diagram of a radar echo extrapolation apparatus based on a depth spatio-temporal attention network according to an embodiment of the present invention;

fig. 7 is a schematic diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Some embodiments of the present invention are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

In the conventional radar echo extrapolation method, an optical flow method calculates an optical flow field describing a motion situation between echo images by using a plurality of continuous echo images or two adjacent echo images, so as to predict a future radar echo situation. However, due to the complexity and diversity of weather phenomena, it is often difficult to achieve higher prediction accuracy by means of optical flow-based echo extrapolation methods alone.

With the development of deep learning, neural networks have achieved better performance than conventional methods. For example ConvLSTM, trajGRU, predRNN, phyDNet, MAU, however, there are still problems of error accumulation and poor prediction effect. In view of the above, the embodiments of the present invention provide a radar echo extrapolation method based on a deep space-time attention network, so as to alleviate the technical problems involved above.

Example 1

Fig. 1 is a flowchart of a radar echo extrapolation method based on a depth space-time attention network according to an embodiment of the present invention, as shown in fig. 1, where the method specifically includes the following steps:

step S102, acquiring radar echo images with multiple frames of time continuous in a preset geographic area, and obtaining an original radar echo sequence.

The known radar echo images can accurately reflect the precipitation condition in the atmosphere, so that the prediction of the precipitation condition of the preset geographic area is required, the embodiment of the invention firstly obtains the radar echo images with continuous multi-frame time in the area, and the radar echo images with continuous multi-frame time form the original radar echo sequence, so that the predicted radar echo sequence of the area in the future time period can be predicted based on the original radar echo sequence. The embodiment of the invention does not specifically limit the frame number of the radar echo image contained in the original radar echo sequence, and a user can set the frame number according to actual conditions, but the frame number is matched with the requirement of a target radar echo extrapolation model used in prediction below on input data.

Step S104, preprocessing the original radar echo sequence to obtain at least one target radar echo sequence.

In order to avoid the influence of noise data on the prediction accuracy of the radar echo sequence, preprocessing is carried out on the original radar echo sequence, wherein the preprocessing comprises: normalization processing, outlier processing, missing value processing and electromagnetic echo interference processing.

Specifically, the normalization processing is performed on the original radar echo sequence, that is, a maximum pixel value and a minimum pixel value are determined from the original radar echo sequence, and in the embodiment of the invention, the pixel value on the radar echo image represents the radar reflectivity of the geographic position corresponding to the pixel point. And then, scaling the pixel values of all radar echo images in the original radar echo sequence in equal proportion according to the determined maximum pixel value and the determined minimum pixel value.

The abnormal value processing is to remove the pixel values beyond the normal range after the normalization processing is carried out on the pixel values of all radar echo images, wherein the normal range can calculate the average pixel value P of all normalized pixel values, then the numerical range formed by P+/-P is used as the normal range, and P represents the preset threshold value. Alternatively, the standard deviation of the normalized pixel values is calculated, and the outliers are filtered based on the standard deviation. The embodiment of the invention does not limit the specific means for processing the abnormal value, and the user can set the abnormal value according to the actual requirement.

The missing value processing is processing of filling in missing data, and if the normal range of pixel values is 0-250 after the normalization processing, 255 can be used to identify missing data. That is, the missing value processing is used to mark missing data as data outside the normal range.

The electromagnetic echo interference processing aims at the problem of radial electromagnetic interference echo existing in the existing radar echo data, and clutter irrelevant to precipitation, such as radial interference clutter, surface object clutter and sea na echo, can be eliminated after the electromagnetic echo interference processing is carried out on the data after the missing value processing. The embodiment of the invention does not limit the electromagnetic echo interference processing method specifically, and a user can set the electromagnetic echo interference processing method specifically according to actual requirements, for example, the electromagnetic echo interference processing method is processed by using a radial data judgment (Radial Data Determine, RDD) algorithm. Alternatively, the electromagnetic echo interference process may be performed before the normalization process, that is, the electromagnetic echo interference process is performed first, and then the normalization process, the outlier process, and the missing value process are performed in this order.

In view of the fact that the size of the radar echo image varies according to the size of the preset geographical area, in order to facilitate the processing of the model, it is possible to perform image segmentation processing on the radar echo image larger than the preset size, that is, to segment the large-size radar echo image into a plurality of small-size radar echo images of a specified size, and to compose one target radar echo sequence from multiple frames of small-size radar echo images that are continuous in time, after performing the above preprocessing on the original radar echo sequence, thereby obtaining at least one target radar echo sequence.

And S106, processing at least one target radar echo sequence by using the target radar echo extrapolation model to obtain a predicted radar echo sequence of the preset geographic area in a future time period.

After obtaining at least one target radar echo sequence, the embodiment of the invention inputs the at least one target radar echo sequence into a pre-trained target radar echo extrapolation model, and the output of the model is a predicted radar echo sequence of a preset geographic area in a future time period. In the embodiment of the invention, the number of frames of the radar echo image in the original radar echo sequence in the current time period (abbreviated as first frame number) and the number of frames of the radar echo image in the predicted radar echo sequence in the future time period (abbreviated as second frame number) are fixed settings of the target radar echo extrapolation model, and are determined by training samples used in model training, wherein the first frame number and the second frame number can be equal or unequal, and a user can set according to actual requirements.

For example, in model training, where the training samples are radar echo sequences that predict future y frames using radar echo sequences of the previous x frames, an example of a training sample may be: the time interval of each frame of radar echo is 6 minutes, and the radar echo sequence of the preset geographic area 0:00 to 0:24 (x=5) is used to predict the radar echo sequence of the area 0:30 to 0:42 (y=3).

Wherein the target radar echo extrapolation model comprises: encoder, spatiotemporal attention model, and decoder.

The encoder is used for extracting the visual characteristics of each frame of radar echo image in at least one target radar echo sequence to obtain historical visual characteristics.

The spatiotemporal attention model includes: at least one set of spatial attention module and temporal attention module employing the same attention structure, the spatiotemporal attention model being used to predict visual features of each frame of radar echo predicted image over a future time period based on historical visual features.

The decoder is configured to convert visual features of each frame of radar echo predicted image over a future time period into a predicted radar echo sequence.

As shown in fig. 2, the target radar echo extrapolation model in the embodiment of the present invention includes: encoder, spatiotemporal attention model and decoder, optionally encoder and decoder are res net based codec and wherein each residual block consists of two convolutional layers and one residual connection. The residual connection adds the input signal directly to the output signal, spanning multiple convolutional layers, enabling information to flow more easily. This structure may make training easier and may allow deep networks to perform better than shallow networks.

In the existing space-time prediction model, a separate space-time attention module is provided, that is, one attention module is used to simultaneously realize the extraction of the time and space change characteristics, but the module is usually very complex, and the complexity and uncertainty of the model are increased. To reduce the complexity of the model, embodiments of the present invention decouple the spatial attention from the temporal attention, i.e. employ a separate spatial attention module for focusing on spatial information within an image frame and a separate temporal attention module for focusing on temporal information between successive image frames.

If one spatial attention module and one temporal attention module are used as a set of target modules, the space-time attention module in the embodiment of the invention needs to set at least one set of target modules, and the two attention modules are alternated to enable the model to focus on time sequence information while capturing spatial information. In addition, in the embodiment of the invention, the spatial attention module and the temporal attention module adopt the same attention structure, namely, different dimensions adopt the same attention mechanism, so that a unified frame can be formed, and limited connection among different dimensions can be established. The spatiotemporal attention model may capture the spatiotemporal variation trajectories inherent in the radar echo sequence. Moreover, the spatiotemporal attention model can reduce the complexity of the model compared with a standard prediction model under the same spatiotemporal feature input size. The embodiment of the invention does not specifically limit the group number of the target modules in the time-space attention model, and a user can set the group number according to actual requirements. Theoretically, the larger the number of groups, the higher the prediction accuracy of the model.

The method of the invention uses a target radar echo extrapolation model when carrying out radar echo extrapolation, the model adopts a non-autoregressive coding-space-time attention-decoding framework, the problem of error accumulation can be effectively alleviated, and the space-time attention model comprises: at least one group of spatial attention modules and temporal attention modules adopting the same attention structure not only reduces the complexity of the model, but also can fully extract the spatial characteristics of each frame of radar echo image and the time sequence characteristics of multiple frames of radar echo images, so that the model has more accurate prediction capability and an accurate radar echo prediction result is obtained.

In an alternative embodiment, the spatiotemporal attention model further includes: the system comprises a first dimension remodelling module, a second dimension remodelling module and a third dimension remodelling module.

The number of the first dimension remodelling modules is consistent with that of the space attention modules, and the output ends of the first dimension remodelling modules are connected with the input ends of the space attention modules.

The first dimension reshaping module is used for reshaping the dimension of the input data intoOutputting the mixture after the reaction; wherein B represents batch size, T represents the number of radar echo images in each target radar echo sequence, C represents the number of channels, H represents the height, and W represents the width. Alternatively, the channel number c=1, that is, a single channel image is adopted, and each pixel corresponds to a value of radar reflectivity.

The second dimension remodelling module is consistent with the time attention module in number, and the output end of the second dimension remodelling module is connected with the input end of the time attention module.

The second dimension remodelling module is used for remodelling the dimension of the input data intoAnd outputting the result.

The input end of the third dimension remodelling module is connected with the output end of the target attention module; wherein the target attention module represents the last attention module in the spatiotemporal attention model.

The third dimension remodelling module is used for remodelling the dimension of the input data into:and outputting the result.

In the embodiment of the invention, the time attention module and the space attention module both adopt a gating attention mechanism based on large-kernel convolution decomposition to extract features, but the attention dimensions of the two modules are different, and in order to match the data dimension, dimension reshaping operation is needed before entering the next module. In particular, the output of the spatial attention module is dimensionally reshaped as an input to the temporal attention module and vice versa. In practice, operators such as torch. Reshape may be used, and the data shape may be reshaped without changing its data.

Specifically, if a set of target radar echo sequences of size B is given Then each first dimension remodelling module in the model remodels the dimension of the input data to +.>And then output to a spatial attention module connected to the output end of the spatial attention module, so that each frame of radar echo image is regarded as a single sample for spatial attention, and only features at a single frame level are focused without considering time variation.

Each second dimension remodeling module in the model reshapes dimensions of the input data intoThen, the time stamp is output to the output endAn attention module, whereby for temporal attention, hidden representations from the spatial attention module are reshaped into a shape +.>And stacking multi-frame level features along the time axis, this will force a time attention module built on the convolution operation to learn the time evolution inside the data from the stack of multi-frame features.

Whether the last attention module in the spatiotemporal attention model (i.e., the target attention module) is the temporal attention module or the spatial attention module, its data dimension needs to be adjusted toAnd outputting the model back end for processing by a decoder at the model back end. Therefore, the third dimension remodelling module is used for remodelling the dimension of the input data into the dimension of the input data And then output.

FIG. 3 is a schematic diagram of a data processing flow of a spatiotemporal attention model according to an embodiment of the present invention, where the spatiotemporal attention model shown in FIG. 3 employs a set of spatial attention modules and temporal attention modules of the same attention structure. From the data dimensions in the spatial attention module and the temporal attention module, fig. 4 is a feature diagram in the spatial attention module, and fig. 5 is a feature diagram in the temporal attention module.

In the embodiment of the invention, the number of output channels of the encoder and the number of input channels of the decoder are equal to the number of frames of radar echo images in the target radar echo sequence. And further, the input and output dimension requirements of the space-time attention model can be matched. That is, if the dimension of the input data of the target radar echo extrapolation model isAnd if the number of radar echo images in each target radar echo sequence is T, the number of output channels of the encoder and the number of input channels of the decoder are also T.

In view of the fact that the spatial attention module and the temporal attention module use the same attention mechanism, the following embodiments of the present invention take the spatial attention module as an example, and specifically describe the constituent structure of the attention module. In an alternative embodiment, the spatial attention module comprises: a spatial attention sub-module that combines globally and locally, and a channel attention sub-module that is based on global pooling and average pooling.

The input data of the spatial attention module is processed by the spatial attention sub-module, and then the processing result of the spatial attention sub-module is input into the channel attention sub-module; or the input data of the spatial attention module is processed by the channel attention sub-module, and then the processing result of the channel attention sub-module is input into the spatial attention sub-module.

The spatial weight matrix of the spatial attention sub-module is expressed as:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Input data representing a spatial attention sub-module, < +.>Representing an activation function->Representation->Convolution (S)/(S)>Representation->Is a deep normal convolution of>Indicating that the expansion ratio is +.>Is->Depth expansion convolution.

The global-local combined spatial attention module helps capture spatial information on different scales, thereby expanding the receptive field of the model. The global attention portion focuses on the entire input feature map, with the largest receptive field. The local attention portion may capture detailed information by focusing on a local region of the input feature map. The model can realize larger receptive field on the premise of keeping lower computational complexity by combining global and local attentiveness.

In the visual transformation task, large kernel convolutions can help the network achieve a larger effective acceptance field and higher shape deviation. This is because the visual transformation task typically requires capturing global information and shape information of the input image, rather than focusing only on local detail and texture information. The use of large-kernel convolution can effectively extract these global and shape information, thereby improving the performance of the model. However, the general target radar echo sequence is a 256×256 resolution time sequence, and considering that the problems of low calculation efficiency and large parameter amount exist in direct use of large kernel convolution, the embodiment of the invention decomposes the large kernel convolution into three parts, namely: depth convolution to capture local receptive fields in a single channel ) And II: depth expansion convolution (++) to establish a connection between distant receptive fields>) And III: a->Convolution (+)>). By a +.>Is a deep normal convolution and an expansion ratio d>Depth-expanded convolution can obtain a receptive field of size K x K, i.e. K represents the kernel size, then +.>Convolution can establish a connection between multiple channels.

That is, the spatial attention sub-module first performs a depth normal convolution and a depth expansion convolution on the input feature map (i.e., the input data above) X, respectively, then concatenates the two convolution results in the channel dimension, and then uses the channel dimension convolution. Thus, the spatial weight matrix may be expressed as:。

the channel weight matrix of the channel attention sub-module is expressed as:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Input data representing a channel attention sub-module, < +.>And->Weight matrix representing multi-layer perceptron, +.>The feature of the input data representing the channel attention sub-module after being averaged and pooled independently on each channel,/for each channel>Input data representing the channel attention sub-module is characterized by being maximally pooled independently on each channel.

In channel attention, global max pooling and global average pooling operations may capture global information from different angles, which has an important role in deriving channel weights. The goal of channel attention is to assign a specific gravity to all channels of the input feature map (input data of the channel attention sub-module) so as to highlight features of important channels and suppress features of non-important channels. Global pooling generally refers to global maximum pooling. In channel attention, all elements of each channel in the input are maximized and then this is taken as the pooling result for that channel, thereby extracting the most significant information for each channel. Global maximization can capture the features with the highest response in each channel, providing a basis for channel weight allocation. The averaging pooling operation performs averaging pooling on each channel of the input feature map, thereby extracting average information for each channel. Averaging pooling can capture the average response within each channel, reflecting the importance of each channel as a whole.

Thus, the embodiment of the present invention first applies the average pooling and the maximum pooling independently on each channel to the input data Y, so that two one-dimensional feature vectors can be obtained:and->. Then both are input to a shared multi-layer perceptron MLP (weight matrix +.>And->) Finally, use the activation function->A Sigmoid function is typically employed to compress the final channel weights to between 0 and 1. Thus, the channel weight matrix can be expressed as:。

in the embodiment of the present invention, the spatial attention sub-modules and the channel attention sub-modules in the spatial attention module may be stacked in a specified order, which is not specifically limited in the embodiment of the present invention. If the input data of the spatial attention module is processed by the channel attention sub-module, and then the processing result of the channel attention sub-module is input into the spatial attention sub-module, the spatial attention characteristics output by the spatial attention moduleExpressed as: />Wherein, the method comprises the steps of, wherein,。

the embodiment of the invention decouples the channel attention sub-module and the space attention sub-module in the space attention module, so that the calculation complexity can be reduced, and a user can independently optimize the two attention mechanisms, so that the user can adjust the weights and the structures of the channel attention sub-module and the space attention sub-module according to the requirements of specific tasks.

In terms of time characteristics, if the characteristics obtained by the spatial attention module are deformed in terms of time and channel dimensions, and input data of the spatial attention module are processed by applying the same calculation mode, the attention characteristics combining time and space can be obtained.

In an alternative embodiment, the method of the present invention further comprises the steps of:

step S201, acquiring a plurality of historical radar echo images of a preset geographic area in a historical time period.

Step S202, preprocessing a plurality of historical radar echo images to obtain a plurality of target radar echo images.

If the radar echo extrapolation is to be performed for a preset geographic area, then the model is trained using historical radar echo images of that geographic area. In order to construct training samples of a model, a plurality of historical radar echo images of a preset geographic area in a historical time period are firstly obtained. For example, a radar echo image of a preset geographical area at a resolution of 1km×1km from 6 minutes to 6 months of 2021 is acquired.

Next, in order to facilitate training by the deep learning model, a plurality of historical radar echo images need to be preprocessed to obtain a plurality of target radar echo images. In the embodiment of the invention, the preprocessing comprises normalization processing, outlier processing, missing value processing and electromagnetic echo interference processing. Embodiments of pretreatment have been described above, and reference is made to the above for specific details.

Optionally, the normalization process specifically includes the following: determining a maximum pixel value and a minimum pixel value in a plurality of historical radar echo images; the pixel values in all radar echo images are scaled equally based on the maximum pixel value and the minimum pixel value. Wherein the operation of scaling in equal proportion can be expressed as:wherein min represents a minimum pixel value, max represents a maximum pixel value, x represents a pixel value before normalization processing, y represents a pixel value after normalization processing, 250 is a specific example value, represents an upper limit of an equal scaling result, and a user can adjust according to actual requirements. After the above formula processing, the normal data range of all elements is 0-250, and 255 can be used to identify missing data in the subsequent steps.

Step S203, dividing a plurality of target radar echo images according to a preset time interval to obtain a plurality of radar echo extrapolation samples; wherein each radar echo extrapolated sample comprises: a training radar echo sequence for the input encoder and a real radar echo sequence for the calculation of the loss.

And step S204, training the initial radar echo extrapolation model by utilizing a plurality of radar echo extrapolation samples to obtain a target radar echo extrapolation model.

After obtaining a plurality of target radar echo images in the historical time period, dividing the images to obtain a plurality of radar echo extrapolation samples, for example, if the data are divided at a preset time interval of 1 hour, taking 2021 month 1 day 00:00 to 01:00 data as one radar echo extrapolation sample, taking 2021 month 6 day 1 day 01:00 to 02:00 data as one radar echo extrapolation sample, and so on to obtain a plurality of radar echo extrapolation samples. And each radar echo extrapolated sample contains both a training radar echo sequence required for model prediction and a real radar echo sequence used for calculating loss as a reference.

Taking a radar echo extrapolation sample of 2021, 6, 1, 00:00, to 01:00 as an example, a radar echo image of 2021, 6, 1, 00:00, to 00:30 may be chosen as the training radar echo sequence, and a radar echo image of 2021, 6, 1, 00:36, to 01:00 as the real radar echo sequence, that is, the radar echo sequence of 00:36, to 01:00 needs to be predicted using the radar echo sequence of 00:00, to 00:30. Then in the model training process, based on the radar echo sequence from 00:00 to 00:30, the model can output a predicted radar echo sequence from 00:36 to 01:00, and the loss of the model can be calculated by using the real radar echo sequence from 00:36 to 01:00 as comparison. And training the initial radar echo extrapolation model through a large number of radar echo extrapolation samples to obtain the target radar echo extrapolation model.

In an alternative embodiment, the loss function of the initial radar echo extrapolation model is expressed as:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Representing a real radar echo sequence,/->Representing a predicted radar echo sequence; />；/>；；/>Representing +.>Frame radar echo image,/->Representing +.>Frame radar echo image,/->Representation ofThe%>Individual pixel values +.>Representation->The%>Individual pixel values +.>Representing the number of frames of radar echo images in a radar echo sequence; />And->Representing the super parameter. />May be set to 1 or 2 for adjusting the sensitivity of the loss function.

In the embodiment of the invention, the loss function of the initial radar echo extrapolation model is obtained by absolute loss errorMean square loss error->And image gradient difference error->In view of the constitution, the MSE loss focuses on the mean square loss error, the MAE loss focuses on the absolute loss error, the MSE is more sensitive to the large error, and the penalty on the model is stronger, because the MSE squares the large error, and the MAE performs the operation on all errors simultaneously, so the sensitivity to abnormal values is lower, and the absolute loss error is used simultaneously>And mean square loss error->These two loss functions can enhance the robustness of the model and the stability of training.

And, in addition, the processing unit,the method is generally used for tasks such as image denoising in image processing, wherein the calculation mode in the image denoising task is to calculate the sum of absolute values of differences between predicted values and true values of each pixel point; />The calculation in the image reconstruction task is to calculate the sum of squares of the differences between the predicted value and the true value for each pixel, and therefore +.>Can be used forWhile enhancing the sharpness and smoothness of the generated image frames.

In addition, due to the imbalance in the dataset, the image gradient differential errorThe gradient information is taken into account in the loss calculation, and by calculating the differences between surrounding neighboring pixels, the lateral gradient and the longitudinal gradient are calculated separately, which makes the model more robust in processing unbalanced data sets. Therefore, use +.>，/>，/>The three loss functions are summed, so that convergence speed and abnormal value conditions can be considered, and model training is more robust.

Optionally, when the model is trained, an AdamW optimizer and OneCycle learning rate strategy are selected, the learning rate is set to be 1e-5 at the beginning of training, and the training learning rate is attenuated to be 0.5 times of the original training learning rate every 20 rounds. Model training is performed on NVIDIA GTX 2080 Ti graphics card, with an input sequence length (i.e., the number of radar echo images in the target radar echo sequence) of 20, a size of 256 x 256 per frame of images, and a channel number of 1. The model has better convergence and higher stability in the training process as can be seen by the loss change in the model training process.

In order to further prove the experimental effect of the method, after the training set is trained by the method, the radar echo of the test set is predicted, and the evaluation index results of various radar echo extrapolation methods can be seen, so that the method provided by the invention has the advantages that the predicted echo is optimal at MAE, MSE, SSIM, and PSNR and SHARP are also more advantageous.

In summary, the embodiment of the invention adopts the non-autoregressive structure of the encoder-attention-decoder, greatly relieves the problem of error accumulation in an autoregressive mode, improves the prediction efficiency, and the space-time attention model adopted by the invention can fully extract the spatial characteristics of each frame of radar echo and the time sequence characteristics of multi-frame radar echo, so that the model has more accurate prediction capability, and effectively improves the accuracy of radar echo prediction results.

Example two

The embodiment of the invention also provides a radar echo extrapolation device based on the depth space-time attention network, which is mainly used for executing the radar echo extrapolation method based on the depth space-time attention network provided by the embodiment, and the device provided by the embodiment of the invention is specifically introduced below.

Fig. 6 is a functional block diagram of a radar echo extrapolation apparatus based on a depth space-time attention network according to an embodiment of the present invention, where, as shown in fig. 6, the apparatus mainly includes: an acquisition module 10, a preprocessing module 20, a processing module 30, wherein:

the acquiring module 10 is configured to acquire multiple frames of time-continuous radar echo images of a preset geographic area, so as to obtain an original radar echo sequence.

The preprocessing module 20 is configured to preprocess the original radar echo sequence to obtain at least one target radar echo sequence; wherein the preprocessing comprises the following steps: normalization processing, outlier processing, missing value processing and electromagnetic echo interference processing.

The processing module 30 is configured to process at least one target radar echo sequence by using the target radar echo extrapolation model, so as to obtain a predicted radar echo sequence of the preset geographic area in a future time period.

The radar echo extrapolation device based on the depth space-time attention network provided by the embodiment of the invention uses a target radar echo extrapolation model when radar echo extrapolation is performed, the model adopts a non-autoregressive coding-space-time attention-decoding framework, the problem of error accumulation can be effectively relieved, and the space-time attention model comprises: at least one group of spatial attention modules and temporal attention modules adopting the same attention structure not only reduces the complexity of the model, but also can fully extract the spatial characteristics of each frame of radar echo image and the time sequence characteristics of multiple frames of radar echo images, so that the model has more accurate prediction capability and an accurate radar echo prediction result is obtained.

Optionally, the space-time attention model further includes: the system comprises a first dimension remodelling module, a second dimension remodelling module and a third dimension remodelling module.

The first dimension reshaping module is used for reshaping the dimension of the input data intoOutputting the mixture after the reaction; wherein B represents batch size, T represents the number of radar echo images in each target radar echo sequence, C represents the number of channels, H represents the height, and W represents the width.

Optionally, the spatial attention module comprises: a spatial attention sub-module that combines globally and locally, and a channel attention sub-module that is based on global pooling and average pooling.

The input data of the spatial attention module is processed by the spatial attention sub-module, and then the processing result of the spatial attention sub-module is input into the channel attention sub-module.

Or the input data of the spatial attention module is processed by the channel attention sub-module, and then the processing result of the channel attention sub-module is input into the spatial attention sub-module.

Optionally, the number of output channels of the encoder and the number of input channels of the decoder are equal to the number of frames of radar echo images in the target radar echo sequence.

Optionally, the apparatus is further configured to:

a plurality of historical radar echo images of a preset geographic area in a historical time period are obtained.

Preprocessing a plurality of historical radar echo images to obtain a plurality of target radar echo images.

Dividing a plurality of target radar echo images according to a preset time interval to obtain a plurality of radar echo extrapolation samples; wherein each radar echo extrapolated sample comprises: a training radar echo sequence for the input encoder and a real radar echo sequence for the calculation of the loss.

And training the initial radar echo extrapolation model by utilizing the plurality of radar echo extrapolation samples to obtain a target radar echo extrapolation model.

Alternatively, the loss function of the initial radar echo extrapolation model is expressed as:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Representing a real radar echo sequence,/->Representing a predicted radar echo sequence; />；/>；；/>Representing +.>Frame radar echo image,/->Representing +.>Frame radar echo image,/->Representation ofThe%>Individual pixel values +.>Representation->The%>Individual pixel values +.>Representing the number of frames of radar echo images in a radar echo sequence; />And->Representing the super parameter.

Optionally, the normalization process includes:

a maximum pixel value and a minimum pixel value in a plurality of historical radar echo images are determined.

The pixel values in all radar echo images are scaled equally based on the maximum pixel value and the minimum pixel value.

Example III

Referring to fig. 7, an embodiment of the present invention provides an electronic device, including: a processor 60, a memory 61, a bus 62 and a communication interface 63, the processor 60, the communication interface 63 and the memory 61 being connected by the bus 62; the processor 60 is arranged to execute executable modules, such as computer programs, stored in the memory 61.

The memory 61 may include a high-speed random access memory (RAM, random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. The communication connection between the system network element and at least one other network element is achieved via at least one communication interface 63 (which may be wired or wireless), and may use the internet, a wide area network, a local network, a metropolitan area network, etc.

Bus 62 may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 7, but not only one bus or type of bus.

The memory 61 is configured to store a program, and the processor 60 executes the program after receiving an execution instruction, and the method executed by the apparatus for defining a process disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 60 or implemented by the processor 60.

The processor 60 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware or instructions in software in the processor 60. The processor 60 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but may also be a digital signal processor (Digital Signal Processing, DSP for short), application specific integrated circuit (Application Specific Integrated Circuit, ASIC for short), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA for short), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory 61 and the processor 60 reads the information in the memory 61 and in combination with its hardware performs the steps of the method described above.

The embodiment of the invention provides a method and a device for radar echo extrapolation based on a depth space-time attention network, which comprise a computer readable storage medium storing non-volatile program codes executable by a processor, wherein the instructions included in the program codes can be used for executing the method described in the previous method embodiment, and specific implementation can be seen in the method embodiment and will not be repeated here.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

In the description of the present invention, it should be noted that, directions or positional relationships indicated by terms such as "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., are directions or positional relationships based on those shown in the drawings, or are directions or positional relationships conventionally put in use of the inventive product, are merely for convenience of describing the present invention and simplifying the description, and are not indicative or implying that the apparatus or element to be referred to must have a specific direction, be constructed and operated in a specific direction, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.

Furthermore, the terms "horizontal," "vertical," "overhang," and the like do not denote a requirement that the component be absolutely horizontal or overhang, but rather may be slightly inclined. As "horizontal" merely means that its direction is more horizontal than "vertical", and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.

In the description of the present invention, it should also be noted that, unless explicitly specified and limited otherwise, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. A method for radar echo extrapolation based on a depth spatiotemporal attention network, comprising:

Acquiring radar echo images with multiple frames of time continuity in a preset geographic area to obtain an original radar echo sequence;

preprocessing the original radar echo sequence to obtain at least one target radar echo sequence; wherein the preprocessing comprises: normalization processing, outlier processing, missing value processing and electromagnetic echo interference processing;

processing the at least one target radar echo sequence by utilizing a target radar echo extrapolation model to obtain a predicted radar echo sequence of the preset geographic area in a future time period;

wherein the target radar echo extrapolation model comprises: an encoder, a spatiotemporal attention model, and a decoder;

the encoder is used for extracting the visual characteristics of each frame of radar echo image in the at least one target radar echo sequence to obtain historical visual characteristics;

the spatiotemporal attention model includes: at least one set of spatial attention module and temporal attention module employing the same attention structure, the spatiotemporal attention model being used to predict visual features of each frame of radar echo predicted image in the future time period from the historical visual features;

the decoder is used for converting visual characteristics of each frame of radar echo predicted image in the future time period into the predicted radar echo sequence;

Wherein the space-time attention model further comprises: the system comprises a first dimension remodelling module, a second dimension remodelling module and a third dimension remodelling module;

the number of the first dimension remodelling modules is consistent with that of the space attention modules, and the output ends of the first dimension remodelling modules are connected with the input ends of the space attention modules;

the first dimension reshaping module is used for reshaping the dimension of the input data intoOutputting the mixture after the reaction; wherein B represents batch size, T represents the number of radar echo images in each target radar echo sequence, C represents the number of channels, H represents the height, and W represents the width;

the second dimension remodelling module is consistent with the time attention module in number, and the output end of the second dimension remodelling module is connected with the input end of the time attention module;

the second dimension remolding module is used for remolding the dimension of the input data intoOutputting the mixture after the reaction;

the input end of the third dimension remodelling module is connected with the output end of the target attention module; wherein the target attention module represents a last attention module in the spatiotemporal attention model;

the third dimension remodelling module is used for remodelling the dimension of the input data into: And outputting the result.

2. The depth spatiotemporal attention network based radar echo extrapolation method in accordance with claim 1, wherein said spatial attention module includes: a spatial attention sub-module combining global and local, and a channel attention sub-module based on global pooling and average pooling;

the input data of the space attention module is processed by the space attention sub-module, and then the processing result of the space attention sub-module is input into the channel attention sub-module;

or the input data of the space attention module is processed by the channel attention sub-module, and then the processing result of the channel attention sub-module is input into the space attention sub-module;

the saidThe spatial weight matrix of the spatial attention sub-module is expressed as:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Input data representing said spatial attention sub-module,/->Representing an activation function->Representation->Convolution (S)/(S)>Representation->Is a deep normal convolution of>Indicating that the expansion ratio is +.>Is->Depth expansion convolution;

the channel weight matrix of the channel attention sub-module is expressed as:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Input data representing the channel attention sub-module, And->Weight matrix representing multi-layer perceptron, +.>Representing the averaged pooled characteristics of the input data of the channel attention sub-module on each channel independently,/for each channel>The input data representing the channel attention sub-module is characterized by being maximally pooled independently on each channel.

3. The depth spatio-temporal attention network based radar echo extrapolation method according to claim 1, wherein the number of output channels of the encoder and the number of input channels of the decoder are equal to the number of frames of radar echo images in the target radar echo sequence.

4. The depth spatiotemporal attention network based radar echo extrapolation method in accordance with claim 1, further comprising:

acquiring a plurality of historical radar echo images of the preset geographic area in a historical time period;

preprocessing a plurality of historical radar echo images to obtain a plurality of target radar echo images;

dividing a plurality of target radar echo images according to a preset time interval to obtain a plurality of radar echo extrapolation samples; wherein each radar echo extrapolated sample comprises: a training radar echo sequence for inputting the encoder and a real radar echo sequence for calculating the loss;

And training the initial radar echo extrapolation model by utilizing the radar echo extrapolation samples to obtain a target radar echo extrapolation model.

5. The depth spatio-temporal attention network based radar echo extrapolation method of claim 4 wherein the loss function of the initial radar echo extrapolation model is expressed as:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Representing a real radar echo sequence,/->Representing a predicted radar echo sequence; />；；/>；/>Representing +.>Frame radar echo image,/->Representing +.>Frame radar echo image,/->Representation->The%>Individual pixel values +.>Representation->The%>Individual pixel values +.>Representing the number of frames of radar echo images in a radar echo sequence; />And->Representing the super parameter.

6. The depth spatiotemporal attention network based radar echo extrapolation method in accordance with claim 4, wherein said normalization process includes:

determining a maximum pixel value and a minimum pixel value in a plurality of historical radar echo images;

and scaling the pixel values in all radar echo images in equal proportion based on the maximum pixel value and the minimum pixel value.

7. A depth spatiotemporal attention network based radar echo extrapolation apparatus, comprising:

The acquisition module is used for acquiring radar echo images with multiple frames of time continuity in a preset geographic area to obtain an original radar echo sequence;

the preprocessing module is used for preprocessing the original radar echo sequence to obtain at least one target radar echo sequence; wherein the preprocessing comprises: normalization processing, outlier processing, missing value processing and electromagnetic echo interference processing;

the processing module is used for processing the at least one target radar echo sequence by utilizing a target radar echo extrapolation model to obtain a predicted radar echo sequence of the preset geographic area in a future time period;

8. An electronic device comprising a memory, a processor, the memory having stored thereon a computer program executable on the processor, characterized in that the processor, when executing the computer program, implements the steps of the depth spatiotemporal attention network based radar echo extrapolation method of any one of claims 1 to 6.

9. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the depth spatiotemporal attention network based radar echo extrapolation method of any one of claims 1 to 6.