CN115456314B

CN115456314B - Atmospheric pollutant space-time distribution prediction system and method

Info

Publication number: CN115456314B
Application number: CN202211408732.5A
Authority: CN
Inventors: 涂志华; 叶允明; 李旭涛
Original assignee: Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology
Current assignee: Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology
Priority date: 2022-11-11
Filing date: 2022-11-11
Publication date: 2023-03-24
Anticipated expiration: 2042-11-11
Also published as: CN115456314A

Abstract

The invention provides a system and a method for predicting the space-time distribution of atmospheric pollutants, which relate to the field of environmental prediction.A feature extraction unit is used for respectively inputting space-time distribution information into a sequential connection path and a jump connection path to extract features, and the sequential connection path can extract continuous time sequence features to obtain a continuous variation trend for predicting subsequent moments; the jump connection path is used for jumping and connecting data at different periods and at the same time, the characteristics of the data at the same time in the previous period can be merged into the characteristics at the current time through the transmission of the jump connection path to the characteristics, the characteristics at the current time are predicted by using more obvious periodic characteristics, and an auxiliary correction function is provided for the prediction at the current time; the fusion unit fuses the extracted features of the two connecting paths by combining the continuous change trend and the periodic change feature of the PM2.5 space-time distribution, so that the prediction result is more accurate, and the method can be used for predicting the PM2.5 space-time distribution situation after a long time.

Description

Atmospheric pollutant space-time distribution prediction system and method

Technical Field

The invention relates to the technical field of environmental prediction, in particular to a system and a method for predicting the spatial and temporal distribution of atmospheric pollutants.

Background

PM2.5 is a common atmospheric pollutant that was of great concern since some years ago due to haze problems. Aerosol (which refers to a gaseous dispersion system composed of solid or liquid particles suspended in a gaseous medium) is closely related to human production life, and accurate prediction of the content of aerosol can help citizens and related organizations to make corresponding decisions, thereby reducing the damage caused by the decisions. However, predicting aerosols accurately is a great challenge.

The atmospheric aerosol is complex and changeable, has multiple influence factors, complex influence relationship and violent data change. Some conventional prediction methods use simulation, which consumes a large amount of computing resources and has insufficient prediction accuracy. In other methods, machine learning models such as linear regression and random forest are used, and although the learning models are small and the calculation amount is small, the prediction effect is insufficient to meet the air forecasting requirement due to the limited modeling capability of the models. In recent years, space-time sequence prediction algorithms are gradually developed, and great results are obtained by crossing with various disciplines, such as precipitation prediction, traffic flow prediction and the like. These spatio-temporal sequence prediction algorithms can also be used for PM2.5 spatio-temporal distribution prediction, but these algorithms are not designed for PM2.5 prediction and do not make good use of PM2.5 spatio-temporal distribution characteristics. The methods only use data at adjacent moments, and data which are not adjacent in time are not well interacted, so that deeper data characteristics are difficult to utilize, more accurate prediction is difficult to achieve, the difference of prediction results is larger than that of real results, the prediction results of the models are higher and higher along with the extension of prediction time, the distribution prediction of PM2.5 at a longer time is not facilitated, and the prediction effect of the models needs to be improved.

Disclosure of Invention

The invention aims to solve the problem of how to make the prediction result of the atmospheric pollutants more accurate.

To solve the above problems, in one aspect, the present invention provides an atmospheric pollutant temporal-spatial distribution prediction system, including:

the characteristic extraction unit is used for respectively inputting the space-time distribution information of the atmospheric pollutants in continuous time periods into a sequential connection path and a jump connection path to carry out characteristic extraction so as to obtain a sequential connection characteristic and a jump connection characteristic, wherein the sequential connection path sequentially reads the space-time distribution information according to a time sequence, and the jump connection path reads the space-time distribution information at intervals according to a set jump cycle length;

the fusion unit is used for fusing the sequence connection feature and the jump connection feature to obtain a fusion feature;

and the decoding unit is used for decoding the fusion characteristics to obtain a final prediction result.

Further, the feature extraction units on the sequential connection path and the jump connection path are ConvGRU units, and the ConvGRU units include a convolutional neural network and a cyclic neural network.

Further, the feature extraction and operation process in the ConvGRU unit is as follows:

，

wherein "∗" represents a convolution operation, "-" represents a Hadamard product, Z _t To refresh the door, R _t To reset the gate, H _t−1 Characteristic of the preceding moment, X _t Is the input image at the time t,

representing hidden layer candidates, H _t Representing the characteristics of the current time. W is a group of _xz ，W _hz ，W _xr ，W _hr ，W _xh And W _hh Respectively, the convolution kernels are convolution kernels of convolution operation, sigma represents a Sigmoid activation function, reLU represents a single-side inhibition activation function, and the output result is a hidden layer candidate value.

Further, the fusion unit includes a convolutional layer and a self-attention layer, the self-attention layer including a channel self-attention mechanism and a spatial self-attention mechanism, the channel self-attention mechanism being superimposed with the spatial self-attention mechanism.

Further, the sequential connection feature and the jump connection feature are spliced, splicing results are input into the convolutional layer, the convolutional layer calculates the weighted sum of the splicing results to obtain transition features, and the transition features are input into the self-attention layer to obtain output results of the fusion unit.

Further, the channel self-attention mechanism in the self-attention layer convolves the transition feature, including:

attened_c=ECA(X)∗X+X，

ECA(X)=σ(C1D _k (g(X)))，

，

wherein, attended _ c represents the output result of the channel self-attention mechanism, and X is the transition characteristic H obtained by the convolution layer _Convt "∗" denotes convolution operation, σ denotes Sigmoid activation function, C1D _k Denotes one-dimensional convolution on a channel, k is the size of the convolution kernel, g (X) denotes global average pooling, W and H denote the length and width of the picture, respectively, i and j denote the ith row, jth column, X of the picture _ij The characteristics of the ith row and the jth column of the picture at the current time are shown.

Further, the spatial self-attention mechanism in the self-attention layer convolves the output result of the channel self-attention mechanism, including:

fusioned_X=Attention(W _q *attened_c,W _k *attened_c,W _v *attened_c)∗attened_c+attened_c，

wherein fused _ X is an output result of the fusion unit, W _q ，W _k And W _v The convolution kernels of 3*3, respectively, attended _ c represent the output of the channel from the attention mechanism.

Further, the decoding unit is further configured to decode the sequential connection feature and the skip connection feature, respectively, to obtain a sequential prediction result and a skip prediction result.

And further, performing gradient descent optimization on the obtained final prediction result, the obtained sequence prediction result and the obtained jump prediction result, updating the optimization result serving as a new input into the feature extraction unit, and performing optimization training on the feature extraction unit and the fusion unit.

In another aspect, the present invention further provides a method for predicting the spatial-temporal distribution of atmospheric pollutants, including:

acquiring space-time distribution information of atmospheric pollutants in continuous time periods;

respectively inputting the space-time distribution information into a sequential connection path and a jump connection path to perform feature extraction, so as to obtain a sequential connection feature and a jump connection feature, wherein the sequential connection path sequentially reads the space-time distribution information according to a time sequence, and the jump connection path reads the space-time distribution information at intervals according to a set jump cycle length;

fusing the sequential connection feature and the jump connection feature to obtain a fused feature;

and decoding the fusion characteristics to obtain a final prediction result.

Compared with the prior art, the invention has the following beneficial effects:

the invention provides a system and a method for predicting the space-time distribution of atmospheric pollutants, wherein the space-time distribution information is respectively input into a sequential connecting passage and a jump connecting passage through a characteristic extraction unit for characteristic extraction, and the sequential connecting passage can extract continuous time sequence characteristics to obtain a continuous variation trend for predicting the subsequent time; the jump connection path is used for jumping and connecting data at different periods and at the same time, the characteristics of the data at the same time in the previous period can be merged into the characteristics at the current time through the transmission of the jump connection path to the characteristics, the characteristics at the current time are predicted by using more obvious periodic characteristics, and an auxiliary correction function is provided for the prediction at the current time; the fusion unit fuses the extracted features of the two connecting paths by combining the continuous variation trend and the periodic variation feature of PM2.5 space-time distribution, so that the prediction result is more accurate, and the method can be used for predicting the PM2.5 space-time distribution situation after a long time.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description will be given below of the drawings required for the description of the embodiments or the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a block flow diagram illustrating an atmospheric pollutant spatial-temporal distribution prediction system in an embodiment of the present invention;

FIG. 2 is a schematic diagram showing a sequential connection path and a jump connection path in an embodiment of the present invention;

FIG. 3 is a schematic diagram showing the information processing flow of the feature extraction unit in the embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating a process flow of information processing of the fusion unit in the embodiment of the present invention;

FIG. 5 is a flow chart illustrating a method for predicting the spatial-temporal distribution of atmospheric pollutants in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 shows a block flow diagram of a prediction system of spatial-temporal distribution of atmospheric pollutants in an embodiment of the present invention, which includes:

the characteristic extraction unit is used for respectively inputting the space-time distribution information of the atmospheric pollutants in continuous time periods into a sequential connection path and a jump connection path to carry out characteristic extraction so as to obtain a sequential connection characteristic and a jump connection characteristic, wherein the sequential connection path sequentially reads the space-time distribution information according to a time sequence, and the jump connection path reads the space-time distribution information at intervals according to a set jump cycle length.

The spatial-temporal distribution of PM2.5 has a significant periodicity, in particular a higher concentration of PM2.5 at the same time of day and a lower concentration of PM2.5 at another same time, due to factors such as human activity. The existing space-time sequence prediction models all use sequentially connected paths, namely, the propagation of data features is continuously transmitted from the previous moment to the next moment in time, and due to the transmission, images at non-adjacent moments cannot be directly interacted, so that the periodic features of data are difficult to clearly model, the periodic features of PM2.5 are very obvious, and the periodic modeling can obviously help to make more accurate prediction. Therefore, in order to better model the periodicity of PM2.5 distribution, two transfer links for model extraction features are designed, one is a sequential connection path, namely the sequential connection feature at the time t-1 is used for predicting the sequential connection feature at the time t; the other is a jump connection path, namely the jump connection characteristic at the T moment is predicted by using the jump connection characteristic at the T-T moment, wherein T is the length of the jump period.

And the fusion unit is used for fusing the sequential connection feature and the jump connection feature to obtain a fusion feature.

And the decoding unit is used for decoding the fusion characteristics to obtain a final prediction result. In addition, the decoding unit is further configured to decode the sequential connection feature and the skip connection feature, respectively, to obtain a sequential prediction result and a skip prediction result.

In FIG. 1, H _seqt−1 Features derived from sequential connecting paths at time t-1, H _skipt−T Showing the characteristics obtained by the jump connection path at time T-T. H _seqt Is a characteristic obtained by connecting the paths sequentially at time t, H _skipt Is a characteristic obtained by the jump connection path at time t. H _t The prediction result is the fusion characteristic at the time t, and the prediction result is output at the time t after decoding. To better illustrate the relationship of the two pathways, this is further illustrated in conjunction with FIG. 2.

Taking PM2.5 reanalysis field data in the european space bureau CAMS data as an example, the time interval of the data is 3 hours, so that 8 pictures exist in one day, the pictures in the first day are X1 to X8, the pictures in the second day are X9 to X16, and so on. Then the sequential connection is the path indicated by the open arrow in the figure, and is the time at which the feature of X1 is passed to X2 and then to X3; the path of the jump connection is indicated by the filled arrow in the figure, i.e. the feature of X1 is transmitted to the time of X9, then to the time of X17, the feature of X2 is transmitted to the time of X10, then to the time of X18, and so on.

The system inputs PM2.5 space-time distribution in a period of continuous time, data respectively extract features from a jump connection path and a sequence connection path in a feature extraction unit, the two paths use a ConvGRU unit to extract the features, for the two extracted features, a gating fusion unit is used to fuse the fusion features of the current moment, wherein the gating fusion unit uses a mode of combining convolution and a self-attention mechanism. And respectively decoding the jump connection feature, the sequence connection feature and the fusion feature to obtain three prediction results, and performing gradient descent optimization together to train a calculation model in the system and finally take the result of decoding the fusion feature as a final prediction result. The output is the PM2.5 spatio-temporal distribution over a subsequent continuous period.

Respectively inputting the space-time distribution information into a sequential connection path and a jump connection path through a feature extraction unit for feature extraction, wherein the sequential connection path can extract continuous time sequence features to obtain a continuous variation trend for prediction of subsequent moments; the jump connection path is used for jumping and connecting data at different periods and at the same time, the characteristics of the data at the same time in the previous period can be merged into the characteristics at the current time through the transmission of the jump connection path to the characteristics, the more obvious periodic characteristics are used for predicting the characteristics at the current time, and the jump connection path has an auxiliary correction function on the prediction at the current time; the fusion unit fuses the extracted features of the two connecting paths by combining the continuous variation trend and the periodic variation feature of PM2.5 space-time distribution, so that the prediction result is more accurate, and the method can be used for predicting the PM2.5 space-time distribution situation after a long time.

The feature extraction units on the sequence connection path and the jump connection path are ConvGRU units, and the ConvGRU units comprise convolutional neural networks and cyclic neural networks. The method has the characteristics of small model parameter quantity, easy training and the like, and can better model the space-time sequence data. Also useful are ConvLSTM, trajGRU, predRNN, MIM, and the like.

In an embodiment of the present invention, as shown in fig. 3, the feature extraction operation process in the ConvGRU unit is as follows:

，

wherein "∗" represents a convolution operation, "-" represents a Hadamard product, Z _t For updating the gate, it is decided how many characteristics H of the previous moment are _t−1 To update into the current neuron, R _t To reset the gate, the characteristic H of the previous moment is determined _t−1 How much is forgotten at the current moment, X _t Is the input image at time t, H _t−1 The ConvGRU uses an update door and a forgetting door to fuse the information of the update door and the forgetting door to obtain a characteristic H of the current time _t ，

Representing hidden layer candidates, H _t Indicating the characteristics of the current time. W is a group of _xz ，W _hz ，W _xr ，W _hr ，W _xh And W _hh Each being a convolution kernel of a convolution operation, is a weight matrix of 3*3, where W _xz Represents from X _t Convolution yields generation Z _t Of the parameters W _xr Is represented by X _t Convolution yields the product R _t By analogy, sigma represents a Sigmoid activation function, the function of the sigma is to normalize the value to an interval from 0 to 1, reLU represents a single-side inhibition activation function, and the output result is a hidden layer candidate value.

In one embodiment of the present invention, the fusion unit includes a convolution layer and a self-attention layer, the self-attention layer includes a channel self-attention mechanism and a spatial self-attention mechanism, and the self-attention layer is formed by superposing the channel self-attention mechanism and the spatial self-attention mechanism. Two features extracted from the jump connection path and the sequential connection path can be better fused, as shown in fig. 4.

Wherein the sequential connection feature and the jump connection feature are spliced, and a splicing result is input into the convolutional layer, which calculates a plurality of convolutional layersAnd obtaining a transition characteristic by the weighted sum of the splicing results, and inputting the transition characteristic into the self-attention layer to obtain an output result of the fusion unit. Suppose that the sequential connection path at the present time gets the feature of H _seqt The jump connecting path is characterized by H _skipt They have the same shape. Splicing the two on the channel to obtain the result of H _seqt ,H _skipt ]The result of the concatenation is input into the convolutional layer, which is composed of a plurality of 1 × 1 convolutional kernels. The function principle of the 1 x1 convolution kernel is to carry out weighted sum on the data of each channel, and a plurality of convolution kernels have various weighted sums and can extract various [ H ] H _seqt ,H _skipt ]The splicing feature of (1).

Output result transition characteristics H obtained by convolutional layer _Convt And inputting the data into a self-attention layer, wherein the self-attention layer uses a mode of superposing a channel self-attention mechanism and a space self-attention mechanism. The channel self-attention mechanism aims to increase the interaction between channels and extract some higher-order features. Specifically, the ECA channel self-attention mechanism is used. The ECA channel self-attention mechanism replaces a full connection layer in a common channel self-attention mechanism SENEt by using one-dimensional convolution on channels, only considers local cross-channel interaction, adaptively determines the field of view of the one-dimensional convolution according to the number of the channels, and is light-weight and efficient channel convolution.

In an embodiment of the present invention, the channel self-attention mechanism in the self-attention layer convolves the transition feature, and the formula of the ECA channel self-attention mechanism is:

ECA(X)=σ(C1D _k (g(X)))，

，

where "∗" denotes convolution operation, σ denotes Sigmoid activation function, C1D _k Represents the one-dimensional convolution on the channel, k is the size of the convolution kernel, in this embodiment, k =5,g (X) represents the Global Average Pooling (GAP), W and H represent the length sum of the picture, respectivelyWidth, i and j denote the ith row, jth column, X of the picture _ij The characteristics of the ith row and the jth column of the picture at the current time are shown.

In summary, the overall process of the channel self-attention layer is to substitute the result obtained from the preamble step into the ECA channel self-attention formula, so that the channel attention result is superimposed on the original feature, and the overall formula is as follows:

attened_c=ECA(X)∗X+X，

wherein, attended _ c represents the output result of the channel self-attention mechanism, and X is the transition characteristic H obtained by the convolution layer _Convt 。

In one embodiment of the invention, the spatial self-attention mechanism is superimposed on the channel self-attention mechanism. The spatial self-Attention mechanism adopts a traditional QKV implementation mode, wherein an output result attentive _ c of a channel self-Attention is convolved by three different 3*3 convolution kernels respectively to obtain Q, K and V, the three different convolution kernels are substituted into a spatial self-Attention formula Attention (Q, K, V), the obtained result is abbreviated as attentive _ st, the obtained result is superposed on input features in a previous Attention superposition mode, and the spatial self-Attention mechanism in a self-Attention layer convolves the output result of the channel self-Attention mechanism, and is specifically represented by a formula:

In an embodiment of the present invention, gradient descent optimization is performed on the obtained final prediction result, the sequential prediction result, and the skip prediction result, and the optimization result is updated to the feature extraction unit as a new input, so as to perform optimization training on the feature extraction unit and the fusion unit. In order to better train the model, based on a training mechanism of multi-task learning, a prediction result can be obtained through a sequence connection path and a jump connection path respectively, and a prediction result can also be obtained through a final fusion unit, the three prediction results are obtained through different feature transmission paths in the system respectively, different features are fused, the prediction results are effective and only have different accuracies, and the most accurate prediction result is obtained through the fusion unit. And updating different results by using a gradient descent algorithm, and updating parameters on different paths. Although the prediction result obtained by the path of the fusion unit is finally output, a training mechanism of multi-task learning can be used for carrying out gradient descent on the outputs of the three paths together, and continuous training and updating of the two paths can ensure that the training effect of the fusion unit is better and help the fusion unit to learn parameters more quickly.

The system utilizes the characteristic that PM2.5 is periodically distributed in time to construct a jump connection path on the basis of a ConvLSTM model, and uses a fusion unit based on a convolution and self-attention mechanism to organically fuse PM2.5 spatial distribution of time moments with different periods and the same phase with PM2.5 spatial distribution of current adjacent time moments together, so that a more accurate prediction result is obtained through modeling periodicity, and the prediction result can show more obvious accuracy than other space-time sequence prediction models along with the extension of prediction time.

Fig. 5 is a flowchart of a method for predicting the spatial-temporal distribution of atmospheric pollutants according to an embodiment of the present invention, which is combined with fig. 1, and includes:

step 1: spatial-temporal distribution information of atmospheric pollutants over successive time periods is obtained.

Step 2: and respectively inputting the space-time distribution information into a sequential connection path and a jump connection path to perform feature extraction, so as to obtain a sequential connection feature and a jump connection feature, wherein the sequential connection path sequentially reads the space-time distribution information according to a time sequence. And the jumping connection path reads the space-time distribution information at intervals according to a set jumping period length.

And step 3: and fusing the sequential connection feature and the jump connection feature to obtain a fused feature.

And 4, step 4: and decoding the fusion characteristics to obtain a final prediction result.

The method can be suitable for PM2.5 space-time distribution prediction across the country, and the sequential connection path can extract continuous time sequence characteristics to obtain a continuous variation trend for prediction at subsequent moments; the jump connection path is used for jumping and connecting data at different periods and at the same time, the characteristics of the data at the same time in the previous period can be merged into the characteristics at the current time through the transmission of the jump connection path to the characteristics, the characteristics at the current time are predicted by using more obvious periodic characteristics, and an auxiliary correction function is provided for the prediction at the current time; the characteristics extracted by the two connecting paths are fused by combining the continuous variation trend and the periodic variation characteristics of the PM2.5 space-time distribution, so that the prediction result is more accurate, and the method can be used for predicting the PM2.5 space-time distribution situation after a long time. When the current time is predicted through the two paths, the time-space distribution change trend of the PM2.5 concentration in the previous continuous time can be clearly known, the level of the PM2.5 concentration at the same time in each period can be clearly known, and the PM2.5 homonymy change trend at the same time in the previous continuous days. In addition, the PM2.5 is closely related to the production life of human beings, and the accurate prediction of the content of the PM2.5 can help citizens and units and other shutdown mechanisms to make corresponding decisions, so that the damage caused by pollution is reduced.

In an embodiment of the present invention, after the time-space distribution information is respectively input to the sequential connection path and the jump connection path to perform feature extraction, and the sequential connection feature and the jump connection feature are obtained, the method for predicting the time-space distribution of the atmospheric pollutants further includes:

and respectively decoding the sequential connection characteristic and the jump connection characteristic to obtain a sequential prediction result and a jump prediction result. These two predictions can also be considered as the last prediction in the short term, but if it is desired to predict the PM2.5 spatio-temporal distribution over a longer time, it is also necessary to combine the two.

In an embodiment of the present invention, after the decoding the fused features to obtain a final prediction result, the method for predicting the spatial-temporal distribution of the atmospheric pollutants further includes:

and performing gradient descent optimization on the obtained final prediction result, the sequence prediction result and the jump prediction result so as to optimize a feature extraction algorithm and a feature fusion algorithm. And finally, training the model by using an ADAM optimizer by using a training mechanism of multi-task learning. During training, the sequence prediction result and the jump prediction result output by the two connection paths can be used for training feature extraction models in the sequence connection path and the jump connection path, and can also be used for training a fusion model in a fusion unit. Therefore, each channel and the fusion unit are kept updated and optimized, and the updating and optimizing effect of each channel can be superposed to the updating and optimizing effect of the fusion unit, so that the training effect of the fusion unit is better, and the learning speed of a fusion model in the fusion unit is higher.

Although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An atmospheric pollutant spatio-temporal distribution prediction system, comprising:

the decoding unit is used for decoding the fusion characteristics to obtain a final prediction result;

the fusion unit comprises a convolution layer and a self-attention layer, wherein the self-attention layer comprises a channel self-attention mechanism and a space self-attention mechanism, and the channel self-attention mechanism is superposed with the space self-attention mechanism;

and splicing the sequential connection characteristic and the jump connection characteristic, inputting a splicing result into the convolutional layer, calculating a weighted sum of a plurality of splicing results by the convolutional layer to obtain a transition characteristic, and inputting the transition characteristic into the self-attention layer to obtain an output result of the fusion unit.

2. The atmospheric pollutant spatial-temporal distribution prediction system of claim 1, wherein the feature extraction units on the sequential connection path and the hopping connection path are ConvGRU units, the ConvGRU units comprising convolutional neural networks and cyclic neural networks.

3. The atmospheric pollutant spatial-temporal distribution prediction system of claim 2, wherein the feature extraction operation process in the ConvGRU unit is as follows:

，

wherein "∗" represents a convolution operation, "-" represents a Hadamard product, Z _t To change the door, R _t To reset the gate, H _t−1 Characteristic of the preceding moment, X _t An input image representing the time t is shown,

indicates the hidden layer candidate value, H _t Characteristic of the current time, W _xz ，W _hz ，W _xr ，W _hr ，W _xh And W _hh Respectively, the convolution kernel, σ, of the convolution operationAnd (4) representing a Sigmoid activation function, reLU representing a single-side inhibition activation function, and outputting a result which is a hidden layer candidate value.

4. The atmospheric pollutant spatial-temporal distribution prediction system of claim 1, wherein the channel self-attention mechanism in the self-attention layer convolves the transition features, comprising:

attened_c=ECA(X)∗X+X，

ECA(X)=σ(C1D _k (g(X)))，

，

wherein, attended _ C represents the output result of the channel self-attention mechanism, X represents the transition feature obtained by the convolution layer, "∗" represents convolution operation, sigma represents Sigmoid activation function, and C1D _k Denotes one-dimensional convolution on a channel, k denotes the size of the convolution kernel, g (X) denotes global average pooling, W and H denote the length and width of the picture, respectively, i and j denote the ith row, jth column, X of the picture _ij The characteristics of the ith row and the jth column of the picture at the current time are shown.

5. The atmospheric pollutant spatial-temporal distribution prediction system of claim 4, wherein the spatial self-attention mechanism in the self-attention layer convolves the output of the channel self-attention mechanism, comprising:

wherein fused _ X represents an output result of the fusion unit, W _q ，W _k And W _v The convolution kernels of 3*3, respectively, attended _ c represent the output of the channel from the attention mechanism.

6. The atmospheric pollutant spatial-temporal distribution prediction system of claim 1, wherein the decoding unit is further configured to decode the sequential connection feature and the skip connection feature to obtain a sequential prediction result and a skip prediction result, respectively.

7. The atmospheric pollutant spatial-temporal distribution prediction system according to claim 6, wherein gradient descent optimization is performed on the obtained final prediction result, sequential prediction result and jump prediction result, the optimization result is updated to the feature extraction unit as a new input, and optimization training is performed on the feature extraction unit and the fusion unit.

8. A method for predicting the space-time distribution of atmospheric pollutants is characterized by comprising the following steps:

inputting the space-time distribution information into a sequential connection path and a jump connection path respectively to perform feature extraction to obtain sequential connection features and jump connection features, wherein the sequential connection path reads the space-time distribution information in sequence according to a time sequence, and the jump connection path reads the space-time distribution information at intervals according to a set jump period length;

decoding the fusion characteristics to obtain a final prediction result;

and splicing the sequential connection features and the jump connection features, inputting splicing results into a convolutional layer, calculating the weighted sum of a plurality of splicing results by the convolutional layer to obtain transition features, and inputting the transition features into a self-attention layer to obtain the fusion features.