CN113762351B

CN113762351B - Air quality prediction method based on deep transition network

Info

Publication number: CN113762351B
Application number: CN202110923976.6A
Authority: CN
Inventors: 欧阳继红; 杨智尧; 王艺蒙; 曲延非; 李嘉寅; 毕夏旭; 王兵
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2023-12-05
Anticipated expiration: 2041-08-12
Also published as: CN113762351A

Abstract

The invention discloses an air quality prediction method based on a deep transition network, and provides an air quality prediction model (AI-DTN) based on auxiliary information and the deep transition network, which comprises two transition networks in different directions, wherein the two transition networks respectively extract characteristic information from the two time sequence directions to enhance the characteristic extraction degree. Each transition network in the AI-DTN extracts a gating circulation unit AI-GRU of fusion auxiliary information of spatial characteristics and an existing transition gating circulation unit T-GRU for extracting time characteristics. In two gating of AI-GRU, one controls the degree of auxiliary information flowing into the gating circulation unit, and the other controls the fusion degree of PM2.5 and auxiliary information, and the gating mechanism can avoid mutual interference in the information fusion process.

Description

Air quality prediction method based on deep transition network

Technical Field

The invention relates to the technical field of data processing, in particular to an air quality prediction method based on a deep transition network.

Background

There are many factors affecting the air quality, such as NO, CO, etc., pollutants, automobile exhaust, industrial emissions, and weather information such as wind speed, wind direction, rainfall, etc., which are collectively referred to as auxiliary information. There are difficulties in using these auxiliary information for air quality prediction. First, it is difficult to obtain all of these information accurately, and it is difficult to obtain all of the real-time emission of automobile exhaust and industrial emission information for pollution information, and weather information is a certain deviation from forecast information, so that forecast information cannot be used, and error accumulation is caused. Therefore, the current air quality prediction mostly uses pollutant information such as NO and CO, and past weather information. Second, complex variations can occur between contaminants, between contaminants and meteorological information, and there is no way to fully model all the variations. How to use the well available auxiliary information, finding the respective auxiliary information has an important meaning for the different roles of the PM2.5 prediction.

Most air quality prediction models now utilize CNN for spatial feature extraction among various features, RNN for temporal feature extraction, or attention mechanism directly for spatial feature extraction, and then cyclic RNN network or LSTM network for temporal feature extraction. But these methods do not have sufficient ability to extract potential features given the side information.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide an air quality prediction method based on a deep transition network so as to improve the accuracy of air quality prediction.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

an air quality prediction method based on a deep transition network comprises the following specific processes:

s1, acquiring air quality time sequence data and preprocessing;

s2, carrying out air quality prediction by adopting an air quality prediction model AI-DTN based on auxiliary information and a deep transition network:

the air quality prediction model AI-DTN consists of a front deep transition network, a back deep transition network and a full connection layer, wherein the front deep transition network and the back deep transition network are used for extracting space features and time features, then the results of the two transition networks are spliced together, and finally the full connection layer is used for outputting;

the depth of each deep transition network is L; the first layer of the deep transition network is a gating circulation unit AI-GRU, wherein the AI-GRU is used for extracting input spatial characteristics; the second layer to the L layer of the deep transition network are composed of transition gate control circulating units T-GRU, and the output of the L layer T-GRU at the time T is the input of the first layer AI-GRU at the time t+1;

the detailed calculation process of the air quality prediction model AI-DTN is as follows:

the input of the model is divided into two parts, the first part being a PM2.5 time series representing a historical time window size q, denoted X _t ＝{x _t-q+1 ，...，x _t }，X _t Is a matrix of dimension 1*q, and the second part is a time sequence of side information representing a historical time window of size q, denoted as A _t ＝{a _t-q+1 ，…，a _t }，A _t Is a matrix with dimension of n x q, A _t Each a of (a) _t-q+1 ，...，a _t Are all a matrix of n 1, where n represents the number of features in the auxiliary information;

in a forward deep transition network, X is first taken _t And A _t Input into AI-GRU to obtain hidden state of first layer of deep transition network

Wherein L represents the layer number of the deep transition network, and the hidden state is weighted and fused with PM2.5 information and auxiliary information to represent the spatial characteristic information at the moment t; the hidden state is then passed to the next layer of T-GRU for the time step, whose hidden state is as follows:

where i represents the current network depth, the T-GRU takes only the hidden state of the AI-GRU of the upper layer as input, and the hidden state of the T-GRU of the last layer is transferred to the AI-GRU of the next time step as input;

likewise, for X using a reverse deep transition network _t And A _t The two time sequences are subjected to reverse feature extraction, so that hidden states representing reverse time sequence information can be obtained

Then splicing the hidden states of the forward and reverse deep transition networks together according to the time sequence:

wherein; representing a splicing operation; at this time E _t The system comprises space characteristic information and time characteristic information which are extracted from a deep transition network and comprise a forward time sequence and a reverse time sequence; finally, E is _t Inputting the final prediction to a full connection layer to obtain a final output:

Y _t ＝W*E _t +b；

where W is the parameter matrix and b is the bias term.

Further, in step S1, the specific process of the pretreatment is:

s1.1, processing a missing value: performing missing value processing on the original air quality time sequence data based on a Lagrange interpolation method;

s1.2, normalization: and (3) adopting a normalization method of min-max standardization to linearly transform the data after the missing value processing is completed, so that the result value is mapped between [0-1 ].

Further, stepIn step S2, for time step t, the hidden state h of the AI-GRU network _t The calculation formula of (2) is shown as follows:

wherein +. _t Gating z by updating of current time step _t To conceal the state h from the last time step _t-1 And candidates for hidden state of current time stepInformation selection and combination are carried out;

z _t the updating gate control has the value range of (0, 1), the value is closer to 0, the more the historical information is discarded, the less the information is newly added in the current time step, the value is closer to 1, the less the information is newly added in the past time step, and the more the information is newly added in the current time step; updating gating z _t The calculation formula of (2) is shown as follows:

z _t ＝σ(W _xz x _t +W _hz h _t-1 +W _az a _t )；

W _xz 、W _az 、W _hz respectively, the weights are represented by the weights,is a candidate value of the hidden state of the current time step; />PM2.5 information x of current time step is selectively controlled by gating mechanism _t Auxiliary information a _t Hidden state h of last time step _t-1 Adding into AI-GRU; candidate value of hidden state +.>The calculation formula of (2) is shown as follows:

r _t representing a reset gate, l _t Gating representing linear transformation g _t Representing gating of auxiliary information, p _t Representing the gating of the degree of fusion of the auxiliary information and the PM2.5 information, H (x) represents the linear transformation of PM 2.5;scaling data to [ -1, 1] by tanh activation function]Finally, the information after linear transformation is added to obtain +.>Results of (2); r is (r) _t 、l _t 、g _t 、p _t The calculation formula of H (x) is as follows:

r _t ＝σ(W _xr x _t +W _hr h _t-1 ) (7)；

l _t ＝σ(W _xl x _t +W _hl h _t-1 ) (8)；

g _t ＝σ(W _ag a _t +W _hg h _t-1 ) (9)；

p _t ＝σ(W _ap a _t +W _hp h _t-1 ) (10)；

H(x _t )＝W _x x _t (11)；

in the above formula, W _xr 、W _hr 、W _xl 、W _hl 、W _ag 、W _hg 、W _ap 、W _hp 、W _x Respectively represent weights, r _t Representing reset gating, representing control over historical information; at the position ofIn the calculation of (2), r _t And h _t-1 Performing element multiplication operation, h _t-1 All history information up to the last time step is contained, and r _t The value range of (0, 1), which means that the value is closer to 0, the less the historical information representing the inflow of the AI-GRU is, the value is closer to 1, the more the historical information representing the inflow of the AI-GRU is, so that the historical information irrelevant to prediction can be timely discarded;

g _t and p _t Is to a _t And h _t-1 Performing nonlinear transformation; wherein g _t The function of (2) is to extract auxiliary information useful for PM2.5, which controls the extent to which the auxiliary information flows into the AI-GRU, and the value range is (0, 1), the value is closer to 0, the smaller the auxiliary information flowing into the AI-GRU is represented, the value is closer to 1, and the more auxiliary information flowing into the AI-GRU is represented; p is p _t The function of the system is to merge auxiliary information and PM2.5 information, the system controls the degree of the auxiliary information and PM2.5 information, the value range is (0, 1), the value is closer to 0, the smaller the fusion degree of the auxiliary information and PM2.5 information is, the closer to 1, and the larger the fusion degree of the auxiliary information and PM2.5 information is;

l _t the gating of the linear transformation H (x) controls the degree of PM2.5 information flowing into the AI-GRU after the linear transformation, the value range is (0, 1), the value is closer to 0, the less PM2.5 information flowing into the AI-GRU is represented, the value is closer to 1, and the more PM2.5 information flowing into the AI-GRU is represented; h (x) is a linear transformation of PM2.5 information that acts to focus AI-GRU on PM2.5 only, making it more focused on PM2.5 information;

AI-GRU gating r by resetting _t Gating g of auxiliary information _t Gating p representing the degree of fusion of side information and PM2.5 information _t Linear transformation gating _t The influence degree of various auxiliary information on the air quality prediction is effectively controlled; meanwhile, the gating mechanism selectively adds the predicted auxiliary information, PM2.5 information and history information with positive effect into the AI-GRU, and discards various information irrelevant to prediction in time.

Further, the hidden state of the T-GRUThe calculation formula of (2) is shown as follows:

wherein, as the element multiplication, i represents the depth of the current transition network; z _t The update gating is that the value range is (0, 1), the closer the value is to 0, the more the discarded historical information is represented, the less the information newly added by the current network layer is, the closer the value is to 1, the less the information representing the discarded previous network layer is, the more the information newly added by the current network layer is, and the calculation formula is shown as the following formula:

is a candidate for the hidden state of the T-GRU by resetting the gate r _t To conceal the status of the previous network layer +.>And performing data processing, wherein the calculation formula is shown as follows:

r _t a representative reset gate representing control of the history information; r is (r) _t The value range of (0, 1), which means that the value is closer to 0, the less the historical information representing inflow T-GRU is, the value is closer to 1, the more the historical information representing inflow T-GRU is, so that the historical information which is irrelevant to prediction can be cleared in time, and the calculation formula is shown as follows:

the T-GRU only receives the hidden states transmitted by the upper layer in the same time step, so that a special nonlinear relation between continuous hidden states can be learned, and further, the state representation of a deeper layer can be obtained.

The invention has the beneficial effects that: in order to extract deep space features and time features of air quality data, the invention provides an air quality prediction model (AI-DTN) based on auxiliary information and deep transition networks, which comprises two transition networks in different directions, wherein feature information is extracted from the two time sequence directions respectively to enhance the feature extraction degree. Each transition network in the AI-DTN extracts a gating circulation unit AI-GRU of fusion auxiliary information of spatial characteristics and an existing transition gating circulation unit T-GRU for extracting time characteristics. In the two gating of AI-GRU, one controls the range of auxiliary information flowing into the gating circulation unit, and the other controls the fusion degree of PM2.5 and auxiliary information, and the gating mechanism can avoid mutual interference in the information fusion process.

Drawings

FIG. 1 is a schematic diagram of a process according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of the structure of an air quality prediction model AI-DTN according to an embodiment of the invention;

FIG. 3 is a block diagram of an AI-GRU in accordance with an embodiment of the invention;

FIG. 4 is a block diagram of a T-GRU in an embodiment of the invention.

Detailed Description

The present invention will be further described with reference to the accompanying drawings, and it should be noted that, on the premise of the present technical solution, the present embodiment provides a detailed implementation manner and a specific operation procedure, but the protection scope of the present invention is not limited to the present embodiment.

The embodiment provides an air quality prediction method based on a deep transition network, as shown in fig. 1, which specifically comprises the following steps:

s1, acquiring air quality time sequence data and preprocessing, wherein the preprocessing process comprises the following steps:

s1.1, processing a missing value:

there are a large number of incomplete, inconsistent, abnormal and deviated data in the original air quality time sequence data, and these problem data can influence the accuracy of air quality prediction. Data preprocessing is therefore indispensable, and among them, it is common to perform missing value processing of a data set.

Data loss value processing can be divided into two categories. One is deletion of missing data, and one is data interpolation. The biggest limitation of the former is that the history data is reduced to be replaced by complete data, so that a great amount of resource waste is caused, and especially in the case of fewer data sets, the objectivity and accuracy of an analysis result may be directly affected by deleting records. The present embodiment thus performs the missing value processing based on the lagrangian interpolation method.

The definition of Lagrange interpolation is as follows:

for a certain polynomial function, given k+1 value points are known, (x) ₀ ，u ₀ )，...(x _k ，y _k ) Wherein x is _j Corresponds to the position of the argument, and y _i Corresponding to the value of the function at this location.

Assuming any two different x _j All are different from each other, the lagrangian interpolation polynomial obtained by applying the lagrangian interpolation formula is:

wherein l _j (x) Is a Lagrangian base polynomial (or interpolation basis function) whose expression is:

lagrangian base polynomial l _j (x) Is characterized in that x is _j Take on a value of 1 at other points x _i The value of i.noteq.j is 0.

S1.2 normalization

The data normalization process is a basic work before an air quality model is built, different features often have different dimensions and dimension units, the situation can influence the result of data analysis, and in order to eliminate the dimension influence among the features, the data normalization process is needed to solve the comparability among data indexes. After the original data is subjected to data standardization processing, all the characteristics are in the same order of magnitude, and the method is suitable for comprehensive comparison and evaluation. The normalization method adopted by the method of the embodiment is min-max normalization, which is the linear transformation of the data after the missing value processing is completed, so that the result value is mapped between [0-1 ]. The transfer function is as follows:

wherein x is ^* For normalized data, x represents data after the missing value processing is completed, max is the maximum value of the data after the missing value processing is completed, and min is the minimum value of the data after the missing value processing is completed.

S2, air quality prediction:

aiming at the problem of how to distinguish the prediction degree of different auxiliary information on PM2.5, the embodiment provides an air quality prediction model-AI-DTN (Auxiliary Information-Deep Transition Network) based on the auxiliary information and a deep transition network.

The air quality prediction model AI-DTN consists of a front deep transition network, a back deep transition network and a full connection layer, wherein the front deep transition network and the back deep transition network are used for extracting space characteristics and time characteristics, then the results of the two transition networks are spliced together, and finally the full connection layer is used for outputting, and the model structure is shown in figure 2.

The construction of the air quality prediction model AI-DTN focuses on two deep transition networks, namely a positive deep transition network and a negative deep transition network, wherein the depth of each deep transition network is L. For a feed forward neural network, the network depth refers to the number of layers of the nonlinear layer between the input and output, while for a cyclic RNN network, the network depth refers to the number of layers of the nonlinear layer in one time step.

The first layer of the deep transition network is the gating loop AI-GRU, which is used to extract the spatial features of the input. The second layer to the L layer of the deep Transition network are composed of Transition gate control circulating units (T-GRU), wherein the T-GRU is an important component in the deep Transition network, and can extract information with deeper hidden states in the circulating neural network. the output of the L-th layer T-GRU at time T is the input of the first layer AI-GRU at time t+1.

AI-GRU and T-GRU are described in further detail below:

1. auxiliary information fused gating circulation unit AI-GRU

In order to extract deeper spatial features and achieve the purpose of distinguishing the importance degree of different auxiliary information on PM2.5 prediction, the embodiment provides an AI-GRU (Auxiliary Information-GRU) which is a gating circulation unit fused with the auxiliary information. AI-GRU is inspired by AGDT (reference: liang Y, meng F, zhang J, et al A non aspect-guided deep transition model for aspect based sentiment analysis [ J ]. ArXiv preprint arXiv:1909.00324, 2019.) and is a recurrent neural network unit that utilizes the gating characteristics of GRU and adds gating of auxiliary information based thereon. The AI-GRU inputs PM2.5 information, utilizes auxiliary information to carry out auxiliary prediction on air quality prediction, fuses the two information at the same time, and controls the input degree of the auxiliary information and the fusion degree of the PM2.5 information and the auxiliary information. The AI-GRU can dynamically adjust the weight of each auxiliary information through a gating mechanism, so that the importance degree of different auxiliary information on PM2.5 prediction is found.

The output structure of the AI-GRU is the same as the GRU, while the input structure is different, the AI-GRU increases the input of auxiliary information. AI-GRU incorporates x from the current time step _t 、a _t And the hidden state h of the last time _t-1 Obtaining the hidden state h of the current time step _t This hidden state contains information about all previous time steps. Hidden state h of AI-GRU _t I.e., the output of the AI-GRU. The structure of the AI-GRU is shown in FIG. 3.

For time step t, hidden state h of AI-GRU network _t The calculation formula of (2) is shown as formula (4).

Wherein +. _t Gating z by updating of current time step _t To conceal the state h from the last time step _t-1 And candidates for hidden state of current time stepInformation selection and combination are performed.

z _t The update gating is that the value range is (0, 1), the value is closer to 0, the more the historical information is discarded, the less the information is newly added in the current time step, the value is closer to 1, the less the information is newly added in the current time step, and the more the information is newly added in the current time step. Updating gating z _t The calculation formula of (2) is shown as formula (5):

z _t ＝σ(W _xz x _t +W _hz h _t-1 +W _az a _t ) (5)；

W _xz 、W _az 、W _hz respectively representing weights, and automatically obtaining new weights through a gradient descent method in deep learning in the training process;is a candidate for the hidden state of the current time step, simply to update the new hidden state. />PM2.5 information x of current time step is selectively controlled by gating mechanism _t Auxiliary information a _t Hidden state h of last time step _t-1 Adding into AI-GRU;

candidate value of hidden stateThe calculation formula of (2) is shown as formula (6):

r _t represents a reset gate, and the calculation formula is shown in formula (7). l (L) _t Representing the gating of the linear transformation, the calculation formula of which is shown in formula (8). g _t And the calculation formula of the gating representing the auxiliary information is shown in formula (9). P is p _t And the gating representing the merging degree of the auxiliary information and the PM2.5 information is shown in a formula (10). H (x) represents the linear transformation of PM2.5, and the calculation formula is shown in formula (11).Scaling data to [ -1, 1] by tanh activation function]Finally, the information after linear transformation is added to obtain +.>As a result of (a).

r _t ＝σ(W _xr x _t +W _hr h _t-1 ) (7)；

l _t ＝σ(W _xl x _t +W _hl h _t-1 ) (8)；

g _t ＝σ(W _ag a _t +W _hg h _t-1 ) (9)；

p _t ＝σ(W _ap a _t +W _hp h _t-1 ) (10)；

H(x _t )＝W _x x _t (11)；

In the above formula, W _xr 、W _hr 、W _x1 、W _h1 、W _ag 、W _hg 、W _ap 、W _hp 、W _x Respectively representing weights, and automatically obtaining new weights through a gradient descent method in deep learning in the training process; r is (r) _t Representing reset gating, substitutionControl of the history information is shown. At the position ofIn the calculation of (2), r _t And h _t-1 Performing element multiplication operation, h _t-1 All history information up to the last time step is contained, and r _t The value range of (0, 1) means that the value is closer to 0, the less the historical information representing the inflow of the AI-GRU is, the value is closer to 1, the more the historical information representing the inflow of the AI-GRU is, and the historical information irrelevant to prediction can be timely discarded.

g _t And p _t Is to a _t And h _t-1 And performing nonlinear transformation. Wherein g _t The function of (2) is to extract auxiliary information useful for PM2.5, which controls the extent to which the auxiliary information flows into the AI-GRU, and the range of values is (0, 1), the smaller the value is, the more the auxiliary information flowing into the AI-GRU is represented, the closer the value is to 1, and the more the auxiliary information flowing into the AI-GRU is represented. P is p _t The function of the system is to merge auxiliary information and PM2.5 information, the system controls the degree of the auxiliary information and PM2.5 information, the value range is (0, 1), the value is closer to 0, the degree of the auxiliary information and PM2.5 information fusion is represented to be smaller, the value is closer to 1, and the degree of the auxiliary information and PM2.5 information fusion is represented to be larger.

l _t The gating of the linear transformation H (x) controls the degree of PM2.5 information flowing into the AI-GRU after the linear transformation, the value range is (0, 1), the value is closer to 0, the less PM2.5 information flowing into the AI-GRU is represented, the value is closer to 1, and the more PM2.5 information flowing into the AI-GRU is represented. H (x) is a linear transformation of PM2.5 information that acts to focus AI-GRU on PM2.5 only, focusing more on PM2.5 information.

AI-GRU gating r by resetting _t Gating g of auxiliary information _t Gating p representing the degree of fusion of side information and PM2.5 information _t Linear transformation gating _t The influence degree of various auxiliary information on the air quality prediction is effectively controlled. At the same time, the gating mechanism also predicts the positive effectAuxiliary information, PM2.5 information and history information are selectively added to the AI-GRU, and various kinds of information irrelevant to prediction are timely discarded.

2. Transition gate control circulation unit

Transition-gated circulation units (T-GRU) (reference: pascanu R, gulcehre C, cho K, et al How toconstruct deep recurrent neural networks [ J)]arXiv preprint arXiv:1312.6026 2013.) is an important component of deep transition networks, and typically when the transition network depth is greater than 2, T-GRU is started to be used. The input of the T-GRU is just the hidden state of the upper layer of the same time stepThe hidden state of the current network layer of the same time step is output +.>The structure of the T-GRU is shown in FIG. 4.

Hidden state of T-GRUThe calculation formula of (2) is shown in formula (12).

Where, as indicated by the multiplication of the elements, i represents the depth of the current transition network. z _t The update gating is that the value range is (0, 1), the value is closer to 0, the more the discarded historical information is represented, the less the information newly added by the current network layer is, the value is closer to 1, the less the information representing the discarded previous network layer is, the more the information newly added by the current network layer is, and the calculation formula is represented as formula (13).

Is a candidate for the hidden state of the T-GRU by resetting the gate r _t To conceal the status of the previous network layer +.>And (4) performing data processing, wherein a calculation formula is shown in a formula (14).

r _t Representing a reset gate, representing control of the history information. r is (r) _t The value range of (0, 1) means that the value is closer to 0, the less the historical information representing inflow T-GRU is, the value is closer to 1, the more the historical information representing inflow T-GRU is, so that the historical information which is irrelevant to prediction can be cleared in time. The calculation formula is shown as formula (15).

Further, the detailed calculation process of the air quality prediction model AI-DTN is as follows:

the input of the model is divided into two parts, the first part being a PM2.5 time series representing a historical time window size q, denoted X _t ＝{x _t-q+1 ，...，x _t }，X _t Is a matrix of dimension 1*q, and the second part is a time sequence of side information representing a historical time window of size q, denoted as A _t ＝{a _t-q+1 ，…，a _t }，A _t Is a matrix with dimension of n x q, A _t Each a of (a) _t-q+1 ，...，a _t Are all a matrix of n 1, where n isThe number of features in the table auxiliary information.

Since the principle of the deep transition network is the same, the calculation process is described below by taking the forward deep transition network as an example. First X is taken up _t And A _t Input into AI-GRU to obtain hidden state of first layer of deep transition network

Wherein L represents the layer number of the deep transition network, and the hidden state is weighted and fused with PM2.5 information and auxiliary information to represent the spatial characteristic information at the moment t; then the hidden state is transferred to the T-GRU of the next layer of the time step, and the hidden state is as follows:

where i represents the current network depth, the T-GRU takes as input only the hidden state of the AI-GRU of the upper layer, and the hidden state of the T-GRU of the last layer will be passed as input to the AI-GRU of the next time step.

Likewise, the reverse deep transition network and the forward deep transition network have the same principle, and the reverse deep transition network is utilized to perform X-ray _t And A _t The two time sequences are subjected to reverse feature extraction to obtain hidden states representing reverse time sequence informationThen, splicing the hidden states of the forward and reverse deep transition networks together in time sequence:

wherein; representing a stitching operation. At this time E _t The method comprises the step of extracting spatial characteristic information and time characteristic information of a forward time sequence and a reverse time sequence through a deep transition network. Finally, E is _t Inputting the final prediction to a full connection layer to obtain a final output:

Y _t ＝W*E _t +b (19)；

where W is the parameter matrix and b is the bias term.

Various modifications and variations of the present invention will be apparent to those skilled in the art in light of the foregoing teachings and are intended to be included within the scope of the following claims.

Claims

1. The air quality prediction method based on the deep transition network is characterized by comprising the following specific processes of:

s1, acquiring air quality time sequence data and preprocessing;

the input of the model is divided into two parts, the first part being a PM2.5 time series representing a historical time window size q, denoted X _t ＝{x _t-q+1 ，...，x _t }，X _t Is a matrix of dimension 1*q, and the second part is a time sequence representing auxiliary information of historical time window size q, denoted as A _t ＝{a _t-q+1 ，...，a _t }，A _t Is a matrix with dimension of n x q, A _t Each a of (a) _t-q+1 ，…，a _t Are all a matrix of n 1, where n represents the number of features in the auxiliary information;

likewise, for X using a reverse deep transition network _t And A _t The two time sequences are subjected to reverse feature extraction to obtain hidden states representing reverse time sequence information

Then, splicing the hidden states of the forward and reverse deep transition networks together in time sequence:

wherein, the symbol; representing a splicing operation; at this time E _t The system comprises space characteristic information and time characteristic information which are extracted from a forward time sequence and a reverse time sequence through a deep transition network; finally, E is _t Inputting the final prediction to a full connection layer to obtain a final output:

Y _t ＝W*E _t +b；

where W is the parameter matrix and b is the bias term.

2. The method according to claim 1, wherein in step S1, the specific process of pretreatment is:

3. The method according to claim 1, wherein in step S2, for time step t, the hidden state h of the AI-GRU network _t The calculation formula of (2) is shown as follows:

wherein +. _t Gating z by updating of current time step _t To conceal the state h from the last time step _t-1 And candidates for hidden state of current time stepSelecting and combining information;

z _t the update gating is that the value range is (0, 1), the value is closer to 0, the more the historical information is discarded, the less the information is newly added in the current time step, the value is closer to 1, the less the information is newly added in the past time step, and the more the information is newly added in the current time step; updating gating z _t The calculation formula of (2) is shown as follows:

z _t ＝σ(W _xz x _t +W _hz h _t-1 +W _az a _t )；

W _xz 、W _hz 、W _az respectively, the weights are represented by the weights,is a candidate value of the hidden state of the current time step; />PM2.5 information x of current time step is selectively controlled by gating mechanism _t Auxiliary information a _t Hidden state h of last time step _t-1 Adding into AI-GRU; candidate value of hidden state +.>The calculation formula of (2) is shown as follows:

r _t representing a reset gate, l _t Gating representing linear transformation g _t Representing gating of auxiliary information, p _t Gating representing the degree of fusion of side information and PM2.5 information, H (x) _t ) Represented is a linear transformation of PM 2.5;scaling data to [ -1, 1] by tanh activation function]Finally, the information after linear transformation is added to obtain +.>Results of (2); r is (r) _t 、l _t 、g _t 、p _t 、H(x _t ) The calculation formula of (2) is as follows:

r _t ＝σ(W _xr x _t +W _hr h _t-1 ) (7)；

l _t ＝σ(W _xl x _t +W _hl h _t-1 ) (8)；

g _t ＝σ(W _ag a _t +W _hg h _t-1 ) (9)；

p _t ＝σ(W _ap a _t +W _hp h _t-1 ) (10)；

H(x _t )＝W _x x _t (11)；

g _t and p _t Is to a _t And h _t-1 Performing nonlinear transformation; wherein g _t The function of (2) is to extract auxiliary information useful for PM2.5, which controls the extent to which the auxiliary information flows into the AI-GRU, and the value range is (0, 1), the value is closer to 0, the smaller the auxiliary information flowing into the AI-GRU is represented, the value is closer to 1, and the more auxiliary information flowing into the AI-GRU is represented; p is p _t The function of (1) is to fuse the side information with PM2.5 information gating, which controls the degree of fusion of auxiliary information and PM2.5 information, wherein the value range is (0, 1), the value is closer to 0, the smaller the degree of fusion of inflow auxiliary information and PM2.5 information is, the closer to 1, and the larger the degree of fusion of auxiliary information and PM2.5 information is;

l _t is a linear transformation H (x _t ) The gating of PM2.5 information flowing into the AI-GRU after linear transformation is controlled, the value range is (0, 1), the value is closer to 0, the smaller the PM2.5 information flowing into the AI-GRU is, the value is closer to 1, and the more PM2.5 information flowing into the AI-GRU is represented; h (x) _t ) The linear transformation of PM2.5 information is used for enabling the AI-GRU to only pay attention to PM2.5, so that the AI-GRU is more focused on PM2.5 information;

AI-GRU gating r by resetting _t Gating g of auxiliary information _t Gating p representing the degree of fusion of side information and PM2.5 information _t Linear transformation gating _t The influence degree of various auxiliary information on the air quality prediction is effectively controlled; meanwhile, the gating mechanism selectively adds the auxiliary information, PM2.5 information and history information which are predicted to have a forward effect into the AI-GRU, and discards various information irrelevant to prediction in time.

4. The method of claim 1, wherein the hidden state of the T-GRUThe calculation formula of (2) is shown as follows:

wherein, as the element multiplication, i represents the depth of the current transition network;is update gating, the value range is (0, 1), the value is closer to 0, the more historical information is discarded, and the current network layer is used for updating the dataThe less the newly added information, the closer the value is to 1, the less the information representing the discarded previous network layer, and the more the newly added information of the current network layer, the calculation formula is shown as follows:

is a candidate for the hidden state of T-GRU by resetting the gate +.>To conceal the status of the previous network layer +.>And performing data processing, wherein the calculation formula is shown as follows:

a representative reset gate representing control of the history information; />The value range of (0, 1), which means that the value is closer to 0, the less the historical information representing inflow T-GRU is, the value is closer to 1, the more the historical information representing inflow T-GRU is, so that the historical information which is irrelevant to prediction can be cleared in time, and the calculation formula is shown as follows:

the T-GRU only receives the hidden states transmitted by the upper layer in the same time step, so that a special nonlinear relation between continuous hidden states can be learned, and further deeper state representation can be obtained.