CN113762351A

CN113762351A - Air quality prediction method based on deep transition network

Info

Publication number: CN113762351A
Application number: CN202110923976.6A
Authority: CN
Inventors: 欧阳继红; 杨智尧; 王艺蒙; 曲延非; 李嘉寅; 毕夏旭; 王兵
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2021-12-07
Anticipated expiration: 2041-08-12
Also published as: CN113762351B

Abstract

The invention discloses an air quality prediction method based on a deep transition network, which provides an air quality prediction model (AI-DTN) based on auxiliary information and the deep transition network in order to extract deep spatial characteristics and time characteristics of air quality data, wherein the AI-DTN comprises two transition networks in positive and negative different directions, and characteristic information is respectively extracted from the positive and negative time sequence directions so as to enhance the characteristic extraction degree. Each transition network in the AI-DTN extracts a gating cycle unit AI-GRU of fusion auxiliary information of spatial characteristics and an existing transition gating cycle unit T-GRU of extraction temporal characteristics. In the two gating modes of the AI-GRU, one controls the degree of auxiliary information flowing into a gating circulation unit, and the other controls the degree of fusion of PM2.5 and the auxiliary information, so that mutual interference in the information fusion process can be avoided.

Description

Air quality prediction method based on deep transition network

Technical Field

The invention relates to the technical field of data processing, in particular to an air quality prediction method based on a deep transition network.

Background

There are many factors affecting the air quality, such as pollutants like NO, CO, automobile exhaust, industrial emissions, and meteorological information like wind speed, wind direction, rainfall, which are collectively referred to as auxiliary information. However, there is a difficulty in using the auxiliary information to predict the air quality. Firstly, it is difficult to accurately acquire all of these information, and it is also difficult to obtain all real-time vehicle exhaust emission and industrial emission information for pollution information, and for weather information, because there is a certain deviation in the forecast information, the forecast information cannot be utilized, which causes error accumulation. Therefore, the current air quality prediction uses pollutant information such as NO and CO and past weather information in many cases. Second, complex changes can occur between contaminants, contaminants and meteorological information, and there is no way to fully model and cover all the changes. How to find each piece of auxiliary information has important significance for different roles of PM2.5 prediction, with the auxiliary information being well available.

Most air quality prediction models now use CNN for spatial feature extraction among features, RNN for temporal feature extraction, or use attention mechanism directly to extract spatial features, and then use a recurrent RNN network or LSTM network to extract temporal features. But these methods are not sufficient in their ability to extract potential features given the side information.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide an air quality prediction method based on a deep transition network so as to improve the accuracy of air quality prediction.

In order to achieve the purpose, the invention adopts the following technical scheme:

a deep transition network-based air quality prediction method comprises the following specific processes:

s1, acquiring air quality time sequence data and preprocessing the data;

s2, adopting an air quality prediction model AI-DTN based on auxiliary information and a deep transition network to predict the air quality:

the AI-DTN comprises a positive deep transition network and a negative deep transition network and a full connection layer, wherein the positive deep transition network and the negative deep transition network are used for extracting spatial characteristics and time characteristics, the results of the two transition networks are spliced together, and the full connection layer outputs the results;

the depth of each deep transition network is L; the first layer of the deep transition network is a gating cycle unit AI-GRU, and the AI-GRU is used for extracting the spatial characteristics of input; the second layer to the L layer of the deep transition network are composed of transition gating circulating units T-GRU, and the output of the L layer T-GRU at the time T is the input of the first layer AI-GRU at the time T + 1;

the detailed calculation process of the air quality prediction model AI-DTN is as follows:

the input to the model is divided into two parts, the first part being a PM2.5 time series representing a historical time window size q, denoted X_t＝{x_t-q+1，...，x_t}，X_tIs a matrix with dimension 1 x q, and the second part is a time sequence of auxiliary information representing a historical time window of size q, denoted A_t＝{a_t-q+1，…，a_t}，A_tIs a matrix of dimension n x q, A_tEach of a_t-q+1，...，a_tAll are a matrix of n x 1, where n represents the number of features in the auxiliary information;

in the forward deep transition network, X is first put_tAnd A_tInputting the data into AI-GRU to obtain the hidden state of the first layer of the deep transition network

Wherein L represents the number of layers of the deep transition network, and the hidden state is weighted and fused with PM2.5 information and auxiliary information and represents spatial characteristic information at the time t; and then introducing the hidden state into the T-GRU of the next layer of the time step, wherein the hidden state is as follows:

wherein i represents the current network depth, the T-GRU only takes the hidden state of the AI-GRU of the upper layer as input, and the hidden state of the T-GRU of the last layer is taken as input to be transmitted to the AI-GRU of the next time step;

similarly, using reverse deep transition network, for X_tAnd A_tThe two time sequences are reversely extracted to obtain a hidden state representing the information of the time sequence in the reverse order

Then, splicing the hidden states of the forward and reverse deep transition networks together according to the time sequence:

wherein, the first and second connecting parts are connected with each other; representing a splicing operation; at this time, E_tThe spatial feature information and the time feature information which are extracted through a deep transition network and comprise a forward time sequence and a reverse time sequence are included; finally, E is_tInputting the data into a full connection layer for final prediction to obtain final output:

Y_t＝W*E_t+b；

where x represents the matrix multiplication, W is the parameter matrix, and b is the bias term.

Further, in step S1, the preprocessing specifically includes:

s1.1, missing value processing: processing missing values of the ordinal data of the original air quality based on a Lagrange interpolation method;

s1.2, normalization: and adopting a min-max standardized normalization method to perform linear transformation on the data subjected to missing value processing, so that the result value is mapped between [0 and 1 ].

Further, in step S2, for time step t, the hidden state h of the AI-GRU network_tThe formula of (c) is shown as follows:

wherein [ ] represents a multiplication of elements, h_tUpdating gating z by current time step_tHidden state h of last time step_t-1And candidate value of hidden state of current time step

Selecting and combining information;

z_tthe value range of the updated gate is (0, 1), the closer the value is to 0, the more the discarded historical information is represented, the less the newly-increased information of the current time step is, the closer the value is to 1, the less the discarded information of the past time step is represented, and the more the newly-increased information of the current time step is represented; updating the gating z_tThe formula of (c) is shown as follows:

z_t＝σ(W_xzx_t+W_hzh_t-1+W_aza_t)；

W_xz、W_az、W_hzthe weights are represented individually by the weights,

is a candidate value for the hidden state at the current time step;

PM2.5 information x of current time step is selectively transmitted through a gating mechanism_tAuxiliary information a_tAnd hidden state h of last time step_t-1Adding into AI-GRU; candidate value of hidden state

The formula of (c) is shown as follows:

r_trepresents a reset gate,/_tRepresenting gating of the linear transformation, g_tGating, p, representing auxiliary information_tGating representing the degree of fusion of the side information and the PM2.5 information, h (x) representing a linear transformation of PM 2.5;

scaling data to [ -1, 1] by tanh activation function]Finally adding the information after the linear transformation to obtain

The result of (1); r is_t、l_t、g_t、p_tThe calculation formula of H (x) is as follows:

r_t＝σ(W_xrx_t+W_hrh_t-1) (7)；

l_t＝σ(W_xlx_t+W_hlh_t-1) (8)；

g_t＝σ(W_aga_t+W_hgh_t-1) (9)；

p_t＝σ(W_apa_t+W_hph_t-1) (10)；

H(x_t)＝W_xx_t (11)；

in the above formula, W_xr、W_hr、W_xl、W_hl、W_ag、W_hg、W_ap、W_hp、W_xRespectively represent the weight, r_tRepresenting reset gating, representing control over historical information; in that

In the calculation of (2), r_tAnd h_t-1Performing element multiplication operation, h_t-1Contains all historical information up to the previous time step, and r_tThe value range of (1, 0) indicates that the closer the value is to 0, the less history information flowing into the AI-GRU is represented, the closer the value is to 1, the more history information flowing into the AI-GRU is represented, and thus the history information irrelevant to prediction can be discarded in time;

g_tand p_tIs to a_tAnd h_t-1A non-linear transformation is performed; wherein, g_tThe function of the method is to extract auxiliary information which is useful for PM2.5, the auxiliary information controls the degree of the auxiliary information flowing into the AI-GRU, the value range is (0, 1), the closer the value is to 0, the less auxiliary information flows into the AI-GRU, the closer the value is to 1, the more auxiliary information flows into the AI-GRU; p is a radical of_tThe gating control method has the advantages that the gating control for fusing the auxiliary information and the PM2.5 information controls the size of the fusion degree of the auxiliary information and the PM2.5 information, the value range is (0, 1), the closer the value is to 0, the smaller the fusion degree of the inflow auxiliary information and the PM2.5 information is, the closer the value is to 1, and the larger the fusion degree of the auxiliary information and the PM2.5 information is;

l_tthe gate control is the gate control of linear transformation H (x), the degree of PM2.5 information flowing into the AI-GRU after linear transformation is controlled, the value range is (0, 1), the closer the value is to 0, the less PM2.5 information flowing into the AI-GRU is represented, the closer the value is to 1, the more PM2.5 information flowing into the AI-GRU is represented; h (x) is a linear transformation on PM2.5 information, which has the effect of letting AI-GRU focus only on PM2.5, and putting more attention on PM2.5 information;

AI-GRU by resetting gating r_tGating g of auxiliary information_tGating representing the degree of fusion of ancillary information and PM2.5 informationp_tAnd linear transform gating_tThe influence degree of various auxiliary information on air quality prediction is effectively controlled; meanwhile, the gating mechanism selectively adds auxiliary information, PM2.5 information and historical information which are predicted to have a forward effect to the AI-GRU, and timely discards various types of information which are not predicted to be relevant.

Further, hidden state of T-GRU

The formula of (c) is shown as follows:

wherein, i represents the multiplication of elements, i represents the depth of the current transition network; z is a radical of_tThe gating is updated, the value range of the gating is (0, 1), the closer the value is to 0, the more historical information representing discarding is, the less information is newly added by the current network layer, the closer the value is to 1, the less information representing discarding of the previous network layer is, the more information is newly added by the current network layer, and the calculation formula is shown as the following formula:

is a candidate for the hidden state of T-GRU by resetting gate r_tTo the hidden state of the last network layer

And (3) carrying out data processing, wherein the calculation formula is shown as the following formula:

r_trepresents a resetA gate representing control over history information; r is_tThe value range of (1, 0) means that the closer the value is to 0, the less history information flowing into the T-GRU is represented, the closer the value is to 1, the more history information flowing into the T-GRU is represented, and thus, the history information irrelevant to prediction can be removed in time, and the calculation formula is shown as follows:

the T-GRU only receives the hidden state transmitted by the previous layer in the same time step, so that a special nonlinear relation between continuous hidden states can be learned, and further a deeper state representation can be obtained.

The invention has the beneficial effects that: in order to extract the deep spatial features and the time features of the air quality data, the invention provides an air quality prediction model (AI-DTN) based on auxiliary information and a deep transition network, which comprises two transition networks in positive and negative different directions, and respectively extracts feature information from the positive and negative time sequence directions so as to enhance the degree of feature extraction. Each transition network in the AI-DTN consists of a gating cycle unit AI-GRU for extracting fusion auxiliary information of spatial features and an existing transition gating cycle unit T-GRU for extracting temporal features. In two gating modes of the AI-GRU, one gating control mode controls the degree of auxiliary information flowing into a gating circulation unit, and the other gating control mode controls the degree of fusion of PM2.5 and the auxiliary information, so that mutual interference in the information fusion process can be avoided.

Drawings

FIG. 1 is a schematic technical route of a method according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of an air quality prediction model AI-DTN according to an embodiment of the present invention;

FIG. 3 is a block diagram of an AI-GRU according to an embodiment of the present invention;

FIG. 4 is a block diagram of a T-GRU in an embodiment of the present invention.

Detailed Description

The present invention will be further described with reference to the accompanying drawings, wherein the embodiments are based on the technical solution, and detailed embodiments and specific operation procedures are provided, but the protection scope of the present invention is not limited to the embodiments.

The embodiment provides an air quality prediction method based on a deep transition network, as shown in fig. 1, the specific process is as follows:

s1, acquiring air quality time sequence data and preprocessing the data, wherein the preprocessing process comprises the following steps:

s1.1, missing value processing:

the original air quality time series data contains a large amount of incomplete, inconsistent, abnormal and deviated data, and the accuracy of air quality prediction is influenced by the problem data. Therefore, data preprocessing is indispensable, and the common task is missing value processing of the data set.

Data missing value processing can be divided into two categories. One is to delete missing data and the other is to interpolate data. The greatest limitation of the former is that the completeness of data is achieved by replacing less historical data, which causes a great deal of waste of resources, and especially under the condition that the data set is small, the objectivity and accuracy of an analysis result can be directly influenced by deleting records. Therefore, the present embodiment performs missing value processing based on the lagrange interpolation method.

The definition of Lagrangian interpolation is as follows:

for a certain polynomial function, given k +1 value points, (x) are known₀，u₀)，...(x_k，y_k) Wherein x is_jCorresponding to the position of the argument, y_iCorresponding to the value of the function at this location.

Suppose any two different x_jAll are different from each other, then the lagrangian interpolation polynomial obtained by applying the lagrangian interpolation formula is:

wherein l_j(x) Is LagrangeThe basic polynomial (or interpolation basis function) of the day is expressed as:

lagrange elementary polynomial l_j(x) Is characterized in that_jUp to 1, at other points x_iI ≠ j takes a value of 0.

S1.2, normalization

The data normalization processing is a basic work before the air quality model is established, different characteristics often have different dimensions and dimension units, the data analysis result is influenced under the condition, and in order to eliminate the dimension influence among the characteristics, the data normalization processing is required to solve the comparability among data indexes. After the raw data are subjected to data standardization processing, all characteristics are in the same order of magnitude, and the method is suitable for comprehensive comparison and evaluation. The normalization method adopted by the method of the embodiment is min-max normalization, which is a linear transformation of data after missing value processing is completed, and the result value is mapped between [0-1 ]. The transfer function is as follows:

wherein x^*The normalized data is represented by x, max is the maximum value of the data after the missing value processing is completed, and min is the minimum value of the data after the missing value processing is completed.

S2, air quality prediction:

to solve the problem of how to distinguish the prediction degree of different Auxiliary Information on PM2.5, the present embodiment proposes an AI-DTN (adaptive Information-Deep Transition Network) that is an air quality prediction model based on Auxiliary Information and a Deep Transition Network.

The air quality prediction model AI-DTN is composed of a positive deep transition network and a negative deep transition network and a full connection layer, firstly, the positive deep transition network and the negative deep transition network are used for extracting spatial characteristics and time characteristics, then, the results of the two transition networks are spliced together, and finally, the results are output by the full connection layer, and the model structure is shown in figure 2.

The AI-DTN is mainly constructed by front and back deep transition networks, and the depth of each deep transition network is L. For a feed-forward neural network, the depth of the network refers to the number of nonlinear layers between the input and output, and for a recurrent RNN network, the depth of the network refers to the number of nonlinear layers in one time step.

The first layer of the deep transition network is the gated round robin unit AI-GRU, which is used to extract the spatial features of the input. The second layer to the L-th layer of the deep Transition network are composed of Transition gating circulation units (T-GRUs) which are important components in the deep Transition network and can extract information in deeper hidden states in the recurrent neural network. the output of the L-th layer T-GRU at time T is the input of the first layer AI-GRU at time T + 1.

AI-GRU and T-GRU will be described in further detail below:

1. auxiliary information fused gated cyclic unit AI-GRU

In order to extract a deeper spatial feature to achieve the purpose of distinguishing the importance degree of different Auxiliary Information for PM2.5 prediction, this embodiment proposes an AI-GRU (automatic Information-GRU) as a gating cycle unit fused with Auxiliary Information. The AI-GRU is inspired by AGDT (reference: Liang Y, Meng F, Zhang J, et al. A novel aspect-defined depth transformation model for aspect based sensory analysis [ J ]. arXiv preprinting arXiv: 1909.00324, 2019.), and is a recurrent neural network unit which utilizes the gating characteristics of GRU and adds auxiliary information gating on the basis of the gating characteristics. The AI-GRU not only inputs PM2.5 information, but also performs auxiliary prediction on air quality prediction by using auxiliary information, and simultaneously fuses the two kinds of information, and controls the input degree of the auxiliary information and the fusion degree of the PM2.5 information and the auxiliary information. The AI-GRU can dynamically adjust the weight of each auxiliary information through a gating mechanism, so as to find the importance degree of different auxiliary information on the PM2.5 prediction.

The output structure of the AI-GRU is the same as the GRU, while the input structure is different, the AI-GRU adding the input of auxiliary information. AI-GRU combines x from the current time step_t、a_tAnd hidden state h of last time_t-1Obtaining the hidden state h of the current time step_tThis hidden state contains information about all previous time steps. Hidden state h of AI-GRU_tI.e., the output of the AI-GRU. The structure of the AI-GRU is shown in FIG. 3.

Hidden state h of AI-GRU network for time step t_tThe calculation formula of (2) is shown in formula (4).

Information selection and combination is performed.

z_tThe value range of the update gate is (0, 1), the closer the value is to 0, the more the discarded historical information is, the less the newly-increased information of the current time step is, the closer the value is to 1, the less the information of the discarded past time step is, and the more the newly-increased information of the current time step is. Updating the gating z_tIs represented by equation (5):

z_t＝σ(W_xzx_t+W_hzh_t-1+W_aza_t) (5)；

W_xz、W_az、W_hzrespectively representing weights, and automatically obtaining new weights through a gradient descent method in deep learning in the training process;

is whenThe candidate value of the hidden state of the previous time step, which is just to update the new hidden state.

PM2.5 information x of current time step is selectively transmitted through a gating mechanism_tAuxiliary information a_tAnd hidden state h of last time step_t-1Adding into AI-GRU;

candidate value of hidden state

Is represented by equation (6):

r_trepresents a reset door, and the calculation formula is shown as formula (7). l_tThe calculation formula of the gate control representing linear transformation is shown as formula (8). g_tThe calculation formula of the gate representing the auxiliary information is shown as formula (9). p is a radical of_tAnd (3) representing the gating of the fusion degree of the auxiliary information and the PM2.5 information, wherein the calculation formula is shown as the formula (10). H (x) represents the linear transformation of PM2.5, and the calculation formula is shown as the formula (11).

Scaling data to [ -1, 1] by tanh activation function]Finally adding the information after linear transformation to obtain

The result of (1).

r_t＝σ(W_xrx_t+W_hrh_t-1) (7)；

l_t＝σ(W_xlx_t+W_hlh_t-1) (8)；

g_t＝σ(W_aga_t+W_hgh_t-1) (9)；

p_t＝σ(W_apa_t+W_hph_t-1) (10)；

H(x_t)＝W_xx_t (11)；

In the above formula, W_xr、W_hr、W_x1、W_h1、W_ag、W_hg、W_ap、W_hp、W_xRespectively representing weights, and automatically obtaining new weights through a gradient descent method in deep learning in the training process; r is_tRepresenting reset gating, representing control over historical information. In that

In the calculation of (2), r_tAnd h_t-1Performing element multiplication operation, h_t-1Contains all historical information up to the previous time step, and r_tThe value range of (1, 0) indicates that the closer the value is to 0, the less history information flows into the AI-GRU, the closer the value is to 1, the more history information flows into the AI-GRU, and thus the history information irrelevant to prediction can be discarded in time.

g_tAnd p_tIs to a_tAnd h_t-1A non-linear transformation is performed. Wherein, g_tThe function of (2) is to extract useful auxiliary information for PM2.5, which controls the extent of the auxiliary information flowing into the AI-GRU, and the range of the auxiliary information is (0, 1), the closer the value is to 0, the less auxiliary information flows into the AI-GRU, the closer the value is to 1, the more auxiliary information flows into the AI-GRU. p is a radical of_tThe method has the effect of fusing the auxiliary information and the PM2.5 information, and controls the size of the fusion degree of the auxiliary information and the PM2.5 information, wherein the value range is (0, 1), the closer the value is to 0, the smaller the fusion degree of the inflow auxiliary information and the PM2.5 information is represented, the closer the value is to 1, the larger the fusion degree of the auxiliary information and the PM2.5 information is represented.

l_tIs the gate control of the linear transformation H (x), controls the degree of PM2.5 information flowing into the AI-GRU after the linear transformation, the value range is (0, 1), the closer the value is to 0, the less PM2.5 information flowing into the AI-GRU is represented, the closer the value is to 1, the more PM2.5 information flowing into the AI-GRU is representedMuch more. H (x) is a linear transformation on PM2.5 information, which has the effect of letting AI-GRU focus only on PM2.5, putting more attention on PM2.5 information.

AI-GRU by resetting gating r_tGating g of auxiliary information_tGating p representing the degree of fusion of auxiliary information and PM2.5 information_tAnd linear transform gating_tThe influence degree of various auxiliary information on the air quality prediction is effectively controlled. Meanwhile, the gating mechanism selectively adds auxiliary information, PM2.5 information and historical information which are predicted to have a forward effect to the AI-GRU, and timely discards various types of information which are not predicted to be relevant.

2. Transitional gated cyclic unit

Transition gated cyclic units (Transition GRU, T-GRU) (ref: Pascanu R, Gulcehre C, Cho K, et al. How autocontrol deep regenerative networks [ J ]]arXiv preprint arXiv: 1312.6026, 2013.) is an important component of deep transition networks, and T-GRUs are typically started when the transition network depth is greater than 2. The input of the T-GRU is only the hidden state of the upper layer of the same time step

The output is the hidden state of the current network layer at the same time step

The structure of the T-GRU is shown in FIG. 4.

Hidden state of T-GRU

The formula (2) is shown in formula (12).

Wherein, i represents the element multiplication, and i represents the depth of the current transition network. z is a radical of_tIs an update gate with a value range of (0, 1), the closer the value is to 0, representing discardingThe more the historical information, the less the information newly added by the current network layer, the closer the value is to 1, the less the information of the previous network layer which represents the discarding, the more the information newly added by the current network layer, and the calculation formula is shown as formula (13).

And (4) carrying out data processing, wherein the calculation formula is shown as a formula (14).

r_tRepresenting a reset gate representing control over the history information. r is_tThe value range of (1, 0) means that the closer the value is to 0, the less history information flows into the T-GRU, the closer the value is to 1, the more history information flows into the T-GRU, and thus the history information irrelevant to prediction can be removed in time. The calculation formula is shown in formula (15).

Further, the detailed calculation process of the air quality prediction model AI-DTN is as follows:

the input of the model is divided into two parts, the first part isRepresents a PM2.5 time series of historical time window size q, denoted X_t＝{x_t-q+1，...，x_t}，X_tIs a matrix with dimension 1 x q, and the second part is a time sequence of auxiliary information representing a historical time window of size q, denoted A_t＝{a_t-q+1，…，a_t}，A_tIs a matrix of dimension n x q, A_tEach of a_t-q+1，...，a_tAre a matrix of n x 1, where n represents the number of features in the side information.

Since the principle of the deep transition network is the same, the calculation process is described below by taking a forward deep transition network as an example. Firstly, X is firstly_tAnd A_tInputting the data into AI-GRU to obtain the hidden state of the first layer of the deep transition network

Wherein L represents the number of layers of the deep transition network, and the hidden state is weighted and fused with PM2.5 information and auxiliary information and represents spatial characteristic information at the time t; and then the hidden state is transmitted into the T-GRU of the next layer of the time step, wherein the hidden state is as the formula:

wherein i represents the current network depth, the T-GRU only takes the hidden state of the AI-GRU of the previous layer as input, and the hidden state of the T-GRU of the last layer is taken as input to be transmitted to the AI-GRU of the next time step.

Similarly, the reverse deep transition network and the forward deep transition network have the same principle, and the reverse deep transition network is used for X_tAnd A_tTwo time sequences are subjected to reverse feature extraction, and time representing reverse order can be obtainedHidden state of sequence information

Then, the hidden states of the forward and reverse deep transition networks are spliced together according to the time sequence:

wherein, the first and second connecting parts are connected with each other; representing a splicing operation. At this time, E_tThe spatial feature information and the temporal feature information extracted by the deep transition network of the forward time sequence and the reverse time sequence are included. Finally, E is_tInputting the data into a full connection layer for final prediction to obtain final output:

Y_t＝W*E_t+b (19)；

Various corresponding changes and modifications can be made by those skilled in the art according to the above technical solutions and concepts, and all such changes and modifications should be included in the scope of the present invention as claimed.

Claims

1. A deep transition network-based air quality prediction method is characterized by comprising the following specific processes:

s1, acquiring air quality time sequence data and preprocessing the data;

the air quality prediction model AI-DTN consists of a positive deep transition network and a negative deep transition network and a full connection layer, wherein the positive deep transition network and the negative deep transition network are used for extracting spatial characteristics and temporal characteristics, the results of the two transition networks are spliced together, and the full connection layer outputs the results;

the input to the model is divided into two parts, the first part being a PM2.5 time series representing a historical time window size q, denoted X_t＝{x_t-q+1，...，x_t}，X_tIs a matrix with dimension 1 x q, and the second part is a time sequence of auxiliary information representing historical time window size q, denoted as A_t＝{a_t-q+1，...，a_t}，A_tIs a matrix of dimension n x q, A_tEach of a_t-q+1，...，a_tAll are a matrix of n x 1, where n represents the number of features in the auxiliary information;

Wherein L represents the number of layers of the deep transition network, and the hidden state is weighted and fused with PM2.5 information and auxiliary information and represents spatial characteristic information at the time t; and then transferring the hidden state into the T-GRU of the next layer of the time step, wherein the hidden state is as follows:

similarly, using reverse deep transition network, for X_tAnd A_tThe two time sequences are subjected to reverse feature extraction, and a hidden state representing the information of the reverse time sequence can be obtained

wherein, the first and second connecting parts are connected with each other; representing a splicing operation; at this time, E_tThe spatial characteristic information and the time characteristic information which are extracted by a deep transition network and comprise a forward time sequence and a reverse time sequence; finally, E is_tInputting the data into a full connection layer for final prediction to obtain final output:

Y_t＝W*E_t+b；

2. The method according to claim 1, wherein in step S1, the preprocessing includes:

s1.1, missing value processing: performing missing value processing on the original air quality time sequence data based on a Lagrange interpolation method;

3. The method according to claim 1, wherein in step S2, for time step t, the hidden state h of the AI-GRU network_tThe formula of (c) is shown as follows:

wherein [ ] represents a multiplication of elements, h_tHidden state h to last time step by updating gate zt of current time step_t-1And candidate value of hidden state of current time step

Selecting and combining information;

z_tthe value range of the updated gate is (0, 1), the closer the value is to 0, the more the discarded historical information is represented, the less the newly-increased information of the current time step is, the closer the value is to 1, the less the discarded information of the past time step is represented, and the more the newly-increased information of the current time step is represented; updating gating z_tThe formula of (c) is shown as follows:

z_t＝σ(W_xzx_t+W_hzh_t-1+W_aza_t)；

W_xz、W_az、W_hzthe weights are represented individually by the weights,

is a candidate value for the hidden state at the current time step;

The formula of (c) is shown as follows:

r_t＝σ(W_xrx_t+W_hrh_t-1) (7)；

l_t＝σ(W_xlx_t+W_hlh_t-1) (8)；

g_t＝σ(W_aga_t+W_hgh_t-1) (9)；

p_t＝σ(W_apa_t+W_hph_t-1) (10)；

H(x_t)＝W_xx_t (11)；

in the above formula, W_xr、W_hr、W_xl、W_h1、W_ag、W_hg、W_ap、W_hp、W_xRespectively represent the weight, r_tRepresenting reset gating, representing control over historical information; in that

In the calculation of (2), r_tAnd h_t-1Performing element multiplication operation, h_t-1Contains all historical information up to the previous time step, and r_tHas a value range of (0, 1), which means that the value is over-connectedNearly 0, the less the historical information which represents the inflow of AI-GRU, the closer the value is to 1, and the more the historical information which represents the inflow of AI-GRU, thus the historical information which is irrelevant to prediction can be discarded in time;

AI-GRU by resetting gating r_tGating g of auxiliary information_tGating p representing the degree of fusion of auxiliary information and PM2.5 information_tAnd linear transform gating_tThe influence degree of various auxiliary information on air quality prediction is effectively controlled; meanwhile, the gating mechanism selectively adds auxiliary information, PM2.5 information and historical information which are predicted to have a forward effect to the AI-GRU, and timely discards various types of information which are not related to prediction.

4. The method of claim 1, wherein the hidden state of a T-GRU

The formula of (c) is shown as follows:

wherein, i represents the multiplication of elements, i represents the depth of the current transition network; z is a radical of_tThe value range of the update gate is (0, 1), the closer the value is to 0, the more the discarded historical information is represented, the less the information newly added by the current network layer is, the closer the value is to 1, the less the information of the previous network layer is represented to be discarded, the more the information newly added by the current network layer is represented, and the calculation formula is shown as the following formula:

r_ta representative reset gate representing control of the history information; r is_tThe value range of (1, 0) means that the closer the value is to 0, the less history information flowing into the T-GRU is represented, the closer the value is to 1, the more history information flowing into the T-GRU is represented, and thus, the history information irrelevant to prediction can be removed in time, and the calculation formula is shown as the following formula:

the T-GRU only receives the hidden state transmitted by the previous layer in the same time step, so that the special nonlinear relation between the continuous hidden states can be learned, and further deeper state representation can be obtained.