CN113762351A - Air quality prediction method based on deep transition network - Google Patents

Air quality prediction method based on deep transition network Download PDF

Info

Publication number
CN113762351A
CN113762351A CN202110923976.6A CN202110923976A CN113762351A CN 113762351 A CN113762351 A CN 113762351A CN 202110923976 A CN202110923976 A CN 202110923976A CN 113762351 A CN113762351 A CN 113762351A
Authority
CN
China
Prior art keywords
information
gru
value
gating
auxiliary information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110923976.6A
Other languages
Chinese (zh)
Other versions
CN113762351B (en
Inventor
欧阳继红
杨智尧
王艺蒙
曲延非
李嘉寅
毕夏旭
王兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin University
Original Assignee
Jilin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin University filed Critical Jilin University
Priority to CN202110923976.6A priority Critical patent/CN113762351B/en
Publication of CN113762351A publication Critical patent/CN113762351A/en
Application granted granted Critical
Publication of CN113762351B publication Critical patent/CN113762351B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Development Economics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an air quality prediction method based on a deep transition network, which provides an air quality prediction model (AI-DTN) based on auxiliary information and the deep transition network in order to extract deep spatial characteristics and time characteristics of air quality data, wherein the AI-DTN comprises two transition networks in positive and negative different directions, and characteristic information is respectively extracted from the positive and negative time sequence directions so as to enhance the characteristic extraction degree. Each transition network in the AI-DTN extracts a gating cycle unit AI-GRU of fusion auxiliary information of spatial characteristics and an existing transition gating cycle unit T-GRU of extraction temporal characteristics. In the two gating modes of the AI-GRU, one controls the degree of auxiliary information flowing into a gating circulation unit, and the other controls the degree of fusion of PM2.5 and the auxiliary information, so that mutual interference in the information fusion process can be avoided.

Description

Air quality prediction method based on deep transition network
Technical Field
The invention relates to the technical field of data processing, in particular to an air quality prediction method based on a deep transition network.
Background
There are many factors affecting the air quality, such as pollutants like NO, CO, automobile exhaust, industrial emissions, and meteorological information like wind speed, wind direction, rainfall, which are collectively referred to as auxiliary information. However, there is a difficulty in using the auxiliary information to predict the air quality. Firstly, it is difficult to accurately acquire all of these information, and it is also difficult to obtain all real-time vehicle exhaust emission and industrial emission information for pollution information, and for weather information, because there is a certain deviation in the forecast information, the forecast information cannot be utilized, which causes error accumulation. Therefore, the current air quality prediction uses pollutant information such as NO and CO and past weather information in many cases. Second, complex changes can occur between contaminants, contaminants and meteorological information, and there is no way to fully model and cover all the changes. How to find each piece of auxiliary information has important significance for different roles of PM2.5 prediction, with the auxiliary information being well available.
Most air quality prediction models now use CNN for spatial feature extraction among features, RNN for temporal feature extraction, or use attention mechanism directly to extract spatial features, and then use a recurrent RNN network or LSTM network to extract temporal features. But these methods are not sufficient in their ability to extract potential features given the side information.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide an air quality prediction method based on a deep transition network so as to improve the accuracy of air quality prediction.
In order to achieve the purpose, the invention adopts the following technical scheme:
a deep transition network-based air quality prediction method comprises the following specific processes:
s1, acquiring air quality time sequence data and preprocessing the data;
s2, adopting an air quality prediction model AI-DTN based on auxiliary information and a deep transition network to predict the air quality:
the AI-DTN comprises a positive deep transition network and a negative deep transition network and a full connection layer, wherein the positive deep transition network and the negative deep transition network are used for extracting spatial characteristics and time characteristics, the results of the two transition networks are spliced together, and the full connection layer outputs the results;
the depth of each deep transition network is L; the first layer of the deep transition network is a gating cycle unit AI-GRU, and the AI-GRU is used for extracting the spatial characteristics of input; the second layer to the L layer of the deep transition network are composed of transition gating circulating units T-GRU, and the output of the L layer T-GRU at the time T is the input of the first layer AI-GRU at the time T + 1;
the detailed calculation process of the air quality prediction model AI-DTN is as follows:
the input to the model is divided into two parts, the first part being a PM2.5 time series representing a historical time window size q, denoted Xt={xt-q+1,...,xt},XtIs a matrix with dimension 1 x q, and the second part is a time sequence of auxiliary information representing a historical time window of size q, denoted At={at-q+1,…,at},AtIs a matrix of dimension n x q, AtEach of at-q+1,...,atAll are a matrix of n x 1, where n represents the number of features in the auxiliary information;
in the forward deep transition network, X is first puttAnd AtInputting the data into AI-GRU to obtain the hidden state of the first layer of the deep transition network
Figure BDA0003208489950000031
Figure BDA0003208489950000032
Wherein L represents the number of layers of the deep transition network, and the hidden state is weighted and fused with PM2.5 information and auxiliary information and represents spatial characteristic information at the time t; and then introducing the hidden state into the T-GRU of the next layer of the time step, wherein the hidden state is as follows:
Figure BDA0003208489950000033
wherein i represents the current network depth, the T-GRU only takes the hidden state of the AI-GRU of the upper layer as input, and the hidden state of the T-GRU of the last layer is taken as input to be transmitted to the AI-GRU of the next time step;
similarly, using reverse deep transition network, for XtAnd AtThe two time sequences are reversely extracted to obtain a hidden state representing the information of the time sequence in the reverse order
Figure BDA0003208489950000034
Then, splicing the hidden states of the forward and reverse deep transition networks together according to the time sequence:
Figure BDA0003208489950000035
wherein, the first and second connecting parts are connected with each other; representing a splicing operation; at this time, EtThe spatial feature information and the time feature information which are extracted through a deep transition network and comprise a forward time sequence and a reverse time sequence are included; finally, E istInputting the data into a full connection layer for final prediction to obtain final output:
Yt=W*Et+b;
where x represents the matrix multiplication, W is the parameter matrix, and b is the bias term.
Further, in step S1, the preprocessing specifically includes:
s1.1, missing value processing: processing missing values of the ordinal data of the original air quality based on a Lagrange interpolation method;
s1.2, normalization: and adopting a min-max standardized normalization method to perform linear transformation on the data subjected to missing value processing, so that the result value is mapped between [0 and 1 ].
Further, in step S2, for time step t, the hidden state h of the AI-GRU networktThe formula of (c) is shown as follows:
Figure BDA0003208489950000041
wherein [ ] represents a multiplication of elements, htUpdating gating z by current time steptHidden state h of last time stept-1And candidate value of hidden state of current time step
Figure BDA0003208489950000042
Selecting and combining information;
ztthe value range of the updated gate is (0, 1), the closer the value is to 0, the more the discarded historical information is represented, the less the newly-increased information of the current time step is, the closer the value is to 1, the less the discarded information of the past time step is represented, and the more the newly-increased information of the current time step is represented; updating the gating ztThe formula of (c) is shown as follows:
zt=σ(Wxzxt+Whzht-1+Wazat);
Wxz、Waz、Whzthe weights are represented individually by the weights,
Figure BDA0003208489950000043
is a candidate value for the hidden state at the current time step;
Figure BDA0003208489950000044
PM2.5 information x of current time step is selectively transmitted through a gating mechanismtAuxiliary information atAnd hidden state h of last time stept-1Adding into AI-GRU; candidate value of hidden state
Figure BDA0003208489950000045
The formula of (c) is shown as follows:
Figure BDA0003208489950000051
rtrepresents a reset gate,/tRepresenting gating of the linear transformation, gtGating, p, representing auxiliary informationtGating representing the degree of fusion of the side information and the PM2.5 information, h (x) representing a linear transformation of PM 2.5;
Figure BDA0003208489950000052
scaling data to [ -1, 1] by tanh activation function]Finally adding the information after the linear transformation to obtain
Figure BDA0003208489950000053
The result of (1); r ist、lt、gt、ptThe calculation formula of H (x) is as follows:
rt=σ(Wxrxt+Whrht-1) (7);
lt=σ(Wxlxt+Whlht-1) (8);
gt=σ(Wagat+Whght-1) (9);
pt=σ(Wapat+Whpht-1) (10);
H(xt)=Wxxt (11);
in the above formula, Wxr、Whr、Wxl、Whl、Wag、Whg、Wap、Whp、WxRespectively represent the weight, rtRepresenting reset gating, representing control over historical information; in that
Figure BDA0003208489950000054
In the calculation of (2), rtAnd ht-1Performing element multiplication operation, ht-1Contains all historical information up to the previous time step, and rtThe value range of (1, 0) indicates that the closer the value is to 0, the less history information flowing into the AI-GRU is represented, the closer the value is to 1, the more history information flowing into the AI-GRU is represented, and thus the history information irrelevant to prediction can be discarded in time;
gtand ptIs to atAnd ht-1A non-linear transformation is performed; wherein, gtThe function of the method is to extract auxiliary information which is useful for PM2.5, the auxiliary information controls the degree of the auxiliary information flowing into the AI-GRU, the value range is (0, 1), the closer the value is to 0, the less auxiliary information flows into the AI-GRU, the closer the value is to 1, the more auxiliary information flows into the AI-GRU; p is a radical oftThe gating control method has the advantages that the gating control for fusing the auxiliary information and the PM2.5 information controls the size of the fusion degree of the auxiliary information and the PM2.5 information, the value range is (0, 1), the closer the value is to 0, the smaller the fusion degree of the inflow auxiliary information and the PM2.5 information is, the closer the value is to 1, and the larger the fusion degree of the auxiliary information and the PM2.5 information is;
ltthe gate control is the gate control of linear transformation H (x), the degree of PM2.5 information flowing into the AI-GRU after linear transformation is controlled, the value range is (0, 1), the closer the value is to 0, the less PM2.5 information flowing into the AI-GRU is represented, the closer the value is to 1, the more PM2.5 information flowing into the AI-GRU is represented; h (x) is a linear transformation on PM2.5 information, which has the effect of letting AI-GRU focus only on PM2.5, and putting more attention on PM2.5 information;
AI-GRU by resetting gating rtGating g of auxiliary informationtGating representing the degree of fusion of ancillary information and PM2.5 informationptAnd linear transform gatingtThe influence degree of various auxiliary information on air quality prediction is effectively controlled; meanwhile, the gating mechanism selectively adds auxiliary information, PM2.5 information and historical information which are predicted to have a forward effect to the AI-GRU, and timely discards various types of information which are not predicted to be relevant.
Further, hidden state of T-GRU
Figure BDA0003208489950000061
The formula of (c) is shown as follows:
Figure BDA0003208489950000062
wherein, i represents the multiplication of elements, i represents the depth of the current transition network; z is a radical oftThe gating is updated, the value range of the gating is (0, 1), the closer the value is to 0, the more historical information representing discarding is, the less information is newly added by the current network layer, the closer the value is to 1, the less information representing discarding of the previous network layer is, the more information is newly added by the current network layer, and the calculation formula is shown as the following formula:
Figure BDA0003208489950000063
Figure BDA0003208489950000064
is a candidate for the hidden state of T-GRU by resetting gate rtTo the hidden state of the last network layer
Figure BDA0003208489950000071
And (3) carrying out data processing, wherein the calculation formula is shown as the following formula:
Figure BDA0003208489950000072
rtrepresents a resetA gate representing control over history information; r istThe value range of (1, 0) means that the closer the value is to 0, the less history information flowing into the T-GRU is represented, the closer the value is to 1, the more history information flowing into the T-GRU is represented, and thus, the history information irrelevant to prediction can be removed in time, and the calculation formula is shown as follows:
Figure BDA0003208489950000073
the T-GRU only receives the hidden state transmitted by the previous layer in the same time step, so that a special nonlinear relation between continuous hidden states can be learned, and further a deeper state representation can be obtained.
The invention has the beneficial effects that: in order to extract the deep spatial features and the time features of the air quality data, the invention provides an air quality prediction model (AI-DTN) based on auxiliary information and a deep transition network, which comprises two transition networks in positive and negative different directions, and respectively extracts feature information from the positive and negative time sequence directions so as to enhance the degree of feature extraction. Each transition network in the AI-DTN consists of a gating cycle unit AI-GRU for extracting fusion auxiliary information of spatial features and an existing transition gating cycle unit T-GRU for extracting temporal features. In two gating modes of the AI-GRU, one gating control mode controls the degree of auxiliary information flowing into a gating circulation unit, and the other gating control mode controls the degree of fusion of PM2.5 and the auxiliary information, so that mutual interference in the information fusion process can be avoided.
Drawings
FIG. 1 is a schematic technical route of a method according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of an air quality prediction model AI-DTN according to an embodiment of the present invention;
FIG. 3 is a block diagram of an AI-GRU according to an embodiment of the present invention;
FIG. 4 is a block diagram of a T-GRU in an embodiment of the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings, wherein the embodiments are based on the technical solution, and detailed embodiments and specific operation procedures are provided, but the protection scope of the present invention is not limited to the embodiments.
The embodiment provides an air quality prediction method based on a deep transition network, as shown in fig. 1, the specific process is as follows:
s1, acquiring air quality time sequence data and preprocessing the data, wherein the preprocessing process comprises the following steps:
s1.1, missing value processing:
the original air quality time series data contains a large amount of incomplete, inconsistent, abnormal and deviated data, and the accuracy of air quality prediction is influenced by the problem data. Therefore, data preprocessing is indispensable, and the common task is missing value processing of the data set.
Data missing value processing can be divided into two categories. One is to delete missing data and the other is to interpolate data. The greatest limitation of the former is that the completeness of data is achieved by replacing less historical data, which causes a great deal of waste of resources, and especially under the condition that the data set is small, the objectivity and accuracy of an analysis result can be directly influenced by deleting records. Therefore, the present embodiment performs missing value processing based on the lagrange interpolation method.
The definition of Lagrangian interpolation is as follows:
for a certain polynomial function, given k +1 value points, (x) are known0,u0),...(xk,yk) Wherein x isjCorresponding to the position of the argument, yiCorresponding to the value of the function at this location.
Suppose any two different xjAll are different from each other, then the lagrangian interpolation polynomial obtained by applying the lagrangian interpolation formula is:
Figure BDA0003208489950000091
wherein lj(x) Is LagrangeThe basic polynomial (or interpolation basis function) of the day is expressed as:
Figure BDA0003208489950000092
lagrange elementary polynomial lj(x) Is characterized in thatjUp to 1, at other points xiI ≠ j takes a value of 0.
S1.2, normalization
The data normalization processing is a basic work before the air quality model is established, different characteristics often have different dimensions and dimension units, the data analysis result is influenced under the condition, and in order to eliminate the dimension influence among the characteristics, the data normalization processing is required to solve the comparability among data indexes. After the raw data are subjected to data standardization processing, all characteristics are in the same order of magnitude, and the method is suitable for comprehensive comparison and evaluation. The normalization method adopted by the method of the embodiment is min-max normalization, which is a linear transformation of data after missing value processing is completed, and the result value is mapped between [0-1 ]. The transfer function is as follows:
Figure BDA0003208489950000093
wherein x*The normalized data is represented by x, max is the maximum value of the data after the missing value processing is completed, and min is the minimum value of the data after the missing value processing is completed.
S2, air quality prediction:
to solve the problem of how to distinguish the prediction degree of different Auxiliary Information on PM2.5, the present embodiment proposes an AI-DTN (adaptive Information-Deep Transition Network) that is an air quality prediction model based on Auxiliary Information and a Deep Transition Network.
The air quality prediction model AI-DTN is composed of a positive deep transition network and a negative deep transition network and a full connection layer, firstly, the positive deep transition network and the negative deep transition network are used for extracting spatial characteristics and time characteristics, then, the results of the two transition networks are spliced together, and finally, the results are output by the full connection layer, and the model structure is shown in figure 2.
The AI-DTN is mainly constructed by front and back deep transition networks, and the depth of each deep transition network is L. For a feed-forward neural network, the depth of the network refers to the number of nonlinear layers between the input and output, and for a recurrent RNN network, the depth of the network refers to the number of nonlinear layers in one time step.
The first layer of the deep transition network is the gated round robin unit AI-GRU, which is used to extract the spatial features of the input. The second layer to the L-th layer of the deep Transition network are composed of Transition gating circulation units (T-GRUs) which are important components in the deep Transition network and can extract information in deeper hidden states in the recurrent neural network. the output of the L-th layer T-GRU at time T is the input of the first layer AI-GRU at time T + 1.
AI-GRU and T-GRU will be described in further detail below:
1. auxiliary information fused gated cyclic unit AI-GRU
In order to extract a deeper spatial feature to achieve the purpose of distinguishing the importance degree of different Auxiliary Information for PM2.5 prediction, this embodiment proposes an AI-GRU (automatic Information-GRU) as a gating cycle unit fused with Auxiliary Information. The AI-GRU is inspired by AGDT (reference: Liang Y, Meng F, Zhang J, et al. A novel aspect-defined depth transformation model for aspect based sensory analysis [ J ]. arXiv preprinting arXiv: 1909.00324, 2019.), and is a recurrent neural network unit which utilizes the gating characteristics of GRU and adds auxiliary information gating on the basis of the gating characteristics. The AI-GRU not only inputs PM2.5 information, but also performs auxiliary prediction on air quality prediction by using auxiliary information, and simultaneously fuses the two kinds of information, and controls the input degree of the auxiliary information and the fusion degree of the PM2.5 information and the auxiliary information. The AI-GRU can dynamically adjust the weight of each auxiliary information through a gating mechanism, so as to find the importance degree of different auxiliary information on the PM2.5 prediction.
The output structure of the AI-GRU is the same as the GRU, while the input structure is different, the AI-GRU adding the input of auxiliary information. AI-GRU combines x from the current time stept、atAnd hidden state h of last timet-1Obtaining the hidden state h of the current time steptThis hidden state contains information about all previous time steps. Hidden state h of AI-GRUtI.e., the output of the AI-GRU. The structure of the AI-GRU is shown in FIG. 3.
Hidden state h of AI-GRU network for time step ttThe calculation formula of (2) is shown in formula (4).
Figure BDA0003208489950000111
Wherein [ ] represents a multiplication of elements, htUpdating gating z by current time steptHidden state h of last time stept-1And candidate value of hidden state of current time step
Figure BDA0003208489950000112
Information selection and combination is performed.
ztThe value range of the update gate is (0, 1), the closer the value is to 0, the more the discarded historical information is, the less the newly-increased information of the current time step is, the closer the value is to 1, the less the information of the discarded past time step is, and the more the newly-increased information of the current time step is. Updating the gating ztIs represented by equation (5):
zt=σ(Wxzxt+Whzht-1+Wazat) (5);
Wxz、Waz、Whzrespectively representing weights, and automatically obtaining new weights through a gradient descent method in deep learning in the training process;
Figure BDA0003208489950000121
is whenThe candidate value of the hidden state of the previous time step, which is just to update the new hidden state.
Figure BDA0003208489950000122
PM2.5 information x of current time step is selectively transmitted through a gating mechanismtAuxiliary information atAnd hidden state h of last time stept-1Adding into AI-GRU;
candidate value of hidden state
Figure BDA0003208489950000123
Is represented by equation (6):
Figure BDA0003208489950000124
rtrepresents a reset door, and the calculation formula is shown as formula (7). ltThe calculation formula of the gate control representing linear transformation is shown as formula (8). gtThe calculation formula of the gate representing the auxiliary information is shown as formula (9). p is a radical oftAnd (3) representing the gating of the fusion degree of the auxiliary information and the PM2.5 information, wherein the calculation formula is shown as the formula (10). H (x) represents the linear transformation of PM2.5, and the calculation formula is shown as the formula (11).
Figure BDA0003208489950000125
Scaling data to [ -1, 1] by tanh activation function]Finally adding the information after linear transformation to obtain
Figure BDA0003208489950000126
The result of (1).
rt=σ(Wxrxt+Whrht-1) (7);
lt=σ(Wxlxt+Whlht-1) (8);
gt=σ(Wagat+Whght-1) (9);
pt=σ(Wapat+Whpht-1) (10);
H(xt)=Wxxt (11);
In the above formula, Wxr、Whr、Wx1、Wh1、Wag、Whg、Wap、Whp、WxRespectively representing weights, and automatically obtaining new weights through a gradient descent method in deep learning in the training process; r istRepresenting reset gating, representing control over historical information. In that
Figure BDA0003208489950000127
In the calculation of (2), rtAnd ht-1Performing element multiplication operation, ht-1Contains all historical information up to the previous time step, and rtThe value range of (1, 0) indicates that the closer the value is to 0, the less history information flows into the AI-GRU, the closer the value is to 1, the more history information flows into the AI-GRU, and thus the history information irrelevant to prediction can be discarded in time.
gtAnd ptIs to atAnd ht-1A non-linear transformation is performed. Wherein, gtThe function of (2) is to extract useful auxiliary information for PM2.5, which controls the extent of the auxiliary information flowing into the AI-GRU, and the range of the auxiliary information is (0, 1), the closer the value is to 0, the less auxiliary information flows into the AI-GRU, the closer the value is to 1, the more auxiliary information flows into the AI-GRU. p is a radical oftThe method has the effect of fusing the auxiliary information and the PM2.5 information, and controls the size of the fusion degree of the auxiliary information and the PM2.5 information, wherein the value range is (0, 1), the closer the value is to 0, the smaller the fusion degree of the inflow auxiliary information and the PM2.5 information is represented, the closer the value is to 1, the larger the fusion degree of the auxiliary information and the PM2.5 information is represented.
ltIs the gate control of the linear transformation H (x), controls the degree of PM2.5 information flowing into the AI-GRU after the linear transformation, the value range is (0, 1), the closer the value is to 0, the less PM2.5 information flowing into the AI-GRU is represented, the closer the value is to 1, the more PM2.5 information flowing into the AI-GRU is representedMuch more. H (x) is a linear transformation on PM2.5 information, which has the effect of letting AI-GRU focus only on PM2.5, putting more attention on PM2.5 information.
AI-GRU by resetting gating rtGating g of auxiliary informationtGating p representing the degree of fusion of auxiliary information and PM2.5 informationtAnd linear transform gatingtThe influence degree of various auxiliary information on the air quality prediction is effectively controlled. Meanwhile, the gating mechanism selectively adds auxiliary information, PM2.5 information and historical information which are predicted to have a forward effect to the AI-GRU, and timely discards various types of information which are not predicted to be relevant.
2. Transitional gated cyclic unit
Transition gated cyclic units (Transition GRU, T-GRU) (ref: Pascanu R, Gulcehre C, Cho K, et al. How autocontrol deep regenerative networks [ J ]]arXiv preprint arXiv: 1312.6026, 2013.) is an important component of deep transition networks, and T-GRUs are typically started when the transition network depth is greater than 2. The input of the T-GRU is only the hidden state of the upper layer of the same time step
Figure BDA0003208489950000141
The output is the hidden state of the current network layer at the same time step
Figure BDA0003208489950000142
The structure of the T-GRU is shown in FIG. 4.
Hidden state of T-GRU
Figure BDA0003208489950000143
The formula (2) is shown in formula (12).
Figure BDA0003208489950000144
Wherein, i represents the element multiplication, and i represents the depth of the current transition network. z is a radical oftIs an update gate with a value range of (0, 1), the closer the value is to 0, representing discardingThe more the historical information, the less the information newly added by the current network layer, the closer the value is to 1, the less the information of the previous network layer which represents the discarding, the more the information newly added by the current network layer, and the calculation formula is shown as formula (13).
Figure BDA0003208489950000145
Figure BDA0003208489950000146
Is a candidate for the hidden state of T-GRU by resetting gate rtTo the hidden state of the last network layer
Figure BDA0003208489950000147
And (4) carrying out data processing, wherein the calculation formula is shown as a formula (14).
Figure BDA0003208489950000148
rtRepresenting a reset gate representing control over the history information. r istThe value range of (1, 0) means that the closer the value is to 0, the less history information flows into the T-GRU, the closer the value is to 1, the more history information flows into the T-GRU, and thus the history information irrelevant to prediction can be removed in time. The calculation formula is shown in formula (15).
Figure BDA0003208489950000151
The T-GRU only receives the hidden state transmitted by the previous layer in the same time step, so that a special nonlinear relation between continuous hidden states can be learned, and further a deeper state representation can be obtained.
Further, the detailed calculation process of the air quality prediction model AI-DTN is as follows:
the input of the model is divided into two parts, the first part isRepresents a PM2.5 time series of historical time window size q, denoted Xt={xt-q+1,...,xt},XtIs a matrix with dimension 1 x q, and the second part is a time sequence of auxiliary information representing a historical time window of size q, denoted At={at-q+1,…,at},AtIs a matrix of dimension n x q, AtEach of at-q+1,...,atAre a matrix of n x 1, where n represents the number of features in the side information.
Since the principle of the deep transition network is the same, the calculation process is described below by taking a forward deep transition network as an example. Firstly, X is firstlytAnd AtInputting the data into AI-GRU to obtain the hidden state of the first layer of the deep transition network
Figure BDA0003208489950000152
Figure BDA0003208489950000153
Wherein L represents the number of layers of the deep transition network, and the hidden state is weighted and fused with PM2.5 information and auxiliary information and represents spatial characteristic information at the time t; and then the hidden state is transmitted into the T-GRU of the next layer of the time step, wherein the hidden state is as the formula:
Figure BDA0003208489950000154
wherein i represents the current network depth, the T-GRU only takes the hidden state of the AI-GRU of the previous layer as input, and the hidden state of the T-GRU of the last layer is taken as input to be transmitted to the AI-GRU of the next time step.
Similarly, the reverse deep transition network and the forward deep transition network have the same principle, and the reverse deep transition network is used for XtAnd AtTwo time sequences are subjected to reverse feature extraction, and time representing reverse order can be obtainedHidden state of sequence information
Figure BDA0003208489950000161
Then, the hidden states of the forward and reverse deep transition networks are spliced together according to the time sequence:
Figure BDA0003208489950000162
wherein, the first and second connecting parts are connected with each other; representing a splicing operation. At this time, EtThe spatial feature information and the temporal feature information extracted by the deep transition network of the forward time sequence and the reverse time sequence are included. Finally, E istInputting the data into a full connection layer for final prediction to obtain final output:
Yt=W*Et+b (19);
where x represents the matrix multiplication, W is the parameter matrix, and b is the bias term.
Various corresponding changes and modifications can be made by those skilled in the art according to the above technical solutions and concepts, and all such changes and modifications should be included in the scope of the present invention as claimed.

Claims (4)

1. A deep transition network-based air quality prediction method is characterized by comprising the following specific processes:
s1, acquiring air quality time sequence data and preprocessing the data;
s2, adopting an air quality prediction model AI-DTN based on auxiliary information and a deep transition network to predict the air quality:
the air quality prediction model AI-DTN consists of a positive deep transition network and a negative deep transition network and a full connection layer, wherein the positive deep transition network and the negative deep transition network are used for extracting spatial characteristics and temporal characteristics, the results of the two transition networks are spliced together, and the full connection layer outputs the results;
the depth of each deep transition network is L; the first layer of the deep transition network is a gating cycle unit AI-GRU, and the AI-GRU is used for extracting the spatial characteristics of input; the second layer to the L layer of the deep transition network are composed of transition gating circulating units T-GRU, and the output of the L layer T-GRU at the time T is the input of the first layer AI-GRU at the time T + 1;
the detailed calculation process of the air quality prediction model AI-DTN is as follows:
the input to the model is divided into two parts, the first part being a PM2.5 time series representing a historical time window size q, denoted Xt={xt-q+1,...,xt},XtIs a matrix with dimension 1 x q, and the second part is a time sequence of auxiliary information representing historical time window size q, denoted as At={at-q+1,...,at},AtIs a matrix of dimension n x q, AtEach of at-q+1,...,atAll are a matrix of n x 1, where n represents the number of features in the auxiliary information;
in the forward deep transition network, X is first puttAnd AtInputting the data into AI-GRU to obtain the hidden state of the first layer of the deep transition network
Figure FDA0003208489940000011
Figure FDA0003208489940000024
Wherein L represents the number of layers of the deep transition network, and the hidden state is weighted and fused with PM2.5 information and auxiliary information and represents spatial characteristic information at the time t; and then transferring the hidden state into the T-GRU of the next layer of the time step, wherein the hidden state is as follows:
Figure FDA0003208489940000021
wherein i represents the current network depth, the T-GRU only takes the hidden state of the AI-GRU of the upper layer as input, and the hidden state of the T-GRU of the last layer is taken as input to be transmitted to the AI-GRU of the next time step;
similarly, using reverse deep transition network, for XtAnd AtThe two time sequences are subjected to reverse feature extraction, and a hidden state representing the information of the reverse time sequence can be obtained
Figure FDA0003208489940000022
Then, the hidden states of the forward and reverse deep transition networks are spliced together according to the time sequence:
Figure FDA0003208489940000023
wherein, the first and second connecting parts are connected with each other; representing a splicing operation; at this time, EtThe spatial characteristic information and the time characteristic information which are extracted by a deep transition network and comprise a forward time sequence and a reverse time sequence; finally, E istInputting the data into a full connection layer for final prediction to obtain final output:
Yt=W*Et+b;
where x represents the matrix multiplication, W is the parameter matrix, and b is the bias term.
2. The method according to claim 1, wherein in step S1, the preprocessing includes:
s1.1, missing value processing: performing missing value processing on the original air quality time sequence data based on a Lagrange interpolation method;
s1.2, normalization: and adopting a min-max standardized normalization method to perform linear transformation on the data subjected to missing value processing, so that the result value is mapped between [0 and 1 ].
3. The method according to claim 1, wherein in step S2, for time step t, the hidden state h of the AI-GRU networktThe formula of (c) is shown as follows:
Figure FDA0003208489940000031
wherein [ ] represents a multiplication of elements, htHidden state h to last time step by updating gate zt of current time stept-1And candidate value of hidden state of current time step
Figure FDA0003208489940000034
Selecting and combining information;
ztthe value range of the updated gate is (0, 1), the closer the value is to 0, the more the discarded historical information is represented, the less the newly-increased information of the current time step is, the closer the value is to 1, the less the discarded information of the past time step is represented, and the more the newly-increased information of the current time step is represented; updating gating ztThe formula of (c) is shown as follows:
zt=σ(Wxzxt+Whzht-1+Wazat);
Wxz、Waz、Whzthe weights are represented individually by the weights,
Figure FDA0003208489940000035
is a candidate value for the hidden state at the current time step;
Figure FDA0003208489940000036
PM2.5 information x of current time step is selectively transmitted through a gating mechanismtAuxiliary information atAnd hidden state h of last time stept-1Adding into AI-GRU; candidate value of hidden state
Figure FDA0003208489940000037
The formula of (c) is shown as follows:
Figure FDA0003208489940000032
rtrepresents a reset gate,/tRepresenting gating of the linear transformation, gtGating, p, representing auxiliary informationtGating representing the degree of fusion of the side information and the PM2.5 information, h (x) representing a linear transformation of PM 2.5;
Figure FDA0003208489940000033
scaling data to [ -1, 1] by tanh activation function]Finally adding the information after linear transformation to obtain
Figure FDA0003208489940000041
The result of (1); r ist、lt、gt、ptThe calculation formula of H (x) is as follows:
rt=σ(Wxrxt+Whrht-1) (7);
lt=σ(Wxlxt+Whlht-1) (8);
gt=σ(Wagat+Whght-1) (9);
pt=σ(Wapat+Whpht-1) (10);
H(xt)=Wxxt (11);
in the above formula, Wxr、Whr、Wxl、Wh1、Wag、Whg、Wap、Whp、WxRespectively represent the weight, rtRepresenting reset gating, representing control over historical information; in that
Figure FDA0003208489940000042
In the calculation of (2), rtAnd ht-1Performing element multiplication operation, ht-1Contains all historical information up to the previous time step, and rtHas a value range of (0, 1), which means that the value is over-connectedNearly 0, the less the historical information which represents the inflow of AI-GRU, the closer the value is to 1, and the more the historical information which represents the inflow of AI-GRU, thus the historical information which is irrelevant to prediction can be discarded in time;
gtand ptIs to atAnd ht-1A non-linear transformation is performed; wherein, gtThe function of the method is to extract auxiliary information which is useful for PM2.5, the auxiliary information controls the degree of the auxiliary information flowing into the AI-GRU, the value range is (0, 1), the closer the value is to 0, the less auxiliary information flows into the AI-GRU, the closer the value is to 1, the more auxiliary information flows into the AI-GRU; p is a radical oftThe gating control method has the advantages that the gating control for fusing the auxiliary information and the PM2.5 information controls the size of the fusion degree of the auxiliary information and the PM2.5 information, the value range is (0, 1), the closer the value is to 0, the smaller the fusion degree of the inflow auxiliary information and the PM2.5 information is, the closer the value is to 1, and the larger the fusion degree of the auxiliary information and the PM2.5 information is;
ltthe gate control is the gate control of linear transformation H (x), the degree of PM2.5 information flowing into the AI-GRU after linear transformation is controlled, the value range is (0, 1), the closer the value is to 0, the less PM2.5 information flowing into the AI-GRU is represented, the closer the value is to 1, the more PM2.5 information flowing into the AI-GRU is represented; h (x) is a linear transformation on PM2.5 information, which has the effect of letting AI-GRU focus only on PM2.5, and putting more attention on PM2.5 information;
AI-GRU by resetting gating rtGating g of auxiliary informationtGating p representing the degree of fusion of auxiliary information and PM2.5 informationtAnd linear transform gatingtThe influence degree of various auxiliary information on air quality prediction is effectively controlled; meanwhile, the gating mechanism selectively adds auxiliary information, PM2.5 information and historical information which are predicted to have a forward effect to the AI-GRU, and timely discards various types of information which are not related to prediction.
4. The method of claim 1, wherein the hidden state of a T-GRU
Figure FDA0003208489940000056
The formula of (c) is shown as follows:
Figure FDA0003208489940000051
wherein, i represents the multiplication of elements, i represents the depth of the current transition network; z is a radical oftThe value range of the update gate is (0, 1), the closer the value is to 0, the more the discarded historical information is represented, the less the information newly added by the current network layer is, the closer the value is to 1, the less the information of the previous network layer is represented to be discarded, the more the information newly added by the current network layer is represented, and the calculation formula is shown as the following formula:
Figure FDA0003208489940000052
Figure FDA0003208489940000053
is a candidate for the hidden state of T-GRU by resetting gate rtTo the hidden state of the last network layer
Figure FDA0003208489940000054
And (3) carrying out data processing, wherein the calculation formula is shown as the following formula:
Figure FDA0003208489940000055
rta representative reset gate representing control of the history information; r istThe value range of (1, 0) means that the closer the value is to 0, the less history information flowing into the T-GRU is represented, the closer the value is to 1, the more history information flowing into the T-GRU is represented, and thus, the history information irrelevant to prediction can be removed in time, and the calculation formula is shown as the following formula:
Figure FDA0003208489940000061
the T-GRU only receives the hidden state transmitted by the previous layer in the same time step, so that the special nonlinear relation between the continuous hidden states can be learned, and further deeper state representation can be obtained.
CN202110923976.6A 2021-08-12 2021-08-12 Air quality prediction method based on deep transition network Active CN113762351B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110923976.6A CN113762351B (en) 2021-08-12 2021-08-12 Air quality prediction method based on deep transition network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110923976.6A CN113762351B (en) 2021-08-12 2021-08-12 Air quality prediction method based on deep transition network

Publications (2)

Publication Number Publication Date
CN113762351A true CN113762351A (en) 2021-12-07
CN113762351B CN113762351B (en) 2023-12-05

Family

ID=78789092

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110923976.6A Active CN113762351B (en) 2021-08-12 2021-08-12 Air quality prediction method based on deep transition network

Country Status (1)

Country Link
CN (1) CN113762351B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103714261A (en) * 2014-01-14 2014-04-09 吉林大学 Intelligent auxiliary medical treatment decision supporting method of two-stage mixed model
US20180336884A1 (en) * 2017-05-19 2018-11-22 Baidu Usa Llc Cold fusing sequence-to-sequence models with language models
CN111275168A (en) * 2020-01-17 2020-06-12 南京信息工程大学 Air quality prediction method of bidirectional gating circulation unit based on convolution full connection
CN112085163A (en) * 2020-08-26 2020-12-15 哈尔滨工程大学 Air quality prediction method based on attention enhancement graph convolutional neural network AGC and gated cyclic unit GRU
CN113095550A (en) * 2021-03-26 2021-07-09 北京工业大学 Air quality prediction method based on variational recursive network and self-attention mechanism

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103714261A (en) * 2014-01-14 2014-04-09 吉林大学 Intelligent auxiliary medical treatment decision supporting method of two-stage mixed model
US20180336884A1 (en) * 2017-05-19 2018-11-22 Baidu Usa Llc Cold fusing sequence-to-sequence models with language models
CN111275168A (en) * 2020-01-17 2020-06-12 南京信息工程大学 Air quality prediction method of bidirectional gating circulation unit based on convolution full connection
CN112085163A (en) * 2020-08-26 2020-12-15 哈尔滨工程大学 Air quality prediction method based on attention enhancement graph convolutional neural network AGC and gated cyclic unit GRU
CN113095550A (en) * 2021-03-26 2021-07-09 北京工业大学 Air quality prediction method based on variational recursive network and self-attention mechanism

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
从筱卿: "基于深度学习的混合股指预测模型研究", CNKI *
林靖皓;秦亮曦;苏永秀;秦川;: "基于自注意力机制的双向门控循环单元和卷积神经网络的芒果产量预测", 计算机应用, no. 1 *
牛哲文;余泽远;李波;唐文虎;: "基于深度门控循环单元神经网络的短期风功率预测模型", 电力自动化设备, no. 05 *

Also Published As

Publication number Publication date
CN113762351B (en) 2023-12-05

Similar Documents

Publication Publication Date Title
CN108448610B (en) Short-term wind power prediction method based on deep learning
CN112801355B (en) Data prediction method based on multi-graph fusion space-time attention of long-short-term space-time data
CN112116080A (en) CNN-GRU water quality prediction method integrated with attention mechanism
CN104063586B (en) Bayesian network failure prediction method based on polymorphic fault tree
US20230334981A1 (en) Traffic flow forecasting method based on multi-mode dynamic residual graph convolution network
CN107704432A (en) A kind of adaptive Interactive Multiple-Model method for tracking target of transition probability
CN114802296A (en) Vehicle track prediction method based on dynamic interaction graph convolution
CN113094860B (en) Industrial control network flow modeling method based on attention mechanism
CN111626764A (en) Commodity sales volume prediction method and device based on Transformer + LSTM neural network model
CN113112819B (en) Improved LSTM-based graph convolution traffic speed prediction method
CN115951014A (en) CNN-LSTM-BP multi-mode air pollutant prediction method combining meteorological features
CN114707698A (en) Long-term power load prediction method based on comprehensive shape error and time error
CN116168548A (en) Traffic flow prediction method of space-time attention pattern convolution network based on multi-feature fusion
CN110866631A (en) Method for predicting atmospheric pollution condition based on integrated gate recursion unit neural network GRU
CN115545350B (en) Vehicle path problem solving method integrating deep neural network and reinforcement learning
CN115545321A (en) On-line prediction method for process quality of silk making workshop
CN113112791A (en) Traffic flow prediction method based on sliding window long-and-short term memory network
CN108710964A (en) A kind of prediction technique of Fuzzy time sequence aquaculture water quality environmental data
CN112735541A (en) Sewage treatment water quality prediction method based on simple circulation unit neural network
CN115376103A (en) Pedestrian trajectory prediction method based on space-time diagram attention network
CN111612175A (en) Waste mobile phone intelligent pricing method based on fuzzy transfer learning
CN114298270A (en) Pollutant concentration prediction method fusing domain knowledge and related equipment thereof
CN113762351A (en) Air quality prediction method based on deep transition network
CN113112792A (en) Multi-module traffic intensity prediction method based on semantic information
CN110850837A (en) System life analysis and fault diagnosis method based on long-time and short-time memory neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant