CN116431999A - PM2.5 concentration prediction method based on self-adaptive principal component analysis and neural network - Google Patents

PM2.5 concentration prediction method based on self-adaptive principal component analysis and neural network Download PDF

Info

Publication number
CN116431999A
CN116431999A CN202310695985.3A CN202310695985A CN116431999A CN 116431999 A CN116431999 A CN 116431999A CN 202310695985 A CN202310695985 A CN 202310695985A CN 116431999 A CN116431999 A CN 116431999A
Authority
CN
China
Prior art keywords
prediction
variable
concentration
neural network
principal component
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310695985.3A
Other languages
Chinese (zh)
Inventor
姚婷
李振莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University of Technology
Original Assignee
Hunan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University of Technology filed Critical Hunan University of Technology
Priority to CN202310695985.3A priority Critical patent/CN116431999A/en
Publication of CN116431999A publication Critical patent/CN116431999A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Abstract

The invention provides a PM2.5 concentration prediction method based on self-adaptive principal component analysis and a neural network, which comprises the following steps: acquiring influence factor historical data of PM2.5 concentration and constructing a prediction variable total sample; extracting a predictive factor for each type of influence factors by utilizing self-adaptive principal component analysis, and identifying the influence factors which are helpful for predicting the average concentration of PM2.5 days; and inputting the influence factors with reduced dimensionality into a neural network mixing data model, and predicting PM2.5 daily average concentration through the original mixing data. The invention utilizes sPCA to accurately extract useful information, and reduces the dimension of an input layer; the model can directly input original mixing data in an artificial neural network by using the ANN-U-MIDAS to predict PM2.5 daily average concentration, so that the defect of preprocessing the same frequency of the data is avoided, a nonlinear influence mode among variables can be accurately identified in mixing data analysis, and the fitting effect and the prediction capability of the model are improved.

Description

PM2.5 concentration prediction method based on self-adaptive principal component analysis and neural network
Technical Field
The invention relates to the technical field of air quality prediction, in particular to a PM2.5 concentration prediction method based on self-adaptive principal component analysis and a neural network.
Background
Along with the economic development and the acceleration of the urban process, the increase of the discharge amount of various pollutants has great influence on the atmospheric environment, and the air pollution problem has become one of the problems of the current society. PM2.5, which is the main pollutant in the atmosphere, has negative effects on crowd health, travel, environmental quality, etc., and is often used to assess air quality due to its long transmission distance and long residence time. Therefore, the method accurately predicts the average PM2.5 daily concentration and plays a vital role in pollution prediction, respiratory disease prevention and atmospheric environmental management.
The current prediction methods for average concentration of PM2.5 days mainly comprise two types: one is a traditional statistical method, and the other is a neural network method of deep learning. In the early stage, the method for predicting the average concentration of PM2.5 days mainly adopts the traditional statistical methods such as gray model, linear regression, principal component regression and the like. Although the above conventional models have been applied to the prediction of pollutants, most of the models are designed for linear data, and have good prediction effect on the linear data, and the change of the average PM2.5 daily concentration has the characteristics of nonlinearity, mutation and the like, which is a relatively complex nonlinear system, so that a large error is generated when the conventional linear statistical method is used for predicting the PM2.5 concentration. In recent years, as the neural network is more and more prominent in the field of nonlinear problems, the neural network is increasingly started to be used as a prediction model of the concentration of pollutants. However, the initial parameter setting of a single prediction model has a great influence on the prediction accuracy, and in order to improve the prediction accuracy of the model, a hybrid model, that is, a model combined with other methods according to a single neural network model, is increasingly proposed. Although the mixed model has good prediction effect, the parameters of the model need to be determined through repeated experiments, which is time-consuming and labor-consuming, and the determined parameters are not necessarily optimal.
Disclosure of Invention
The purpose of the invention is that: aiming at the defects in the background art, the PM2.5 concentration prediction method is provided, useful information can be accurately extracted through self-adaptive principal component analysis (sPCA), variables which are conducive to predicting the average concentration of PM2.5 days can be identified, and the dimension of an input layer is reduced; the neural network frequency mixing data model (ANN-U-MIDAS) can reduce the computational complexity, avoid the defect of data same-frequency preprocessing, and accurately identify nonlinear influence modes among variables in frequency mixing data analysis so as to improve the fitting effect and the prediction capability of the model.
In order to achieve the above object, the present invention provides a PM2.5 concentration prediction method based on adaptive principal component analysis and neural network, comprising:
s1, acquiring influence factor historical data of PM2.5 concentration and constructing a prediction variable total sample;
s2, extracting predictive factors for each type of influence factors by utilizing self-adaptive principal component analysis, and identifying influence factors which are helpful for predicting PM2.5 daily average concentration and comprise
S21, forming a group of scaled prediction variables, wherein the scaling factors are the slopes of the prediction regression on the standardized prediction variables;
s22, extracting a diffusion index from the scaled predicted variable as a predicted factor of PM2.5 daily average concentration;
s3, inputting the dimensionality-reduced influence factors into a neural network mixing data model, and predicting PM2.5 daily average concentration through original mixing data, wherein the method comprises the following steps of
S31, performing frequency alignment on each predicted variable so as to have the same frequency as the output variable;
s32, multiplying all frequency alignment vectors entering the hidden layer by the weight of the hidden layer, adding the bias of the hidden layer, and obtaining the calculation result of each hidden layer node under the action of a sigmoid transfer function;
s33, calculating results of all hidden layer nodes entering the input layer
Figure SMS_1
Bring in the output layer, calculate the result of all hidden layer nodes going into the input layer +.>
Figure SMS_2
And output layer weight->
Figure SMS_3
Multiplication, plus output layer bias->
Figure SMS_4
Transfer function at output layer->
Figure SMS_5
Under the action of the (2) and (5) the final output result of the average concentration of PM2.5 days is obtained:
Figure SMS_6
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_7
is the output layer weight vector,/>
Figure SMS_8
Is the output layer deviation, & lt + & gt>
Figure SMS_9
Is the output layer transfer function.
Further, the influencing factors in S1 include SO 2 、NO 2 、CO、O 3 The total production value of the people-average area, the total industrial enterprises of the people-average area and the number of motor vehicles are calculated according to the pollution gas, the highest temperature, the precipitation amount, the lowest temperature and the wind power.
Further, a set of scaled prediction variables is formed in S21
Figure SMS_10
Scaling factor->
Figure SMS_11
The slope of predictive regression is carried out on the ith standardized predictive variable, the variable with strong predictive force is extracted to predict the average PM2.5 day concentration;
Figure SMS_12
where N is the total number of potential predicted variables,
Figure SMS_13
represents regression of the average PM2.5 day concentration from time t to time t+1, ++>
Figure SMS_14
Is the i-th predicted variable of time t, < ->
Figure SMS_15
Is an error value with an average value equal to zero, +.>
Figure SMS_16
An intercept term representing the i-th predicted variable.
Further, S22 is from
Figure SMS_17
The diffusion index is extracted as a PM2.5 daily average concentration predictive factor, and the form of the extracted diffusion index is as follows:
Figure SMS_18
wherein the method comprises the steps of
Figure SMS_19
Is a K-order vector representing the diffusion index of the adaptive principal component analysis, K is defined by modified +.>
Figure SMS_20
It is determined that the number of the cells,
Figure SMS_21
representing the goodness of fit, measured as how well the predicted value fits to the true value, +.>
Figure SMS_22
Is the K-dimensional parameter that needs to be estimated,
Figure SMS_23
is a heterogeneous noise term.
Further, performing predictive regression on the hysteresis value of each predictive variable by scaling the coefficients
Figure SMS_24
Estimating the predictive power, and having strong predictive power with large scaling factor, weak predictive power with small scaling factor, and strong predictive powerThe predicted variables are used as input values for the neural network mixed data model.
Further, predictive variables with high predictive power include SO 2 、NO 2 The total number of industrial enterprises, the number of motor vehicles in CO, wind power and people-average areas.
Further, each of the predicted variables is frequency-aligned in S31 to obtain a frequency identical to the output variable
Figure SMS_25
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_26
is the high frequency original input variable,/->
Figure SMS_27
Is a low frequency target output variable,/->
Figure SMS_28
Representation of
Figure SMS_29
And->
Figure SMS_30
Frequency mismatch between->
Figure SMS_31
Further, all frequency alignment vectors entering the hidden layer are aligned with the hidden layer weights in S32
Figure SMS_32
Multiplying together with hidden layer bias +.>
Figure SMS_33
In sigmoid transfer function +.>
Figure SMS_34
Under the action of (a) to obtain the calculation result of each hidden layer node +.>
Figure SMS_35
Figure SMS_36
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_37
is the weight vector of the hidden layer,/>
Figure SMS_38
Is the bias vector of the hidden layer, ">
Figure SMS_39
Is the high frequency prediction horizon related to the high frequency variable, < > hi->
Figure SMS_40
Representing the sigmoid transfer function using the hyperbolic tangent function.
The scheme of the invention has the following beneficial effects:
according to the PM2.5 concentration prediction method based on the self-adaptive principal component analysis and the neural network, disclosed by the invention, common factors are extracted from each type of influence factors by utilizing the self-adaptive principal component analysis (sPCA), the influence factors which are conducive to predicting the average concentration of PM2.5 days are identified, useful information is accurately extracted, and the dimension of an input layer is reduced; the PM2.5 daily average concentration is predicted by using a neural network mixing data model (ANN-U-MIDAS), the model can directly input original mixing data in an artificial neural network, the defect of data same-frequency pretreatment is avoided, a nonlinear influence mode among variables can be accurately identified in mixing data analysis, and the fitting effect and the prediction capability of the model are improved.
Other advantageous effects of the present invention will be described in detail in the detailed description section which follows.
Drawings
FIG. 1 is a flow chart of the steps of the present invention.
Detailed Description
Other advantages and effects of the present disclosure will become readily apparent to those skilled in the art from the following disclosure, which describes embodiments of the present disclosure by way of specific examples. It will be apparent that the described embodiments are merely some, but not all embodiments of the present disclosure. The disclosure may be embodied or practiced in other different specific embodiments, and details within the subject specification may be modified or changed from various points of view and applications without departing from the spirit of the disclosure. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict. All other embodiments, which can be made by one of ordinary skill in the art without inventive effort, based on the embodiments in this disclosure are intended to be within the scope of this disclosure.
It is noted that various aspects of the embodiments are described below within the scope of the following claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the present disclosure, one skilled in the art will appreciate that one aspect described herein may be implemented independently of any other aspect, and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. In addition, such apparatus may be implemented and/or such methods practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.
It should also be noted that the illustrations provided in the following embodiments merely illustrate the basic concepts of the disclosure by way of illustration, and only the components related to the disclosure are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated. In addition, in the following description, specific details are provided in order to provide a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.
As shown in FIG. 1, an embodiment of the present invention provides a PM2.5 concentration prediction method based on adaptive principal component analysis and neural network, which uses high frequency solar data SO 2 、NO 2 、CO、O 3 And other polluted gases such as the highest temperature, the precipitation amount, the lowest temperature and the wind power, and predicting PM2.5 daily average concentration change by using low-frequency month data average production total value, average industrial enterprise total number and motor vehicle number.
The PM2.5 daily average concentration is predicted by considering influence factors of various sources and various aspects as comprehensively and systematically as possible, so that the fitting and prediction accuracy of the output variable can be improved, and the interpretation capability of the analysis result can be enhanced. However, introducing numerous influencing factors into the model creates corresponding high-dimensional mixing data analysis problems. If all the influencing factors are reserved in the model, a plurality of redundant variables and redundant characteristics exist, multiple collinearity problems are caused, the estimation load of the model is increased, the estimation result is biased, and therefore the generalization capability and the prediction precision of the model are reduced. In this case, the high-dimensional mixing variable needs to be subjected to a dimension reduction process. Principal Component Analysis (PCA) is the most widely used method of dimension reduction. It converts a large number of variables into orthogonal components so that the original data can be replaced by a small number of principal components. While principal component analysis is useful in reducing a large number of prediction variables to a corresponding number of combinations, one recognized disadvantage is that it ignores the prediction target information altogether.
Based on this, the present embodiment adopts adaptive principal component analysis (sPCA), i.e., scaling the predictive power of the target to be predicted according to each prediction variable, replaying more weight on the prediction variables that are more important to the predicted target. In contrast, principal Component Analysis (PCA) gives equal weight to all predicted variables. Principal Component Analysis (PCA) can sum up a large amount of predicted variable information into a few variables, filtering out specific noise, but ignoring the predicted target, and is an unsupervised learning technique. If one predicted variable is noisier than the other predicted variable, it inevitably disproportionately affects the weights of the variables. sPCA accurately corrects this defect by adding less weight to the noisy predicted variable.
In this embodiment, therefore, sPCA not only filters out most of the special noise contained in a single predicted variable, but also extracts the most significant synergy from all potential variables. Before the diffusion index is constructed, by evaluating each predicted variable by its ability to predict the PM2.5 daily average concentration, a variable that contributes to predicting the PM2.5 daily average concentration can be identified.
Specifically, in this embodiment, after obtaining historical data of influencing factors and constructing a total sample of predicted variables, sPCA extracts diffusion indexes in two steps:
forming a set of scaled prediction variables
Figure SMS_41
Scaling factor->
Figure SMS_42
The slope of predictive regression is carried out on the ith standardized predictive variable, the variable with strong predictive force is extracted to predict the average PM2.5 day concentration;
Figure SMS_43
where N is the total number of potential predicted variables,
Figure SMS_44
represents regression of the average PM2.5 day concentration from time t to time t+1, ++>
Figure SMS_45
Is the i-th predicted variable of time t, < ->
Figure SMS_46
Is the error value whose mean is equal to zero.
From proportional technical index panel
Figure SMS_47
sPCA diffusion index is extracted from the extract as new PM2.5 daily average concentrationA predictor. The form of the extracted PCA diffusion index is as follows:
Figure SMS_48
wherein the method comprises the steps of
Figure SMS_49
Is the K-order vector->
Figure SMS_50
Represents the diffusion index of sPCA, K is defined by modified +.>
Figure SMS_51
Determination of->
Figure SMS_52
Is the K-dimensional parameter to be estimated, +.>
Figure SMS_53
Is a heterogeneous noise term.
Predictive regression of hysteresis values of each predicted variable using extracted diffusion index by scaling coefficients
Figure SMS_54
The prediction capacity is evaluated, the description weight with large scaling coefficient is large, the prediction capacity is strong, the description weight with small scaling coefficient is small, and the prediction capacity is weak.
The prediction variables are divided into variables with weak prediction capability and variables with strong prediction capability, and the variables with strong prediction capability are used as input values of the ANN-U-MIDAS. And inputting the influence factors after dimension reduction into an ANN-U-MIDAS, and directly modeling and predicting PM2.5 daily average concentration of the original mixing data.
Taking the three-layer neural network as an example, it generally comprises a network of three-layer neural networks
Figure SMS_55
Input layer composed of input neurons, hidden layer and
Figure SMS_56
consisting of hidden neuronsAnd an output layer. Meanwhile, hyperbolic tangent sigmoid is used as an activation function in the hidden layer, and an identity function is used as a conversion function in the output layer. In addition, a solution process of a nonlinear optimization algorithm based on standard gradients is provided to optimize the connection weights and bias.
Is provided with
Figure SMS_59
Is the high frequency original input variable,/->
Figure SMS_60
Is a low frequency target output variable. Let->
Figure SMS_62
Representation->
Figure SMS_58
And->
Figure SMS_61
Frequency mismatch between them. In order to predict the low frequency variable +.>
Figure SMS_63
It is necessary to add according to a given maximum hysteresis order +.>
Figure SMS_64
Realize->
Figure SMS_57
Is aligned in frequency. Specifically:
for each predicted variable
Figure SMS_65
Frequency alignment is performed to obtain a frequency alignment with +.>
Figure SMS_66
The output variables having the same frequency
Figure SMS_67
(use of
Figure SMS_68
Indicating (I)>
Figure SMS_69
)。
By passing sigmoid transfer function
Figure SMS_70
Weights applied to frequency alignment vector and hidden layer>
Figure SMS_71
Add hidden layer bias->
Figure SMS_72
Inner product between them to obtain->
Figure SMS_73
Hidden layer node->
Figure SMS_74
Figure SMS_75
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_76
is the weight vector of the hidden layer,/>
Figure SMS_77
Is the bias vector of the hidden layer, ">
Figure SMS_78
Is variable with high frequency
Figure SMS_79
Related high frequency prediction range, +.>
Figure SMS_80
The sigmoid transfer function using the hyperbolic tangent function is expressed in the following specific form:
Figure SMS_81
s33, hiding the layer calculation result
Figure SMS_82
And (3) carrying out an output layer, and outputting a final predicted result of the PM2.5 average daily concentration:
Figure SMS_83
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_84
is the output layer weight vector,/>
Figure SMS_85
Is the output layer deviation, & lt + & gt>
Figure SMS_86
Is the output layer transfer function, typically using an identity function.
The effects of the present invention are further illustrated by the following specific examples:
and acquiring 355 groups of effective air quality indexes including other pollutant gas concentrations and weather condition high-frequency date data of corresponding dates from 5 months 19 days to 5 months 19 days 2019 in XX city, wherein the forecast variable is the total production value of the people-average area, the total industrial enterprises of the people-average area and the low-frequency month data of motor vehicles. Wherein the high frequency solar data comprises SO 2 、NO 2 、CO、O 3 The historical concentration data of the polluted gas, the highest temperature, the precipitation amount, the lowest temperature and the wind power meteorological data form a predicted variable total sample.
Through self-adaptive principal component analysis, SO 2 、NO 2 The prediction capability of the total number of industrial enterprises and the number of motor vehicles in CO, wind power and people-average areas is strong, O 3 The maximum temperature, precipitation, minimum temperature, and average area production total value are less predictive. SO SO 2 、NO 2 And the total number of industrial enterprises and the number of motor vehicles in CO, wind power and people-average areas are used as input values of the ANN-U-MIDAS.
Selection of SO 2 、NO 2 The total number of industrial enterprises and the number of motor vehicles in the areas of CO, wind power and people average are taken as prediction variables, and the output variables are PM2.5 daily average concentration. The optimal hidden layer neuron number and the maximum hysteresis order are determined according to a Time Sequence Cross Validation (TSCV) method to realize the frequency alignment of the predicted variable and the output variable. According to the calculation, the number of hidden layer neurons is set to 5, the maximum hysteresis order of the high-frequency prediction variable is 8, and the maximum hysteresis order of the low-frequency prediction variable is 2. Furthermore, when using a gradient descent algorithm, an iteration convergence condition is given, i.e., the maximum number of iterations does not exceed 108.
The gradient descent algorithm specifically comprises the following steps:
the detailed solving process of the gradient descent algorithm is as follows:
let s=0, s denote the number of iterations, for all parameters in the neural network
Figure SMS_87
,/>
Figure SMS_88
,/>
Figure SMS_89
,/>
Figure SMS_90
Randomly selecting a set of initial parameter values;
calculating an error signal based on the input high frequency interpretation variable, the known parameter value and the given error function
Figure SMS_91
Figure SMS_92
Wherein T represents the length of the low frequency time series,
Figure SMS_93
true value representing the result of the iteration s times,/->
Figure SMS_94
And representing the predicted value output by the ANN-U-MIDAS model for s iterations.
To obtain error signal
Figure SMS_97
Back to the individual neurons and to determine the error signal +.>
Figure SMS_98
For parameters->
Figure SMS_101
Figure SMS_96
,/>
Figure SMS_100
,/>
Figure SMS_103
Partial derivative of>
Figure SMS_104
,/>
Figure SMS_95
,/>
Figure SMS_99
,/>
Figure SMS_102
And iteratively updating parameters on each layer according to an iteration formula. Wherein, the gradient formula of each parameter is respectively:
Figure SMS_105
Figure SMS_106
wherein->
Figure SMS_107
, />
Figure SMS_109
Representing the learning rate.
And (3) s+1- & gt s, repeating the steps until the convergence condition is met, stopping calculation, and outputting estimation results of all parameters. The convergence conditions are as follows: 1) Error signal
Figure SMS_110
Reaching a given threshold value; 2) The total number of iterations s reaches a given maximum number of iterations.
And finally obtaining the maximum iteration number not exceeding 108 according to the steps.
The prediction of step length of 7 was performed using this model, and the prediction results of average PM2.5 day concentration in XX city from 5 months 20 days in 2019 to 5 months 26 days in 2019 were obtained as shown in table 1.
Table 1: prediction result of PM2.5 daily average concentration in XX City
Figure SMS_111
In summary, the PM2.5 concentration prediction method provided in this embodiment first uses adaptive principal component analysis (sPCA) to extract common factors for each type of influencing factors, identifies influencing factors that are conducive to predicting the PM2.5 daily average concentration, accurately extracts useful information, and reduces the dimension of the input layer; and then, predicting PM2.5 daily average concentration by using a neural network mixing data model (ANN-U-MIDAS), wherein the model can directly input original mixing data in an artificial neural network, so that the defect of data same-frequency pretreatment is avoided, a nonlinear influence mode among variables can be accurately identified in mixing data analysis, and the fitting effect and prediction capability of the model are improved.
Based on the same inventive concept, the present embodiment also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the foregoing PM2.5 concentration prediction method based on adaptive principal component analysis and neural network.
The computer readable medium includes, but is not limited to, any type of disk including floppy disks, hard disks, optical disks, CD-ROMs, and magneto-optical disks, ROM, RAM, EPROM (Erasable Programmable Read-Only Memory), EEPROMs, flash Memory, magnetic cards, or optical cards. That is, a readable medium includes any medium that stores or transmits information in a form readable by a device (e.g., a computer).
The computer readable storage medium provided in this embodiment has the same inventive concept and the same advantages as the aforementioned method, and is not described here again.
While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the present invention.

Claims (7)

1. The PM2.5 concentration prediction method based on the self-adaptive principal component analysis and the neural network is characterized by comprising the following steps of:
s1, acquiring influence factor historical data of PM2.5 concentration and constructing a prediction variable total sample;
s2, extracting predictive factors for each type of influence factors by utilizing self-adaptive principal component analysis, and identifying influence factors which are helpful for predicting PM2.5 daily average concentration and comprise
S21, forming a group of scaled prediction variables, wherein the scaling factors are the slopes of the prediction regression on the standardized prediction variables;
s22, extracting a diffusion index from the scaled predicted variable as a predicted factor of PM2.5 daily average concentration;
s3, inputting the dimensionality-reduced influence factors into a neural network mixing data model, and predicting PM2.5 daily average concentration through original mixing data, wherein the method comprises the following steps of
S31, performing frequency alignment on each predicted variable so as to have the same frequency as the output variable;
s32, multiplying all frequency alignment vectors entering the hidden layer by the weight of the hidden layer, adding the bias of the hidden layer, and obtaining the calculation result of each hidden layer node under the action of a sigmoid transfer function;
s33, calculating results of all hidden layer nodes entering the input layer
Figure QLYQS_1
Bring in the output layer, calculate the result of all hidden layer nodes going into the input layer +.>
Figure QLYQS_2
And output layer weight->
Figure QLYQS_3
Multiplication, plus output layer bias->
Figure QLYQS_4
Transfer function at output layer
Figure QLYQS_5
Under the action of the (2) and (5) the final output result of the average concentration of PM2.5 days is obtained:
Figure QLYQS_6
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure QLYQS_7
is the output layer weight vector,/>
Figure QLYQS_8
Is the output layer deviation, & lt + & gt>
Figure QLYQS_9
Is the output layer transfer function.
2. The method for predicting PM2.5 concentration based on adaptive principal component analysis and neural network of claim 1, wherein the influencing factors in S1 include SO 2 、NO 2 、CO、O 3 The total production value of the people-average area, the total industrial enterprises of the people-average area and the number of motor vehicles are calculated according to the pollution gas, the highest temperature, the precipitation amount, the lowest temperature and the wind power.
3. The method for predicting PM2.5 concentration based on adaptive principal component analysis and neural network of claim 2, wherein a set of scaled prediction variables is formed in S21
Figure QLYQS_10
Scaling factor->
Figure QLYQS_11
The slope of predictive regression is carried out on the ith standardized predictive variable, the variable with strong predictive force is extracted to predict the average PM2.5 day concentration;
Figure QLYQS_12
where N is the total number of potential predicted variables,
Figure QLYQS_13
represents the regression of the PM2.5 day average concentration from time t to time t +1,
Figure QLYQS_14
is the i-th predicted variable of time t, < ->
Figure QLYQS_15
Is an error value with an average value equal to zero, +.>
Figure QLYQS_16
An intercept term representing the i-th predicted variable.
4. The method for predicting PM2.5 concentration based on adaptive principal component analysis and neural network as claimed in claim 3, wherein S22 is derived from
Figure QLYQS_17
The diffusion index is extracted as a PM2.5 daily average concentration predictive factor, and the form of the extracted diffusion index is as follows:
Figure QLYQS_18
wherein the method comprises the steps of
Figure QLYQS_19
Is a K-order vector representing the diffusion index of the adaptive principal component analysis, K is defined by modified +.>
Figure QLYQS_20
Determination of->
Figure QLYQS_21
Representing the goodness of fit, measured as how well the predicted value fits to the true value, +.>
Figure QLYQS_22
Is the K-dimensional parameter to be estimated, +.>
Figure QLYQS_23
Is a heterogeneous noise term.
5. The method for predicting PM2.5 concentration based on adaptive principal component analysis and neural network according to claim 4, wherein,
performing predictive regression on the hysteresis value of each predictive variable by scaling the coefficient
Figure QLYQS_24
And (3) evaluating the prediction capability, wherein the prediction capability with large scaling coefficient is strong, the prediction capability with small scaling coefficient is weak, and the prediction variable with strong prediction capability is used as the input value of the neural network mixing data model.
6. Self-based as in claim 5The PM2.5 concentration prediction method adapting to principal component analysis and neural network is characterized in that the prediction variable with strong prediction capability comprises SO 2 、NO 2 The total number of industrial enterprises, the number of motor vehicles in CO, wind power and people-average areas.
7. The method for predicting PM2.5 concentration based on adaptive principal component analysis and neural network of claim 6, wherein each predicted variable is frequency aligned in S31 to obtain a value having the same frequency as the output variable
Figure QLYQS_25
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure QLYQS_26
is the high frequency original input variable,/->
Figure QLYQS_27
Is a low frequency target output variable,/->
Figure QLYQS_28
Representation->
Figure QLYQS_29
And
Figure QLYQS_30
frequency mismatch between->
Figure QLYQS_31
All frequency alignment vectors and hidden layer weights entering the hidden layer in S32
Figure QLYQS_32
Multiplying, adding hidden layer bias
Figure QLYQS_33
In sigmoid transmissionDelivery function->
Figure QLYQS_34
Under the action of (a), each hidden layer node is obtained
Is calculated according to the calculation result of (2)
Figure QLYQS_35
Figure QLYQS_36
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure QLYQS_37
is the weight vector of the hidden layer,
Figure QLYQS_38
is the bias vector of the hidden layer, ">
Figure QLYQS_39
Is the high frequency prediction horizon related to the high frequency variable, < > hi->
Figure QLYQS_40
Representing the sigmoid transfer function using the hyperbolic tangent function.
CN202310695985.3A 2023-06-13 2023-06-13 PM2.5 concentration prediction method based on self-adaptive principal component analysis and neural network Pending CN116431999A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310695985.3A CN116431999A (en) 2023-06-13 2023-06-13 PM2.5 concentration prediction method based on self-adaptive principal component analysis and neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310695985.3A CN116431999A (en) 2023-06-13 2023-06-13 PM2.5 concentration prediction method based on self-adaptive principal component analysis and neural network

Publications (1)

Publication Number Publication Date
CN116431999A true CN116431999A (en) 2023-07-14

Family

ID=87087633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310695985.3A Pending CN116431999A (en) 2023-06-13 2023-06-13 PM2.5 concentration prediction method based on self-adaptive principal component analysis and neural network

Country Status (1)

Country Link
CN (1) CN116431999A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11080789B1 (en) * 2011-11-14 2021-08-03 Economic Alchemy LLC Methods and systems to quantify and index correlation risk in financial markets and risk management contracts thereon
CN114781538A (en) * 2022-05-07 2022-07-22 东莞理工学院 Air quality prediction method and system of GA-BP neural network coupling decision tree

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11080789B1 (en) * 2011-11-14 2021-08-03 Economic Alchemy LLC Methods and systems to quantify and index correlation risk in financial markets and risk management contracts thereon
CN114781538A (en) * 2022-05-07 2022-07-22 东莞理工学院 Air quality prediction method and system of GA-BP neural network coupling decision tree

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
NOURI A: ""Prediction of PM 2.5 Concentrations Using Principal Component Analysis and Artificial Neural Network Techniques: A Case Study: Urmia, Iran"", 《ENVIRONMENTAL ENGINEERING SCIENCE》, pages 1 - 10 *
XU Q: ""An artificial neural network for mixed frequency data"", 《EXPERT SYSTEMS WITH APPLICATIONS》, pages 1 - 13 *
YANGLI GUO: ""Oil price volatility predictability: New evidence from a scaled PCA approach"", 《ENERGY ECONOMICS》, pages 1 - 9 *

Similar Documents

Publication Publication Date Title
Liu et al. An intelligent hybrid model for air pollutant concentrations forecasting: Case of Beijing in China
CN111563706A (en) Multivariable logistics freight volume prediction method based on LSTM network
CN107292446B (en) Hybrid wind speed prediction method based on component relevance wavelet decomposition
Piltan et al. Energy demand forecasting in Iranian metal industry using linear and nonlinear models based on evolutionary algorithms
CN111967688B (en) Power load prediction method based on Kalman filter and convolutional neural network
CN112434848B (en) Nonlinear weighted combination wind power prediction method based on deep belief network
CN114912077B (en) Sea wave forecasting method integrating random search and mixed decomposition error correction
CN113554466A (en) Short-term power consumption prediction model construction method, prediction method and device
CN113537469B (en) Urban water demand prediction method based on LSTM network and Attention mechanism
CN114580545A (en) Wind turbine generator gearbox fault early warning method based on fusion model
CN116187835A (en) Data-driven-based method and system for estimating theoretical line loss interval of transformer area
Li et al. An innovative combined model based on multi-objective optimization approach for forecasting short-term wind speed: A case study in China
Wang et al. Causal carbon price interval prediction using lower upper bound estimation combined with asymmetric multi-objective evolutionary algorithm and long short-term memory
Liu et al. Air pollutant concentration forecasting using long short-term memory based on wavelet transform and information gain: A case study of Beijing
CN116307298B (en) Combined carbon emission prediction method based on multi-source heterogeneous tensor data
Li et al. A new PM2. 5 concentration forecasting system based on AdaBoost‐ensemble system with deep learning approach
CN116797274A (en) Shared bicycle demand prediction method based on Attention-LSTM-LightGBM
CN116431999A (en) PM2.5 concentration prediction method based on self-adaptive principal component analysis and neural network
CN115481816A (en) Time-space convolution short-term traffic flow prediction method based on complementary integrated empirical mode decomposition
CN114862032A (en) XGboost-LSTM-based power grid load prediction method and device
CN113191526A (en) Short-term wind speed interval multi-objective optimization prediction method and system based on random sensitivity
CN113988415A (en) Medium-and-long-term power load prediction method
CN114065996A (en) Traffic flow prediction method based on variational self-coding learning
CN117313160B (en) Privacy-enhanced structured data simulation generation method and system
CN116578844B (en) Cold quantity prediction method, system and related equipment for heating ventilation air conditioner

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination