CN116431999A - PM2.5 concentration prediction method based on self-adaptive principal component analysis and neural network - Google Patents
PM2.5 concentration prediction method based on self-adaptive principal component analysis and neural network Download PDFInfo
- Publication number
- CN116431999A CN116431999A CN202310695985.3A CN202310695985A CN116431999A CN 116431999 A CN116431999 A CN 116431999A CN 202310695985 A CN202310695985 A CN 202310695985A CN 116431999 A CN116431999 A CN 116431999A
- Authority
- CN
- China
- Prior art keywords
- prediction
- variable
- concentration
- neural network
- principal component
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000000513 principal component analysis Methods 0.000 title claims abstract description 33
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 32
- 238000013499 data model Methods 0.000 claims abstract description 8
- 239000004973 liquid crystal related substance Substances 0.000 claims description 16
- 239000013598 vector Substances 0.000 claims description 16
- 238000009792 diffusion process Methods 0.000 claims description 14
- 230000003044 adaptive effect Effects 0.000 claims description 12
- 238000012546 transfer Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 7
- 230000009471 action Effects 0.000 claims description 6
- 238000004519 manufacturing process Methods 0.000 claims description 5
- 238000001556 precipitation Methods 0.000 claims description 5
- 230000000694 effects Effects 0.000 abstract description 10
- 230000007547 defect Effects 0.000 abstract description 6
- 238000007405 data analysis Methods 0.000 abstract description 5
- 239000000284 extract Substances 0.000 abstract description 5
- 238000007781 pre-processing Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 15
- 239000003344 environmental pollutant Substances 0.000 description 5
- 231100000719 pollutant Toxicity 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 239000007789 gas Substances 0.000 description 4
- 210000002569 neuron Anatomy 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000007619 statistical method Methods 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000003915 air pollution Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006806 disease prevention Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 210000002364 input neuron Anatomy 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012628 principal component regression Methods 0.000 description 1
- 238000011946 reduction process Methods 0.000 description 1
- 208000023504 respiratory system disease Diseases 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
Abstract
The invention provides a PM2.5 concentration prediction method based on self-adaptive principal component analysis and a neural network, which comprises the following steps: acquiring influence factor historical data of PM2.5 concentration and constructing a prediction variable total sample; extracting a predictive factor for each type of influence factors by utilizing self-adaptive principal component analysis, and identifying the influence factors which are helpful for predicting the average concentration of PM2.5 days; and inputting the influence factors with reduced dimensionality into a neural network mixing data model, and predicting PM2.5 daily average concentration through the original mixing data. The invention utilizes sPCA to accurately extract useful information, and reduces the dimension of an input layer; the model can directly input original mixing data in an artificial neural network by using the ANN-U-MIDAS to predict PM2.5 daily average concentration, so that the defect of preprocessing the same frequency of the data is avoided, a nonlinear influence mode among variables can be accurately identified in mixing data analysis, and the fitting effect and the prediction capability of the model are improved.
Description
Technical Field
The invention relates to the technical field of air quality prediction, in particular to a PM2.5 concentration prediction method based on self-adaptive principal component analysis and a neural network.
Background
Along with the economic development and the acceleration of the urban process, the increase of the discharge amount of various pollutants has great influence on the atmospheric environment, and the air pollution problem has become one of the problems of the current society. PM2.5, which is the main pollutant in the atmosphere, has negative effects on crowd health, travel, environmental quality, etc., and is often used to assess air quality due to its long transmission distance and long residence time. Therefore, the method accurately predicts the average PM2.5 daily concentration and plays a vital role in pollution prediction, respiratory disease prevention and atmospheric environmental management.
The current prediction methods for average concentration of PM2.5 days mainly comprise two types: one is a traditional statistical method, and the other is a neural network method of deep learning. In the early stage, the method for predicting the average concentration of PM2.5 days mainly adopts the traditional statistical methods such as gray model, linear regression, principal component regression and the like. Although the above conventional models have been applied to the prediction of pollutants, most of the models are designed for linear data, and have good prediction effect on the linear data, and the change of the average PM2.5 daily concentration has the characteristics of nonlinearity, mutation and the like, which is a relatively complex nonlinear system, so that a large error is generated when the conventional linear statistical method is used for predicting the PM2.5 concentration. In recent years, as the neural network is more and more prominent in the field of nonlinear problems, the neural network is increasingly started to be used as a prediction model of the concentration of pollutants. However, the initial parameter setting of a single prediction model has a great influence on the prediction accuracy, and in order to improve the prediction accuracy of the model, a hybrid model, that is, a model combined with other methods according to a single neural network model, is increasingly proposed. Although the mixed model has good prediction effect, the parameters of the model need to be determined through repeated experiments, which is time-consuming and labor-consuming, and the determined parameters are not necessarily optimal.
Disclosure of Invention
The purpose of the invention is that: aiming at the defects in the background art, the PM2.5 concentration prediction method is provided, useful information can be accurately extracted through self-adaptive principal component analysis (sPCA), variables which are conducive to predicting the average concentration of PM2.5 days can be identified, and the dimension of an input layer is reduced; the neural network frequency mixing data model (ANN-U-MIDAS) can reduce the computational complexity, avoid the defect of data same-frequency preprocessing, and accurately identify nonlinear influence modes among variables in frequency mixing data analysis so as to improve the fitting effect and the prediction capability of the model.
In order to achieve the above object, the present invention provides a PM2.5 concentration prediction method based on adaptive principal component analysis and neural network, comprising:
s1, acquiring influence factor historical data of PM2.5 concentration and constructing a prediction variable total sample;
s2, extracting predictive factors for each type of influence factors by utilizing self-adaptive principal component analysis, and identifying influence factors which are helpful for predicting PM2.5 daily average concentration and comprise
S21, forming a group of scaled prediction variables, wherein the scaling factors are the slopes of the prediction regression on the standardized prediction variables;
s22, extracting a diffusion index from the scaled predicted variable as a predicted factor of PM2.5 daily average concentration;
s3, inputting the dimensionality-reduced influence factors into a neural network mixing data model, and predicting PM2.5 daily average concentration through original mixing data, wherein the method comprises the following steps of
S31, performing frequency alignment on each predicted variable so as to have the same frequency as the output variable;
s32, multiplying all frequency alignment vectors entering the hidden layer by the weight of the hidden layer, adding the bias of the hidden layer, and obtaining the calculation result of each hidden layer node under the action of a sigmoid transfer function;
s33, calculating results of all hidden layer nodes entering the input layerBring in the output layer, calculate the result of all hidden layer nodes going into the input layer +.>And output layer weight->Multiplication, plus output layer bias->Transfer function at output layer->Under the action of the (2) and (5) the final output result of the average concentration of PM2.5 days is obtained:
wherein, the liquid crystal display device comprises a liquid crystal display device,is the output layer weight vector,/>Is the output layer deviation, & lt + & gt>Is the output layer transfer function.
Further, the influencing factors in S1 include SO 2 、NO 2 、CO、O 3 The total production value of the people-average area, the total industrial enterprises of the people-average area and the number of motor vehicles are calculated according to the pollution gas, the highest temperature, the precipitation amount, the lowest temperature and the wind power.
Further, a set of scaled prediction variables is formed in S21Scaling factor->The slope of predictive regression is carried out on the ith standardized predictive variable, the variable with strong predictive force is extracted to predict the average PM2.5 day concentration;
where N is the total number of potential predicted variables,represents regression of the average PM2.5 day concentration from time t to time t+1, ++>Is the i-th predicted variable of time t, < ->Is an error value with an average value equal to zero, +.>An intercept term representing the i-th predicted variable.
Further, S22 is fromThe diffusion index is extracted as a PM2.5 daily average concentration predictive factor, and the form of the extracted diffusion index is as follows:
wherein the method comprises the steps ofIs a K-order vector representing the diffusion index of the adaptive principal component analysis, K is defined by modified +.>It is determined that the number of the cells,representing the goodness of fit, measured as how well the predicted value fits to the true value, +.>Is the K-dimensional parameter that needs to be estimated,is a heterogeneous noise term.
Further, performing predictive regression on the hysteresis value of each predictive variable by scaling the coefficientsEstimating the predictive power, and having strong predictive power with large scaling factor, weak predictive power with small scaling factor, and strong predictive powerThe predicted variables are used as input values for the neural network mixed data model.
Further, predictive variables with high predictive power include SO 2 、NO 2 The total number of industrial enterprises, the number of motor vehicles in CO, wind power and people-average areas.
Further, each of the predicted variables is frequency-aligned in S31 to obtain a frequency identical to the output variable
Wherein, the liquid crystal display device comprises a liquid crystal display device,is the high frequency original input variable,/->Is a low frequency target output variable,/->Representation ofAnd->Frequency mismatch between->。
Further, all frequency alignment vectors entering the hidden layer are aligned with the hidden layer weights in S32Multiplying together with hidden layer bias +.>In sigmoid transfer function +.>Under the action of (a) to obtain the calculation result of each hidden layer node +.>:
Wherein, the liquid crystal display device comprises a liquid crystal display device,
is the weight vector of the hidden layer,/>Is the bias vector of the hidden layer, ">Is the high frequency prediction horizon related to the high frequency variable, < > hi->Representing the sigmoid transfer function using the hyperbolic tangent function.
The scheme of the invention has the following beneficial effects:
according to the PM2.5 concentration prediction method based on the self-adaptive principal component analysis and the neural network, disclosed by the invention, common factors are extracted from each type of influence factors by utilizing the self-adaptive principal component analysis (sPCA), the influence factors which are conducive to predicting the average concentration of PM2.5 days are identified, useful information is accurately extracted, and the dimension of an input layer is reduced; the PM2.5 daily average concentration is predicted by using a neural network mixing data model (ANN-U-MIDAS), the model can directly input original mixing data in an artificial neural network, the defect of data same-frequency pretreatment is avoided, a nonlinear influence mode among variables can be accurately identified in mixing data analysis, and the fitting effect and the prediction capability of the model are improved.
Other advantageous effects of the present invention will be described in detail in the detailed description section which follows.
Drawings
FIG. 1 is a flow chart of the steps of the present invention.
Detailed Description
Other advantages and effects of the present disclosure will become readily apparent to those skilled in the art from the following disclosure, which describes embodiments of the present disclosure by way of specific examples. It will be apparent that the described embodiments are merely some, but not all embodiments of the present disclosure. The disclosure may be embodied or practiced in other different specific embodiments, and details within the subject specification may be modified or changed from various points of view and applications without departing from the spirit of the disclosure. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict. All other embodiments, which can be made by one of ordinary skill in the art without inventive effort, based on the embodiments in this disclosure are intended to be within the scope of this disclosure.
It is noted that various aspects of the embodiments are described below within the scope of the following claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the present disclosure, one skilled in the art will appreciate that one aspect described herein may be implemented independently of any other aspect, and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. In addition, such apparatus may be implemented and/or such methods practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.
It should also be noted that the illustrations provided in the following embodiments merely illustrate the basic concepts of the disclosure by way of illustration, and only the components related to the disclosure are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated. In addition, in the following description, specific details are provided in order to provide a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.
As shown in FIG. 1, an embodiment of the present invention provides a PM2.5 concentration prediction method based on adaptive principal component analysis and neural network, which uses high frequency solar data SO 2 、NO 2 、CO、O 3 And other polluted gases such as the highest temperature, the precipitation amount, the lowest temperature and the wind power, and predicting PM2.5 daily average concentration change by using low-frequency month data average production total value, average industrial enterprise total number and motor vehicle number.
The PM2.5 daily average concentration is predicted by considering influence factors of various sources and various aspects as comprehensively and systematically as possible, so that the fitting and prediction accuracy of the output variable can be improved, and the interpretation capability of the analysis result can be enhanced. However, introducing numerous influencing factors into the model creates corresponding high-dimensional mixing data analysis problems. If all the influencing factors are reserved in the model, a plurality of redundant variables and redundant characteristics exist, multiple collinearity problems are caused, the estimation load of the model is increased, the estimation result is biased, and therefore the generalization capability and the prediction precision of the model are reduced. In this case, the high-dimensional mixing variable needs to be subjected to a dimension reduction process. Principal Component Analysis (PCA) is the most widely used method of dimension reduction. It converts a large number of variables into orthogonal components so that the original data can be replaced by a small number of principal components. While principal component analysis is useful in reducing a large number of prediction variables to a corresponding number of combinations, one recognized disadvantage is that it ignores the prediction target information altogether.
Based on this, the present embodiment adopts adaptive principal component analysis (sPCA), i.e., scaling the predictive power of the target to be predicted according to each prediction variable, replaying more weight on the prediction variables that are more important to the predicted target. In contrast, principal Component Analysis (PCA) gives equal weight to all predicted variables. Principal Component Analysis (PCA) can sum up a large amount of predicted variable information into a few variables, filtering out specific noise, but ignoring the predicted target, and is an unsupervised learning technique. If one predicted variable is noisier than the other predicted variable, it inevitably disproportionately affects the weights of the variables. sPCA accurately corrects this defect by adding less weight to the noisy predicted variable.
In this embodiment, therefore, sPCA not only filters out most of the special noise contained in a single predicted variable, but also extracts the most significant synergy from all potential variables. Before the diffusion index is constructed, by evaluating each predicted variable by its ability to predict the PM2.5 daily average concentration, a variable that contributes to predicting the PM2.5 daily average concentration can be identified.
Specifically, in this embodiment, after obtaining historical data of influencing factors and constructing a total sample of predicted variables, sPCA extracts diffusion indexes in two steps:
forming a set of scaled prediction variablesScaling factor->The slope of predictive regression is carried out on the ith standardized predictive variable, the variable with strong predictive force is extracted to predict the average PM2.5 day concentration;
where N is the total number of potential predicted variables,represents regression of the average PM2.5 day concentration from time t to time t+1, ++>Is the i-th predicted variable of time t, < ->Is the error value whose mean is equal to zero.
From proportional technical index panelsPCA diffusion index is extracted from the extract as new PM2.5 daily average concentrationA predictor. The form of the extracted PCA diffusion index is as follows:
wherein the method comprises the steps ofIs the K-order vector->Represents the diffusion index of sPCA, K is defined by modified +.>Determination of->Is the K-dimensional parameter to be estimated, +.>Is a heterogeneous noise term.
Predictive regression of hysteresis values of each predicted variable using extracted diffusion index by scaling coefficientsThe prediction capacity is evaluated, the description weight with large scaling coefficient is large, the prediction capacity is strong, the description weight with small scaling coefficient is small, and the prediction capacity is weak.
The prediction variables are divided into variables with weak prediction capability and variables with strong prediction capability, and the variables with strong prediction capability are used as input values of the ANN-U-MIDAS. And inputting the influence factors after dimension reduction into an ANN-U-MIDAS, and directly modeling and predicting PM2.5 daily average concentration of the original mixing data.
Taking the three-layer neural network as an example, it generally comprises a network of three-layer neural networksInput layer composed of input neurons, hidden layer andconsisting of hidden neuronsAnd an output layer. Meanwhile, hyperbolic tangent sigmoid is used as an activation function in the hidden layer, and an identity function is used as a conversion function in the output layer. In addition, a solution process of a nonlinear optimization algorithm based on standard gradients is provided to optimize the connection weights and bias.
Is provided withIs the high frequency original input variable,/->Is a low frequency target output variable. Let->Representation->And->Frequency mismatch between them. In order to predict the low frequency variable +.>It is necessary to add according to a given maximum hysteresis order +.>Realize->Is aligned in frequency. Specifically:
for each predicted variableFrequency alignment is performed to obtain a frequency alignment with +.>The output variables having the same frequency;
By passing sigmoid transfer functionWeights applied to frequency alignment vector and hidden layer>Add hidden layer bias->Inner product between them to obtain->Hidden layer node->:
Wherein, the liquid crystal display device comprises a liquid crystal display device,
is the weight vector of the hidden layer,/>Is the bias vector of the hidden layer, ">Is variable with high frequencyRelated high frequency prediction range, +.>The sigmoid transfer function using the hyperbolic tangent function is expressed in the following specific form:
s33, hiding the layer calculation resultAnd (3) carrying out an output layer, and outputting a final predicted result of the PM2.5 average daily concentration:
wherein, the liquid crystal display device comprises a liquid crystal display device,is the output layer weight vector,/>Is the output layer deviation, & lt + & gt>Is the output layer transfer function, typically using an identity function.
The effects of the present invention are further illustrated by the following specific examples:
and acquiring 355 groups of effective air quality indexes including other pollutant gas concentrations and weather condition high-frequency date data of corresponding dates from 5 months 19 days to 5 months 19 days 2019 in XX city, wherein the forecast variable is the total production value of the people-average area, the total industrial enterprises of the people-average area and the low-frequency month data of motor vehicles. Wherein the high frequency solar data comprises SO 2 、NO 2 、CO、O 3 The historical concentration data of the polluted gas, the highest temperature, the precipitation amount, the lowest temperature and the wind power meteorological data form a predicted variable total sample.
Through self-adaptive principal component analysis, SO 2 、NO 2 The prediction capability of the total number of industrial enterprises and the number of motor vehicles in CO, wind power and people-average areas is strong, O 3 The maximum temperature, precipitation, minimum temperature, and average area production total value are less predictive. SO SO 2 、NO 2 And the total number of industrial enterprises and the number of motor vehicles in CO, wind power and people-average areas are used as input values of the ANN-U-MIDAS.
Selection of SO 2 、NO 2 The total number of industrial enterprises and the number of motor vehicles in the areas of CO, wind power and people average are taken as prediction variables, and the output variables are PM2.5 daily average concentration. The optimal hidden layer neuron number and the maximum hysteresis order are determined according to a Time Sequence Cross Validation (TSCV) method to realize the frequency alignment of the predicted variable and the output variable. According to the calculation, the number of hidden layer neurons is set to 5, the maximum hysteresis order of the high-frequency prediction variable is 8, and the maximum hysteresis order of the low-frequency prediction variable is 2. Furthermore, when using a gradient descent algorithm, an iteration convergence condition is given, i.e., the maximum number of iterations does not exceed 108.
The gradient descent algorithm specifically comprises the following steps:
the detailed solving process of the gradient descent algorithm is as follows:
let s=0, s denote the number of iterations, for all parameters in the neural network,/>,/>,/>Randomly selecting a set of initial parameter values;
calculating an error signal based on the input high frequency interpretation variable, the known parameter value and the given error function:
Wherein T represents the length of the low frequency time series,true value representing the result of the iteration s times,/->And representing the predicted value output by the ANN-U-MIDAS model for s iterations.
To obtain error signalBack to the individual neurons and to determine the error signal +.>For parameters->,,/>,/>Partial derivative of>,/>,/>,/>。
And iteratively updating parameters on each layer according to an iteration formula. Wherein, the gradient formula of each parameter is respectively:;wherein->,, />Representing the learning rate.
And (3) s+1- & gt s, repeating the steps until the convergence condition is met, stopping calculation, and outputting estimation results of all parameters. The convergence conditions are as follows: 1) Error signalReaching a given threshold value; 2) The total number of iterations s reaches a given maximum number of iterations.
And finally obtaining the maximum iteration number not exceeding 108 according to the steps.
The prediction of step length of 7 was performed using this model, and the prediction results of average PM2.5 day concentration in XX city from 5 months 20 days in 2019 to 5 months 26 days in 2019 were obtained as shown in table 1.
Table 1: prediction result of PM2.5 daily average concentration in XX City
In summary, the PM2.5 concentration prediction method provided in this embodiment first uses adaptive principal component analysis (sPCA) to extract common factors for each type of influencing factors, identifies influencing factors that are conducive to predicting the PM2.5 daily average concentration, accurately extracts useful information, and reduces the dimension of the input layer; and then, predicting PM2.5 daily average concentration by using a neural network mixing data model (ANN-U-MIDAS), wherein the model can directly input original mixing data in an artificial neural network, so that the defect of data same-frequency pretreatment is avoided, a nonlinear influence mode among variables can be accurately identified in mixing data analysis, and the fitting effect and prediction capability of the model are improved.
Based on the same inventive concept, the present embodiment also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the foregoing PM2.5 concentration prediction method based on adaptive principal component analysis and neural network.
The computer readable medium includes, but is not limited to, any type of disk including floppy disks, hard disks, optical disks, CD-ROMs, and magneto-optical disks, ROM, RAM, EPROM (Erasable Programmable Read-Only Memory), EEPROMs, flash Memory, magnetic cards, or optical cards. That is, a readable medium includes any medium that stores or transmits information in a form readable by a device (e.g., a computer).
The computer readable storage medium provided in this embodiment has the same inventive concept and the same advantages as the aforementioned method, and is not described here again.
While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the present invention.
Claims (7)
1. The PM2.5 concentration prediction method based on the self-adaptive principal component analysis and the neural network is characterized by comprising the following steps of:
s1, acquiring influence factor historical data of PM2.5 concentration and constructing a prediction variable total sample;
s2, extracting predictive factors for each type of influence factors by utilizing self-adaptive principal component analysis, and identifying influence factors which are helpful for predicting PM2.5 daily average concentration and comprise
S21, forming a group of scaled prediction variables, wherein the scaling factors are the slopes of the prediction regression on the standardized prediction variables;
s22, extracting a diffusion index from the scaled predicted variable as a predicted factor of PM2.5 daily average concentration;
s3, inputting the dimensionality-reduced influence factors into a neural network mixing data model, and predicting PM2.5 daily average concentration through original mixing data, wherein the method comprises the following steps of
S31, performing frequency alignment on each predicted variable so as to have the same frequency as the output variable;
s32, multiplying all frequency alignment vectors entering the hidden layer by the weight of the hidden layer, adding the bias of the hidden layer, and obtaining the calculation result of each hidden layer node under the action of a sigmoid transfer function;
s33, calculating results of all hidden layer nodes entering the input layerBring in the output layer, calculate the result of all hidden layer nodes going into the input layer +.>And output layer weight->Multiplication, plus output layer bias->Transfer function at output layerUnder the action of the (2) and (5) the final output result of the average concentration of PM2.5 days is obtained:
2. The method for predicting PM2.5 concentration based on adaptive principal component analysis and neural network of claim 1, wherein the influencing factors in S1 include SO 2 、NO 2 、CO、O 3 The total production value of the people-average area, the total industrial enterprises of the people-average area and the number of motor vehicles are calculated according to the pollution gas, the highest temperature, the precipitation amount, the lowest temperature and the wind power.
3. The method for predicting PM2.5 concentration based on adaptive principal component analysis and neural network of claim 2, wherein a set of scaled prediction variables is formed in S21Scaling factor->The slope of predictive regression is carried out on the ith standardized predictive variable, the variable with strong predictive force is extracted to predict the average PM2.5 day concentration;
where N is the total number of potential predicted variables,represents the regression of the PM2.5 day average concentration from time t to time t +1,is the i-th predicted variable of time t, < ->Is an error value with an average value equal to zero, +.>An intercept term representing the i-th predicted variable.
4. The method for predicting PM2.5 concentration based on adaptive principal component analysis and neural network as claimed in claim 3, wherein S22 is derived fromThe diffusion index is extracted as a PM2.5 daily average concentration predictive factor, and the form of the extracted diffusion index is as follows:
wherein the method comprises the steps ofIs a K-order vector representing the diffusion index of the adaptive principal component analysis, K is defined by modified +.>Determination of->Representing the goodness of fit, measured as how well the predicted value fits to the true value, +.>Is the K-dimensional parameter to be estimated, +.>Is a heterogeneous noise term.
5. The method for predicting PM2.5 concentration based on adaptive principal component analysis and neural network according to claim 4, wherein,
performing predictive regression on the hysteresis value of each predictive variable by scaling the coefficientAnd (3) evaluating the prediction capability, wherein the prediction capability with large scaling coefficient is strong, the prediction capability with small scaling coefficient is weak, and the prediction variable with strong prediction capability is used as the input value of the neural network mixing data model.
6. Self-based as in claim 5The PM2.5 concentration prediction method adapting to principal component analysis and neural network is characterized in that the prediction variable with strong prediction capability comprises SO 2 、NO 2 The total number of industrial enterprises, the number of motor vehicles in CO, wind power and people-average areas.
7. The method for predicting PM2.5 concentration based on adaptive principal component analysis and neural network of claim 6, wherein each predicted variable is frequency aligned in S31 to obtain a value having the same frequency as the output variable
Wherein, the liquid crystal display device comprises a liquid crystal display device,is the high frequency original input variable,/->Is a low frequency target output variable,/->Representation->Andfrequency mismatch between->;
All frequency alignment vectors and hidden layer weights entering the hidden layer in S32Multiplying, adding hidden layer biasIn sigmoid transmissionDelivery function->Under the action of (a), each hidden layer node is obtained
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310695985.3A CN116431999A (en) | 2023-06-13 | 2023-06-13 | PM2.5 concentration prediction method based on self-adaptive principal component analysis and neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310695985.3A CN116431999A (en) | 2023-06-13 | 2023-06-13 | PM2.5 concentration prediction method based on self-adaptive principal component analysis and neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116431999A true CN116431999A (en) | 2023-07-14 |
Family
ID=87087633
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310695985.3A Pending CN116431999A (en) | 2023-06-13 | 2023-06-13 | PM2.5 concentration prediction method based on self-adaptive principal component analysis and neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116431999A (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11080789B1 (en) * | 2011-11-14 | 2021-08-03 | Economic Alchemy LLC | Methods and systems to quantify and index correlation risk in financial markets and risk management contracts thereon |
CN114781538A (en) * | 2022-05-07 | 2022-07-22 | 东莞理工学院 | Air quality prediction method and system of GA-BP neural network coupling decision tree |
-
2023
- 2023-06-13 CN CN202310695985.3A patent/CN116431999A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11080789B1 (en) * | 2011-11-14 | 2021-08-03 | Economic Alchemy LLC | Methods and systems to quantify and index correlation risk in financial markets and risk management contracts thereon |
CN114781538A (en) * | 2022-05-07 | 2022-07-22 | 东莞理工学院 | Air quality prediction method and system of GA-BP neural network coupling decision tree |
Non-Patent Citations (3)
Title |
---|
NOURI A: ""Prediction of PM 2.5 Concentrations Using Principal Component Analysis and Artificial Neural Network Techniques: A Case Study: Urmia, Iran"", 《ENVIRONMENTAL ENGINEERING SCIENCE》, pages 1 - 10 * |
XU Q: ""An artificial neural network for mixed frequency data"", 《EXPERT SYSTEMS WITH APPLICATIONS》, pages 1 - 13 * |
YANGLI GUO: ""Oil price volatility predictability: New evidence from a scaled PCA approach"", 《ENERGY ECONOMICS》, pages 1 - 9 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | An intelligent hybrid model for air pollutant concentrations forecasting: Case of Beijing in China | |
CN111563706A (en) | Multivariable logistics freight volume prediction method based on LSTM network | |
CN107292446B (en) | Hybrid wind speed prediction method based on component relevance wavelet decomposition | |
Piltan et al. | Energy demand forecasting in Iranian metal industry using linear and nonlinear models based on evolutionary algorithms | |
CN111967688B (en) | Power load prediction method based on Kalman filter and convolutional neural network | |
CN112434848B (en) | Nonlinear weighted combination wind power prediction method based on deep belief network | |
CN114912077B (en) | Sea wave forecasting method integrating random search and mixed decomposition error correction | |
CN113554466A (en) | Short-term power consumption prediction model construction method, prediction method and device | |
CN113537469B (en) | Urban water demand prediction method based on LSTM network and Attention mechanism | |
CN114580545A (en) | Wind turbine generator gearbox fault early warning method based on fusion model | |
CN116187835A (en) | Data-driven-based method and system for estimating theoretical line loss interval of transformer area | |
Li et al. | An innovative combined model based on multi-objective optimization approach for forecasting short-term wind speed: A case study in China | |
Wang et al. | Causal carbon price interval prediction using lower upper bound estimation combined with asymmetric multi-objective evolutionary algorithm and long short-term memory | |
Liu et al. | Air pollutant concentration forecasting using long short-term memory based on wavelet transform and information gain: A case study of Beijing | |
CN116307298B (en) | Combined carbon emission prediction method based on multi-source heterogeneous tensor data | |
Li et al. | A new PM2. 5 concentration forecasting system based on AdaBoost‐ensemble system with deep learning approach | |
CN116797274A (en) | Shared bicycle demand prediction method based on Attention-LSTM-LightGBM | |
CN116431999A (en) | PM2.5 concentration prediction method based on self-adaptive principal component analysis and neural network | |
CN115481816A (en) | Time-space convolution short-term traffic flow prediction method based on complementary integrated empirical mode decomposition | |
CN114862032A (en) | XGboost-LSTM-based power grid load prediction method and device | |
CN113191526A (en) | Short-term wind speed interval multi-objective optimization prediction method and system based on random sensitivity | |
CN113988415A (en) | Medium-and-long-term power load prediction method | |
CN114065996A (en) | Traffic flow prediction method based on variational self-coding learning | |
CN117313160B (en) | Privacy-enhanced structured data simulation generation method and system | |
CN116578844B (en) | Cold quantity prediction method, system and related equipment for heating ventilation air conditioner |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |