CN108491953B - PM2.5 prediction and early warning method and system based on nonlinear theory - Google Patents

PM2.5 prediction and early warning method and system based on nonlinear theory Download PDF

Info

Publication number
CN108491953B
CN108491953B CN201810095420.0A CN201810095420A CN108491953B CN 108491953 B CN108491953 B CN 108491953B CN 201810095420 A CN201810095420 A CN 201810095420A CN 108491953 B CN108491953 B CN 108491953B
Authority
CN
China
Prior art keywords
prediction
train
training
model
predict
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810095420.0A
Other languages
Chinese (zh)
Other versions
CN108491953A (en
Inventor
尹建光
彭飞
谢连科
臧玉魏
马新刚
韩悦
刘辉
王坤
巩泉泉
窦丹丹
张国英
李方伟
李佳煜
郭本祥
闫文晶
崔翔宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
Electric Power Research Institute of State Grid Jilin Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201810095420.0A priority Critical patent/CN108491953B/en
Publication of CN108491953A publication Critical patent/CN108491953A/en
Application granted granted Critical
Publication of CN108491953B publication Critical patent/CN108491953B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Operations Research (AREA)
  • Human Resources & Organizations (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Economics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Biology (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Algebra (AREA)
  • Development Economics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a PM2.5 prediction and early warning method and a system based on a nonlinear theory, and the method comprises a model training step and a model prediction step; dividing PM2.5 concentration time sequence data into two groups which are respectively used as a training time sequence data set and a test time sequence training set; performing S-level wavelet decomposition on the data of the training time sequence data set, performing time-frequency analysis, expanding one-dimensional information into high-dimensional information, and extracting implicit information of PM2.5 historical data to obtain a training time sequence index data set; constructing a prediction model; training the prediction model; and (3) performing MLRC-LSSVR model prediction aiming at the test time sequence training set, and performing variance analysis on the model prediction result to obtain an upper bound value of a confidence interval as a final prediction result. The invention can provide the adjustable parameters of the model, and is suitable for the prediction and early warning work of PM2.5 concentration in different regions by changing the adjustable parameters.

Description

PM2.5 prediction and early warning method and system based on nonlinear theory
Technical Field
The invention relates to the field of air quality prediction and early warning, in particular to a PM2.5 prediction and early warning method and system based on a nonlinear theory.
Background
The main component of haze is PM2.5, and PM2.5 is particulate matter with the particle size of less than 2.5 μm and is a colloid mixture. The influence factors of PM2.5 are complex, and the concentration change of the PM presents nonlinear characteristics.
At present, the atmospheric pollutant concentration prediction method mainly comprises two types, namely a statistical model and a deterministic model. The statistical model is generally a correlation model between the air quality and the influence factors established based on historical data, and has the advantages that the requirement on input data is relatively low, but the prediction precision is low, the regional air quality is difficult to reflect, and reasonable explanations on pollution causes, sources and the like cannot be given; the numerical model is based on the theory of atmospheric dynamics of different scales, couples the atmospheric physical and chemical change processes, establishes a multi-scale type atmospheric pollutant diffusion model, and forecasts the change trend and the dynamic distribution condition of the atmospheric pollutant concentration by a computer system.
In view of the high cost consumption required by numerical prediction, the existence of more uncertain factors, the complex requirements on the model establishment process and the data requirements, the prediction of the concentration of the atmospheric pollutants tends to be carried out by taking a statistical model as a main means through a plurality of researches, and particularly, a plurality of improvement researches are carried out aiming at the single-site statistical model prediction. Many researchers combine the traditional statistical method with a neural network model, an autoregressive moving average model and a multiple linear regression model to obtain a more ideal prediction result.
From the point of methodology, the autoregressive moving average model and the multiple linear regression model are linear modes, and some nonlinear relations are difficult to accurately predict, which is reflected in some example researches; as a nonlinear mapping method, the neural network model has a good effect on predicting the concentration of fine particulate matters through a multilayer perception mode. However, the learning speed of the neural network method is generally slow, the parameter setting is difficult, the neural network method is easy to fall into local optimization, the popularization capability is poor, and the prediction efficiency is low. The Support Vector Machine (SVM) overcomes the defects of long training time of a neural network, poor generalization capability, easy falling into local minimum and the like. The single-step prediction effect is good, but when multi-step prediction is carried out, each step of prediction needs the output of the previous prediction as the input, in the iteration process, the prediction result of the previous time can influence the prediction result of the next time point, errors can be gradually accumulated until the last time, and the prediction effect is gradually weakened.
In summary, the prior art lacks an effective solution to the PM2.5 prediction problem.
Disclosure of Invention
In order to solve the defects of the prior art, the invention provides a PM2.5 prediction and early warning method based on a nonlinear theory, the method can provide adjustable parameters of a model, and the prediction and early warning work of the PM2.5 concentration in different regions can be adapted by changing the adjustable parameters.
A PM2.5 prediction and early warning method based on a nonlinear theory comprises the following steps:
a model training step and a model prediction step;
dividing PM2.5 concentration time sequence data into two groups which are respectively used as a training time sequence data set and a test time sequence training set;
performing S-level wavelet decomposition on the data of the training time sequence data set, performing time-frequency analysis, expanding one-dimensional information into high-dimensional information, and extracting implicit information of PM2.5 historical data to obtain a training time sequence index data set;
then constructing a prediction model of nonlinear least squares support vector regression (AMLRC-LSSVR) based on multi-stage residual correction;
training an AMLRC-LSSVR model;
and (3) performing MLRC-LSSVR model prediction aiming at the test time sequence training set, and performing variance analysis on the model prediction result to obtain an upper bound value of a confidence interval as a final prediction result.
Further, the adjustable parameters of the prediction model are as follows: the wavelet decomposition layer number s and the regression parameters of the least square support vector machine, including the kernel function parameters and the regularization parameters gamma, can be obtained by optimizing through methods such as a genetic algorithm and the like.
Further, a multi-stage residual correction based nonlinear least squares support vector regression (MLRC-LSSVR) prediction model is described as follows:
training input: training data set (X)train,Ytrain)∈R(n-1)×2Wherein, in the step (A),
Figure RE-GDA0001670433330000021
Figure RE-GDA0001670433330000022
and (3) prediction output: predicted concentration of PM2.5 contaminants at time n +1
Figure RE-GDA0001670433330000023
Further, the model training step:
step 1: for X in the training data settrainCarrying out coifN wavelet transformation to obtain m-layer high-dimensional input training matrix X'train={X′train,1,X′train,2,...X′train,n-1And (c) the step of (c) in which,
Figure RE-GDA0001670433330000024
n-1, constructing an LSSVR model training dataset (X'rain,Ytrain)∈R(n-1)×(m+2)
Step 2: based on training data set (X'train,Ytrain) Training an LSSVR model, wherein a Simplex method with higher search efficiency and 10-fold cross validation are adopted in the training process, Gaussian kernel function key parameters of the LSSVR are optimally searched, and an LSSVR training final value Y 'is obtained'train
And step 3: calculating LSSVR training final value Y'trainAnd YtrainR between2Coefficient of correlation R2(Y′train,Ytrain);
And 4, step 4: if R is2Coefficient of correlation R2(Y′train,Ytrain) Less than a predetermined value R2Calculating a training residual vector and constructing a residual training data set (X ') according to the correlation coefficient threshold value'train,Ytrain=Ytrain-Y′train) And repeating Step 2 and Step 3 until the model satisfies R2And (4) constructing an MLRC-LSSVR prediction model by using a correlation coefficient threshold, and realizing online synchronous correction of the prediction residual by additionally k-1 LSSVR residual prediction models, wherein k is the MLRC-LSSVR prediction model level.
Further, the working steps of the model prediction process are described as follows:
step 1: reconstructing a prediction data set X at time npredict={Xtrain,XpredictTherein of
Figure RE-GDA0001670433330000032
To XpredictPerforming coifN wavelet decomposition to obtain a high-dimensional input prediction vector X 'at n time'predict=(Am,predict,D1,predict,...Dm,predict);
Step 2: inputting a high dimensional prediction vector X'predictInputting the MLRC-LSSVR prediction model to obtain MLRC-LSSVR multi-stage prediction output { Y'predict,RC1,predict,...RCk-1,predictIs thus obtained
Figure RE-GDA0001670433330000031
Wherein, RCj,predictIs the prediction output of the jth LSSVR residual prediction model.
And step 3: linear smoothing and bias correction based on the central limit theory, for residual error (RC)k-1,train,RCk-1,predict) Performing variance estimation to obtain corresponding prediction confidence upper bound YPpredict=Ypredict+RCPk-1,predictWherein, RCPk-1,predictEstimate variance for 97% confidence of k-1 level residuals;
and (3) repeating the model prediction process of the steps 1-3, and realizing the online prediction and the upper confidence limit estimation of the PM2.5 predicted concentration.
In addition, with the continuous updating of the PM2.5 concentration time sequence, in order to eliminate the redundancy of long-term historical steady-state bias information, the constructed AMLRC-LSSVR prediction model can be combined with the time sequence interval updating data to repeat the training process periodically, so that the effectiveness of model online prediction is improved.
A PM2.5 prediction and early warning system based on a nonlinear theory comprises:
the data processing unit is used for dividing the PM2.5 concentration time sequence data into a training time sequence data set and a test time sequence training set;
the wavelet decomposition unit is used for performing S-level wavelet decomposition on the data of the training time sequence data set, performing time-frequency analysis, expanding one-dimensional information into high-dimensional information, and extracting implicit information of PM2.5 historical data to obtain a training time sequence index data set;
a support vector regression prediction unit for constructing a prediction model of nonlinear least squares support vector regression (AMLRC-LSSVR) based on multi-stage residual correction; training an AMLRC-LSSVR model; and (3) performing MLRC-LSSVR model prediction aiming at the test time sequence training set, and performing variance analysis on the model prediction result to obtain an upper bound value of a confidence interval as a final prediction result.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a method for correcting multi-stage residual errors, which can avoid the cumulative effect of errors and improve the prediction precision; according to the invention, variance analysis is carried out on the prediction result, so that the problem of uncertainty of prediction can be avoided; the invention can provide the adjustable parameters of the model, and is suitable for the prediction and early warning work of PM2.5 concentration in different regions by changing the adjustable parameters.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.
FIG. 1 is a data processing flow diagram of the present invention;
fig. 2 is a schematic diagram of wavelet decomposition.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As introduced by the background art, the prior art has the defect that the PM2.5 prediction data is inaccurate, and in order to solve the technical problems, the application provides a PM2.5 prediction and early warning method based on a nonlinear theory.
In a typical embodiment of the present application, as shown in fig. 1, a PM2.5 prediction and early warning method based on a non-linear theory is provided, and the specific steps of the PM2.5 prediction and early warning method based on the non-linear theory are as follows:
step 1: and aiming at PM2.5 time series data, performing time-frequency analysis by using wavelet decomposition, expanding one-dimensional information into high-dimensional information, and extracting implicit information (information such as trend, randomness, periodicity and the like) of PM2.5 historical data.
Step 2: constructing a nonlinear least squares support vector regression (AMLRC-LSSVR) prediction model based on self-adaptive multistage residual correction, wherein the step comprises two parts of parameter optimization and regression prediction, and the specific operation steps of the part are detailed in the description of AMLRC-LSSVR;
and step 3: and carrying out variance analysis on the model prediction result to obtain an upper bound value of the confidence interval as a final prediction result.
The adjustable parameters can be adjusted through the parameter optimizing unit, the general adaptability of the model to different areas is improved, and the adjustable parameters of the model are as follows: the wavelet decomposition layer number s and s is selected mainly according to experience, a general decomposition variable A is obtained after the trend part is smoothed, and the regression parameters of the least square support vector machine comprise kernel function parameters and regularization parameters gamma.
When a kernel function is selected to solve an actual problem, the commonly adopted method includes: firstly, a kernel function is preselected by using the prior knowledge of an expert; secondly, a Cross-Validation method is adopted, namely different kernel functions are tried respectively when kernel function selection is carried out, the kernel function with the minimum induction error is the best kernel function, the minimum induction error is taken as a selection standard, and detailed operation steps are detailed in the description of a specific training process.
(1) Wavelet decomposition and feature extraction
The wavelet decomposition is characterized in that signals are represented by scaling and shifting by adopting oscillation waveforms with finite length or fast attenuation, and information is effectively extracted from the signals (research data) based on local transformation of time and frequency, so that the application of Fourier transformation is well expanded. Selecting a family of parent wavelet function generating functions having an oscillatory characteristic capable of rapidly decaying to zero:
Figure RE-GDA0001670433330000051
in the formula psia,τ(x) Is a wavelet basis function; x is PM2.5 time series data; τ translation parameter, a is the scale parameter.
In practical engineering application, discrete wavelet transform WT for obtaining signal f (x) is mostly realized by adopting discrete wavelet transform due to the characteristic of computer discrete samplingf(p, q) and corresponding reconstruction formula:
Figure RE-GDA0001670433330000052
Figure RE-GDA0001670433330000053
wherein p and q are a scale factor and a translation factor, respectively; psi*(x) A complex conjugate function of ψ (x); c is a constant independent of the signal.
For an understanding of the wavelet analysis, it can be assumed that a signal S is illustrated by a three-layer decomposition, see fig. 2.
In the process of signal analysis, different wavelet basis functions are adopted as processing tools, the obtained results have obvious difference, and reasonable wavelet basis must be selected to obtain a high-precision prediction result. At present, no clear standard exists for selecting wavelet base in the engineering field, and wavelets are selected according to experience or the purpose of signal processing. Generally, the support length, the vanishing moment and the regularity are subjected to balance processing, and in consideration of the application of wavelet decomposition to feature extraction and prediction of a PM2.5 concentration time sequence, the real-time performance and the time-frequency localization capability of the feature extraction and prediction are considered, the coifN wavelet has obvious advantages by combining the properties of wavelet basis and comprehensively analyzing: on the vanishing moment, the coifN wavelet can effectively decompose the original signal through fewer hierarchical layers, the supporting length is shorter, the filter length is shorter, and the calculation amount of wavelet decomposition is low, so that the signal processing performance can be met, the calculation amount can be reduced, and the online prediction efficiency can be improved.
(2) Least Squares Support Vector Regression (LSSVR)
Least Squares Support Vector Regression (LSSVR) is a modeling method based on a statistical learning theory, and has the characteristics of high training speed and good generalization performance and strong capability of fitting a nonlinear function. The LSSVR is an important branch of support vector machine regression (SVR), is similar to the support vector machine regression, is a solution convex quadratic optimization problem with a globally unique solution, maps an input space to a high-dimensional feature space through nonlinear mapping phi (x), and finds an optimal priority function in the feature space.
LSSVR is SVR deformation algorithm, Suykens converts inequality constraint into equality constraint, functions are converted from error sum into square sum, solving algorithm is converted from convex quadratic optimization problem into solving linear equation set problem, the number of solving variables is reduced from 2n +1 to n +1, and n is the number of training samples, so that LSSVR algorithm is lower in solving difficulty and higher in training speed than SVR. Let the training data set be
Figure RE-GDA0001670433330000061
Input xi∈RdOutput yiE R, then LSSVR can be expressed as:
Figure RE-GDA0001670433330000062
s.t.yi=wTφ(xi)+b+ei,i=l,…,n (5)
where phi (x) is from input space to high-order feature spaceNonlinear mapping; w is a weight vector characterizing the complexity of the model; e ═ e1,e2,…,en]TIs an error vector; gamma epsilon R+Is a regularization parameter.
In order to solve the constraint optimization problem, Lagrange function and dual optimization are introduced, and the problem is converted into an unconstrained optimization problem shown in a solving formula (6).
Figure RE-GDA0001670433330000063
Where α is Lagrangian, for w, b, e respectivelytAnd alphatCalculating partial derivative, making partial derivative zero to eliminate w, etThe following system of equations is obtained:
Figure RE-GDA0001670433330000064
wherein y is [ y ]1,……,yn];α=[α1,……,αn];L=[1,……,1]TIs an n × 1 matrix; i isnIs an n × n identity matrix; kij=κ(xi,xj)=φ(xi)Tφ(xj),i,j=1,……,n;κ(xi,xj) Is a kernel function. And optimizing the kernel function by adopting a genetic algorithm to obtain an optimal result.
According to the algorithm given by Suykens, the LSSVR model prediction function is finally obtained as follows:
Figure RE-GDA0001670433330000065
wherein alpha isiB constants, which are lagrange operators, are obtained by statistical regression of the PM2.5 timing data.
(3) Constructing a nonlinear least squares support vector regression (AMLRC-LSSVR) prediction model based on multi-stage residual correction
The multi-stage residual correction based nonlinear least squares support vector regression (MLRC-LSSVR) prediction model can be described as follows:
training input: training data set (X)train,Ytrain)∈R(n-1)×2Wherein, in the step (A),
Figure RE-GDA0001670433330000071
Figure RE-GDA0001670433330000072
Figure RE-GDA0001670433330000073
is the ith PM2.5 time series data.
And (3) prediction output: predicted concentration of PM2.5 contaminants at time n +1
Figure RE-GDA0001670433330000074
The working principle of the method mainly comprises a model training process and a model prediction process.
The working steps of the model training process are described as follows:
step 1: for X in the training data settrainCarrying out coifN wavelet transformation to obtain m-layer high-dimensional input training matrix X'train={X′train,1,X′train,2,...X′train,n-1},(X′train,iFor the ith PM2.5 time series data
Figure RE-GDA0001670433330000078
A data set after wavelet decomposition) of which,
Figure RE-GDA0001670433330000075
(wherein A, D is a wavelet decomposed component), i ═ 1, 2.. n-1, and an LSSVR model training data set (X'train,Ytrain)∈R(n-1)×(m+2)
Step 2: based on training data set (X'train,Ytrain) Training the LSSVR model by using a simple method with higher search efficiency and 10And performing cross validation, optimally searching key parameters of Gaussian kernel function of LSSVR (least squares support vector regression) and obtaining an LSSVR training final value Y'train
And step 3: calculating LSSVR training final value Y'trainAnd YtrainR between2Coefficient of correlation R2(Y′train,Ytrain);
And 4, step 4: if R is2Coefficient of correlation R2(Y′train,Ytrain) Less than a predetermined value R2Calculating a training residual vector and constructing a residual training data set (X ') according to the correlation coefficient threshold value'train,Ytrain=Ytrain-Ytrain) And repeating Step 2 and Step 3 until the model satisfies R2And (4) constructing an MLRC-LSSVR prediction model by using a correlation coefficient threshold, and realizing online synchronous correction of the prediction residual by additionally k-1 LSSVR residual prediction models, wherein k is the MLRC-LSSVR prediction model level.
The working steps of the model prediction process are described as follows:
step 1: reconstructing a prediction data set X at time npredict={Xtrain,XpredictTherein of
Figure RE-GDA0001670433330000076
To XpredictPerforming coifN wavelet decomposition to obtain a high-dimensional input prediction vector X 'at n time'tredict=(Am,predict,D1,predict,...Dm,predict);
Step 2: inputting a high dimensional prediction vector X'predictInputting the MLRC-LSSVR prediction model to obtain MLRC-LSSVR multi-stage prediction output { Y'predict,RC1,predict,...RCk-1,predictIs thus obtained
Figure RE-GDA0001670433330000077
Wherein, RCj,predictIs the prediction output of the jth LSSVR residual prediction model.
And step 3: linear smoothing and bias correction based on the central limit theory, for residual error (RC)k-1,train,RCk-1,predict) Performing variance estimation to obtain corresponding prediction confidence upper bound YPpredict=Ypredict+RCPk-1,predictWherein, RCPk-1,predictEstimate variance for 97% confidence of k-1 level residuals;
and (3) repeating the model prediction process of the steps 1-3, and realizing the online prediction and the upper confidence limit estimation of the PM2.5 predicted concentration. In addition, with the continuous updating of the PM2.5 concentration time sequence, in order to eliminate the redundancy of long-term historical steady-state bias information, the constructed AMLRC-LSSVR prediction model can be combined with the time sequence interval updating data to repeat the training process periodically, so that the effectiveness of model online prediction is improved.
The method comprises a data processing unit (dividing data into a training data set and a test set), a wavelet decomposition unit, a support vector regression prediction unit (including kernel function optimization, residual calculation, prediction and the like), and the like, provides adjustable parameters of a model (selection of wavelet basis functions, the number of decomposition layers and the like), and adapts to prediction and early warning work of PM2.5 concentrations in different regions by changing the adjustable parameters.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (4)

1. A PM2.5 prediction and early warning method based on a nonlinear theory is characterized by comprising the following steps:
a model training step and a model prediction step;
for PM2.5The concentration time sequence data are divided into two groups which are respectively used as a training time sequence data set and a test time sequence training set;
performing S-level wavelet decomposition on the data of the training time sequence data set, performing time-frequency analysis, expanding one-dimensional information into high-dimensional information, and extracting PM2.5Implicit information of historical data is obtained to obtain a training time sequence index data set;
then constructing a prediction model of nonlinear least square support vector regression (MLRC-LSSVR) based on multi-stage residual correction;
training an MLRC-LSSVR model; the model training step comprises:
step 1: for X in the training data settrainCarrying out coifN wavelet transformation to obtain m-layer high-dimensional input training matrix X'train={X′train,1,X′train,2,...X′train,n-1And (c) the step of (c) in which,
Figure FDA0003392940710000013
constructing an LSSVR model training dataset (X'train,Ytrain)∈R(n-1)×(m+2)
Step 2: based on training data set (X'train,Ytrain) Training an LSSVR model, wherein a Simplex method with higher search efficiency and 10-fold cross validation are adopted in the training process, Gaussian kernel function key parameters of the LSSVR are optimally searched, and an LSSVR training final value Y 'is obtained'train
And step 3: calculating LSSVR training final value Y'trainAnd YtrainR between2Coefficient of correlation R2(Y′train,Ytrain);
And 4, step 4: if R is2Coefficient of correlation R2(Y′train,Ytrain) Less than a predetermined value R2Calculating a training residual vector and constructing a residual training data set (X ') according to the correlation coefficient threshold value'train,Ytrain=Ytrain-Y′train) And repeating the steps 2 and 3 until the model satisfies R2Constructing an MLRC-LSSVR prediction model by a correlation coefficient threshold, and realizing online synchronous correction of prediction residuals through additional k-1 LSSVR residual prediction models, wherein k is the level of the MLRC-LSSVR prediction model;
performing MLRC-LSSVR model prediction aiming at the test time sequence training set, and performing variance analysis on a model prediction result to obtain an upper bound value of a confidence interval as a final prediction result; the working steps of the model prediction process are described as follows:
step 1: reconstructing a prediction data set X at time npredict={Xtrain,XpredictTherein of
Figure FDA0003392940710000011
To XpredictPerforming coifN wavelet decomposition to obtain a high-dimensional input prediction vector X 'at n time'predict=(Am,predict,D1,predict,...Dm,predict);
Step 2: inputting a high dimensional prediction vector X'predictInputting the MLRC-LSSVR prediction model to obtain MLRC-LSSVR multi-stage prediction output { Y'predict,RC1,predict,...RCk-1,predictIs thus obtained
Figure FDA0003392940710000012
Wherein, RCj,predictIs the prediction output of the jth LSSVR residual prediction model;
and step 3: linear smoothing and bias correction based on the central limit theory, for residual error (RC)k-1,train,RCk-1,predict) Performing variance estimation to obtain corresponding prediction confidence upper bound YPpredict=Ypredict+RCPk-1,predictWherein, RCPk-1,predictEstimate variance for 97% confidence of k-1 level residuals;
repeating the model prediction process of the step 1-3 to realize PM2.5Online prediction of predicted concentration and upper confidence limit estimation.
2. The PM2.5 prediction and early warning method based on the nonlinear theory as claimed in claim 1, wherein the adjustable parameters of the prediction model are as follows: the wavelet decomposition layer number s, the regression parameters of the least square support vector machine, including the kernel function parameters and the regularization parameters gamma, can be obtained through optimization by a genetic algorithm.
3. The PM2.5 prediction and early warning method based on the non-linear theory as claimed in claim 1, wherein the model for predicting and early warning based on the multi-stage residual correction of the nonlinear least squares support vector regression MLRC-LSSVR is described as follows:
training input: training data set (X)train,Ytrain)∈R(n-1)×2Wherein, in the step (A),
Figure FDA0003392940710000021
Figure FDA0003392940710000022
and (3) prediction output: time PM of n +12.5Predicted concentration of contaminant
Figure FDA0003392940710000023
4. A PM2.5 prediction and early warning system based on a nonlinear theory is characterized by comprising
A data processing unit for processing PM2.5The concentration time sequence data is divided into a training time sequence data set and a test time sequence training set;
a wavelet decomposition unit for performing S-level wavelet decomposition on the data of the training time sequence data set, performing time-frequency analysis, expanding one-dimensional information into high-dimensional information, and extracting PM2.5Implicit information of historical data is obtained to obtain a training time sequence index data set;
the support vector regression prediction unit is used for constructing a prediction model of nonlinear least square support vector regression (MLRC-LSSVR) based on multi-stage residual correction; training an MLRC-LSSVR model; the model training step comprises:
step 1: for X in the training data settrainCarrying out coifN wavelet transformation to obtain m-layer high-dimensional input training matrix X'train={X′train,1,X′train,2,...X′train,n-1And (c) the step of (c) in which,
Figure FDA0003392940710000024
constructing an LSSVR model training dataset (X'train,Ytrain)∈R(n-1)×(m+2)
Step 2: based on training data set (X'train,Ytrain) Training an LSSVR model, wherein a Simplex method with higher search efficiency and 10-fold cross validation are adopted in the training process, Gaussian kernel function key parameters of the LSSVR are optimally searched, and an LSSVR training final value Y 'is obtained'train
And step 3: calculating LSSVR training final value Y'trainAnd YtrainR between2Coefficient of correlation R2(Y′train,Ytrain);
And 4, step 4: if R is2Coefficient of correlation R2(Y′train,Ytrain) Less than a predetermined value R2Calculating a training residual vector and constructing a residual training data set (X ') according to the correlation coefficient threshold value'train,Ytrain=Ytrain-Y′train) And repeating the steps 2 and 3 until the model satisfies R2Constructing an MLRC-LSSVR prediction model by a correlation coefficient threshold, and realizing online synchronous correction of prediction residuals through additional k-1 LSSVR residual prediction models, wherein k is the level of the MLRC-LSSVR prediction model;
performing MLRC-LSSVR model prediction aiming at the test time sequence training set, and performing variance analysis on a model prediction result to obtain an upper bound value of a confidence interval as a final prediction result; the working steps of the model prediction process are described as follows:
step 1: reconstructing a prediction data set X at time npredict={Xtrain,XpredictTherein of
Figure FDA0003392940710000031
To XpredictPerforming coifN wavelet decomposition to obtain a high-dimensional input prediction vector X 'at n time'predict=(Am,predict,D1,predict,...Dm,predict);
Step 2: inputting a high dimensional prediction vector X'predictInputting the MLRC-LSSVR prediction model to obtain MLRC-LSSVR multi-stage prediction output { Y'predict,RC1,predict,...RCk-1,predictIs thus obtained
Figure FDA0003392940710000032
Wherein, RCj,predictIs the prediction output of the jth LSSVR residual prediction model;
and step 3: linear smoothing and bias correction based on the central limit theory, for residual error (RC)k-1,train,RCk-1,predict) Performing variance estimation to obtain corresponding prediction confidence upper bound YPpredict=Ypredict+RCPk-1,predictWherein, RCPk-1,predictEstimate variance for 97% confidence of k-1 level residuals;
repeating the model prediction process of the step 1-3 to realize PM2.5Online prediction of predicted concentration and upper confidence limit estimation.
CN201810095420.0A 2018-01-31 2018-01-31 PM2.5 prediction and early warning method and system based on nonlinear theory Active CN108491953B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810095420.0A CN108491953B (en) 2018-01-31 2018-01-31 PM2.5 prediction and early warning method and system based on nonlinear theory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810095420.0A CN108491953B (en) 2018-01-31 2018-01-31 PM2.5 prediction and early warning method and system based on nonlinear theory

Publications (2)

Publication Number Publication Date
CN108491953A CN108491953A (en) 2018-09-04
CN108491953B true CN108491953B (en) 2022-02-25

Family

ID=63343976

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810095420.0A Active CN108491953B (en) 2018-01-31 2018-01-31 PM2.5 prediction and early warning method and system based on nonlinear theory

Country Status (1)

Country Link
CN (1) CN108491953B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858696A (en) * 2019-01-30 2019-06-07 山东万钢信息科技有限公司 A kind of city environmental pollution prediction technique and system
DE102019207059A1 (en) * 2019-05-15 2020-11-19 Siemens Aktiengesellschaft Method for validating system parameters of an energy system, method for operating an energy system and energy management system for an energy system
CN110992101A (en) * 2019-12-05 2020-04-10 中国铁道科学研究院集团有限公司电子计算技术研究所 Station advertisement media resource value and income prediction regression method and prediction model
CN111598156A (en) * 2020-05-14 2020-08-28 北京工业大学 PM based on multi-source heterogeneous data fusion2.5Prediction model
CN113532263B (en) * 2021-06-09 2022-09-20 厦门大学 Joint angle prediction method for flexible sensor time sequence performance change

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008278A (en) * 2014-05-14 2014-08-27 昆明理工大学 PM2.5 concentration prediction method based on feature vectors and least square support vector machine
CN105184012A (en) * 2015-09-28 2015-12-23 宁波大学 Method for predicting PM2.5 concentration of regional air
CN106599520A (en) * 2016-12-31 2017-04-26 中国科学技术大学 LSTM-RNN model-based air pollutant concentration forecast method
CN107609718A (en) * 2017-10-18 2018-01-19 仲恺农业工程学院 The Forecasting Methodology and system of dissolved oxygen in a kind of breeding water body

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10726954B2 (en) * 2015-04-22 2020-07-28 Reciprocal Labs Corporation Predictive modeling of respiratory disease risk and events
US11195125B2 (en) * 2016-04-27 2021-12-07 International Business Machines Corporation Pollution prediction

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008278A (en) * 2014-05-14 2014-08-27 昆明理工大学 PM2.5 concentration prediction method based on feature vectors and least square support vector machine
CN105184012A (en) * 2015-09-28 2015-12-23 宁波大学 Method for predicting PM2.5 concentration of regional air
CN106599520A (en) * 2016-12-31 2017-04-26 中国科学技术大学 LSTM-RNN model-based air pollutant concentration forecast method
CN107609718A (en) * 2017-10-18 2018-01-19 仲恺农业工程学院 The Forecasting Methodology and system of dissolved oxygen in a kind of breeding water body

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"基于小波分解与自适应多级残差修正的最小二乘支持向量回归预测模型的PM2.5浓度预测";尹建光 等;《环境科学学报》;20180301;第38卷(第5期);第2090-2098页 *
"基于支持向量回归的PM2.5浓度实时预报";朱亚杰 等;《测绘科学》;20151215;第41卷(第1期);第12-17,22页 *

Also Published As

Publication number Publication date
CN108491953A (en) 2018-09-04

Similar Documents

Publication Publication Date Title
CN108491953B (en) PM2.5 prediction and early warning method and system based on nonlinear theory
Olu-Ajayi et al. Building energy consumption prediction for residential buildings using deep learning and other machine learning techniques
AU2009353036B2 (en) Systems and methods for the quantitative estimate of production-forecast uncertainty
WO2016101628A1 (en) Data processing method and device in data modeling
Abbasi et al. Results uncertainty of support vector machine and hybrid of wavelet transform‐support vector machine models for solid waste generation forecasting
CN106649658B (en) Recommendation system and method for user role non-difference treatment and data sparsity
JP2020194560A (en) Causal relationship analyzing method and electronic device
Zeng et al. Predicting vacant parking space availability: A DWT-Bi-LSTM model
Gramacy et al. Parameter space exploration with Gaussian process trees
Howard et al. Multifidelity deep operator networks
Jemai et al. FBWN: An architecture of fast beta wavelet networks for image classification
CN108595414A (en) Heavy metal-polluted soil enterprise pollution source discrimination based on source remittance space variable reasoning
CN103413038A (en) Vector quantization based long-term intuitionistic fuzzy time series prediction method
CN115407038A (en) Urban water supply pipe network water quality monitoring method based on water quality early warning point site selection
CN116107279A (en) Flow industrial energy consumption multi-objective optimization method based on attention depth neural network
Garai et al. An MRA Based MLR Model for Forecasting Indian Annual Rainfall Using Large Scale Climate Indices
CN109670695B (en) Outlier data mining-based mechanical product machining procedure abnormity parallel detection method
Wang et al. Recognizing groundwater DNAPL contaminant source and aquifer parameters using parallel heuristic search strategy based on Bayesian approach
Chaolong et al. Study of railway track irregularity standard deviation time series based on data mining and linear model
CN105911016A (en) Non-linear modeling method for spectral properties of crude oil
CN116307206A (en) Natural gas flow prediction method based on segmented graph convolution and time attention mechanism
Xing et al. A Decomposition‐Ensemble Approach with Denoising Strategy for PM2. 5 Concentration Forecasting
CN104361409A (en) Irrigation control method and system based on crop drought combined prediction model
Gugnani et al. A deep learning model for air quality forecasting based on 1d convolution and bilstm
Ahmadi et al. Development of Wavelet-Kstar Algorithm Hybrid Model for the Monthly Precipitation Prediction (Case Study: Synoptic Station of Ahvaz)

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 250003 No. 2000, Wang Yue Road, Shizhong District, Ji'nan, Shandong

Patentee after: ELECTRIC POWER RESEARCH INSTITUTE OF STATE GRID SHANDONG ELECTRIC POWER Co.

Patentee after: STATE GRID CORPORATION OF CHINA

Address before: 250003 No. 2000, Wang Yue Road, Shizhong District, Ji'nan, Shandong

Patentee before: ELECTRIC POWER RESEARCH INSTITUTE OF STATE GRID SHANDONG ELECTRIC POWER Co.

Patentee before: State Grid Corporation of China

CP01 Change in the name or title of a patent holder
TR01 Transfer of patent right

Effective date of registration: 20221128

Address after: 250003 No. 2000, Wang Yue Road, Shizhong District, Ji'nan, Shandong

Patentee after: ELECTRIC POWER RESEARCH INSTITUTE OF STATE GRID SHANDONG ELECTRIC POWER Co.

Patentee after: JILIN PROVINCE ELECTRIC POWER RESEARCH INSTITUTE OF JILIN ELECTRIC POWER Co.,Ltd.

Patentee after: STATE GRID CORPORATION OF CHINA

Address before: 250003 No. 2000, Wang Yue Road, Shizhong District, Ji'nan, Shandong

Patentee before: ELECTRIC POWER RESEARCH INSTITUTE OF STATE GRID SHANDONG ELECTRIC POWER Co.

Patentee before: STATE GRID CORPORATION OF CHINA

TR01 Transfer of patent right