CN111340288A - Urban air quality time sequence prediction method considering space-time correlation - Google Patents

Urban air quality time sequence prediction method considering space-time correlation Download PDF

Info

Publication number
CN111340288A
CN111340288A CN202010114790.1A CN202010114790A CN111340288A CN 111340288 A CN111340288 A CN 111340288A CN 202010114790 A CN202010114790 A CN 202010114790A CN 111340288 A CN111340288 A CN 111340288A
Authority
CN
China
Prior art keywords
time
correlation
space
data
air quality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010114790.1A
Other languages
Chinese (zh)
Other versions
CN111340288B (en
Inventor
关庆锋
吕建军
姚尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Mojin Creative Technology Co ltd
Original Assignee
Wuhan Mojin Creative Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Mojin Creative Technology Co ltd filed Critical Wuhan Mojin Creative Technology Co ltd
Priority to CN202010114790.1A priority Critical patent/CN111340288B/en
Publication of CN111340288A publication Critical patent/CN111340288A/en
Application granted granted Critical
Publication of CN111340288B publication Critical patent/CN111340288B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Quality & Reliability (AREA)
  • Human Resources & Organizations (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Development Economics (AREA)
  • Computational Linguistics (AREA)
  • Game Theory and Decision Science (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an urban air quality time sequence prediction method considering space-time correlation, which introduces singular spectrum analysis to predict time sequences of PM2.5 monitoring data and meteorological feature data, designs a space-time correlation cube to adaptively select the first K important space neighborhood site features, superposes the time sequence prediction result and the first K important space neighborhood features to construct a sample feature set, and finally completes fitting of final results under different time scales by using a random forest algorithm. The coupling model provided by the invention can effectively take the space-time correlation among different space sites into account, thereby improving the time sequence prediction effect and stability of a single site in an urban space environment under different time scales and providing a reference basis for urban atmosphere management decision.

Description

Urban air quality time sequence prediction method considering space-time correlation
Technical Field
The invention relates to the field of atmospheric environment management and monitoring, in particular to a time sequence prediction method for urban air quality considering space-time correlation, which is a time sequence prediction method for urban air quality in future time periods with different time scales on the basis of considering space-time correlation.
Background
Air pollution is an important environmental health problem, and air quality pollution caused by haze, dust, inhalable fine particles and the like is not harmful to the healthy living environment of urban residents all the time, and is particularly more influenced by old people, children, pregnant women and other sensitive people. In addition, air pollution causes many more serious environmental problems, such as acid rain, climate change, water pollution, deterioration of ecosystem, and the like. Therefore, in order to better meet the needs of assisting government functional departments in making decisions and guiding public life services, an urgent need exists to provide a method for continuously predicting the urban air quality in the future period based on consideration of the spatial and temporal correlation. The most common methods for predicting the air quality in the prior art are empirical inference methods, parameter statistical models, and the like. The experience inference method is used for summarizing experience and finding trends from meteorological features or historical data of air quality, and therefore prediction and judgment are made on the air quality change trend in the future time period based on subjective guidance and calculation results. This type of process has mainly the following characteristics: the method has the advantages of high calculation speed, simplicity in use, strong applicability in a static environment, low overall prediction precision and difficulty in reacting when the air quality fluctuates greatly. In order to further improve the prediction accuracy, more objective and effective parameter statistical models are widely used, such as classification, clustering, regression, filtering and other methods, and integrated statistical methods based on these models. The method has a simple model structure, can obtain high fitting precision in a local experimental area, and needs a large amount of observation data for training. And for the comprehensive action and the transmission process among different influence factors, even if the parametric statistical model has higher computational efficiency and the capability of finding potential relation among data, the nonlinear change process of the air quality is still difficult to be completely simulated. Meanwhile, the deep learning technology also provides a plurality of new research methods for air pollutant concentration prediction, and typical examples include a Back Propagation Neural Network (BPNN), a Radial Basis Function Neural Network (RBFNN), a Recurrent Neural Network (RNN), and the like. The RNN can dynamically capture timing information included in input sequences of different lengths, but is limited by the problem of gradient disappearance and cannot effectively learn overlong input sequences. The long-time and short-time memory neural network (LSTM) provided on the basis of RNN can effectively make up the defect, and is widely applied to the field of time sequence prediction. However, although deep learning has excellent data mining performance, the model structure and parameter adjustment process are complicated, a large amount of observation data is required for training, and complexity and high computational cost are caused.
Disclosure of Invention
The technical problem to be solved by the invention is to provide an urban air quality time sequence prediction method taking space-time correlation into consideration for overcoming the defects in the existing method, which can fully take space-time correlation among different prediction positions into consideration and continuously predict the air quality of a specific position in an urban space range in a future time period.
The invention is realized by the following steps: the invention provides an urban air quality time sequence prediction method considering space-time correlation, which comprises the following steps:
s1) collecting historical time interval recording data of air quality monitoring stations set in the urban space range, and carrying out data matching on the collected historical time interval recording data to form time sequence recording data of multi-feature variables at different air quality stations;
s2) carrying out data preprocessing on the time sequence recording data acquired in the step S1) to finally form meteorological data with complete time sequence;
s3) inputting the meteorological data of the site to be predicted at the historical moment into a singular spectrum analysis model to obtain predicted data of the site to be predicted; extracting the prediction data of the first K sites with the strongest correlation with the site to be predicted at the moment to be predicted as auxiliary site data by utilizing the constructed space-time correlation cube;
s4) coupling the auxiliary site data extracted through the space-time correlation cube with the site prediction data to be predicted obtained through the singular spectrum analysis model to jointly form an input feature set, putting the input feature set into a random forest model, and predicting through the random forest model to obtain a final prediction result of the site to be predicted at the time to be predicted.
Further, the data matching in step S1) refers to matching the data acquired by the air quality monitoring station according to the acquisition time, the acquisition station, and the category to which the data belongs, normalizing the data, and obtaining the data as the meteorological monitoring data of each station at different times.
Further, the collected historical time period record data of the air quality monitoring stations set in the urban space range comprises PM2.5 monitoring data and meteorological feature data.
Further, the data preprocessing comprises outlier screening, culling, interpolation and missing value filling.
Further, the interpolation method selects IDW inverse distance weight interpolation, and corresponding meteorological information of the air quality monitoring station position at the corresponding moment is obtained through interpolation.
Further, missing value filling is to model the correlation between the meteorological features and the PM2.5 index at the same moment by using a random forest algorithm, and mutual estimation between the meteorological features and the PM2.5 value is realized according to a correlation model between the meteorological features and the PM2.5 value. The missing value filling is predicted according to the correlation between the meteorological features and the PM value, because the missing moment only has the meteorological features (temperature, humidity, pressure, wind speed, wind direction and the like) and the PM value is missing, the correlation between the meteorological features and the PM2.5 is modeled, the relation between the meteorological features and the PM2.5 is obtained by using a random forest, and finally the PM value at the moment is estimated by using the meteorological features at the missing moment.
Further, a space-time correlation cube is constructed according to historical data, the space-time correlation cube is the correlation strength between the site and the site at different moments, and the first K important neighborhood sites with the strongest correlation between the site to be predicted and the site at the position are extracted in a self-adaptive mode through the space-time correlation cube;
the concrete steps of the space-time correlation cube are as follows:
firstly, historical data is utilized to construct a correlation matrix between different sites at different moments in a set time period, and each value of the matrix represents the correlation strength between every two sites;
the correlation strength calculation formula is as follows:
Figure BDA0002391144370000041
wherein Ks represents the strength of correlation of different stations at a certain time, Cov () represents the covariance between variables, S (i, t) and S (j, t) represent the corresponding air quality records of i and j stations at time t,
Figure BDA0002391144370000042
represents the standard deviation of the corresponding variable;
the strength of the correlation between different space stations at different moments is measured by the space-time correlation cube through an autocorrelation coefficient, so that the correlation among all the air quality monitoring stations at all the moments in a set time period is obtained, and the space-time correlation cube is constructed;
determining the priority of the auxiliary station to be introduced according to the time and the position of the predicted time interval in actual prediction through a space-time correlation cube, and extracting the correlation between the stations corresponding to the time from the space-time correlation cube on the assumption that the air quality index of a certain station at the future time needs to be predicted;
and sequencing the neighborhood sites according to the correlation strength of the neighborhood sites, adding the neighborhood sites one by one according to the importance sequence of the neighborhood sites, entering a random forest algorithm for training, and selecting the optimal feature number K according to the model precision.
The model precision is to compare the predicted value with the true value, and the comparison index adopts R2 goodness of fit.
Compared with the prior art, the air quality prediction method designed by the invention has the beneficial effects that:
(1) the invention designs the space-time correlation cube based on the correlation strength, and can fully consider the space-time correlation among different air quality monitoring sites;
(2) meanwhile, the singular spectrum analysis-random forest coupling model designed by the invention can strengthen the prediction capability and effectively improve the precision and stability of the air quality prediction model on the basis of considering both the space-time correlation and the space heterogeneity; the singular spectrum algorithm is used for predicting the air quality in a future period according to historical data, the space-time correlation cube is used for screening the air quality prediction result of a neighborhood site of each site, and the selected space-time correlation cube is used as an auxiliary variable to improve the prediction accuracy of the site.
(3) The invention can provide a practical and reliable scientific method for continuously predicting the air quality of a plurality of time scales in future time periods of specific urban positions, namely different air quality monitoring stations,
drawings
FIG. 1 is an overall method flow diagram of an urban air quality time sequence prediction method taking into account temporal-spatial correlation in accordance with the present invention;
FIG. 2 is a multi-dimensional meteorological feature obtained after data processing is completed by an urban air quality time sequence prediction method considering space-time correlation according to the invention;
FIG. 3 is a singular spectral analysis model application in an urban air quality time sequence prediction method of the present invention taking into account spatio-temporal correlations;
FIG. 4 is a spatiotemporal correlation cube designed by the urban air quality time sequence prediction method considering spatiotemporal correlation according to the present invention, for selection and addition of neighborhood significant features;
FIG. 5 is a diagram illustrating the prediction effect of the city air quality time sequence prediction method considering the space-time correlation in different time scales in an example experiment.
Detailed Description
The invention will be further described with reference to the following figures and examples, which are given by way of illustration only and are not to be construed as limiting the present patent.
The invention provides a method framework shown in figure 1, which introduces a space-time correlation cube to extract space-time information, and designs a singular spectrum analysis and random forest coupling model to accurately fit the air quality in the future stage. The specific implementation mode is as follows:
step 1: an air quality characteristic data set is constructed, for example, a certain city is collected, PM2.5 monitoring data and meteorological characteristic data (including characteristics such as temperature, humidity, wind speed, pressure intensity, wind direction and the like) of the 35 city air quality monitoring stations built before 2018 from 1 month and 1 day of 2017 to 1 month and 1 day of 2018 are collected, the data are matched according to the positions and the collection time of the spatial stations, and time sequence data of different characteristics of the same coordinate and different timestamps are obtained;
further, the step 1 comprises the following steps:
step 1.1: collecting PM2.5 monitoring data and meteorological characteristic data (including temperature, wind speed, pressure, humidity, wind direction and the like) of recorded data in a calendar history period from 1 month 1 day in 2017 to 1 month 1 in 2018 of 35 air quality monitoring sites set in an urban space range;
step 1.2: and carrying out data matching on the acquired time sequence data according to the category, the space station position and the acquisition time of the data to form time sequence recording data of the multi-feature variable at different air quality stations.
The data matching refers to standardizing the data acquired by the air quality monitoring stations according to the acquisition time, the acquisition stations, the types (PM2.5, temperature, humidity, pressure, wind speed and the like), namely the data, wherein the processed data are meteorological monitoring data of each station (1-35 stations) at different moments (1/2017 to 1/2018).
Step 2: the method comprises the steps of cleaning and preprocessing data, preprocessing relevant experimental data (PM2.5 monitoring data and meteorological characteristic data), including the steps of abnormal value screening, removing, spatial interpolation, missing value filling and the like, and finally forming historical record data of a plurality of air quality monitoring stations with complete time sequence;
and (3) carrying out data preprocessing on the matched time sequence recording data, wherein the acquired time sequence data has partial missing of time, and the air quality at the missing time needs to be filled by preprocessing.
Further, the step 2 comprises the following steps:
step 2.1: in the urban space range of the embodiment, the positions of the urban air quality monitoring station and the meteorological feature monitoring station are not overlapped, so that the meteorological features are needed to be interpolated to accurately acquire the corresponding meteorological information of the position of the air quality monitoring station at the corresponding moment, the interpolation method is selected as IDW inverse distance weight interpolation, and the corresponding information can be acquired through interpolation;
step 2.2: for long-time-sequence meteorological feature monitoring data, recording loss is often caused by uncontrollable factors, and a time sequence prediction algorithm requires the integrity of time sequence recording, so that missing value filling is necessary, which is different from a mean value smoothing method adopted in the past research, the method utilizes a random forest algorithm to model the correlation between meteorological features and PM2.5 indexes at the same moment, so that the missing value of PM2.5 can be estimated according to the meteorological feature data at the missing moment, and multivariate feature information obtained after interpolation and missing value filling is shown in FIG. 2;
and step 3: time sequence prediction, predicting the air quality in the future period according to historical data, then putting the multidimensional time sequence formed by data preprocessing into a singular spectrum analysis model, wherein the singular spectrum analysis model is a time sequence prediction algorithm, the model can predict time series data gradually, can obtain a multi-dimensional characteristic prediction value in a future period, and has the specific principle that sliding window scanning is carried out on time series historical data to construct a historical track matrix, decomposing and reconstructing the track matrix, extracting the characteristics representing different components of the time sequence, such as long-term trend signals, periodic signals, noise signals and the like, analyzing and further predicting the important characteristics, this process simplification can be viewed as a multiple regression, with the historical data features as input X features and the future predicted values as output Y variables.
Further, a detailed algorithm process of the singular spectrum analysis is shown in fig. 3, and the specific steps are as follows:
step 3.1: assuming that the number of time sequence features to be predicted is k, each type of feature comprises n time records, and a feature matrix with the size of n x k is formed;
step 3.2: traversing the time sequence data by using a time window with the size of l to form a plurality of characteristic vectors with the length of l to form a characteristic matrix Xlw, embedding the matrixes of each type of characteristics, and superposing to form a characteristic cube with the dimension of l, w, k; the embedding is to superpose feature matrixes Xlw of different types of features to form a feature cube in dimensions of l, w, k, and the dimension of k is to refer to features in different dimensions, such as temperature, humidity, wind speed, pressure and the like.
Step 3.3: calculating the eigenvalue and eigenvector of the feature matrix Xlw of each type of time-series feature, namely calculating each dimension section of the feature cube, and sequencing the eigenvalues from large to small according to the importance of the eigenvalues; the purpose of sorting is to obtain the most important characteristic factor of each dimension (representing different types of characteristics) characteristic, which is the core part of the singular spectrum analysis algorithm principle, and the more important characteristic value is set with a larger weight, and the less important weight is set with a smaller weight, so that the purposes of eliminating interference and improving accuracy are achieved.
Step 3.4: and reconstructing a track matrix, namely superposing the eigenvectors corresponding to each sequence eigenvalue of each type of time sequence characteristic, and reconstructing the track matrix to form a new characteristic matrix.
Step 3.5: the time sequence is decomposed into characteristic vectors such as periods, non-periods, random factors and the like through singular spectrum analysis, and the further deduction of the time sequence process can be completed through combination and calculation of different characteristics.
And 4, step 4: then, extracting an air quality prediction value of a neighborhood space site as auxiliary data by utilizing a space-time correlation cube constructed according to historical data, wherein the space-time correlation cube is the correlation strength between sites at different moments, and the first K important neighborhood sites with the strongest correlation with the site at the position at the moment to be predicted can be extracted in a self-adaptive mode through the model;
further, the concrete steps of the spatio-temporal correlation cube are as follows:
step 4.1: taking the example experiment of the present invention as an example, as shown in fig. 4-a, a space-time correlation cube (with a dimension of 35 x 24) was constructed by measuring the correlation strength between 35 air quality sites at corresponding different times (0:00-23:00) based on historical data. Through the cube, the priority of the auxiliary sites needing to be introduced can be determined according to the predicted time and spatial position in actual prediction, and the specific number of introduced auxiliary features is adaptively determined through prediction accuracy.
The correlation strength calculation formula is as follows:
Figure BDA0002391144370000091
wherein Ks represents the strength of correlation of different stations at a certain time, Cov () represents the covariance between variables, S (i, t) and S (j, t) represent the corresponding air quality records of i and j stations at time t,
Figure BDA0002391144370000092
represents the standard deviation of the corresponding variable;
step 4.2: the space-time correlation cube measures the strength of the correlation between different space stations at different moments through autocorrelation coefficients, so that the correlation among all the air quality monitoring stations at all the moments (24) in one day is obtained, and the space-time correlation cube is constructed;
step 4.3: through the space-time correlation cube, the priority of the auxiliary sites to be introduced can be determined according to the time and the position of the predicted time period in actual prediction, for example, taking 0:00 as an example, assuming that the air quality index of a certain site at the future time needs to be predicted at present, firstly, the correlation between the sites corresponding to the time is extracted from the space-time correlation cube, as shown in fig. 4-B, the strength of the color represents the strength of the correlation, and the stronger the warm color represents the correlation, the weaker the cool color represents the correlation. The magnitude of the correlation represents the influence sequence of other sites on the site to be predicted at the moment, and the sequence of the addition of the characteristics of the neighborhood sites is determined according to the sequence.
Step 4.4: and then, sequencing the neighborhood sites according to the correlation strength of the neighborhood sites, adding the neighborhood sites one by one according to the importance sequence of the neighborhood sites, entering a random forest algorithm for training, and designing a self-adaptive selection mechanism of the Top-K important neighborhood features instead of adding all the sites in similar research when the feature addition of the neighborhood sites is carried out. Specifically, after the spatial site correlation at the corresponding moment is extracted from the space-time correlation cube each time, the method sequences the neighborhood sites according to the correlation strength of the neighborhood sites, starts to add the neighborhood sites one by one according to the importance sequence of the neighborhood sites, enters a random forest algorithm for training, and selects the optimal feature number K according to the model precision. For example, as shown in fig. 4-C, assuming that the K value is selected to be 5 according to the accuracy variation curve, the 5 neighboring sites with the strongest correlation with each site will be retained and added as an assistant feature.
And 5: and (3) coupling the auxiliary station data extracted through the space-time correlation cube with time sequence prediction data obtained by using a singular spectrum analysis model, putting the coupled auxiliary station data into a random forest algorithm for training, constructing a multi-dimensional feature-air quality correlation model, and finishing final air quality fitting of the station at the time to be predicted. Coupling refers to combining two originally unrelated models together to jointly improve the model prediction accuracy. The spatio-temporal correlation cube can help the site to be predicted to select auxiliary site data according to the site correlation strength (measured by a correlation coefficient among different site historical data).
For example, to predict 12 for 1 month 5 days of 2020: the air quality of the A01 station at the time 00 is predicted by inputting the weather data (not all input, a progressive time input window, such as the window size is a parameter which can be adjusted every first 72 hours) of the historical time of the station according to a singular spectrum analysis model (the predicted time window can be adjusted), then extracting the predicted data of the first K stations with the strongest correlation with the station at the time as an auxiliary characteristic by using a space-time correlation cube, selecting the data according to the correlation among the historical data of the stations at the time (12: 00), wherein the selection of K is determined by the precision, such as the highest predicted precision when the first 5 important stations are introduced, and K is kept as 5. after the first K adjacent stations are selected, the data of the auxiliary stations are superposed with the predicted data of the previous stations through singular spectrum analysis, and the feature sets are put into a random forest model, and a final prediction result is obtained by predicting the random forest model.
In an example experiment, in order to prove the multi-scale time window prediction capability of the model of the present invention, the size of the time prediction window is selected to be 1-12 hours progressive, as shown in fig. 5, the prediction effect of different time scales is shown, wherein red is a predicted value and black is an observed value.
In order to further prove the excellent effect of the coupling model of the invention, a plurality of different models are selected in the embodiment to carry out verification comparison on the same data set, different algorithms such as an integrated moving average autoregressive model (ARIMA), a common Singular Spectrum Analysis (SSA), a long-and-short memory neural network (LSTM) and a one-dimensional Convolutional Neural Network (CNN) are selected as the comparison model, and the comparison result is shown in attached table 1.
TABLE 1
Figure BDA0002391144370000111
Figure BDA0002391144370000121
The results show that the model (STSR) shows good prediction performance on different prediction time scales, and the fitting results are higher than those of other algorithms of the same time segment. It should be noted that, on the prediction accuracy at a longer time, other model algorithms may have a great drop in accuracy, while the model has good accuracy stability, which indicates that the model has good reliability and generalization performance.
The method introduces singular spectrum analysis to perform time series prediction of PM2.5 monitoring data and meteorological feature data, designs a space-time correlation cube to adaptively select the first K important space neighborhood site features, overlaps the time series prediction result with the first K important space neighborhood features to construct a sample feature set, and finally completes fitting of final results under different time scales by using a random forest algorithm. The coupling model provided by the invention combines singular spectrum analysis-space-time correlation cube-random forest model together to complete the prediction purpose, and can effectively consider the space-time correlation among different space sites, thereby improving the time sequence prediction effect and stability of a single site in the urban space environment under different time scales, and providing a reference basis for urban atmosphere management decision. The prediction output time window of the invention can be adjusted, and the experiment is taken as an example to predict 1-12 hours in the future.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (7)

1. A city air quality time sequence prediction method considering space-time correlation is characterized by comprising the following steps:
s1) collecting historical time interval recording data of air quality monitoring stations set in the urban space range, and carrying out data matching on the collected historical time interval recording data to form time sequence recording data of multi-feature variables at different air quality stations;
s2) carrying out data preprocessing on the time sequence recording data acquired in the step S1) to finally form meteorological data with complete time sequence;
s3) inputting the meteorological data of the site to be predicted at the historical moment into a singular spectrum analysis model to obtain predicted data of the site to be predicted; extracting the prediction data of the first K sites with the strongest correlation with the site to be predicted at the moment to be predicted as auxiliary site data by utilizing the constructed space-time correlation cube;
s4) coupling the auxiliary site data extracted through the space-time correlation cube with the site prediction data to be predicted obtained through the singular spectrum analysis model to jointly form an input feature set, putting the input feature set into a random forest model, and predicting through the random forest model to obtain a final prediction result of the site to be predicted at the time to be predicted.
2. The urban air quality time sequence prediction method considering space-time correlation according to claim 1, characterized in that: the data matching in the step S1) refers to matching the data acquired by the air quality monitoring stations according to the acquisition time, the acquisition stations and the types of the stations, normalizing the data, and obtaining the data which are the meteorological monitoring data of each station at different moments.
3. The urban air quality time sequence prediction method considering space-time correlation according to claim 1, characterized in that: the collected historical time period record data of the air quality monitoring station set in the urban space range comprises PM2.5 monitoring data and meteorological characteristic data.
4. The urban air quality time sequence prediction method considering space-time correlation according to claim 1, characterized in that: the data preprocessing comprises abnormal value screening, removing, interpolating and missing value filling.
5. The urban air quality time sequence prediction method considering space-time correlation according to claim 4, characterized in that: the interpolation method selects IDW inverse distance weight interpolation, and acquires corresponding meteorological information at the position of the air quality monitoring station at the corresponding moment through interpolation.
6. The urban air quality time sequence prediction method considering space-time correlation according to claim 4, characterized in that: and missing value filling is to model the correlation between the meteorological features and the PM2.5 index at the same moment by using a random forest algorithm, and realize mutual conjecture between the meteorological features and the PM2.5 value according to a correlation model between the meteorological features and the PM2.5 value.
7. The urban air quality time sequence prediction method considering space-time correlation according to claim 1, characterized in that: constructing a space-time correlation cube according to historical data, wherein the space-time correlation cube is the correlation strength between sites and sites at different moments, and extracting the first K important neighborhood sites with the strongest correlation between the site to be predicted and the site at the position in a self-adaptive manner through the space-time correlation cube;
the concrete steps of the space-time correlation cube are as follows:
firstly, historical data is utilized to construct a correlation matrix between different sites at different moments in a set time period, and each value of the matrix represents the correlation strength between every two sites;
the correlation strength calculation formula is as follows:
Figure FDA0002391144360000021
wherein Ks represents the strength of correlation of different stations at a certain time, Cov () represents the covariance between variables, S (i, t) and S (j, t) represent the corresponding air quality records of i and j stations at time t,
Figure FDA0002391144360000022
represents the standard deviation of the corresponding variable;
the strength of the correlation between different space stations at different moments is measured by the space-time correlation cube through an autocorrelation coefficient, so that the correlation among all the air quality monitoring stations at all the moments in a set time period is obtained, and the space-time correlation cube is constructed;
determining the priority of the auxiliary station to be introduced according to the time and the position of the predicted time interval in actual prediction through a space-time correlation cube, and extracting the correlation between the stations corresponding to the time from the space-time correlation cube on the assumption that the air quality index of a certain station at the future time needs to be predicted;
and sequencing the neighborhood sites according to the correlation strength of the neighborhood sites, adding the neighborhood sites one by one according to the importance sequence of the neighborhood sites, entering a random forest algorithm for training, and selecting the optimal feature number K according to the model precision.
CN202010114790.1A 2020-02-25 2020-02-25 Urban air quality time sequence prediction method considering time-space correlation Active CN111340288B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010114790.1A CN111340288B (en) 2020-02-25 2020-02-25 Urban air quality time sequence prediction method considering time-space correlation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010114790.1A CN111340288B (en) 2020-02-25 2020-02-25 Urban air quality time sequence prediction method considering time-space correlation

Publications (2)

Publication Number Publication Date
CN111340288A true CN111340288A (en) 2020-06-26
CN111340288B CN111340288B (en) 2024-04-05

Family

ID=71187079

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010114790.1A Active CN111340288B (en) 2020-02-25 2020-02-25 Urban air quality time sequence prediction method considering time-space correlation

Country Status (1)

Country Link
CN (1) CN111340288B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183625A (en) * 2020-09-28 2021-01-05 武汉大学 PM based on deep learning2.5High-precision time-space prediction method
CN112801423A (en) * 2021-03-29 2021-05-14 北京英视睿达科技有限公司 Method and device for identifying abnormity of air quality monitoring data and storage medium
CN113077097A (en) * 2021-04-14 2021-07-06 江南大学 Air quality prediction method based on deep space-time similarity
CN113077357A (en) * 2021-03-29 2021-07-06 国网湖南省电力有限公司 Power time sequence data abnormity detection method and filling method thereof
CN113610243A (en) * 2021-08-12 2021-11-05 中节能天融科技有限公司 Atmospheric pollutant tracing method based on coupled machine learning and correlation analysis
CN113610286A (en) * 2021-07-27 2021-11-05 中国地质大学(武汉) PM accounting for spatio-temporal correlations and meteorological factors2.5Concentration prediction method and device
US20220316734A1 (en) * 2021-04-14 2022-10-06 Jiangnan University Deep Spatial-Temporal Similarity Method for Air Quality Prediction
CN117332906A (en) * 2023-12-01 2024-01-02 山东大学 Machine learning-based three-dimensional space-time grid air quality prediction method and system
CN117540193A (en) * 2024-01-10 2024-02-09 飞特质科(北京)计量检测技术有限公司 Extraction method of characteristic data of fan conduction interference test curve

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160125307A1 (en) * 2013-06-05 2016-05-05 Yu Zheng Air quality inference using multiple data sources
CN106407633A (en) * 2015-07-30 2017-02-15 中国科学院遥感与数字地球研究所 Method and system for estimating ground PM2.5 based on space-time regression Kriging model
CN106447072A (en) * 2016-08-01 2017-02-22 中国卫星海上测控部 Explicit genetic algorithm and singular spectrum analysis-based meteorological and hydrological element forecast method
CN106971547A (en) * 2017-05-18 2017-07-21 福州大学 A kind of Short-time Traffic Flow Forecasting Methods for considering temporal correlation
CN107133398A (en) * 2017-04-28 2017-09-05 河海大学 A kind of river ethic Forecasting Methodology based on complex network
CN107423861A (en) * 2017-08-09 2017-12-01 北京工业大学 Air Quality Forecast method based on iterative learning
CN107563565A (en) * 2017-09-14 2018-01-09 广西大学 A kind of short-term photovoltaic for considering Meteorology Factor Change decomposes Forecasting Methodology
CN108053071A (en) * 2017-12-21 2018-05-18 宇星科技发展(深圳)有限公司 Regional air pollutant concentration Forecasting Methodology, terminal and readable storage medium storing program for executing
CN108701274A (en) * 2017-05-24 2018-10-23 北京质享科技有限公司 A kind of small scale air quality index prediction technique in city and system
CN109492822A (en) * 2018-11-24 2019-03-19 上海师范大学 Air pollutant concentration time-space domain interaction prediction method
CN109902863A (en) * 2019-02-15 2019-06-18 浙江财经大学 A kind of wind speed forecasting method and device based on multifactor temporal correlation
CN110210681A (en) * 2019-06-11 2019-09-06 西安电子科技大学 A kind of prediction technique of the monitoring station PM2.5 value based on distance
CN110598953A (en) * 2019-09-23 2019-12-20 哈尔滨工程大学 Space-time correlation air quality prediction method
CN110610258A (en) * 2019-08-20 2019-12-24 中国地质大学(武汉) Urban air quality refined estimation method and device fusing multi-source space-time data

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160125307A1 (en) * 2013-06-05 2016-05-05 Yu Zheng Air quality inference using multiple data sources
CN106407633A (en) * 2015-07-30 2017-02-15 中国科学院遥感与数字地球研究所 Method and system for estimating ground PM2.5 based on space-time regression Kriging model
CN106447072A (en) * 2016-08-01 2017-02-22 中国卫星海上测控部 Explicit genetic algorithm and singular spectrum analysis-based meteorological and hydrological element forecast method
CN107133398A (en) * 2017-04-28 2017-09-05 河海大学 A kind of river ethic Forecasting Methodology based on complex network
CN106971547A (en) * 2017-05-18 2017-07-21 福州大学 A kind of Short-time Traffic Flow Forecasting Methods for considering temporal correlation
CN108701274A (en) * 2017-05-24 2018-10-23 北京质享科技有限公司 A kind of small scale air quality index prediction technique in city and system
CN107423861A (en) * 2017-08-09 2017-12-01 北京工业大学 Air Quality Forecast method based on iterative learning
CN107563565A (en) * 2017-09-14 2018-01-09 广西大学 A kind of short-term photovoltaic for considering Meteorology Factor Change decomposes Forecasting Methodology
CN108053071A (en) * 2017-12-21 2018-05-18 宇星科技发展(深圳)有限公司 Regional air pollutant concentration Forecasting Methodology, terminal and readable storage medium storing program for executing
CN109492822A (en) * 2018-11-24 2019-03-19 上海师范大学 Air pollutant concentration time-space domain interaction prediction method
CN109902863A (en) * 2019-02-15 2019-06-18 浙江财经大学 A kind of wind speed forecasting method and device based on multifactor temporal correlation
CN110210681A (en) * 2019-06-11 2019-09-06 西安电子科技大学 A kind of prediction technique of the monitoring station PM2.5 value based on distance
CN110610258A (en) * 2019-08-20 2019-12-24 中国地质大学(武汉) Urban air quality refined estimation method and device fusing multi-source space-time data
CN110598953A (en) * 2019-09-23 2019-12-20 哈尔滨工程大学 Space-time correlation air quality prediction method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GAO, Y 等: "Prediction of vertical PM2.5 concentrations alongside an elevated expressway by using the neural network hybrid model and generalized additive model", 《FRONTIERS OF EARTH SCIENCE》, vol. 11, no. 2, 23 June 2017 (2017-06-23) *
张怡文;敖希琴;时培俊;郭傲东;费久龙;陈家丽;: "基于Pearson相关指标的BP神经网络PM2.5预测模型", 青岛大学学报(自然科学版), no. 02, 15 May 2017 (2017-05-15) *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183625A (en) * 2020-09-28 2021-01-05 武汉大学 PM based on deep learning2.5High-precision time-space prediction method
CN113077357B (en) * 2021-03-29 2023-11-28 国网湖南省电力有限公司 Power time sequence data anomaly detection method and filling method thereof
CN112801423A (en) * 2021-03-29 2021-05-14 北京英视睿达科技有限公司 Method and device for identifying abnormity of air quality monitoring data and storage medium
CN113077357A (en) * 2021-03-29 2021-07-06 国网湖南省电力有限公司 Power time sequence data abnormity detection method and filling method thereof
CN112801423B (en) * 2021-03-29 2021-07-20 北京英视睿达科技有限公司 Method and device for identifying abnormity of air quality monitoring data and storage medium
US20220316734A1 (en) * 2021-04-14 2022-10-06 Jiangnan University Deep Spatial-Temporal Similarity Method for Air Quality Prediction
CN113077097A (en) * 2021-04-14 2021-07-06 江南大学 Air quality prediction method based on deep space-time similarity
WO2022217839A1 (en) * 2021-04-14 2022-10-20 江南大学 Air quality prediction method based on deep spatiotemporal similarity
CN113077097B (en) * 2021-04-14 2023-08-25 江南大学 Air quality prediction method based on depth space-time similarity
US11512864B2 (en) * 2021-04-14 2022-11-29 Jiangnan University Deep spatial-temporal similarity method for air quality prediction
CN113610286A (en) * 2021-07-27 2021-11-05 中国地质大学(武汉) PM accounting for spatio-temporal correlations and meteorological factors2.5Concentration prediction method and device
CN113610286B (en) * 2021-07-27 2024-03-29 中国地质大学(武汉) PM taking into account space-time correlation and meteorological factors 2.5 Concentration prediction method and device
CN113610243A (en) * 2021-08-12 2021-11-05 中节能天融科技有限公司 Atmospheric pollutant tracing method based on coupled machine learning and correlation analysis
CN113610243B (en) * 2021-08-12 2023-10-13 中节能天融科技有限公司 Atmospheric pollutant tracing method based on coupled machine learning and correlation analysis
CN117332906A (en) * 2023-12-01 2024-01-02 山东大学 Machine learning-based three-dimensional space-time grid air quality prediction method and system
CN117332906B (en) * 2023-12-01 2024-03-15 山东大学 Machine learning-based three-dimensional space-time grid air quality prediction method and system
CN117540193A (en) * 2024-01-10 2024-02-09 飞特质科(北京)计量检测技术有限公司 Extraction method of characteristic data of fan conduction interference test curve

Also Published As

Publication number Publication date
CN111340288B (en) 2024-04-05

Similar Documents

Publication Publication Date Title
CN111340288A (en) Urban air quality time sequence prediction method considering space-time correlation
CN109508360B (en) Geographical multivariate stream data space-time autocorrelation analysis method based on cellular automaton
CN110852515B (en) Water quality index prediction method based on mixed long-time and short-time memory neural network
CN112949828B (en) Graph convolution neural network traffic prediction method and system based on graph learning
CN104036360B (en) User data processing system and processing method based on magcard attendance behaviors
CN112785066B (en) Global wild fire season space-time prediction method based on convolution-recurrent neural network
CN105678428A (en) Criminal suspicion probability prediction method and system
CN108600965B (en) Passenger flow data prediction method based on guest position information
CN110909928B (en) Energy load short-term prediction method and device, computer equipment and storage medium
CN111079999A (en) Flood disaster susceptibility prediction method based on CNN and SVM
CN112232543A (en) Multi-site prediction method based on graph convolution network
CN116128141B (en) Storm surge prediction method and device, storage medium and electronic equipment
CN112285376A (en) Wind speed prediction method based on CNN-LSTM
CN110533100A (en) A method of CME detection and tracking is carried out based on machine learning
CN113516304A (en) Space-time joint prediction method and device for regional pollutants based on space-time graph network
CN115099450A (en) Family carbon emission monitoring and accounting platform based on fusion model
CN116796168A (en) CNN-BiLSTM high-altitude multi-factor power transmission line audible noise prediction method based on multi-head attention mechanism
CN116681176A (en) Traffic flow prediction method based on clustering and heterogeneous graph neural network
CN114169364A (en) Electroencephalogram emotion recognition method based on space-time diagram model
CN116525135B (en) Method for predicting epidemic situation development situation by space-time model based on meteorological factors
US20240029556A1 (en) Short-term traffic flow prediction method based on causal gated-low-pass graph convolutional network
CN117194954A (en) Mixed deep learning water quality prediction method based on time-frequency feature extraction
Noor et al. Prediction map of rainfall classification using random forest and inverse distance weighted (IDW)
CN115796361A (en) Wind speed interval prediction method and device for ground stage of overhead line engineering
CN113616209B (en) Method for screening schizophrenic patients based on space-time attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant