CN111340288B - Urban air quality time sequence prediction method considering time-space correlation - Google Patents

Urban air quality time sequence prediction method considering time-space correlation Download PDF

Info

Publication number
CN111340288B
CN111340288B CN202010114790.1A CN202010114790A CN111340288B CN 111340288 B CN111340288 B CN 111340288B CN 202010114790 A CN202010114790 A CN 202010114790A CN 111340288 B CN111340288 B CN 111340288B
Authority
CN
China
Prior art keywords
time
correlation
data
space
air quality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010114790.1A
Other languages
Chinese (zh)
Other versions
CN111340288A (en
Inventor
关庆锋
吕建军
姚尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Mojin Creative Technology Co ltd
Original Assignee
Wuhan Mojin Creative Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Mojin Creative Technology Co ltd filed Critical Wuhan Mojin Creative Technology Co ltd
Priority to CN202010114790.1A priority Critical patent/CN111340288B/en
Publication of CN111340288A publication Critical patent/CN111340288A/en
Application granted granted Critical
Publication of CN111340288B publication Critical patent/CN111340288B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Abstract

The invention discloses a time sequence prediction method of urban air quality taking time-space correlation into consideration, which introduces singular spectrum analysis to predict the time sequence of PM2.5 monitoring data and meteorological characteristic data, designs a time-space correlation cube to adaptively select first K important space neighborhood site characteristics, superimposes a time sequence prediction result and the first K important space neighborhood characteristics to construct a sample characteristic set, and finally completes fitting of final results under different time scales by utilizing a random forest algorithm. The coupling model provided by the invention can effectively consider the space-time correlation between different space stations, thereby improving the time sequence prediction effect and stability degree of a single station in the urban space environment under different time scales and providing a reference basis for urban atmospheric management decision.

Description

Urban air quality time sequence prediction method considering time-space correlation
Technical Field
The invention relates to the field of atmospheric environment management and monitoring, in particular to a time sequence prediction method for urban air quality, which takes time-space correlation into consideration.
Background
Air pollution is an important environmental health problem, and air quality pollution caused by haze, dust, inhalable fine particles and the like is not harmful to the healthy living environment of urban residents at any time, and particularly has more influence on the old, children, pregnant women and other sensitive people. In addition, air pollution can cause a plurality of more serious environmental problems such as acid rain, climate change, water resource pollution, ecological system deterioration and the like. Thus, in order to better meet the needs of assisting government functional sector decisions and guiding public life services, there is an urgent need to propose a method for continuous prediction of urban air quality future period based on consideration of space-time correlation. The most common of the traditional air quality prediction methods are an empirical inference method, a parameter statistical model and the like. The experience deduction method is to summarize experience and discovery trend from meteorological features or air quality historical record data, so that the air quality change trend of a future period is predicted and judged based on subjective guidance and calculation results. The method mainly has the following characteristics: the method has the advantages of high calculation speed, simple use and strong applicability in static environment, but the overall prediction accuracy is low, and the method is difficult to react when the air quality fluctuates greatly. In order to further improve the prediction accuracy, more objective and effective parameter statistical models are widely applied, such as classification, clustering, regression, filtering and other methods, and integrated statistical methods based on the models. The model structure of the method is simpler, and the method can obtain higher fitting precision in local experimental areas, but a large amount of observation data is needed for training. And for the comprehensive action and transmission process among different influencing factors, even if the parameter statistical model has higher calculation efficiency and the capability of finding potential relations among data, the nonlinear change process of the air quality is still difficult to completely simulate. Meanwhile, deep learning techniques have also provided many new research methods for air pollutant concentration prediction, and typical examples include Back Propagation Neural Networks (BPNNs), radial Basis Function Neural Networks (RBFNNs), cyclic neural networks, and the like (RNNs). The RNN can dynamically capture timing information contained in input sequences of different lengths, but is limited by the problem that gradient vanishes, so that an input sequence with too long length cannot be effectively learned. The long-short-time memory neural network (LSTM) provided on the basis of RNN can effectively overcome the defect, and is widely applied in the field of time sequence prediction. However, although the deep learning has excellent data mining performance, the model structure and the parameter tuning process are too complex, and a large amount of observation data is required for training, which also results in complex and high calculation cost.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a time sequence prediction method for urban air quality, which aims at the defects in the existing method, and can fully consider the time-space correlation among different prediction positions and continuously predict the air quality of a specific position in the urban space range in future time periods.
The invention is realized in the following way: the invention provides a city air quality time sequence prediction method considering space-time correlation, which comprises the following steps:
s1) acquiring historical time period record data of air quality monitoring stations set up in a city space range, and performing data matching on the acquired historical time period record data to form time sequence record data of multi-characteristic variables at different air quality stations;
s2) carrying out data preprocessing on the time sequence record data acquired in the step S1), and finally forming weather data with complete time sequence;
s3) inputting meteorological data of historical moments of the stations to be predicted into a singular spectrum analysis model to obtain predicted data of the stations to be predicted; extracting predicted data of the first K sites with the strongest correlation between the moment to be predicted and the site to be predicted by using the constructed space-time correlation cube as auxiliary site data;
s4) coupling auxiliary site data extracted through a space-time correlation cube and site prediction data to be predicted obtained by using a singular spectrum analysis model to form an input feature set, putting the input feature set into a random forest model, and predicting by the random forest model to obtain a final prediction result of the site to be predicted at the moment to be predicted.
Further, the data matching in step S1) refers to matching the data collected by the air quality monitoring station according to the collection time, the collection station and the category to which the data belong, and the data obtained is weather monitoring data of each station at different moments.
Further, the collected historical period record data of the air quality monitoring station established in the urban space range comprises PM2.5 monitoring data and meteorological characteristic data.
Further, the data preprocessing comprises outlier screening, eliminating, interpolation and missing value filling.
Further, the interpolation method selects IDW inverse distance weight interpolation, and corresponding weather information of the air quality monitoring site position at the corresponding moment is obtained through interpolation.
Further, the missing value filling is to model the correlation between the meteorological features and the PM2.5 index at the same moment by using a random forest algorithm, and the mutual speculation between the meteorological features and the PM2.5 value is realized according to a correlation model between the meteorological features and the PM2.5 value. The missing value filling is predicted according to the correlation between the meteorological features and the PM value, and the missing moment is only the meteorological features (temperature, humidity, pressure, wind speed, wind direction and the like) and the PM value is missing, so the invention models the correlation of the meteorological features-PM 2.5, acquires the relation between the meteorological features and the PM value by using random forests, and finally estimates the PM value at the moment by using the meteorological features at the missing moment.
Further, constructing a space-time correlation cube according to historical data, wherein the space-time correlation cube is the correlation strength between stations at different moments, and extracting the first K important neighborhood stations with the strongest correlations between the moment to be predicted and the station at the position through the space-time correlation cube in a self-adaptive manner;
the specific steps of the space-time correlation cube are as follows:
firstly, constructing correlation matrixes among different sites at different moments in a set time period by utilizing historical data, wherein each value of the matrixes represents correlation strength among the sites;
the calculation formula of the correlation strength is as follows:
where Ks represents the correlation strength of different stations at a certain moment, cov () represents the covariance between variables, S (i, t) and S (j, t) represent the air quality records corresponding to stations i and j at a moment t,represents standard deviation of the corresponding variable;
the space-time correlation cube measures the correlation strength between different space stations at different moments through the autocorrelation coefficients, so that the correlation between all air quality monitoring stations at all moments in a set time period is obtained, and the space-time correlation cube is constructed according to the correlation;
determining the priority of auxiliary stations to be introduced according to the time and the position of a predicted period in actual prediction by a space-time correlation cube, and extracting the correlation between stations corresponding to the time from the space-time correlation cube on the assumption that the air quality index of a certain station in the future at the time is required to be predicted;
and sequencing the neighborhood stations according to the correlation strength of the neighborhood stations, starting to add the neighborhood stations one by one according to the importance sequence of the neighborhood stations, entering a random forest algorithm for training, and selecting the optimal feature number K according to model precision.
And the model precision is to compare the predicted value with the true value, and the comparison index adopts R2 fitting goodness.
Compared with the prior art, the air quality prediction method has the beneficial effects that:
(1) The invention designs a space-time correlation cube based on correlation strength, which can fully consider the space-time correlation among different air quality monitoring stations;
(2) Meanwhile, the singular spectrum analysis-random forest coupling model designed by the invention can strengthen the prediction capability, and can effectively improve the precision and stability of the air quality prediction model on the basis of considering the space-time correlation and the space heterogeneity; the singular spectrum algorithm predicts the air quality of the future period according to the historical data, and the space-time correlation cube performs screening on the air quality prediction result of the neighborhood site of each site, and the selected air quality prediction result is used as an auxiliary variable to improve the prediction accuracy of the site.
(3) The invention can provide a practical and reliable scientific method for continuous prediction of future time periods of air quality in different air quality monitoring stations at specific positions of cities,
drawings
FIG. 1 is an overall method flow diagram of a method for urban air quality timing prediction in view of temporal-spatial correlation in accordance with the present invention;
FIG. 2 is a diagram of a multi-dimensional weather feature obtained after the completion of data processing by a method for urban air quality timing prediction in consideration of time-space correlation in accordance with the present invention;
FIG. 3 is a singular spectrum analysis model application in a time sequence prediction method of urban air quality in consideration of time-space correlation in the invention;
FIG. 4 is a view of a spatio-temporal correlation cube designed by the method for urban air quality timing prediction in consideration of spatio-temporal correlation, used for selection and addition of important features of a neighborhood;
FIG. 5 shows the prediction effect of the urban air quality time sequence prediction method considering time-space correlation in different time scales in an example experiment.
Detailed Description
The invention will now be further described with reference to the accompanying drawings and examples, which are given by way of illustration only and are not to be construed as limiting the present patent.
The invention provides a method framework shown in figure 1, which introduces a space-time correlation cube to extract space-time information, and designs singular spectrum analysis and a random forest coupling model to accurately fit air quality in a future stage. The specific implementation mode is as follows:
step 1: constructing an air quality characteristic data set, taking a city as an example, acquiring PM2.5 monitoring data and meteorological characteristic data (comprising characteristics of temperature, humidity, wind speed, pressure intensity, wind direction and the like) recorded by 2017 1 month 1 day to 2018 1 month 1 day of 35 city air quality monitoring stations constructed before 2018 of the city, matching the data according to the space station positions and acquisition time of the data, and obtaining time sequence data of different characteristics of the same coordinate and different time stamps;
further, the step 1 includes the following steps:
step 1.1: collecting PM2.5 monitoring data and meteorological characteristic data (comprising temperature, wind speed, pressure, humidity, wind direction and the like) of calendar history period recording data of 35 air quality monitoring sites from 2017 1 month 1 day to 2018 1 month 1 day;
step 1.2: and carrying out data matching on the acquired time sequence data according to the category, the space station position and the acquisition time of the acquired time sequence data to form time sequence record data of the multi-characteristic variable at stations with different air quality.
The data matching here refers to that the data collected by the air quality monitoring station is normalized according to the collection time, collection station and category (PM 2.5, temperature, humidity, pressure, wind speed, etc.), that is, the data after processing should be weather monitoring data of each station (1-35 stations) at different moments (2017, 1 month, 1 day, 1 month, 1 hour, 1 day, 2018).
Step 2: data cleaning and preprocessing, namely performing data preprocessing on related experimental data (PM 2.5 monitoring data and meteorological characteristic data), wherein the data preprocessing comprises the steps of outlier screening, removing, spatial interpolation, filling of missing values and the like, and finally forming historical record data of a plurality of air quality monitoring stations with complete time sequence;
and carrying out data preprocessing on the matched time sequence record data, wherein the acquired time sequence data has partial time loss, and the air quality at the time of the loss needs to be filled by the preprocessing.
Further, the step 2 includes the following steps:
step 2.1: because the positions of the urban air quality monitoring stations and the weather characteristic monitoring stations are not coincident in the urban space range of the embodiment, interpolation is needed to be carried out on the weather characteristics to accurately acquire the corresponding weather information of the positions of the air quality monitoring stations at the corresponding moment, the interpolation method is selected as IDW inverse distance weight interpolation, and the corresponding information can be acquired through interpolation;
step 2.2: for long-time-sequence meteorological feature monitoring data, record deletion is often caused by uncontrollable factors, and a time sequence prediction algorithm requires the integrity of time sequence records, so that missing value filling is necessary, unlike a mean value smoothing method adopted in the prior study, the method models the correlation between the meteorological features at the same moment and PM2.5 indexes by using a random forest algorithm, so that missing values of PM2.5 can be estimated according to the meteorological feature data at the missing moment, and the multivariate feature information obtained after interpolation and missing value filling is shown in fig. 2;
step 3: the method comprises the following steps of time sequence prediction, namely predicting air quality of a future period according to historical data, then placing a multidimensional time sequence formed by data preprocessing into a singular spectrum analysis model, wherein the singular spectrum analysis model is a time sequence prediction algorithm, the time sequence data can be predicted progressively, multidimensional feature predicted values in the future period can be obtained through the model, the specific principle is that sliding window scanning is carried out on the historical data of the time sequence, a historical track matrix is constructed, the track matrix is decomposed and reconstructed, features representing different components of the time sequence, such as long-term trend signals, periodic signals, noise signals and the like, can be simplified into multiple regression through analyzing and further predicting the important features, the historical data features are taken as input X features, and the future predicted values are output Y variables.
Further, as shown in fig. 3, the detailed algorithm process of the singular spectrum analysis comprises the following specific steps:
step 3.1: assuming k time sequence features to be predicted, wherein each type of features comprises n time records, and a feature matrix with the size of n is formed;
step 3.2: traversing the time sequence data by utilizing a time window with the size of l to form a plurality of characteristic vectors with the length of l, forming a characteristic matrix Xlw, embedding the matrixes of each type of characteristic, and superposing the matrixes to form a characteristic cube with the dimension of l, w and k; embedding refers to stacking feature matrices Xlw of different types of features to form a i-w-k-dimensional feature cube, wherein k-dimensional features refer to features of different dimensions such as temperature, humidity, wind speed, pressure and the like.
Step 3.3: calculating the eigenvalues and eigenvectors of the eigenvmatrix Xlw of each class of time sequence features, namely calculating each dimension section of the eigenvector, and sequencing the eigenvalues from large to small according to the importance of the eigenvalues; the sorting purpose is to obtain the most important characteristic factors of the characteristics of each dimension (representing the characteristics of different classes), which is the core part of the principle of the singular spectrum analysis algorithm, and the purpose is that the more important characteristic values are set with larger weight and the less important ones are set with smaller weight, so that the purposes of eliminating interference and improving accuracy are achieved.
Step 3.4: reconstructing a track matrix, superposing feature vectors corresponding to each order feature value of each type of time sequence feature, and reconstructing the track matrix to form a new feature matrix.
Step 3.5: the time sequence is decomposed into characteristic vectors such as periodic, non-periodic and random factors through singular spectrum analysis, and further deduction of the time sequence process can be completed through combination and calculation of different characteristics.
Step 4: then, extracting air quality predicted values of the neighborhood space stations by using a space-time correlation cube constructed according to historical data as auxiliary data, wherein the space-time correlation cube is correlation strength between stations at different moments, and the model can be used for adaptively extracting first K important neighborhood stations with strongest correlations between the moment to be predicted and the station at the position;
further, the specific steps of the space-time correlation cube are as follows:
step 4.1: taking the example experiment of the present invention as an example, as shown in fig. 4-a, a space-time correlation cube (with a dimension of 35×35×24) is constructed by measuring correlation strengths between 35 air quality stations at corresponding different moments (0:00-23:00) according to historical data. By means of the cube, in actual prediction, the priority of auxiliary stations needing to be introduced can be determined according to the time and the space position of prediction, and the number of the auxiliary features to be introduced is determined adaptively through prediction accuracy.
The calculation formula of the correlation strength is as follows:
where Ks represents the correlation strength of different stations at a certain moment, cov () represents the covariance between variables, S (i, t) and S (j, t) represent the air quality records corresponding to stations i and j at a moment t,represents standard deviation of the corresponding variable;
step 4.2: the space-time correlation cube measures the correlation strength between different space stations at different moments through the autocorrelation coefficients, so that the correlation between all air quality monitoring stations at all moments (24) in a day is obtained, and the space-time correlation cube is constructed according to the correlation;
step 4.3: by means of the space-time correlation cube, in actual prediction, the priority of auxiliary stations to be introduced can be determined according to the time and the position of a predicted period, for example, taking 0:00 as an example, assuming that the air quality index of a certain station at the future time is required to be predicted, firstly, the correlation between stations corresponding to the time is extracted from the space-time correlation cube, as shown in fig. 4-B, the intensity of color represents the intensity of the correlation, the intensity of the warm color represents the intensity of the correlation, and the intensity of the cold color represents the intensity of the correlation. The magnitude of the correlation represents the influence sequence of other stations on the stations to be predicted at the moment, and the sequence of the neighbor station characteristic addition is determined according to the sequence.
Step 4.4: then, sorting the neighborhood sites according to the correlation strength of the neighborhood sites, starting to add and enter a random forest algorithm one by one according to the importance order of the neighborhood sites for training, and when the neighborhood site features are added, not adding all the sites into similar research, but designing a self-adaptive selection mechanism of Top-K important neighborhood features. Specifically, after the space site correlation at the corresponding moment is extracted from the space-time correlation cube each time, the method sorts the neighborhood sites according to the correlation strength of the neighborhood sites, starts to be added one by one according to the importance sequence of the neighborhood sites, enters a random forest algorithm for training, and selects the optimal feature number K according to model precision. For example, as shown in fig. 4-C, assuming that the K value is selected to be 5 according to the accuracy profile, the 5 neighborhood stations that are most relevant to each station will be retained and added as assist features.
Step 5: coupling auxiliary site data extracted through the space-time correlation cube with time sequence prediction data obtained by utilizing a singular spectrum analysis model, putting the coupled auxiliary site data and the time sequence prediction data into a random forest algorithm for training, constructing a correlation model between multidimensional features and air quality, and finishing final air quality fitting of the site at the moment to be predicted. Coupling means that two originally unrelated models are combined together, and model prediction accuracy is improved jointly. The spatio-temporal correlation cube may assist the site to be predicted in selecting auxiliary site data based on site correlation strengths (measured by correlation coefficients between different site histories).
For example, 1 month and 5 days of 2020 are predicted 12: the air quality of the site A01 at the moment 00 is firstly input according to a singular spectrum analysis model, meteorological data (not all input, a progressive time input window exists, such as 72 hours before each time of placement, the window size is an adjustable parameter) is used for predicting the meteorological data at the future moment (the prediction time window is also adjustable), then a space-time correlation cube is used for extracting prediction data of the top K sites with the highest correlation between the moment and the site as auxiliary features, the selection basis is according to the correlation among the historical data at the moment (12:00), the selection of K is determined by precision, for example, when the top 5 important sites are introduced, K is reserved to be 5, after the top K adjacent sites are selected, the data of the auxiliary sites are overlapped with the site prediction data which are analyzed by the singular spectrum together to form an input feature set, the feature set is placed in a random forest model, and the final prediction result is obtained by the random forest model prediction.
In an example experiment, to demonstrate the multi-scale time window prediction capability of the model of the present invention, the size of the time prediction window was chosen to be 1-12 hours progressive, as shown in fig. 5, which is the prediction effect for different time scales, where red is the predicted value and black is the observed value.
In order to further prove the excellent effect of the coupling model of the invention, a plurality of different models are selected in the embodiment to carry out verification comparison on the same data set, and the comparison model selects different algorithms such as an integrated moving average autoregressive model (ARIMA), common Singular Spectrum Analysis (SSA), long and short time memory neural network (LSTM), one-dimensional Convolutional Neural Network (CNN) and the like, and the comparison result is shown in the attached table 1.
TABLE 1
The results show that the model herein (STSR) shows good predictive performance on different prediction time scales, and the fitting results are higher than other algorithms in the same time period. It should be noted that, on the prediction accuracy at a longer time, other model algorithms have a great drop in accuracy, while the model herein has a good accuracy stability, which indicates that the model herein has good reliability and generalization performance.
According to the method, singular spectrum analysis is introduced to conduct time sequence prediction of PM2.5 monitoring data and meteorological feature data, a space-time correlation cube is designed to adaptively select first K important spatial neighborhood site features, a time sequence prediction result and the first K important spatial neighborhood features are overlapped to construct a sample feature set, and finally fitting of final results under different time scales is completed by means of a random forest algorithm. According to the coupling model provided by the invention, the singular spectrum analysis-space-time correlation cube-random forest model is combined together to complete the prediction purpose, so that the space-time correlation among different space sites can be effectively considered, the time sequence prediction effect and stability degree of a single site in the urban space environment under different time scales can be improved, and a reference basis can be provided for urban atmospheric management decision. The prediction output time window of the invention is adjustable, and the prediction of 1-12 hours in the future is performed by taking the experiment as an example.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (6)

1. A time sequence prediction method for urban air quality taking time-space correlation into consideration is characterized by comprising the following steps:
s1) acquiring historical time period record data of air quality monitoring stations set up in a city space range, and performing data matching on the acquired historical time period record data to form time sequence record data of multi-characteristic variables at different air quality stations;
s2) carrying out data preprocessing on the time sequence record data acquired in the step S1), and finally forming historical record data of a plurality of air quality monitoring stations with complete time sequence;
s3) inputting meteorological data of historical moments of the stations to be predicted into a singular spectrum analysis model to obtain the stations to be predicted prediction data, wherein the specific steps are as follows:
assuming k time sequence features to be predicted, wherein each type of features comprises n time records, and a feature matrix with the size of n is formed;
traversing the time sequence data by utilizing a time window with the size of l to form a plurality of characteristic vectors with the length of l, forming a characteristic matrix Xlw, embedding the matrixes of each type of characteristic, and superposing the matrixes to form a characteristic cube with the dimension of l, w and k; embedding means that feature matrixes Xlw of different types of features are overlapped to form a l-dimension w-dimension k-dimension feature cube, wherein k-dimension means features of different dimensions;
calculating the eigenvalues and eigenvectors of the eigenvmatrix Xlw of each class of time sequence features, namely calculating each dimension section of the eigenvector, and sequencing the eigenvalues from large to small according to the importance of the eigenvalues;
reconstructing a track matrix, superposing feature vectors corresponding to each orthotopic feature value of each type of time sequence feature, and reconstructing the track matrix to form a new feature matrix;
decomposing the time sequence into features representing different components of the time sequence through singular spectrum analysis, and further predicting the important features to complete further deduction of the time sequence process;
the method for constructing the space-time correlation cube according to the historical data comprises the following specific steps:
firstly, constructing correlation matrixes among different sites at different moments in a set time period by utilizing historical data, wherein each value of the matrixes represents correlation strength among the sites;
the calculation formula of the correlation strength is as follows:
where Ks represents the correlation strength of different stations at a certain moment, cov () represents the covariance between variables, S (i, t) and S (j, t) represent the air quality records corresponding to stations i and j at a moment t,represents standard deviation of the corresponding variable;
the space-time correlation cube measures the correlation strength between different space stations at different moments through the autocorrelation coefficients, so that the correlation between all air quality monitoring stations at all moments in a set time period is obtained, and the space-time correlation cube is constructed according to the correlation;
the method for extracting the predicted data of the first K neighborhood stations with the strongest correlation with the stations to be predicted at the moment to be predicted by using the constructed space-time correlation cube as auxiliary station data comprises the following steps:
determining the priority of auxiliary stations to be introduced according to the time and the position of a predicted period in actual prediction by a space-time correlation cube, extracting the correlation between stations corresponding to the time from the space-time correlation cube when the air quality index of a certain station at the time in the future is required to be predicted, sequencing the neighborhood stations according to the correlation strength of the neighborhood stations, and selecting the first K important spatial neighborhood station characteristics;
s4) coupling auxiliary site data extracted through a space-time correlation cube and site prediction data to be predicted obtained by using a singular spectrum analysis model to form an input feature set, putting the input feature set into a random forest model, and predicting by the random forest model to obtain a final prediction result of the site to be predicted at the moment to be predicted;
aiming at the first K important space neighborhood site characteristics, adding the features one by one according to the importance sequence of the features, and entering a random forest algorithm for training; the optimal number of features K is selected according to the model accuracy.
2. The urban air quality time sequence prediction method considering space-time correlation according to claim 1, wherein: the data matching in the step S1) refers to matching the data acquired by the air quality monitoring stations according to the acquisition time, the acquisition stations and the category to which the data belong, and the data are normalized data, and the acquired data are weather monitoring data of each station at different moments.
3. The urban air quality time sequence prediction method considering space-time correlation according to claim 1, wherein: the collected historical period record data of the air quality monitoring station established in the urban space range comprises PM2.5 monitoring data and meteorological characteristic data.
4. The urban air quality time sequence prediction method considering space-time correlation according to claim 1, wherein: the data preprocessing comprises outlier screening, eliminating, interpolation and missing value filling.
5. The urban air quality time sequence prediction method considering space-time correlation according to claim 4, wherein: the interpolation method selects IDW inverse distance weight interpolation, and obtains corresponding weather information of the air quality monitoring site position at the corresponding moment through interpolation.
6. The urban air quality time sequence prediction method considering space-time correlation according to claim 4, wherein: the missing value filling is to model the correlation between the meteorological features and the PM2.5 index at the same moment by using a random forest algorithm, and the mutual speculation between the meteorological features and the PM2.5 value is realized according to a correlation model between the meteorological features and the PM2.5 value.
CN202010114790.1A 2020-02-25 2020-02-25 Urban air quality time sequence prediction method considering time-space correlation Active CN111340288B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010114790.1A CN111340288B (en) 2020-02-25 2020-02-25 Urban air quality time sequence prediction method considering time-space correlation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010114790.1A CN111340288B (en) 2020-02-25 2020-02-25 Urban air quality time sequence prediction method considering time-space correlation

Publications (2)

Publication Number Publication Date
CN111340288A CN111340288A (en) 2020-06-26
CN111340288B true CN111340288B (en) 2024-04-05

Family

ID=71187079

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010114790.1A Active CN111340288B (en) 2020-02-25 2020-02-25 Urban air quality time sequence prediction method considering time-space correlation

Country Status (1)

Country Link
CN (1) CN111340288B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183625A (en) * 2020-09-28 2021-01-05 武汉大学 PM based on deep learning2.5High-precision time-space prediction method
CN112801423B (en) * 2021-03-29 2021-07-20 北京英视睿达科技有限公司 Method and device for identifying abnormity of air quality monitoring data and storage medium
CN113077357B (en) * 2021-03-29 2023-11-28 国网湖南省电力有限公司 Power time sequence data anomaly detection method and filling method thereof
CN113077097B (en) * 2021-04-14 2023-08-25 江南大学 Air quality prediction method based on depth space-time similarity
US11512864B2 (en) * 2021-04-14 2022-11-29 Jiangnan University Deep spatial-temporal similarity method for air quality prediction
CN113610286B (en) * 2021-07-27 2024-03-29 中国地质大学(武汉) PM taking into account space-time correlation and meteorological factors 2.5 Concentration prediction method and device
CN113610243B (en) * 2021-08-12 2023-10-13 中节能天融科技有限公司 Atmospheric pollutant tracing method based on coupled machine learning and correlation analysis
CN117332906B (en) * 2023-12-01 2024-03-15 山东大学 Machine learning-based three-dimensional space-time grid air quality prediction method and system
CN117540193A (en) * 2024-01-10 2024-02-09 飞特质科(北京)计量检测技术有限公司 Extraction method of characteristic data of fan conduction interference test curve

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407633A (en) * 2015-07-30 2017-02-15 中国科学院遥感与数字地球研究所 Method and system for estimating ground PM2.5 based on space-time regression Kriging model
CN106447072A (en) * 2016-08-01 2017-02-22 中国卫星海上测控部 Explicit genetic algorithm and singular spectrum analysis-based meteorological and hydrological element forecast method
CN106971547A (en) * 2017-05-18 2017-07-21 福州大学 A kind of Short-time Traffic Flow Forecasting Methods for considering temporal correlation
CN107133398A (en) * 2017-04-28 2017-09-05 河海大学 A kind of river ethic Forecasting Methodology based on complex network
CN107423861A (en) * 2017-08-09 2017-12-01 北京工业大学 Air Quality Forecast method based on iterative learning
CN107563565A (en) * 2017-09-14 2018-01-09 广西大学 A kind of short-term photovoltaic for considering Meteorology Factor Change decomposes Forecasting Methodology
CN108053071A (en) * 2017-12-21 2018-05-18 宇星科技发展(深圳)有限公司 Regional air pollutant concentration Forecasting Methodology, terminal and readable storage medium storing program for executing
CN108701274A (en) * 2017-05-24 2018-10-23 北京质享科技有限公司 A kind of small scale air quality index prediction technique in city and system
CN109492822A (en) * 2018-11-24 2019-03-19 上海师范大学 Air pollutant concentration time-space domain interaction prediction method
CN109902863A (en) * 2019-02-15 2019-06-18 浙江财经大学 A kind of wind speed forecasting method and device based on multifactor temporal correlation
CN110210681A (en) * 2019-06-11 2019-09-06 西安电子科技大学 A kind of prediction technique of the monitoring station PM2.5 value based on distance
CN110598953A (en) * 2019-09-23 2019-12-20 哈尔滨工程大学 Space-time correlation air quality prediction method
CN110610258A (en) * 2019-08-20 2019-12-24 中国地质大学(武汉) Urban air quality refined estimation method and device fusing multi-source space-time data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105493109B (en) * 2013-06-05 2018-01-30 微软技术许可有限责任公司 Inferred using the air quality of multiple data sources

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407633A (en) * 2015-07-30 2017-02-15 中国科学院遥感与数字地球研究所 Method and system for estimating ground PM2.5 based on space-time regression Kriging model
CN106447072A (en) * 2016-08-01 2017-02-22 中国卫星海上测控部 Explicit genetic algorithm and singular spectrum analysis-based meteorological and hydrological element forecast method
CN107133398A (en) * 2017-04-28 2017-09-05 河海大学 A kind of river ethic Forecasting Methodology based on complex network
CN106971547A (en) * 2017-05-18 2017-07-21 福州大学 A kind of Short-time Traffic Flow Forecasting Methods for considering temporal correlation
CN108701274A (en) * 2017-05-24 2018-10-23 北京质享科技有限公司 A kind of small scale air quality index prediction technique in city and system
CN107423861A (en) * 2017-08-09 2017-12-01 北京工业大学 Air Quality Forecast method based on iterative learning
CN107563565A (en) * 2017-09-14 2018-01-09 广西大学 A kind of short-term photovoltaic for considering Meteorology Factor Change decomposes Forecasting Methodology
CN108053071A (en) * 2017-12-21 2018-05-18 宇星科技发展(深圳)有限公司 Regional air pollutant concentration Forecasting Methodology, terminal and readable storage medium storing program for executing
CN109492822A (en) * 2018-11-24 2019-03-19 上海师范大学 Air pollutant concentration time-space domain interaction prediction method
CN109902863A (en) * 2019-02-15 2019-06-18 浙江财经大学 A kind of wind speed forecasting method and device based on multifactor temporal correlation
CN110210681A (en) * 2019-06-11 2019-09-06 西安电子科技大学 A kind of prediction technique of the monitoring station PM2.5 value based on distance
CN110610258A (en) * 2019-08-20 2019-12-24 中国地质大学(武汉) Urban air quality refined estimation method and device fusing multi-source space-time data
CN110598953A (en) * 2019-09-23 2019-12-20 哈尔滨工程大学 Space-time correlation air quality prediction method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Prediction of vertical PM2.5 concentrations alongside an elevated expressway by using the neural network hybrid model and generalized additive model;Gao, Y 等;《FRONTIERS OF EARTH SCIENCE》;20170623;第11卷(第2期);全文 *
基于Pearson相关指标的BP神经网络PM2.5预测模型;张怡文;敖希琴;时培俊;郭傲东;费久龙;陈家丽;;青岛大学学报(自然科学版);20170515(第02期);全文 *

Also Published As

Publication number Publication date
CN111340288A (en) 2020-06-26

Similar Documents

Publication Publication Date Title
CN111340288B (en) Urban air quality time sequence prediction method considering time-space correlation
CN109508360B (en) Geographical multivariate stream data space-time autocorrelation analysis method based on cellular automaton
CN106951611B (en) Energy-saving design optimization method for buildings in severe cold regions based on user behaviors
CN110674604A (en) Transformer DGA data prediction method based on multi-dimensional time sequence frame convolution LSTM
CN111476713A (en) Intelligent weather image identification method and system based on multi-depth convolution neural network fusion
CN112949945A (en) Wind power ultra-short-term prediction method for improving bidirectional long-short term memory network
CN110929918A (en) 10kV feeder line fault prediction method based on CNN and LightGBM
CN111950708B (en) Neural network structure and method for finding daily life habits of college students
CN112785066B (en) Global wild fire season space-time prediction method based on convolution-recurrent neural network
CN112232543A (en) Multi-site prediction method based on graph convolution network
CN113516304B (en) Regional pollutant space-time joint prediction method and device based on space-time diagram network
CN112396234A (en) User side load probability prediction method based on time domain convolutional neural network
CN111709393B (en) Structural damage identification method combining convolution and cyclic neural network
CN115495991A (en) Rainfall interval prediction method based on time convolution network
CN115099450A (en) Family carbon emission monitoring and accounting platform based on fusion model
CN114444561A (en) PM2.5 prediction method based on CNNs-GRU fusion deep learning model
CN112419711A (en) Closed parking lot parking demand prediction method based on improved GMDH algorithm
CN114970946A (en) PM2.5 pollution concentration long-term space prediction method based on deep learning model and empirical mode decomposition coupling
Yu et al. A diagnosis model of soybean leaf diseases based on improved residual neural network
CN116525135B (en) Method for predicting epidemic situation development situation by space-time model based on meteorological factors
CN113657472A (en) Multi-source remote sensing data fusion method based on subspace learning
CN117370766A (en) Satellite mission planning scheme evaluation method based on deep learning
CN117116045A (en) Traffic flow prediction method and device based on space-time sequence deep learning
CN117113054A (en) Multi-element time sequence prediction method based on graph neural network and transducer
CN117033923A (en) Method and system for predicting crime quantity based on interpretable machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant