WO2018214060A1

WO2018214060A1 - Small-scale air quality index prediction method and system for city

Info

Publication number: WO2018214060A1
Application number: PCT/CN2017/085715
Authority: WO
Inventors: 王绍鑫; 陈矿; 吴建东; 曹袭亚; 林爱德华·罗伯特
Original assignee: 北京质享科技有限公司
Priority date: 2017-05-24
Filing date: 2017-05-24
Publication date: 2018-11-29
Also published as: CN108701274B; CN108701274A

Abstract

The present invention discloses a small-scale air quality index prediction method and system for a city. The method comprises: firstly dividing a city area into a grid of multiple locations to undergo prediction; acquiring historical data related to each model and creating the following models on the basis of the historical data: a time prediction model respectively corresponding to a prediction relating to a current moment and predictions relating to each moment in a future period of time, a spatial prediction model for performing air quality predictions with respect to locations at specified coordinates, a dynamic prediction model for characterizing a relationship of traffic data and data of a geographical location of interest to an air quality index, and an indoor and outdoor prediction model for characterizing a relationship of an indoor air quality index to an outdoor air quality index; when performing prediction, performing coordinated training on the created time prediction model, the spatial prediction model, the dynamic prediction model and the indoor and outdoor prediction model with respect to any one of the locations to undergo prediction at any real-time moment, so as to merge prediction results of all of the models and obtain air quality index predicted values of each of the locations to undergo prediction at a corresponding current moment, and at each moment in the future period of time.

Description

Method and system for predicting urban small-scale air quality index

Technical field

The invention relates to the technical field of air quality index prediction, in particular to a city small-scale air quality index prediction method and system based on machine learning algorithm.

Background technique

With the advancement of urbanization and industrialization, the problem of environmental pollution has become more and more serious. In recent years, widespread and serious air pollution has directly threatened people's health and affected the sustainable development of social economy. At present, most of the regions only provide city-level air quality index forecasts, but not to geographical locations within the city. For residents living in cities, accurate and reasonable air quality predictions help them arrange production and life, adjust travel modes and take appropriate protective measures to reduce the harm of air pollutants to the body and improve the overall health of the society.

AQI (air quality index) is a dimensionless indicator that quantitatively describes air quality and is currently the most widely used indicator for measuring air quality. Referring to the national standard HJ633-2012, AQI is calculated from the concentration of several pollutants through a functional relationship. These contaminants include sulfur dioxide (SO ₂ ), nitrogen dioxide (NO ₂ ), nitrogen monoxide (NO), carbon monoxide (CO), ozone (O ₃ ), suspended particulate matter PM2.5, and PM10. The value of AQI ranges from 0 to 500. A larger value indicates a more serious air pollution condition.

The current methods for AQI prediction are as follows:

1. Atmospheric dispersion modeling: This model simulates the transport, diffusion and migration processes of atmospheric pollutants, and predicts the temporal and spatial distribution of a certain pollutant concentration under different pollution source conditions, meteorological conditions and underlying surface conditions. The mathematical model is a simplified mathematical description of the migration and diffusion of pollutants in the lower atmosphere. The form of the model varies according to different modeling theory systems, contaminant migration, diffusion processes, and different description objects. The SPRINTARS method (Spectral Radiation-Transport Model for Aerosol Species) developed by Kyushu University, Japan is a typical representative. It is a numerical model developed on a global scale to simulate the effects of atmospheric suspended particulate matter on the climate system and atmospheric pollution. Based on the air-sea coupled model MIROC, the atmospheric aerosols present in the troposphere are studied. Such methods are scientific in nature, but have the following disadvantages: the diffusion form of pollutants is mainly considered from the macroscopic atmospheric circulation, and it is difficult to distinguish the specific climate conditions of key areas (such as cities). Due to the specific climatic conditions in the same area, it will change due to seasons, time periods, and even human factors. For example, before and after a new chemical plant in a certain area, the emission and accumulation of pollutants are significantly different. Therefore, it is difficult to accurately target specific areas. On the other hand, the method has a large amount of data collection and a huge amount of data calculation, and at least a large amount of pollution source specific information needs to be collected. And satellite meteorological information, while configuring high-performance hardware devices to provide data processing functions, high cost, professional, and not suitable for ordinary users.

2. Statistical models based on historical data, such as linear regression, artificial neural networks. Such methods typically only use local data collected near the air quality monitoring base station to predict local air quality indices near the base station. The disadvantage of this type of method is that only the vicinity of the base station is considered, and the prediction model cannot be established for a location without a base station. On the other hand, since only geographical local information is considered, the spatial diffusion process of pollutants is rarely considered. Therefore, not only the prediction models in different locations may have huge differences, but also the accuracy of predictions is difficult to maintain at a high level.

Summary of the invention

The technical problem to be solved by the present invention is to use a cooperative training algorithm that combines various prediction methods to perform air quality index prediction for each geographical point in a city range not limited to an air quality monitoring base station, while keeping the calculation low. Improve the accuracy of prediction while increasing complexity.

The technical solution adopted by the invention is: a city small-scale air quality index prediction method, comprising:

S1, dividing the urban area into a grid-shaped advancing area, and the grid intersection points correspond to the location of the air quality index to be predicted;

S2, obtaining historical monitoring data of each air quality monitoring base station, and establishing a historical database;

S3. Based on the monitoring data of multiple time series of each base station in the historical database, establish a time prediction model corresponding to the current time prediction and the prediction of each time in the future time period;

S4, based on monitoring data of the base station at the same time by each air quality in the historical database, using a two-dimensional linear interpolation method to establish a spatial prediction model for air quality prediction at a specified coordinate;

S5, obtaining traffic data and geographic interest point data of each to-be-predicted location and air quality monitoring base station, and air quality index data of each base station to be predicted and the air quality monitoring base station at corresponding moments;

Based on the acquired data, a dynamic prediction model that characterizes the relationship between traffic data and geographic interest point data and air quality index is established;

S6, obtaining indoor air quality index shared by the user, user living environment data, and air quality index data of the corresponding location, and establishing an indoor and outdoor prediction model for characterizing the relationship between the indoor air quality index and the outdoor air quality index;

S7, cooperatively training the established time prediction model, spatial prediction model, dynamic prediction model and indoor and outdoor prediction model for any predicted location of any real-time moment of the air quality index to be predicted, so as to predict the prediction results of all models The fusion is performed to obtain the predicted values of the air quality index of each of the to-be-predicted locations at respective current moments and at various moments in the future.

In the present invention, the time prediction model corresponding to the current time prediction represents the relationship between the historical monitoring data and the current monitoring data, and corresponds to the temporal prediction model predicted in the future, and the historical monitoring data and the current monitoring number are represented. According to the relationship between the monitoring data at various moments in the future, according to the specified time span of the future period, including a plurality of time prediction models corresponding to each moment;

The spatial prediction model characterizes the relationship between the real-time monitoring data of each known location or base station and the air quality index data of the to-be-predicted point whose real-time monitoring data is unknown.

The invention realizes the air quality prediction of the location outside the base station through the establishment of each model and the fusion of the prediction results of each model at the time of prediction, and the prediction result integrates the influence of various related factors, and the accuracy is higher.

Further, the present invention also includes:

S8, real-time assessment of the accuracy of air quality index prediction results, including:

S81, using the K-fold cross-validation algorithm to evaluate the accuracy of the current time prediction result:

S811, assuming that the number of base stations is N, all base stations are divided into K shares, each number is 1, 2, ..., k, k+1, ..., K, and each has c=N/K Base stations;

S812, removing the kth part from the K base station, and remaining K-1 base stations as known data;

S813, based on the data of the known K-1 base stations, obtain the AQI prediction value of each base station in the removed kth base station at the current time, and record it as

S814. Obtain the measured AQI values y ₁ , y ₂ , . . . , y _c of each base station in the kth part, and the accuracy of the current time predicted value result is described by the following indicator η _k :

S815, traversing k from 1 to K, repeating steps S712 to S714 respectively, and then obtaining an accuracy index η of the prediction system at the current time is:

The closer η is to 1, the higher the accuracy of the current time prediction of the system;

S82, assessing the accuracy of the predicted results for a period of time:

Assume that the predicted value of all base stations corresponding to a specified time in the future is predicted to be

The actual measured values of the base stations at the specified time are z ₁ , z ₂ ,..., z _N , and the prediction accuracy of the prediction system for future moments is:

The closer the ψ is to 1, the higher the accuracy of the system's prediction of future moments.

Preferably, the monitoring data monitored by the air quality monitoring base station in the present invention includes date and time, base station name, and base station Latitude, AQI data, temperature, air pressure, wind, humidity, and weather type data. For the case of missing historical data, interpolation of local time series can be performed.

Preferably, in step S3, a multivariate linear regression method is used to establish a temporal prediction model corresponding to the current time prediction and the future time prediction.

The step of establishing a temporal prediction model based on the historical database in step S3 includes the steps of:

S31, specifying the length of the historical sequence l ₁ and the prediction period, that is, the length of the future sequence is l ₂ , and the data of the current time is x _n , then the historical sequence is

The future sequence is

Extracting all consecutive sequence data of l ₁ +1+l ₂ hours in the historical database to form an initial training data set S ₁ ;

S32, establishing l ₂ +1 multiple linear regression models, each of the multiple linear regression models respectively corresponding to the current time and the prediction of each moment in the next ₁₂ hours, expressed as:

Y ₁ =β ₀ +β ₁ X ₁ +β ₂ X ₂ +...+β _p X _p

Where β _i is the regression coefficient, X _i is the model input data, and Y ₁ is the air quality index of the time to be predicted;

For the current time prediction, the model input data is the ₁ hour historical AQI data and the temperature, air pressure, wind, humidity and weather type data of the hour at the current time;

For the prediction of each moment in the next ₁₂ hours, the model input data is the current time AQI data, l ₁ hour historical AQI data, and current temperature, air pressure, wind, humidity and weather type data.

By using multiple sequences in the initial training data set S ₁ to train each multiple linear regression model, each regression coefficient in each initial multiple linear regression model can be obtained, thereby obtaining each initial multiple linear regression model, that is, an initial time prediction model. .

The present invention may use a numerical number for the weather type data, such as 0 for sunny days, 1 for cloudy cloudy days, 2 for rainy days, and the like. Other existing data processing and presentation methods can also be employed.

Preferably, in step S4 of the present invention, a two-dimensional linear interpolation method is used to establish a spatial prediction model for air quality prediction at a specified coordinate, including:

S41: acquiring real-time monitoring data of all known air quality index locations at the same time, and latitude and longitude data of corresponding locations, forming a training data set S ₂ of the spatial prediction model;

S42, the coordinate of the location to be predicted is (x, y), and the spatial prediction model for air index prediction of the location is expressed as:

Y ₂ =griddata(x,y,S ₂ )

The input of the model is S ₂ , the output of the model is the air quality index of the location to be predicted, and the griddata () represents the two-dimensional interpolation function.

The initial training data set of the spatial prediction model contains only the air quality index at the base station.

Preferably, in step S5, the traffic data includes length data of the smooth path segment, the slow road segment, and the congestion road segment in each of the to-be-predicted locations and the air quality monitoring base station.

The geographic point of interest data includes distribution data of geographic object entities within a set radius area around the base station and the air quality monitoring base station; the geographic object entity types include schools, banks, restaurants, and gas stations. Other geographic object entities may also be included, and the exhaustion is not described.

Preferably, step S5 of the present invention establishes a dynamic prediction model by using a multiple linear regression method, including the steps of:

S51: Obtain traffic data and geographic interest point data within a given radius of each base station corresponding to each time point in the historical database, and the traffic data includes the proportion data of the length of the smooth path segment, the slow road segment, and the congestion segment length, and is defined as T ₁ . T ₂ , T ₃ , geographic interest point data includes the number of geographical interest points within a given radius of the base station, defined as T ₄ , T ₅ ,..., T _q , and the air quality index monitoring of the corresponding base station at the corresponding time. Data, establish an initial training set S ₃ ;

S52, establishing a dynamic prediction model, expressed as:

Y ₃ =α ₀ +α ₁ T ₁ +α ₂ T ₂ +α ₃ T ₃ +α ₄ T ₄ +...++α _q T _q

Where α _i is the regression coefficient, the model input quantity is the traffic data and geographic interest point data within a given radius of the to-be-predicted location at the specified time, and the model output quantity Y ₃ is the air quality index of the point to be predicted.

Dynamic prediction model initial training data in S ₃ only contains the data at the base station in the history database. The values of the regression coefficients in the dynamic prediction model can be obtained through the training of the training set before each prediction, so that the corresponding dynamic prediction model is obtained, and the current and future moments of the air quality index data are obtained by using the dynamic prediction model. For the prediction of future time, the prior art can predict the future time of the traffic data. Therefore, when the present invention performs dynamic prediction for the future time, the input data can directly use the traffic prediction data that has been predicted by the prior art.

According to the data analysis report of the indoor air quality survey released by the Department of Electronic Engineering of Tsinghua University [2], indoor air quality and outdoor air quality have various types of numerical relationships. This depends on a variety of conditions: the type of building environment, the floor, the distance from the main road, whether to open the central air conditioning, whether to open the window ventilation, whether to open the air purifier, etc. will affect the relationship between indoor and outdoor air quality index.

Preferably, in step S6 of the present invention, the regression tree algorithm is used to establish an indoor and outdoor prediction model, including the steps:

S61. Acquire air quality index data monitored by each base station at a plurality of specified moments in the historical database, and indoor air quality index data and indoor air quality index related data shared by users at corresponding moments in the corresponding time, and the indoor air quality index related data includes the type of the building environment. , floor, distance from the main road, whether to open the central air conditioner, whether to open the window ventilation and whether to open the air purifier; based on the acquired data to establish an initial training set S _{4 of the} indoor and outdoor prediction model;

S62, establishing an indoor and outdoor prediction model, expressed as:

Y ₄ =RT(M,S ₄ )

The model input quantity is the indoor air quality index M shared by the user acquired at the time to be predicted, and the indoor air quality index related data, and the model output quantity Y ₄ is the air quality index data of the to-be-predicted place at the time to be predicted.

In the indoor and outdoor prediction model, when the indoor air quality index related data in the training data are different, that is, the conditional states affecting the indoor and outdoor air quality index are different, the model coefficients of the regression tree RT() are also different, and the training of the present invention is adopted. Under the same conditions, the input and output data are trained, and the regression tree model and its coefficient which characterize the relationship between indoor and outdoor air quality index under each condition are obtained, which is applied to the subsequent prediction of the air quality index of the predicted location under the same conditions. When predicting a future time of a to-be-predicted location, the input data may be data of a model input data acquired by using the prior art at a corresponding time in the future.

In the actual forecast, if the indoor air quality index related data at the corresponding time cannot be obtained, the indoor and outdoor prediction model is established as follows:

Y ₄ =M/60%

According to the statistical relationship between indoor and outdoor air quality published by the US Environmental Protection Agency [1], indoor air quality is about 60% of outdoor air quality.

Preferably, step S7 of the present invention performs collaborative training on the established time prediction model, spatial prediction model, dynamic prediction model and indoor and outdoor prediction model, including:

S71, the time prediction model, the spatial prediction model, the dynamic prediction model, and the indoor and outdoor prediction models are predictors F ₁ , F ₂ , F ₃ , and F ₄ , respectively, and the training sets of each predictor are respectively recorded as L ₁ , L ₂ , L ₃ , L ₄ , initialize the training set to:

L ₁ =S ₁ , L ₂ =S ₂ , L ₃ =S ₃ , L ₄ =S ₄ ;

The weight vector for initializing each predictor prediction result is [w ₁ , w ₂ , w ₃ , w ₄ ], and the sum of the four weighting factors is equal to 1.

S72, based on the training set _{_{_{L 1, L 2, L 3}}} , L 4 , respectively, is trained _{_{_{F 1, F 2, F 3}}} , F 4;

S73. Acquire a model input quantity data corresponding to each predictor at a time to be predicted at a time to be predicted, and use the acquired data to calculate a predicted value of the to-be-predicted time by using four predictors obtained by the training for each to-be-predicted place. Recorded as:

Y ₁ =F ₁ (x,y)

Y ₂ =F ₂ (x,y)

Y ₃ =F ₃ (x,y)

Y ₄ =F ₄ (x,y)

S74. For each location to be predicted, the AQI fusion value at the time to be predicted is:

S75, defining a deviation threshold _{Rth of the} prediction result, and calculating a sum of deviations of the prediction results of the four predictors:

S76: For each location to be predicted, compare the calculated R _x,y with the deviation threshold R _th respectively, if:

Then exit the loop, taking Y ₀ as the air quality index predicted value of each to-be-predicted location at the time to be predicted; otherwise, go to step S77;

S77, from all the to-be-predicted locations, in the order of the corresponding R _{x, y} from small to large, select n locations to be predicted, and record:

S={(x ₁ , y ₁ ), (x ₂ , y ₂ ), ..., (x _n , y _n )};

S78, updating the training set of each predictor is: L ₁ ={L ₁ ,S}, L ₂ ={L ₂ ,S}, L ₃ ={L ₃ ,S}, L ₄ is the current S ₄ ; Go to step S72, and repeat steps S72 to S78 to continue training until the step S76 is satisfied.

Then, Y ₀ corresponding to the time of satisfaction is used as the predicted value of the air quality index of each to-be-predicted location at the time to be predicted.

It can be seen from the above method that the present invention performs at least one round of training for each time prediction. In the process of the cyclic training, each round of the training process is completed, and the next round of training, the data of the training data sets of each model will be Updated to provide more accurate predictions in subsequent training. The newly added data in each training data set is the relevant data at the predicted position where the sum of the prediction results of the predictors and the deviation of the cooperative training results is the smallest in the previous round of training. For the spatial prediction model, the newly added training data is The coordinates and AQI data at the predicted location obtained from the previous round of training; for the dynamic prediction model, the newly added training data is the historical air quality index data and traffic data and geographic interest point data at the predicted location, and so on.

Further, if the corresponding AQI prediction value cannot be obtained by the predictor F _{4 in} step S73, the AQI fusion value is calculated in the following formula using the following formula:

The invention also provides a city small-scale air quality index prediction system, comprising:

The area dividing module divides the urban area into a grid-shaped advancing area, and the grid intersection points correspond to the location of the air quality index to be predicted;

The historical monitoring data acquisition module acquires historical monitoring data of the air quality monitoring base station and establishes a historical database; the historical monitoring data includes AQI data, weather data, and weather type data;

A time prediction model building module, which establishes a time prediction model based on a historical database;

The spatial prediction model building module acquires real-time monitoring data of each air quality monitoring base station and establishes a spatial prediction model;

The dynamic prediction model establishing module acquires traffic data and geographic interest point data of each to-be-predicted location and air quality monitoring base station, and establishes a dynamic prediction model;

The indoor and outdoor prediction model building module acquires the indoor air quality index shared by the user and establishes an indoor and outdoor prediction model;

The collaborative training module cooperatively trains the established time prediction model, spatial prediction model, dynamic prediction model and indoor and outdoor prediction model to fuse the prediction results of all models to obtain all the predicted locations at the current time and for a period of time in the future. Air quality index forecast.

Beneficial effect

Compared with the prior art, the urban small-scale air quality prediction method provided by the invention has the following advantages:

1. It can more accurately predict the current and future air quality index of any location in the city to provide accurate air quality predictions;

2. The invention combines multiple data sources and multiple prediction models, avoiding the limitations of a single prediction model and ensuring the accuracy of the model;

3. The invention separates multiple prediction models and then finally cooperates with each other, which reduces the overall computational complexity and shortens the calculation time.

DRAWINGS

Figure 1 is a schematic flow chart of the method of the present invention.

detailed description

The technical solutions of the present invention will be further specifically described below by way of embodiments and with reference to the accompanying drawings. It should be noted that the present invention should not be limited to the specific embodiments described below. In addition, a detailed description of some well-known techniques is omitted for the sake of brevity.

The urban small-scale air quality index prediction method of the invention comprises the steps of:

In the present invention, the time prediction model corresponding to the current time prediction represents the relationship between the historical monitoring data and the current monitoring data, and corresponds to the temporal prediction model predicted in the future, and represents the historical monitoring data and the current monitoring data and the future. Monitoring the relationship between the data at each moment in time, according to the specified time span of the future period, including a plurality of temporal prediction models corresponding to each moment;

Example

Figure 1 is a flow chart of the present invention. As shown in FIG. 1, the present invention uses a cooperative training algorithm with multiple prediction models to predict the null. Gas quality index. The various prediction models, cooperative training algorithms, and final evaluation accuracy for predicting air quality are described in detail below.

First, a square grid system is built in the area to be predicted. In this embodiment, the to-be-predicted area is the inner five-ring area of Beijing, and a square grid system is established, and the grid size is one square kilometer. The grid intersection is the location where the air quality index is to be predicted. The number of air quality monitoring base stations is recorded as N. In this embodiment, there are 36 air quality monitoring base stations in Beijing.

Construction of time prediction model F ₁ in step S3

Obtain and establish a historical database of air quality monitoring base stations, including date and time, base station name, base station latitude and longitude, AQI data, temperature, air pressure, wind, humidity, and weather type. In this embodiment, the sampling time interval for the historical data is preferably 1 hour. In order to ensure the integrity of the training samples, the local time series interpolation is completed for the case of missing historical data.

According to the historical database, a unified time series prediction model is established for each air quality monitoring base station, and is used to predict the air quality index of the specified predicted location at a certain point in time in the future. This step further includes the following substeps:

Specifies the length of the historical sequence used and the forecast period. The data at the current time is recorded as x _n , the length of the historical sequence is L ₁ , and the history sequence is recorded as

The future sequence length is L ₂ , and the future sequence is recorded as

Preferably, the length of the historical sequence is selected to be 6 and the length of the prediction period is selected to be 6. That is, at any time, the corresponding 6-hour historical data is used to predict the most recent 6-hour air quality index. Thus, all consecutive L ₁ +1+L ₂ hour sequences in the extraction history database constitute the training data set S ₁ .

The multivariate linear regression model was used to predict the current time and the next 6 hours. A multivariate linear regression model is established for each predicted time point, that is, there are a total of seven time prediction models. For the current time prediction, the input data S ₁ is the 6-hour historical data of the AQI and the temperature, air pressure, wind, humidity, and weather type of the previous hour. For the next 6 hours of prediction, the input data S ₁ is the current time AQI and 6-hour historical AQI data, and the current time temperature, air pressure, wind, humidity, weather type. The output of the multiple linear regression model is the AQI data at the time point that needs to be predicted. Multiple linear regression models can be written in the following form,

Y ₁ =β ₀ +β ₁ X ₁ +β ₂ X ₂ +...+β _p X _p (1)

Where β _i is the regression coefficient, X _i is the input data, and Y ₁ is the air quality index of the point to be predicted.

Construction of spatial prediction model F ₂ in step S4

Get real-time data of all base stations at the same time, including date and time, base station name, base station latitude and longitude, AQI number according to.

The spatial prediction model uses a two-dimensional linear interpolation algorithm. The input data S ₂ is the latitude and longitude, AQI of the base station or grid point of the known AQI value. The spatial prediction model can be expressed as:

Y ₂ =griddata(x,y,S ₂ ) (2)

Where x, y are the coordinates of the point to be predicted, S ₂ is the input data, that is, the training set, and Y ₂ is the air quality index of the point to be predicted. The initial training data S _{2 of the} spatial prediction model contains only relevant data at the base station. The griddata function is an existing interpolation function.

The initial training data of S ₂ is only the base station related data. After the training set is updated, the updated data is the previous round prediction result value of the to-be-predicted location with the smallest deviation of the prediction result in the previous round of training.

Construction of dynamic prediction model F ₃ in step S5

Obtain traffic data and geographic point of interest data within a given radius of all base stations and grid points to be predicted. The traffic data includes unblocked, slow, and congested road lengths, and converted into proportional data; the geographic interest point data includes distribution data of various types of geographic object entities within a given radius of the designated location, such as schools, banks, The number of restaurants, gas stations, etc.;

The dynamic prediction model is established by using multiple linear regression models. The input data is traffic data and geographic interest point data, and the output data is AQI data. The model form is as follows,

Y ₃ =α ₀ +α ₁ T ₁ +α ₂ T ₂ +α ₃ T ₃ +α ₄ T ₄ +...++α _q T _q (3)

Where α _i is the regression coefficient, T ₁ , T ₂ , T ₃ are the ratio of smooth, slow, and congested road segments, T ₄ , T ₅ ,..., T _q are the number of geographic interest points of various types, and Y ₃ is to be The air quality index of the predicted point. The initial dynamic prediction model training data S ₃ only contains the data at the base station.

Indoor and outdoor prediction model F _{4 in} step S6

Get the indoor air quality index shared by users. The indoor air quality index is measured by an air quality sensor placed on an air purifier that is compatible with the software system. Remember that all users share the data set as S ₄ as the training data for this model.

According to the data analysis report of the indoor air quality survey released by the Department of Electronic Engineering of Tsinghua University, indoor air quality and outdoor air quality have various types of numerical relationships. This depends on a variety of conditions: the type of building environment, the floor, the distance from the main road, whether the central air conditioning is turned on, whether the window is ventilated, whether the air purifier is turned on, etc. The regression tree algorithm was used to fit the indoor and outdoor air quality index relationships under each category. In general, indoor and outdoor prediction models can be expressed as

Y ₄ =RT(M,S ₄ ) (4-1)

Where RT is the regression tree algorithm, M is the indoor air quality index measured by the sensor, and Y ₄ is the outdoor air quality index to be predicted. When the state combinations of the conditions in S ₄ are different, the coefficients of the regression tree are also different. Therefore, the corresponding spatial prediction model is obtained by the combination of different conditions in the historical data, and is used for the prediction under the corresponding condition combination.

If the training data S _{4 is} missing or the measured data lacks the type of building environment, the floor, the distance from the main road, whether the central air conditioner is turned on, whether the window is ventilated, whether the air purifier is turned on, etc., the indoor and outdoor prediction models are obtained using the following method. According to the statistical relationship between indoor and outdoor air quality published by the US Environmental Protection Agency [1], indoor air quality is about 60% of outdoor air quality, namely:

Y ₄ =M/60% (4-2)

Where M is the indoor air quality index measured by the sensor and Y ₄ is the outdoor air quality index to be predicted.

Cooperative training algorithm in step S7

After the above four prediction models are established, a collaborative training algorithm is adopted to fuse the calculation results of each model. At the same time, these four models will likely have different levels of updates. The cooperative training algorithm is a semi-supervised learning algorithm whose main purpose is to efficiently use a small amount of marker data and a large amount of unlabeled data to train the predictor. This embodiment uses a simplified version of the collaborative training algorithm. The specific implementation steps are as follows:

L ₁ =S ₁ , L ₂ =S ₂ , L ₃ =S ₃ , L ₄ =S ₄ ;

Y ₁ =F ₁ (x,y)

Y ₂ =F ₂ (x,y)

Y ₃ =F ₃ (x,y)

Y ₄ =F ₄ (x,y)

S={(x ₁ , y ₁ ), (x ₂ , y ₂ ), ..., (x _n , y _n )};

If the corresponding AQI prediction value cannot be obtained by the predictor F _{4 in} step S73, the AQI fusion value is calculated in the following formula using the following formula:

Step S81 evaluates the accuracy of the current time prediction system

For the AQI prediction of each grid point at the current time, the accuracy of the cooperative training algorithm is calculated by cross-checking. The specific implementation steps are as follows:

S811, using K-fold cross-checking method, all base stations are randomly divided into K parts, each number is sequentially numbered 1, 2, ..., k, k+1, ..., K, each having c=N /K base stations. Preferably, in the embodiment, K is taken as 18. Thus, there are c=N/K=36/18=2 base stations in each copy;

S812, removing the kth part from the K base station, the measured value of the base station in the 1 part and the predicted value of the grid where the base station is located will be used for calculating accuracy in the subsequent steps, and the remaining K-1 base stations are known as data;

S813, based on the data of the K-1 base stations, perform step 7 to obtain an AQI prediction value of each base station in the k-th base station that is separately isolated, at the current time, and record

S814. Obtaining the measured AQI value of the kth base station is y ₁ , y ₂ , . . . , y _c , and the accuracy of the current time prediction by the prediction system when removing the kth base station may be described by the following indicator η _k :

S815, traversing k from 1 to K, and obtaining an accuracy index η of the prediction system at the current time is as follows:

The closer η is to 1, the higher the accuracy of the current time prediction of the system.

Step S82 evaluates the accuracy of AQI prediction in future time

The predicted value of the grid in which all base stations are located at a specified future time after performing the step S7 is recorded.

The actual measured values of the base station are z ₁ , z ₂ ,..., z _N , and the accuracy of the prediction system for future predictions is:

Example 2

The present invention utilizes a cooperative training algorithm that combines various prediction methods to perform air quality index prediction for each geographical point in a city range not limited to an air quality monitoring base station, while maintaining a low computational complexity while improving The accuracy of the forecast.

The historical monitoring data acquisition module acquires historical monitoring data of the air quality monitoring base station and establishes a historical database; Historical monitoring data includes AQI data, meteorological data, and weather type data;

Those skilled in the art will appreciate that embodiments of the present application can be provided as a method, system, or computer program product. Thus, the present application can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment in combination of software and hardware. Moreover, the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Claims

A city small-scale air quality index prediction method, which is characterized by:

S1, the urban area is divided into regions by a grid, and the intersection of the grid corresponds to the location of the air quality index to be predicted;

S2, obtaining historical monitoring data of each air quality monitoring base station, and establishing a historical database;

S3. Based on the monitoring data of multiple time series of each base station in the historical database, establish a time prediction model corresponding to the current time prediction and the prediction of each time in the future time period;

S4, based on monitoring data of the base station at the same time by each air quality in the historical database, using a two-dimensional linear interpolation method to establish a spatial prediction model for air quality prediction at a specified coordinate;

S5, obtaining traffic data and geographic interest point data of each to-be-predicted location and air quality monitoring base station, and air quality index data of each base station to be predicted and the air quality monitoring base station at corresponding moments;

Based on the acquired data, a dynamic prediction model that characterizes the relationship between traffic data and geographic interest point data and air quality index is established;

S6, obtaining indoor air quality index shared by the user, user living environment data, and air quality index data of the corresponding location, and establishing an indoor and outdoor prediction model for characterizing the relationship between the indoor air quality index and the outdoor air quality index;

S7, cooperatively training the established time prediction model, spatial prediction model, dynamic prediction model and indoor and outdoor prediction model for any predicted location of any real-time moment of the air quality index to be predicted, so as to predict the prediction results of all models The fusion is performed to obtain the predicted values of the air quality index of each of the to-be-predicted locations at respective current moments and at various moments in the future.
The method of claim 1 further comprising:

S8, real-time assessment of the accuracy of the air quality index prediction results:

S81, using the K-fold cross-validation algorithm to evaluate the accuracy of the current time prediction result:

S811, assuming that the number of base stations is N, all base stations are divided into K shares, each number is 1, 2, ..., k, k+1, ..., K, each having c = N / K base stations;

S812, removing the kth part from the K base station, and remaining K-1 base stations as known data;

S813, based on the data of the known K-1 base stations, obtain the AQI prediction value of each base station in the removed kth base station at the current time, and record it as

S814. Obtain the measured AQI values y 1 , y 2 , . . . , y c of each base station in the kth part, and the accuracy of the current time predicted value result is described by the following indicator η k :

S815, traversing k from 1 to K, repeating steps S712 to S714 respectively, and then obtaining an accuracy index η of the prediction system at the current time is:

The closer η is to 1, the higher the accuracy of the current time prediction of the system;

S82, assessing the accuracy of the predicted results for a period of time:

Assume that the predicted value of all base stations corresponding to a specified time in the future is predicted to be
The actual measured values of the base stations at the specified time are z 1 , z 2 ,..., z N , and the prediction accuracy of the prediction system for future moments is:

The closer the ψ is to 1, the higher the accuracy of the system's prediction of future moments.
The method according to claim 1, wherein the monitoring data monitored by the air quality monitoring base station comprises date and time, base station name, base station latitude and longitude, AQI data, temperature, air pressure, wind, humidity and weather type data;

The step of establishing a temporal prediction model based on the historical database in step S3 includes the steps of:

S31, specifying the length of the historical sequence l 1 and the prediction period, that is, the length of the future sequence is l 2 , and the data of the current time is x n , then the historical sequence is
The future sequence is

Extracting all consecutive sequence data of l 1 +1+l 2 hours in the historical database to form an initial training data set S 1 ;

S32, establishing l 2 +1 multiple linear regression models, each of the multiple linear regression models respectively corresponding to the current time and the prediction of each moment in the next 12 hours, expressed as:

Y 1 =β 0 +β 1 X 1 +β 2 X 2 +...+β p X p

Where β i is the regression coefficient, X i is the model input data, and Y 1 is the air quality index of the time to be predicted;

For the current time prediction, the model input data is the 1 hour historical AQI data and the temperature, air pressure, wind, humidity and weather type data of the hour at the current time;

For the prediction of each moment in the next 12 hours, the model input data is the current time AQI data, l 1 hour historical AQI data, and current temperature, air pressure, wind, humidity and weather type data.
The method according to claim 3, wherein in step S4, a spatial prediction model for air quality prediction at a specified coordinate is established by using a two-dimensional linear interpolation method, comprising:

S41: acquiring real-time monitoring data of all known air quality index locations in the historical database at the same time, and latitude and longitude data of corresponding locations, forming a training data set S 2 of the spatial prediction model;

S42, the coordinate of the location to be predicted is (x, y), and the spatial prediction model for air index prediction of the location is expressed as:

Y 2 =griddata(x,y,S 2 )

The input of the model is S 2 , the output of the model is the air quality index of the location to be predicted, and the griddata () represents the two-dimensional interpolation function.
The method according to claim 4, wherein in step S5, the traffic data comprises length data of the smooth path segment, the slow road segment and the congestion road segment in the set radius region of each of the to-be-predicted locations and the air quality monitoring base station.

The geographic point of interest data includes distribution data of geographic object entities within a set radius area around the base station and the air quality monitoring base station; the geographic object entity types include schools, banks, restaurants, and gas stations.

Step S5 uses a multiple linear regression method to establish a dynamic prediction model, including the steps:

S51: Obtain traffic data and geographic interest point data within a given radius of each base station corresponding to each time point in the historical database, and the traffic data includes the proportion data of the length of the smooth path segment, the slow road segment, and the congestion segment length, and is defined as T 1 . T 2 , T 3 , geographic interest point data includes the number of geographical interest points distributed within a given radius of the base station, defined as T 4 , T 5 , . . . , T q , and air quality index monitoring data of the corresponding base station at the corresponding time, Establish an initial training set S 3 ;

S52, establishing a dynamic prediction model, expressed as:

Y 3 =α 0 +α 1 T 1 +α 2 T 2 +α 3 T 3 +α 4 T 4 +...++α q T q

Where α i is the regression coefficient, the model input quantity is the traffic data and geographic interest point data within a given radius of the to-be-predicted location at the specified time, and the model output quantity Y 3 is the air quality index of the point to be predicted.
The method according to claim 5, wherein the step S6 uses a regression tree algorithm to establish an indoor and outdoor prediction model, comprising the steps of:

S61. Acquire air quality index data monitored by each base station at a plurality of specified moments in the historical database, and indoor air quality index data and indoor air quality index related data shared by users at corresponding moments in the corresponding time, and the indoor air quality index related data includes the type of the building environment. , floor, distance from the main road, whether to open the central air conditioner, whether to open the window ventilation and whether to open the air purifier; based on the acquired data to establish an initial training set S 4 of the indoor and outdoor prediction model;

S62, establishing an indoor and outdoor prediction model, expressed as:

Y 4 =RT(M,S 4 )

The model input quantity is the indoor air quality index M shared by the user acquired at the time to be predicted, and the indoor air quality index related data, and the model output quantity Y 4 is the air quality index data of the to-be-predicted place at the time to be predicted.
The method according to claim 6, wherein step S7 performs cooperative training on the established time prediction model, spatial prediction model, dynamic prediction model and indoor and outdoor prediction model, including:

S71, the time prediction model, the spatial prediction model, the dynamic prediction model, and the indoor and outdoor prediction models are predictors F 1 , F 2 , F 3 , and F 4 , respectively, and the training sets of each predictor are respectively recorded as L 1 , L 2 , L 3 , L 4 , initialize the training set to:

L 1 =S 1 , L 2 =S 2 , L 3 =S 3 , L 4 =S 4 ;

The weight vector for initializing each predictor prediction result is [w 1 , w 2 , w 3 , w 4 ], and the sum of the four weighting factors is equal to 1.

S72, based on the training set L 1, L 2, L 3 , L 4 , respectively, is trained F 1, F 2, F 3 , F 4;

S73. Acquire a model input quantity data corresponding to each predictor at a time to be predicted at a time to be predicted, and use the acquired data to calculate a predicted value of the to-be-predicted time by using four predictors obtained by the training for each to-be-predicted place. Recorded as:

Y 1 =F 1 (x,y)

Y 2 =F 2 (x,y)

Y 3 =F 3 (x,y)

Y 4 =F 4 (x,y)

S74. For each location to be predicted, the AQI fusion value at the time to be predicted is:

S75, defining a deviation threshold Rth of the prediction result, and calculating a sum of deviations of the prediction results of the four predictors:

S76: For each location to be predicted, compare the calculated R x,y with the deviation threshold R th respectively, if:

Then exit the loop, using Y 0 as the predicted air quality index of each to-be-predicted location at the time to be predicted; otherwise, go to step S77;

S77, from all the to-be-predicted locations, in the order of the corresponding R x, y from small to large, select n locations to be predicted, and record:

S={(x 1 , y 1 ), (x 2 , y 2 ), ..., (x n , y n )};

S78, updating the training set of each predictor is: L 1 ={L 1 ,S}, L 2 ={L 2 ,S}, L 3 ={L 3 ,S}, L 4 is the current S 4 ; Go to step S72, and repeat steps S72 to S78 to continue training until the step S76 is satisfied.
Then, Y 0 corresponding to the time of satisfaction is used as the predicted value of the air quality index of each to-be-predicted location at the time to be predicted.
The method according to claim 7, wherein if the corresponding AQI prediction value cannot be obtained by the predictor F 4 in step S73, the AQI fusion value is calculated in the following formula using the following formula:
A city small-scale air quality index prediction system, which is characterized by:

The area dividing module divides the urban area into a grid-shaped advancing area, and the grid intersection points correspond to the location of the air quality index to be predicted;

The historical monitoring data acquisition module acquires historical monitoring data of each air quality monitoring base station and establishes a historical database;

a time prediction model establishing module, based on monitoring data of multiple time series of each base station in the historical database, establishing a time prediction model corresponding to the current time prediction and each time prediction in a future time period;

The spatial prediction model building module monitors the monitoring data of the base station at the same time based on the respective air quality in the historical database, and uses the two-dimensional linear interpolation method to establish a spatial prediction model for air quality prediction at the specified coordinates;

The dynamic prediction model establishing module acquires traffic data and geographic interest point data of each to-be-predicted location and air quality monitoring base station, and air quality index data of each base station to be predicted and the air quality monitoring base station at the corresponding time; based on the acquired data, Establish a dynamic prediction model that characterizes the relationship between traffic data and geographic point of interest data and air quality index;

The indoor and outdoor prediction model building module acquires the indoor air quality index shared by the user, the user living environment data, and the air quality index data of the corresponding place, and establishes an indoor and outdoor prediction model that characterizes the relationship between the indoor air quality index and the outdoor air quality index;

The collaborative training module cooperatively trains the established time prediction model, spatial prediction model, dynamic prediction model and indoor and outdoor prediction model for any predicted location of any real-time moment of the air quality index to be predicted, so as to The prediction results are combined to obtain the predicted values of the air quality index of each of the to-be-predicted locations at respective current moments and at various moments in the future.