CN108053071A

CN108053071A - Regional air pollutant concentration Forecasting Methodology, terminal and readable storage medium storing program for executing

Info

Publication number: CN108053071A
Application number: CN201711393032.2A
Authority: CN
Inventors: 戈燕红; 舒少君; 雷涛
Original assignee: Universtar Science and Technology Shenzhen Co Ltd
Current assignee: Universtar Science and Technology Shenzhen Co Ltd
Priority date: 2017-12-21
Filing date: 2017-12-21
Publication date: 2018-05-18

Abstract

The embodiment of the present invention provides a kind of regional air pollutant concentration Forecasting Methodology, terminal and computer readable storage medium.The described method includes：Using the per day Historical Pollution object concentration data collection being calculated and pretreated day history meteorological dataset as sample data set, it is trained using Random Forest model, wherein, Random Forest model includes more decision trees, and every decision tree is realized using multilayer feedforward neural network；Determine the prediction meteorological data for the following preset number of days predicted on the day of current time；Prediction meteorological data is pre-processed；According to pretreated prediction meteorological data, the pollutant concentration data of current time same day monitoring, trained Random Forest model is utilized to predict the pollutant concentration data of the following preset number of days in region to be predicted.The embodiment of the present invention improves regional air pollutant concentration precision of prediction, has stronger generalization ability.

Description

Regional air pollutant concentration Forecasting Methodology, terminal and readable storage medium storing program for executing

Technical field

The present invention relates to technical field of data processing more particularly to a kind of regional air pollutant concentration Forecasting Methodology, ends End and computer readable storage medium.

Background technology

In recent years, China's most area frequently occurs the air pollution episode of large area, and air pollution problems inherent is tight The normal life production of the people is influenced again.Common atmosphere pollution, such as PM2.5, PM10, O3 are to human health risk pole To be serious.The generation of air pollution and variation tendency are predicted using reliable, easy air prediction methods, make portion of government Door understands following Air Quality in time, takes counter-measure in advance in face of heavy air contamination accident, ensures the life of the people Health realizes that maximum social benefit is particularly important with lowest economic cost.

At present, the main method of air pollution forecasting includes two kinds of statistical fluctuation and numerical forecast.Numerical Prediction Method is led to It crosses and the physics and chemical process of atmosphere pollution is modeled, realize dirty to following air under different scale, different background The simulation of dye situation.Need that the physical chemistry transformation mechanism of air pollution is carried out in detail to understand due to Numerical Prediction Method and Modeling, and modeling and application process are complicated, not only need high performance computer cluster, it is also desirable to high-resolution meteorological data With pollution sources listings data as supporting, these are resulted in the more difficult reality in the relatively low small and medium-sized cities of some detection levels or region It is existing.

Method based on statistical fluctuation is according to statistical principle, using specific statistical algorithms from air pollutants Concentration, meteorological factor and other relevant historical datas find out mutual transformation rule, thus this method need not be to pollution The physical and chemical process of object is modeled.However the research of existing statistical method and application mainly using multiple linear regression as Main, multiple linear regression nonlinear fitting ability is poor, and prediction effect precision is not high.

The content of the invention

The embodiment of the present invention provides a kind of regional air pollutant concentration Forecasting Methodology, terminal and computer-readable storage medium Matter can predict regional air pollutant concentration, improve regional air pollutant concentration precision of prediction, have stronger Generalization ability.

In a first aspect, an embodiment of the present invention provides a kind of regional air pollutant concentration Forecasting Methodology, this method includes：

The monitoring pollution object concentration data collection that season all monitoring points are corresponded to according to region current time to be predicted calculates Obtain per day Historical Pollution object concentration data collection；

Determine that region current time to be predicted corresponds to the day history meteorological dataset in season；

Day history meteorological dataset is pre-processed；

Using per day Historical Pollution object concentration data collection and pretreated day history meteorological dataset as sample data Collection, is trained using Random Forest model, wherein, Random Forest model includes more decision trees, and every decision tree uses Multilayer feedforward neural network is realized；

Determine the prediction meteorological data for the following preset number of days predicted on the day of current time；

Prediction meteorological data is pre-processed；

According to pretreated prediction meteorological data, the pollutant concentration data of current time same day monitoring, training is utilized Good Random Forest model predicts the pollutant concentration data of the following preset number of days in region to be predicted.

Second aspect, an embodiment of the present invention provides a kind of terminal, which includes performing above-mentioned first aspect institute The unit for the method stated.

The third aspect, an embodiment of the present invention provides another terminal, including processor, input equipment, output equipment and Memory, the processor, input equipment, output equipment and memory are connected with each other, wherein, the memory should for storing With program code, the processor is arranged to call said program code, performs the method described in above-mentioned first aspect.

Fourth aspect, an embodiment of the present invention provides a kind of computer readable storage medium, the computer storage media Computer program is stored with, the computer program includes program instruction, and described program instruction makes institute when being executed by a processor State the method described in the above-mentioned first aspect of processor execution.

The embodiment of the present invention is by by the per day Historical Pollution object concentration data collection being calculated and pretreated day History meteorological dataset is trained using Random Forest model as sample data set, wherein, Random Forest model includes More decision trees, every decision tree are realized using multilayer feedforward neural network；Determine that the future of prediction on the day of current time presets The prediction meteorological data of number of days；Prediction meteorological data is pre-processed；According to it is pretreated prediction meteorological data, it is current when Between same day monitoring pollutant concentration data, the future for predicting region to be predicted using trained Random Forest model presets day Several pollutant concentration data.The embodiment of the present invention carries out the pollutant concentration in region using trained Random Forest model Prediction, improves regional air pollutant concentration precision of prediction, has stronger generalization ability.

Description of the drawings

Technical solution in order to illustrate the embodiments of the present invention more clearly, below will be to needed in embodiment description Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is some embodiments of the present invention, general for this field For logical technical staff, without creative efforts, other attached drawings are can also be obtained according to these attached drawings.

Fig. 1 is a kind of flow diagram of regional air pollutant concentration Forecasting Methodology provided in an embodiment of the present invention；

Fig. 2 is a kind of sub-process schematic diagram of regional air pollutant concentration Forecasting Methodology provided in an embodiment of the present invention；

Fig. 3 is a kind of another sub-process signal of regional air pollutant concentration Forecasting Methodology provided in an embodiment of the present invention Figure；

Fig. 4 is a kind of another sub-process signal of regional air pollutant concentration Forecasting Methodology provided in an embodiment of the present invention Figure；

Fig. 5 is a kind of another sub-process for regional air pollutant concentration Forecasting Methodology that another embodiment of the present invention provides Schematic diagram；

Fig. 6 is a kind of schematic block diagram for terminal that the present invention applies example offer；

Fig. 7 is the schematic block diagram of average calculation unit provided in an embodiment of the present invention；

Fig. 8 is the schematic block diagram of the first pretreatment unit provided in an embodiment of the present invention；

Fig. 9 is the schematic block diagram of predicting unit provided in an embodiment of the present invention；

Figure 10 is the schematic block diagram for the predicting unit that another embodiment of the present invention provides；

Figure 11 is a kind of schematic block diagram for terminal that another embodiment of the present invention provides.

Specific embodiment

Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is part of the embodiment of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, the every other implementation that those of ordinary skill in the art are obtained without making creative work Example, belongs to the scope of protection of the invention.

It should be appreciated that ought use in this specification and in the appended claims, term " comprising " and "comprising" instruction Described feature, entirety, step, operation, the presence of element and/or component, but it is not precluded from one or more of the other feature, whole Body, step, operation, element, component and/or its presence or addition gathered.

In the specific implementation, the terminal described in the embodiment of the present invention is including but not limited to such as with touch sensitive surface The mobile phone, laptop computer or tablet computer of (for example, touch-screen display and/or touch tablet) etc it is portable Equipment.It is to be further understood that in certain embodiments, the equipment is not portable communication device, but it is quick with touching Feel the desktop computer of surface (for example, touch-screen display and/or touch tablet).

Pollutant concentration refers to the amount of contained pollutant in unit volume.Due to Spatial Difference, the pollutant of different zones Concentration will be different, and the pollutant concentration of the same area can change according to the difference of seasonal climate, therefore dense to pollutant It it is facing area, season when degree is monitored and predicts.It, can when carrying out pollutant concentration monitoring and prediction according to region The otherness with spatial distribution is distributed in drain time to lower different zones, avoids the influence of single monitoring point exception.According to When season carries out pollutant concentration monitoring and prediction, the influence of the pollutant emission otherness of Various Seasonal can be reduced.Wherein, Season can be divided according to following rule：Winter is 2 months December to next years, spring is March to May, summer is June to 8 The moon, autumn are 9 to November.Different divisions can be carried out to season with specific reference to different regions.

Fig. 1 is a kind of flow diagram of regional air pollutant concentration Forecasting Methodology provided in an embodiment of the present invention.It should Method includes S101-S107.

S101 corresponds to the monitoring pollution object concentration data collection of season all monitoring points according to region current time to be predicted Per day Historical Pollution object concentration data collection is calculated.

Wherein, multiple miniature monitoring points can be disposed in region to be predicted, the plurality of miniature monitoring point there are multiple pollutions Object monitor, such as PM2.5, PM10, O3 monitor.The plurality of miniature monitoring point can monitor dirt in such as 5 minutes every preset time Object concentration data is contaminated, therefore has multiple time points in daily can all there are pollutant concentration data.It should be noted that in text The pollutant concentration data mentioned refer to same pollutant concentration data.

The monitoring pollution object concentration data collection that season all monitoring points are corresponded to according to region current time to be predicted calculates Obtain per day Historical Pollution object concentration data collection.Specifically, obtain region current time to be predicted and correspond to season all prisons The monitoring pollution object concentration data collection of measuring point, wherein, generally include the pollutant concentration data monitored for many years；In daily The data that multiple time points obtain are handled to obtain corresponding pollutant concentration data of multiple time points, by multiple time points pair The pollutant concentration data answered averagely are obtained per day Historical Pollution object concentration data, compute repeatedly to obtain to be predicted Region current time corresponds to the dated per day Historical Pollution object concentration data collection of season institute.Current time, which corresponds to season, to be owned Date refers to current time and current time corresponding date in corresponding season.At regional pollution object concentration data Reason obtains per day Historical Pollution object concentration data, can reduce the influence of the exceptional value of individual time point in one day.

In one embodiment, as shown in Fig. 2, the data obtained to a time point are handled to obtain time point correspondence Pollutant concentration data, including S201-S204.

S201, at the exceptional value of the monitoring pollution object concentration data of a time point corresponding all monitoring points Reason.Specifically, whether judge the monitoring pollution object concentration data of corresponding monitoring point has missing values, extremum；If there is missing values, root Missing values are substituted according to the average value of the Proximity detection point of corresponding monitoring point；If there is extremum, using corresponding pollutant concentration data Boundary value replace；If without monitoring point in neighborhood, then reject the corresponding data of corresponding monitoring point current point in time.Wherein, pole End value can be understood as being more than the value for corresponding to scope, such as assume the concentration range of PM2.5 for [0,500], unit is often stood for microgram Square rice, then extremum is exactly the value more than 500 and the value less than 0, and the boundary value of corresponding PM2.5 is respectively 0 and 500. If monitoring, the value of PM2.5 is less than 0, then is replaced with the boundary value 0 of PM2.5；If monitoring, the value of PM2.5 is more than 500, then It is replaced with the boundary value 500 of PM2.5.Wherein neighborhood is interpreted as all point sets of the distance less than K of the corresponding monitoring point of distance.K 3km can be taken, other can also be taken suitably to be worth.

S202 carries out gridding to region to be predicted and obtains multiple grid cells.

Wherein, grid cell size is configured according to region area size to be predicted, such as could be provided as 1km*1km. Rule setting can be carried out, irregular setting can also be carried out, is forest as being corresponding with a big panel region in region to be predicted, The region of forest can be so arranged to a grid cell.

S203 corresponds to the pollutant concentration data of net region using the multiple grid cells of interpolation algorithm calculating.

It is to be understood that monitoring point is not necessarily both provided in each grid cell, therefore to being not provided with monitoring point Grid cell calculates the pollutant concentration data of corresponding net region using interpolation algorithm, and so each grid cell corresponds to grid There are pollutant concentration data in region.Wherein, interpolation algorithm is Kriging regression algorithm, or other suitable interpolation are calculated Method.

S204 calculates the average value that multiple grid cells correspond to the pollutant concentration data of net region, obtains described treat The corresponding pollutant concentration data of region current point in time of prediction.

The average value that multiple grid cells correspond to the pollutant concentration data of net region is calculated, using the average value as treating The corresponding pollutant concentration data of region current point in time of prediction.

Embodiment shown in Fig. 2 is carried out by obtaining the monitoring pollution object concentration data of all monitoring points to a time point Outlier processing, and grid cell division is carried out to region to be predicted, and calculate each grid cell with interpolation algorithm and correspond to The pollutant concentration data in region, the pollutant concentration data of each grid cell corresponding region to obtaining are averaged, with The pollutant concentration data at corresponding time point are obtained, so improve the accuracy of the pollutant concentration data at corresponding time point.

S102 determines that region current time to be predicted corresponds to the day history meteorological dataset in season.

Day history meteorological dataset includes but not limited to following meteorological factor：(1) temperature：When daily 2,8 when, 14 when, 20 When temperature and it is daily 2 when, 8 when, 14 when, 20 when alternating temperature, daily mean temperature, daily minimal tcmperature, daily maximum temperature, when 7 The temperature difference of table, the temperature of 700hpa, 850hpa, 925hpa and each layer；(2) air pressure：When daily 2,8 when, 14 when, 20 when air pressure With daily 2 when, 8 when, 14 when, 20 when transformation, per day air pressure, per day draught head；(3) humidity：When daily 2,8 when, 14 When, 20 when relative humidity, per day relative humidity；(4) wind：Cardinal wind, per day wind speed, 24 it is small when wind speed variable；(5) It is other：Per day rainfall, per day cloud amount, per day intensity of illumination.Wherein, daily 8 when alternating temperature when referring to the previous day 8 Temperature of temperature when subtracting the same day 8, when daily mean temperature refers to daily 2,8 when, 14 when, 20 when temperature average value, Air pressure when transformation when daily 8 refers to the previous day 8 subtracts the air pressure during same day 8, and per day draught head was referred in one day Face highest air pressure subtracts the value of barometric minimum.

S103 pre-processes day history meteorological dataset.

In one embodiment, as shown in figure 3, being pre-processed to day history meteorological dataset, including S301-S302.

S301 is filtered the corresponding characteristic attribute of day history meteorological dataset using correlation coefficient process.

Wherein, the corresponding characteristic attribute of day history meteorological dataset, it can be understood as specific meteorological factor, such as gas when 2 Temperature, daily mean temperature, daily maximum temperature etc..By calculating each characteristic attribute and the dirt for needing to predict in day history meteorological dataset The related coefficient of object concentration is contaminated, is filtered corresponding characteristic attribute according to related coefficient.Such as Pearson correlation coefficient, with Related coefficient 0.2 is used as threshold values, and being less than the threshold values, then characteristic attribute filters out, and so realizes the preliminary filtering of characteristic attribute. It is to be appreciated that some characteristic attributes in day history meteorological dataset the pollutant concentration that needs are predicted is not influenced or Person influences little, then can will retain without influencing or influencing less corresponding characteristic attribute filtering to pollutant concentration There is the characteristic attribute of larger impact.

S302 is normalized the day history meteorological dataset after filtering according to normalized formula.

For the day history meteorological dataset after filtering, in order to avoid scale caused by different dimensions differs, it is necessary to will not The meteorological data obtained with dimension is converted into the meteorological data of identical dimension.It therefore can be to the day history meteorological dataset after filtering It is normalized, min-max standardized methods may be employed in normalization processing method.Wherein, corresponding standardization Formula is：Y*=0.1+0.8* (Y_i-Y_min)/(Y_max-Y_min), wherein, Y_iRepresent the value of ith feature attribute, Y_maxRepresent the spy Levy the corresponding maximum of attribute, Y_minAfter representing that the corresponding minimum value of this feature attribute, Y* represent the normalization of this feature attribute Value.

S104, using per day Historical Pollution object concentration data collection and pretreated day history meteorological dataset as sample Data set is trained using Random Forest model, wherein, Random Forest model includes more decision trees, every decision tree It is realized using multilayer feedforward neural network.

Wherein, random forests algorithm is a kind of Ensemble Learning Algorithms, which is integrated by constructing more decision trees Learn to improve the accuracy and generalization of prediction, effectively avoid the over-fitting problem of single algorithm, improve to air pollution The noise of quality testing measured data and the tolerance of exceptional value.Characteristic attribute is randomly selected using the algorithm, avoids feature selecting Irrationality problem, the final prediction for realizing Regional Atmospheric Pollution object concentration.The prediction model of single tree chooses multilayer feedforward Neural fusion.Wherein, multilayer feedforward neural network refers to BP neural network, unifies to use BP neural network below.

Specifically, random forests algorithm parameter is set, and number scale P, every decision tree such as random forest have that puts back to take Number k of characteristic attribute etc. is extracted in the times N of sample data, each sample data；For every decision tree：Using Bootstrap samplings have N number of sample data generation training of the random selection put back to from the sample data concentration after normalization Collection；K feature is randomly choosed from the characteristic attribute set of sample data, is carried according to k feature of selection from training subset Take proper subspace training set；BP neural network is initialized, input feature vector subspace training set carries out the training of single decision tree. The training of all decision trees is completed in the operation for repeating above every decision tree.

Wherein, the number k for characteristic attribute being extracted in each sample data is generated at random in section [4, m], and m is sample The dimension of the characteristic attribute collection of notebook data.Wherein, P can be 100 etc..Single decision tree is using BP nerves in Random Forest model Network algorithm realizes that the input layer number of the BP neural network algorithm is using the parameter k of generation, output layer 1, hidden layer number According toIt determines, wherein, a values in [1,10]., it is necessary to set BP neural network when initializing BP neural network Error, maximum supervision number that end condition, each iteration allow etc..Wherein it is possible to maximum iteration is set for BP nerves The end condition of network, maximum iteration can be 2000 etc..

Above step S101-S104 is the process of Random Forest model training.It is dense by calculating per day Historical Pollution object Degrees of data collection and the pretreatment to day history meteorological data improve reliability and the accuracy of sample data.Further according to more Accurate and reliable sample data, is trained using Random Forest model, improves the generalization ability of model.

Complete the training of Random Forest model, it is possible to carry out air pollution using the Random Forest model trained Object concentration prediction.

S105 determines the prediction meteorological data for the following preset number of days predicted on the day of current time.

Wherein, weather prognosis pattern such as WRF (The Weather Research and may be employed in prediction meteorological data Forecasting Model) etc. prediction result obtain.WRF patterns are initial and American National air may be employed in boundary condition The global meteorological data of the Global Forecast System (GFS) of research center (NCAR) and the pre- measured center of U.S. environment (NCEP), landform USGS 30s whole world landform and MODIS underlying surface grouped datas may be employed with underlying surface data.Predict that meteorological data can also It is obtained using other weather prognosis patterns.Preset number of days can be 1 day, or 7 days etc..

S106 pre-processes prediction meteorological data.

Specifically, the corresponding characteristic attribute of prediction meteorological data is filtered using correlation coefficient process；According to normalization The prediction meteorological data after filtering is normalized in processing formula.

It should be noted that it is to predicting that meteorological data is pre-processed and pretreatment is carried out to day history meteorological dataset It is consistent, in order to obtaining the prediction meteorological data of identical data attribute.Specifically prediction meteorological data is pre-processed Mode refer to the mode to being pre-processed to day history meteorological dataset, and details are not described herein.

S107 according to pretreated prediction meteorological data, the pollutant concentration data of current time same day monitoring, is utilized Trained Random Forest model predicts the pollutant concentration data of the following preset number of days in region to be predicted.

In one embodiment, as shown in figure 4, being monitored according to pretreated prediction meteorological data, the current time same day Pollutant concentration data predict that the pollutant of the following preset number of days in region to be predicted is dense using trained Random Forest model Degrees of data, i.e. step S107 include S401-S404.

S401 obtains the prediction pollutant concentration data of nearest L days calculated using trained Random Forest model.

Specifically, the actual pollutant concentration data calculated according to the same day corresponding meteorological data and the previous day, i.e., before The averaged historical pollutant concentration data of one day are predicted the pollutant concentration on the same day with trained Random Forest model, are obtained To the prediction pollutant concentration data several days following later one day after and one day after.Wherein, it should be noted that carry here To the same day corresponding meteorological data be by pretreated data.In this way, calculate the prediction pollutant concentration of nearest L days Data, wherein, daily prediction pollutant concentration data are corresponding with multiple.Wherein, the value of L can be 7, or other Suitable data.

S402, according to the prediction pollutant concentration data of nearest L days and the pollutant concentration data of actual monitoring, using by mistake Poor calculation formula calculates the prediction error of every decision tree, wherein, error calculation formula is：

Wherein, MSN represents the prediction error of each tree, and the time span that n expressions are predicted every time, unit is day, such as the same day The prediction air pollutant concentration three days following, then the value of n is 3.y_tiIt is predicted for using Random Forest model within t days The pollutant concentration i-th day following, y_t' calculated for the in t days according to the pollutant concentration data of actual monitoring it is per day Historical Pollution object concentration data.In nearest L days, daily foreseeable time span n is constant, therefore daily in L days recently Prediction pollutant concentration data it is related with n.Prediction error daily in n days future of the prediction same day is considered in error formula, Predict the error that following first day of prediction of the same day is not only allowed in error, it is also considered that following second day of same day prediction to the The error of n days.So improve calculate prediction error accuracy, can so make subsequently with the relevant calculating of prediction error As a result it is more accurate.

S403 according to pretreated prediction meteorological data, the pollutant concentration data of current time same day monitoring, is utilized Trained Random Forest model is predicted, obtains the prediction result of more decision trees.

Worked as according to what the pollutant concentration data of pretreated prediction meteorological data, current time same day monitoring calculated It averaged historical pollutant concentration data, are predicted, every decision tree is corresponding with using trained Random Forest model One prediction result, then more decision trees can obtain multiple prediction results.

S404 selects to predict following preset number of days of the prediction result of the small correspondence decision tree of error as region to be predicted Pollutant concentration data.

In another embodiment, as shown in figure 5, according to pretreated prediction meteorological data, current time same day monitoring Pollutant concentration data, trained Random Forest model is utilized to predict the pollutant of the following preset number of days in region to be predicted Concentration data, i.e. step S107 include S501-S504.Fig. 5 and Fig. 4 embodiments difference lies in：Add step S503 and The difference of S505.

The prediction error of every decision tree is normalized S503, to obtain the weight of every decision tree.Specifically, The calculation formula of weight is according to equation below：

Wherein, MSN represents the prediction error of each tree, and P represents the number of decision tree, v_iRepresent the transition factor, r_iRepresent every The weight of decision tree.

Region to be predicted is calculated according to the weight of every decision tree, the corresponding prediction result of every decision tree in S505 Following preset number of days pollutant concentration data.Specifically, the weight of every decision tree is multiplied by every decision tree future in advance If corresponding prediction result obtains pollutant concentration data daily in the following preset number of days in region to be predicted in number of days.

In other embodiments, after corresponding prediction result in every decision tree future preset number of days is calculated, also may be used The corresponding prediction result weighted average of every decision tree directly is obtained daily corresponding prediction result in following preset number of days.

Step S105-S107 is the flow predicted, can be to the air pollutants of current time future preset number of days Concentration data is predicted.The meteorological data of prediction is pre-processed, to improve the confidence level of meteorological data and and air The degree of association between pollutant concentration, then when being predicted, according to more accurately predicting error so that the result of prediction is more accurate Really, the generalization ability of Random Forest model is further improved.

It should be noted that trained Random Forest model also needs to periodically be trained again, so that new data can To feed back in the model, the accuracy of model prediction is improved, improves the generalization ability of model.

Fig. 6 is a kind of schematic block diagram of terminal provided in an embodiment of the present invention.As shown in fig. 6, the terminal 60 includes meter Calculate unit 601, the first determination unit 602, the first pretreatment unit 603, training unit 604, the second determination unit 605, second Pretreatment unit 606, predicting unit 607.

Computing unit 601, for corresponding to the monitoring pollution of season all monitoring points according to region current time to be predicted Per day Historical Pollution object concentration data collection is calculated in object concentration data collection.

In one embodiment, computing unit 601 includes first acquisition unit, average calculation unit.Wherein, first list is obtained Member, for obtaining the monitoring pollution object concentration data collection that region current time to be predicted corresponds to season all monitoring points, wherein, Generally include the pollutant concentration data monitored for many years；Average calculation unit, for being obtained to interior multiple time points daily Data handled to obtain corresponding pollutant concentration data of multiple time points, by corresponding pollutant concentration of multiple time points Data are averagely obtained per day Historical Pollution object concentration data, compute repeatedly to obtain region current time pair to be predicted It should the dated per day Historical Pollution object concentration data collection of season institute.Current time corresponds to all dates in season and refers to corresponding to Current time and current time corresponding date in season.Regional pollution object concentration data is handled to obtain per day go through History pollutant concentration data, can reduce the influence of the exceptional value of individual time point in one day.

In one embodiment, as shown in fig. 7, average calculation unit 70 includes outlier processing unit 701, gridding unit 702nd, interpolating unit 703, time point computing unit 704.

Outlier processing unit 701, for the monitoring pollution object concentration numbers to a time point corresponding all monitoring points According to exceptional value handled.

Specifically, whether judge the monitoring pollution object concentration data of corresponding monitoring point has missing values, whether has extremum；If There are missing values, missing values are substituted according to the average value of the Proximity detection point of corresponding monitoring point；If there is extremum, polluted using corresponding The boundary value of object concentration data is replaced；If without monitoring point in neighborhood, then reject the corresponding number of corresponding monitoring point current point in time According to.Wherein, extremum can be understood as being more than the value for corresponding to scope, such as assume the concentration range of PM2.5 for [0,500], unit For every cubic metre of microgram, then extremum is exactly the value more than 500 and the value less than 0, the boundary value point of corresponding PM2.5 It Wei 0 and 500.If monitoring, the value of PM2.5 is less than 0, then is replaced with the boundary value 0 of PM2.5；If monitor the value of PM2.5 More than 500, then replaced with the boundary value 500 of PM2.5.Wherein neighborhood is interpreted as the distance of the corresponding monitoring point of distance less than K's All point sets.K can take 3km, other can also be taken suitably to be worth.

Gridding unit 702 obtains multiple grid cells for carrying out gridding to region to be predicted.

Interpolating unit 703, for corresponding to the pollutant concentration of net region using the multiple grid cells of interpolation algorithm calculating Data.It is to be understood that not necessarily it is both provided with monitoring point, therefore the grid to being not provided with monitoring point in each grid cell Unit calculates the pollutant concentration data of corresponding net region using interpolation algorithm, and so each grid cell corresponds to net region There are pollutant concentration data.Wherein, interpolation algorithm is Kriging regression algorithm, or other suitable interpolation algorithms.

Time point computing unit 704, for calculating the pollutant concentration data that multiple grid cells correspond to net region Average value obtains the corresponding pollutant concentration data of the region current point in time to be predicted.Calculate multiple grid cells pair The average value of the pollutant concentration data of net region is answered, the average value is corresponding as region current point in time to be predicted Pollutant concentration data.

Average calculation unit 70 is carried out by obtaining the monitoring pollution object concentration data of all monitoring points to a time point Outlier processing, and grid cell division is carried out to region to be predicted, and calculate each grid cell with interpolation algorithm and correspond to The pollutant concentration data in region, the pollutant concentration data of each grid cell corresponding region to obtaining are averaged, with The pollutant concentration data at corresponding time point are obtained, so improve the accuracy of the pollutant concentration data at corresponding time point.

First determination unit 602, for determining that region current time to be predicted corresponds to the day history meteorological data in season Collection.

First pretreatment unit 603, for being pre-processed to day history meteorological dataset.

In one embodiment, as shown in figure 8, the first pretreatment unit 603 includes filter element 801, normalization unit 802。

Filter element 801, for being carried out using correlation coefficient process to the corresponding characteristic attribute of day history meteorological dataset Filter.Wherein, the corresponding characteristic attribute of day history meteorological dataset, it can be understood as specific meteorological factor, such as temperature, day when 2 Temperature on average, daily maximum temperature etc..By calculating each characteristic attribute and the pollutant for needing to predict in day history meteorological dataset Corresponding characteristic attribute is filtered by the related coefficient of concentration according to related coefficient.Such as Pearson correlation coefficient, with correlation Coefficient 0.2 is used as threshold values, and being less than the threshold values, then characteristic attribute filters out, and so realizes the preliminary filtering of characteristic attribute.It can be with With understanding, some characteristic attributes in day history meteorological dataset do not have influence or shadow to the pollutant concentration that needs are predicted Ring little, then can will without influencing or influencing less corresponding characteristic attribute filtering, retain have to pollutant concentration compared with The characteristic attribute influenced greatly.

Normalization unit 802, for being returned according to normalized formula to the day history meteorological dataset after filtering One change is handled.For the day history meteorological dataset after filtering, in order to avoid scale caused by different dimensions differs, it is necessary to will not The meteorological data obtained with dimension is converted into the meteorological data of identical dimension.It therefore can be to the day history meteorological dataset after filtering It is normalized, min-max standardized methods may be employed in normalization processing method.Wherein, corresponding standardization Formula is：Y*=0.1+0.8* (Y_i-Y_min)/(Y_max-Y_min), wherein, Y_iRepresent the value of ith feature attribute, Y_maxRepresent the spy Levy the corresponding maximum of attribute, Y_minAfter representing that the corresponding minimum value of this feature attribute, Y* represent the normalization of this feature attribute Value.

Training unit 604, for by per day Historical Pollution object concentration data collection and pretreated day history meteorology number Sample data set is used as according to collection, is trained using Random Forest model, wherein, Random Forest model includes more decision-makings Tree, every decision tree are realized using BP neural network.

More than computing unit 601, the first determination unit 602, the first pretreatment unit 603, training unit 604 pass through calculating Per day Historical Pollution object concentration data collection and the pretreatment to day history meteorological data improve the reliability of sample data And accuracy.Further according to more accurate and reliable sample data, it is trained using Random Forest model, improves the general of model Change ability.

Second determination unit 605, for the prediction meteorological data of the following preset number of days of prediction on the day of determining current time. Wherein, weather prognosis pattern such as WRF (The Weather Research and may be employed in prediction meteorological data Forecasting Model) etc. prediction result obtain.Preset number of days can be 1 day, or 7 days etc..

Second pretreatment unit 606, for being pre-processed to prediction meteorological data.Specifically, the second pretreatment unit 606, for being filtered using correlation coefficient process to the corresponding characteristic attribute of prediction meteorological data；According to normalized formula Prediction meteorological data after filtering is normalized.It should be noted that prediction meteorological data carry out pretreatment and It is consistent that pretreatment is carried out to day history meteorological dataset, in order to obtaining the prediction meteorological data of identical data attribute. The mode specifically pre-processed to prediction meteorological data refer to the mode to being pre-processed to day history meteorological dataset, Details are not described herein.

Predicting unit 607, it is dense for the pollutant according to pretreated prediction meteorological data, current time same day monitoring Degrees of data predicts the pollutant concentration data of the following preset number of days in region to be predicted using trained Random Forest model.

In one embodiment, as shown in figure 9, predicting unit 607 includes second acquisition unit 901, the first error calculation list First 902, first prediction result unit 903, the first result determination unit 904.

Second acquisition unit 901, it is dirty for obtaining the prediction of nearest L days calculated using trained Random Forest model Contaminate object concentration data.Specifically, the actual pollutant concentration number calculated according to the same day corresponding meteorological data and the previous day According to, the i.e. averaged historical pollutant concentration data of the previous day, the pollutant on the same day is predicted with trained Random Forest model Concentration obtains the prediction pollutant concentration data several days following later one day after and one day after.Wherein, it is necessary to pay attention to It is that same day mentioned herein, corresponding meteorological data was by pretreated data.In this way, calculate the prediction of nearest L days Pollutant concentration data, wherein, daily prediction pollutant concentration data are corresponding with multiple.Wherein, the value of L can be 7, Can be other suitable data.

First error calculation unit 902, for according to the prediction pollutant concentration data of nearest L days and the dirt of actual monitoring Object concentration data is contaminated, the prediction error of every decision tree is calculated using error calculation formula, wherein, error calculation formula is：

First prediction result unit 903, for being monitored according to pretreated prediction meteorological data, the current time same day Pollutant concentration data are predicted using trained Random Forest model, obtain the prediction result of more decision trees.According to The averaged historical on the same day that the pollutant concentration data of monitoring calculate on the day of pretreated prediction meteorological data, current time Pollutant concentration data are predicted that every decision tree is corresponded to there are one prediction result using trained Random Forest model, So much decision tree can obtain multiple prediction results.

First result determination unit 904, the prediction result of the correspondence decision tree for selecting prediction error small are pre- as treating Survey the pollutant concentration data of the following preset number of days in region.

In another embodiment, as shown in Figure 10, predicting unit 607 includes the 3rd acquiring unit 101, the second error calculation Unit 102, weight calculation unit 103, the second prediction result unit 104, the second result determination unit 105.Wherein, shown in Figure 10 Embodiment and embodiment shown in Fig. 9 difference lies in：It adds 103 and second result of weight calculation unit and determines list The difference of member 105.3rd acquiring unit 101, the second error calculation unit 102, the second prediction result unit 104 refer to Fig. 9 Shown embodiment describes accordingly.

Weight calculation unit 103, for the prediction error of every decision tree to be normalized, to obtain every decision tree Weight.Specifically, the calculation formula of weight is according to equation below：

Second result determination unit 105, for weight, the corresponding prediction result of every decision tree according to every decision tree The pollutant concentration data of the following preset number of days in region to be predicted are calculated.Specifically, the weight of every decision tree is multiplied It is obtained with corresponding prediction result in every decision tree future preset number of days in the following preset number of days in region to be predicted daily Pollutant concentration data.

Second determination unit 605, the second pretreatment unit 606, predicting unit 607 can be realized following to current time pre- If the air pollutant concentration data of number of days are predicted.It is pre-processed by the meteorological data to prediction, to improve meteorology The confidence level of data and the degree of association between air pollutant concentration, then when being predicted, missed according to more accurately prediction Difference so that the result of prediction is more accurate, further improves the generalization ability of Random Forest model.

Figure 11 is a kind of schematic block diagram for terminal that another embodiment of the present invention provides.The terminal 110 includes one At a or multiple input equipment 111, one or more output equipments 112, one or more memories 113 and one or more Device 114 is managed, above-mentioned input equipment 111, output equipment 112, memory 113 and processor 114 are connected by bus 115.It deposits Reservoir 113 is for storing computer program, and the computer program includes program instruction, and processor 114 is arranged to call The program instruction that memory 113 stores.Wherein, processor 114 is used to perform：

The monitoring pollution object concentration data collection that season all monitoring points are corresponded to according to region current time to be predicted calculates Obtain per day Historical Pollution object concentration data collection；Determine that region current time to be predicted corresponds to the day history meteorology number in season According to collection；Day history meteorological dataset is pre-processed；By per day Historical Pollution object concentration data collection and pretreated day History meteorological dataset is trained using Random Forest model as sample data set, wherein, Random Forest model includes More decision trees, every decision tree are realized using multilayer feedforward neural network；Determine that the future of prediction on the day of current time presets The prediction meteorological data of number of days；Prediction meteorological data is pre-processed；According to it is pretreated prediction meteorological data, it is current when Between same day monitoring pollutant concentration data, the future for predicting region to be predicted using trained Random Forest model presets day Several pollutant concentration data.

In one embodiment, processor 114 corresponds to season all monitorings in execution according to region current time to be predicted It is specific to perform when per day Historical Pollution object concentration data collection is calculated in the monitoring pollution object concentration data collection of point：

Obtain the monitoring pollution object concentration data collection that region current time to be predicted corresponds to season all monitoring points；To every The data that multiple time points obtain in it are handled to obtain corresponding pollutant concentration data of multiple time points, by multiple times The corresponding pollutant concentration data of point are averagely obtained per day Historical Pollution object concentration data, compute repeatedly to treat with acquisition pre- The region current time of survey corresponds to the dated per day Historical Pollution object concentration data collection of season institute.

In one embodiment, processor 114 is handled to obtain the time in the data that execution obtains a time point It is specific to perform during the corresponding pollutant concentration data of point：

The exceptional value of the monitoring pollution object concentration data of one time point corresponding all monitoring points is handled；To institute It states region progress gridding to be predicted and obtains multiple grid cells；Multiple grid cells, which are calculated, using interpolation algorithm corresponds to grid The pollutant concentration data in region；The average value that multiple grid cells correspond to the pollutant concentration data of net region is calculated, is obtained To the corresponding pollutant concentration data of the region current point in time to be predicted.

In one embodiment, processor 114 is specific to perform when execution pre-processes day history meteorological dataset：

The corresponding characteristic attribute of day history meteorological dataset is filtered using correlation coefficient process；According to normalized The day history meteorological dataset after filtering is normalized in formula, wherein, normalized formula is：Y*=0.1+ 0.8*(Y_i-Y_min)/(Y_max-Y_min), wherein, Y_iRepresent the value of ith feature attribute, Y_maxRepresent that the characteristic attribute is corresponding Maximum, Y_minRepresent the corresponding minimum value of the characteristic attribute, Y* represents the value after the characteristic attribute normalization.

In one embodiment, processor 114 works as heavenly prison or jail in execution according to pretreated prediction meteorological data, current time The pollutant concentration data of survey predict the pollution of the following preset number of days in region to be predicted using trained Random Forest model It is specific to perform during object concentration data：

Obtain the prediction pollutant concentration data of nearest L days calculated using trained Random Forest model；According to most The nearly prediction pollutant concentration data of L days and the pollutant concentration data of actual monitoring, every is calculated using error calculation formula The prediction error of decision tree, wherein, error calculation formula is：

Wherein, the time span that n expressions are predicted every time, y_tiThe future predicted for the t days using Random Forest model The pollutant concentration of i-th day, y_t' the per day history that is calculated for the in t days according to the pollutant concentration data of actual monitoring Pollutant concentration data；According to pretreated prediction meteorological data, the pollutant concentration data of current time same day monitoring, profit It is predicted with trained Random Forest model, obtains the prediction result of more decision trees；The small correspondence of selection prediction error Pollutant concentration data of the prediction result of decision tree as the following preset number of days in region to be predicted.

Obtain the prediction pollutant concentration data of nearest L days calculated using trained Random Forest model；According to most The nearly prediction pollutant concentration data of L days and the pollutant concentration data of actual monitoring calculate the prediction error of every decision tree； The prediction error of every decision tree is normalized, to obtain the weight of every decision tree；According to pretreated prediction gas The pollutant concentration data of monitoring, are predicted using trained Random Forest model, are obtained on the day of image data, current time The prediction result of more decision trees；It is calculated and treats according to the weight of every decision tree, the corresponding prediction result of every decision tree The pollutant concentration data of the following preset number of days of estimation range.

In one embodiment, processor 114 is additionally operable to perform：Determine the following preset number of days predicted on the day of current time It predicts meteorological data, is come definite by weather prognosis pattern.

It should be appreciated that in embodiments of the present invention, alleged processor 114 can be central processing unit (Central Processing Unit, CPU), which can also be other general processors, digital signal processor (Digital Signal Processor, DSP), application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable GateArray, FPGA) or other programmable logic devices Part, discrete gate or transistor logic, discrete hardware components etc..General processor can be microprocessor or the processing Device can also be any conventional processor etc..

Input equipment 111 may include keyboard, mouse, acoustic input dephonoprojectoscope, touch input unit etc..Output equipment 112 It may include display, display screen, touch-screen etc..

The memory 113 can include read-only memory and random access memory, and to processor 114 provide instruction and Data.The a part of of memory 113 can also include nonvolatile RAM.For example, memory 113 can also be deposited Store up the information of device type.

In the specific implementation, input equipment 111, output equipment 112, the processor 114 described in the embodiment of the present invention can The described realization method of all embodiments of optical finger print recognition methods provided in an embodiment of the present invention is performed, also can perform this The realization method of the described terminal of inventive embodiments, details are not described herein.

A kind of computer readable storage medium, the computer-readable storage medium are provided in another embodiment of the invention Matter is stored with computer program, and the computer program includes program instruction, and described program instruction is real when being executed by a processor Existing following steps：

In one embodiment, the processor corresponds to season all monitorings in execution according to region current time to be predicted When per day Historical Pollution object concentration data collection is calculated in the monitoring pollution object concentration data collection of point, specific implementation：

In one embodiment, the processor is handled to obtain the time in the data that execution obtains a time point During the corresponding pollutant concentration data of point, specific implementation：

In one embodiment, the processor is when execution pre-processes day history meteorological dataset, specific implementation：

In one embodiment, the processor is on the day of performing according to pretreated prediction meteorological data, current time The pollutant concentration data of monitoring predict the dirt of the following preset number of days in region to be predicted using trained Random Forest model When contaminating object concentration data, specific implementation：

In one embodiment, the processor is on the day of performing according to pretreated prediction meteorological data, current time The pollutant concentration data of monitoring predict the dirt of the following preset number of days in region to be predicted using trained Random Forest model It is specific to perform when contaminating object concentration data：

In one embodiment, the processor also performs：The following preset number of days of prediction is pre- on the day of determining current time Meteorological data is surveyed, is come definite by weather prognosis pattern.

The computer readable storage medium can be the internal storage unit of the terminal described in foregoing any embodiment, example Such as the hard disk or memory of terminal.The computer readable storage medium can also be the External memory equipment of the terminal, such as The plug-in type hard disk being equipped in the terminal, intelligent memory card (SmartMedia Card, SMC), secure digital (Secure Digital, SD) card etc..Further, the computer readable storage medium can also both include the storage inside of the terminal Unit also includes External memory equipment.In several embodiments provided herein, it should be understood that disclosed terminal and Method can be realized by another way.For example, terminal embodiment described above is only schematical, for example, institute The division of unit is stated, is only a kind of division of logic function, there can be other dividing mode in actual implementation.

It is apparent to those skilled in the art that for convenience of description and succinctly, the end of foregoing description End and the specific work process of unit, may be referred to the corresponding process in preceding method embodiment, details are not described herein.More than institute It states, is only the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, it is any to be familiar with the art Technical staff in the technical scope disclosed by the present invention, various equivalent modifications or substitutions can be readily occurred in, these modification or Replacement should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with the protection model of claim Subject to enclosing.

Claims

1. a kind of regional air pollutant concentration Forecasting Methodology, which is characterized in that the described method includes：

The monitoring pollution object concentration data collection that season all monitoring points are corresponded to according to region current time to be predicted is calculated Per day Historical Pollution object concentration data collection；

Day history meteorological dataset is pre-processed；

Using per day Historical Pollution object concentration data collection and pretreated day history meteorological dataset as sample data set, profit It is trained with Random Forest model, wherein, Random Forest model includes more decision trees, and every decision tree is used before multilayer Present neural fusion；

Prediction meteorological data is pre-processed；

According to pretreated prediction meteorological data, the pollutant concentration data of current time same day monitoring, utilization is trained Random Forest model predicts the pollutant concentration data of the following preset number of days in region to be predicted.

2. according to the method described in claim 1, it is characterized in that, described correspond to season according to region current time to be predicted Per day Historical Pollution object concentration data collection is calculated in the monitoring pollution object concentration data collection of all monitoring points, including：

Obtain the monitoring pollution object concentration data collection that region current time to be predicted corresponds to season all monitoring points；

The data that interior multiple time points obtain daily are handled to obtain corresponding pollutant concentration data of multiple time points, it will Multiple time points, corresponding pollutant concentration data were averagely obtained per day Historical Pollution object concentration data, compute repeatedly with It obtains region current time to be predicted and corresponds to the dated per day Historical Pollution object concentration data collection of season institute.

3. according to the method described in claim 2, it is characterized in that, the data obtained to a time point are handled this Time point corresponding pollutant concentration data, including：

The exceptional value of the monitoring pollution object concentration data of one time point corresponding all monitoring points is handled；

Gridding is carried out to the region to be predicted and obtains multiple grid cells；

The pollutant concentration data of net region are corresponded to using the multiple grid cells of interpolation algorithm calculating；

The average value that multiple grid cells correspond to the pollutant concentration data of net region is calculated, obtains the region to be predicted The corresponding pollutant concentration data of current point in time.

4. according to the method described in claim 1, it is characterized in that, described pre-process day history meteorological dataset, bag It includes：

The corresponding characteristic attribute of day history meteorological dataset is filtered using correlation coefficient process；

The day history meteorological dataset after filtering is normalized according to normalized formula, wherein, at normalization Managing formula is：Y*=0.1+0.8* (Y_i-Y_min)/(Y_max-Y_min), wherein, Y_iRepresent the value of ith feature attribute, Y_maxRepresent institute State the corresponding maximum of characteristic attribute, Y_minRepresent the corresponding minimum value of the characteristic attribute, Y* represents the characteristic attribute normalizing Value after change.

5. according to the method described in claim 1, it is characterized in that, it is described according to pretreated prediction meteorological data, it is current The pollutant concentration data of monitoring on the day of time, the future for predicting region to be predicted using trained Random Forest model preset The pollutant concentration data of number of days, including：

Obtain the prediction pollutant concentration data of nearest L days calculated using trained Random Forest model；

It is public using error calculation according to the prediction pollutant concentration data of nearest L days and the pollutant concentration data of actual monitoring Formula calculates the prediction error of every decision tree, wherein, error calculation formula is：

<mrow> <mi>M</mi> <mi>S</mi> <mi>N</mi> <mo>=</mo> <msqrt> <mrow> <mfrac> <mn>1</mn> <mrow> <mi>L</mi> <mo>*</mo> <mi>n</mi> </mrow> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>L</mi> </munderover> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <msup> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mrow> <mi>t</mi> <mi>i</mi> </mrow> </msub> <mo>-</mo> <msubsup> <mi>y</mi> <mi>t</mi> <mo>&prime;</mo> </msubsup> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> </msqrt> </mrow>

Wherein, the time span that n expressions are predicted every time, y_tiFor the t days i-th day futures predicted using Random Forest model Pollutant concentration, y_t' the per day Historical Pollution object that is calculated for the in t days according to the pollutant concentration data of actual monitoring Concentration data；

According to pretreated prediction meteorological data, the pollutant concentration data of current time same day monitoring, utilization is trained Random Forest model is predicted, obtains the prediction result of more decision trees；

Pollutant of the prediction result for the correspondence decision tree for selecting prediction error small as the following preset number of days in region to be predicted Concentration data.

6. according to the method described in claim 1, it is characterized in that, it is described according to pretreated prediction meteorological data, it is current The pollutant concentration data of monitoring on the day of time, the future for predicting region to be predicted using trained Random Forest model preset The pollutant concentration data of number of days, including：

According to the prediction pollutant concentration data of nearest L days and the pollutant concentration data of actual monitoring, every decision tree is calculated Prediction error；

The prediction error of every decision tree is normalized, to obtain the weight of every decision tree；

The future that region to be predicted is calculated according to the weight of every decision tree, the corresponding prediction result of every decision tree presets The pollutant concentration data of number of days.

7. according to the method described in claim 1, it is characterized in that, the future of prediction presets day on the day of the definite current time Several prediction meteorological datas is come definite by weather prognosis pattern.

8. a kind of terminal, which is characterized in that the terminal includes performing as described in claim 1-7 any claims The unit of method.

9. a kind of terminal, which is characterized in that the processor, defeated including processor, input equipment, output equipment and memory Enter equipment, output equipment and memory to be connected with each other, wherein, the memory is used to store computer program, the computer Program includes program instruction, and the processor is arranged to call described program instruction, perform such as any one of claim 1-7 The method.

10. a kind of computer readable storage medium, which is characterized in that the computer storage media is stored with computer program, The computer program includes program instruction, and described program instruction makes the processor perform such as right when being executed by a processor It is required that 1-7 any one of them methods.