CN113011455B - Air quality prediction SVM model construction method - Google Patents

Air quality prediction SVM model construction method Download PDF

Info

Publication number
CN113011455B
CN113011455B CN202110140388.5A CN202110140388A CN113011455B CN 113011455 B CN113011455 B CN 113011455B CN 202110140388 A CN202110140388 A CN 202110140388A CN 113011455 B CN113011455 B CN 113011455B
Authority
CN
China
Prior art keywords
data
pollution source
air quality
model
emission
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110140388.5A
Other languages
Chinese (zh)
Other versions
CN113011455A (en
Inventor
宋国君
刘帅
何伟
张波
宋天一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shuhuitong Information Technology Co ltd
Original Assignee
Beijing Shuhuitong Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shuhuitong Information Technology Co ltd filed Critical Beijing Shuhuitong Information Technology Co ltd
Priority to CN202110140388.5A priority Critical patent/CN113011455B/en
Publication of CN113011455A publication Critical patent/CN113011455A/en
Application granted granted Critical
Publication of CN113011455B publication Critical patent/CN113011455B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses an air quality prediction SVM model construction method, which comprises the steps of collecting air quality data, meteorological data and pollution source continuous emission data; processing the acquired data by sumif function calculation; constructing model variables, processing air quality data into conventional air quality variables through In functions, processing conventional meteorological parameter variables through calculation of meteorological data, and carrying pollution source emission data into pollutant emission variables of pollution sources through weighting and calculation; and (3) establishing a prediction model, establishing a model by adopting an SVM method, modeling, and performing a test run model after modeling is completed. According to the invention, by fully utilizing the principle of the existing big data and information of the Internet of things and the Internet and the innovative big data statistical analysis thinking and method tool and service city air quality management decision, high-level innovation is made in scientific research, high-level researchers, doctor and basic-level professionals are cultivated, and statistical prediction diagnosis technical support is provided for air quality management in heavy pollution areas.

Description

Air quality prediction SVM model construction method
Technical Field
The invention relates to the technical field of air quality prediction, in particular to an air quality prediction SVM model construction method.
Background
Most of the existing air quality prediction documents are modeled by adopting a neural network method. From the choice of explanatory variables, most studies only consider weatherThe influence of factors on the concentration of the monitoring points is not studied and considered. Zhou Shuhua (2017) establishes a statistical prediction model of PM2.5 concentration of different seasons and days in Yibin city by using a stepwise regression analysis method, and comprehensively analyzes the relation between the PM2.5 concentration and the previous six pollutant concentrations. Meanwhile, the relation between PM2.5 and meteorological elements such as the current day of air pressure, air temperature, temperature difference, rainfall, average wind speed, sunlight time and the like is explained, and the simulation relative error is 28.5%. Mo Xianlie (2003) 6 meteorological factors such as wind speed, wind direction, relative humidity, cloud cover, average air temperature and highest air temperature are selected as input values by using an artificial neural network method, 365 groups of two-year total O3 daily average value data in Dalian city are selected as training sets, 61 groups of data are selected as test sets, and a plurality of O are selected 3 The daily average concentration is predicted, and the average relative error between the measured concentration and the predicted concentration is 21.49 percent. Liu Jie (2014) applies a method of combining a support vector machine and a fuzzy granulation time sequence, performs feature extraction on data samples by using a triangular membership function as input of the support vector machine according to daily variation periodic patterns of different seasons of PM2.5, establishes a time sequence prediction model of PM2.5 mass concentration by taking monitoring values of PM2.5 h mass concentration of monitoring points as sample data, and fits R 2 The absolute error range of simulation can reach 0.94, and is between 0.2 mug/m < 3 > -46.85 mug/m < 3 >. Sun Baolei (2017) variable screening method by using BP neural network and combining average influence value (MIV) to make 5 environmental monitoring points SO in Kunming urban area 2 、NOx、O 3 And 6 pollutant concentration monitoring data such as CO, PM10, PM2.5 and the like are used for establishing a Kunming city air quality prediction model. A total of 694 sets of two-year data are selected as training sets, and 350 sets of one-year data are selected as test sets, wherein the ratio of the standard deviation of the predicted value to the standard deviation of the measured value is 0.6. However, the above studies have not attempted to incorporate the pollution sources into a model that is silent about the control and planning significance of the pollution sources.
Disclosure of Invention
Aiming at the technical problems in the related art, the invention provides an air quality prediction SVM model construction method, which can overcome the defects of the prior art method.
In order to achieve the technical purpose, the technical scheme of the invention is realized as follows:
the air quality prediction SVM model construction method comprises the steps of collecting air quality data, meteorological data and pollution source continuous emission data; collecting air quality data includes PM2.5, NOx and SO needing to be collected at each monitoring point 2 CO and O 3 Concentration data of (2); the meteorological data acquisition comprises the data of air pressure, humidity, wind speed, wind direction and rainfall of an urban meteorological station to be acquired; collecting pollution source data comprises the steps of collecting emission amount of particulate matters and SO 2 Emission data.
PM2.5, NOx, SO for each detection point was calculated in Excel by sumif function 2 CO and O 3 Processing the air quality data as a 24 hour average; the atmospheric pressure, humidity and wind data values are processed into average values of 24 hours in Excel through a sumif function, so that meteorological data are processed, and pollution source emission data are processed through a pollution source emission data calculation method.
Constructing a model variable, and processing the model variable into an air quality variable by solving a logarithmic value of air quality data by an In function In Excel; the meteorological variable is processed by firstly calculating the average value and standard deviation of air pressure, humidity and wind speed by using an average function and a std function, subtracting the average value from Excel, and dividing the standard deviation by the average value to perform standardized processing on the air pressure, humidity and wind speed data values so as to form the air pressure, humidity and wind speed variables; and (3) weighting the pollution sources, calculating a weighted average value of the pollution sources, processing the pollution source emission data, and then carrying the pollution source emission data into a pollution source variable.
Modeling was performed using an SVM method by calling the libsvm toolkit in matlab. During modeling, a training set test set is selected, firstly, the svmtrain function is called to train the training set, a built SVM model is stored in a model_test, then the built SVM model_test is utilized, the test set is tested by calling a prediction function, the test set is stored in a accuracy, and a 'relative error MSE' for evaluating a test effect is found in the accuracy.
And (3) constructing a planning model, incorporating a background concentration value of the pollution source under zero emission into the model, temporarily taking out part of samples from the test set, calculating a weighted average value of the pollution source, and placing the pollution source emission after the weighted average value into the test set.
Model test run, which provides model structure parameters, pollution source weight and a test set sample for a developer; and the software automatically runs the primary model, and is adjusted according to comparison between the test result and the software output result.
The invention has the beneficial effects that: by fully utilizing the big data and information of the existing Internet of things and the Internet, innovation big data statistical analysis thinking and method tools and the principle of service city air quality management decision, high-level innovation is made in scientific research, high-level researchers, doctor students and basic-level professionals are cultivated, and statistical prediction diagnosis technical support is provided for air quality management in heavy pollution areas.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of an overall technical flow of a statistical predictive diagnosis model study of urban air quality big data according to an embodiment of the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which are derived by a person skilled in the art based on the embodiments of the invention, fall within the scope of protection of the invention.
The air quality prediction SVM model construction method comprises the steps of collecting air quality data, meteorological data and pollution sourcesData including PM2.5, NOx, SO for city monitoring points 2 CO and O 3 Is the concentration data of the (a), the original data is the hour data; the meteorological data comprise air pressure, humidity, wind speed, wind direction and rainfall data of an urban meteorological station, and the original data are hour data, wherein the wind direction needs to have data of degrees for later use; the pollution source data comprises pollution source continuous emission data of urban national control point sources, including particulate matter emission, NOx emission and SO 2 Emission data, raw data is hour data.
Processing the collected data in Excel to calculate PM2.5, NOx and SO at each detection point by sumif function 2 CO and O 3 Is processed to 24 hour average value; the meteorological data are processed by calculating air pressure, humidity and wind speed data values in Excel by using a sumif function, and processing the data values into average values of 24 hours, wherein the wind direction data values are processed into daily dominant wind directions by representing the daily dominant wind directions by using wind directions under the maximum wind speed of the hours, and if the wind direction data are in degrees, the daily dominant wind directions are processed by converting the wind direction data into four discrete values of east, south, west, north and the like, and the rainfall data values are processed into rainfall by 24 hours, wherein the rainfall data values are rainy, marked as rainfall and non-rainy, and marked as non-rainfall; the treatment of the pollution source emission data is to calculate the daily average emission flow of the pollution source by using the hour emission flow of the pollution source; calculating the daily average emission concentration of the pollution source by using the hour emission concentration of the pollution source; the daily average discharge amount of the pollution source is obtained by multiplying the daily average discharge flow of the pollution source by the daily average discharge concentration of the pollution source.
The monitoring points may have a small amount of data missing of concentration, weather and pollution sources, and the data is supplemented according to a differential filling method. The differential filling method is to fill with the average value of the data before and after the missing sample. Or may be implemented in a programming language on matlab. Then please contact the company when there is a large area of data missing, discuss the specific solution. Some samples are deleted if necessary (deletion of samples should be done after the next model variable build is completed in order not to delete useful hysteresis variables).
Then, model variable construction is performed, and the air quality data is processed into the air quality variable needed in the model by solving the logarithmic value of the air quality data and realizing the air quality data by an ln function in Excel.
The average value and standard deviation of the air pressure, humidity and wind speed are calculated by using an average function and a std function, then the average value is subtracted from Excel, the standard deviation is divided by the average value, and the air pressure, humidity and wind speed data values are subjected to standardization processing to form air pressure, humidity and wind speed variables.
The wind direction variable is represented by virtual variables in the model, 4 virtual variables are required to be set, the wind direction variable represents four dominant wind directions of southeast, northwest and northwest, 4 columns are listed in Excel, each class represents one wind direction, and the virtual variables are distinguished by 0-1; the rainfall data is characterized in that 1 virtual variable is set, whether rainfall occurs or not is distinguished by 0-1, 0 represents non-rainfall, 1 represents rainfall, and a virtual variable whether rainfall occurs or not is formed.
The pollution source data are processed into pollution source variables, and as hundreds of pollution sources or drain outlets are frequently arranged in the future in the process of prediction, the data of the emission amount of the pollution sources are required to be further processed and brought into a model, the weighted average value of the pollution sources is calculated by weighting each pollution source, different weighting methods are arranged at different monitoring points after the pollution sources are weighted, the specific weighting methods are described in detail later, and the pollution source variables are formed after the weighted average value of the pollution sources is standardized.
Then, in the future prediction, other auxiliary variables are added in addition to the variables, the trend variable is used for constructing the trend variable, and then the model is incorporated, wherein the construction method is that the year is used as the variable, and the number 1, 2 and 3 can be used for expressing the trend variable, and the year can be used for expressing the trend variable; the method comprises the steps of constructing periodic variables by using sin and cos functions in Excel to form two rows of variables, wherein T in the functions is month; with hysteresis variables, construction of hysteresis variables of part of the variables, e.g. hysteresis-first-period variables of monitoring-point concentration, by hysteresis of monitoring-point concentrationLater, forming a lag phase variable of the concentration of the monitoring point; there are other virtual variables, such as workday variables: constructing a virtual variable of 0-1 with 'whether the virtual variable is workday', wherein 0 represents non-workday, 1 represents workday and heating period variable: with "whether heating" a virtual variable of "0-1" is constructed, 0 representing unadopted, and 1 representing heating. In the model, it should be noted that the explained variable and the explained variable also need to reasonably correspond, for example, the explained variable is PM2.5, and the pollution variable in the explained variable is particulate matter emission; the interpreted variable is NOx, and the pollution source variable in the interpreted variable is NOx emission; is interpreted as SO 2 The pollution source variable in the explained variable is SO 2 Discharge amount; is interpreted as a variable O 3 The pollution source variable in the explained variable is O 3 Discharge amount.
After model variables are built, a prediction model is built, firstly, a model is built by adopting an SVM method, and in matlab, a libsvm tool package is called, so that the whole modeling process can be completed.
In the modeling, firstly, a training set is required to be selected, wherein the training set can be a data set in 2016-2017, and meanwhile, a test set is selected and can be a data set in 2018.
The training set needs to prepare a train_X variable which represents an interpretation variable including a meteorological variable, a pollution source variable and other variables, and a train_Y variable which represents an interpreted variable, namely a concentration variable of a monitoring point. It should be noted that the train_y variable can only be one column of data, the train_x variable can be multiple columns of data (each column represents an explanatory variable), and the number of rows of the train_x and train_y variables should be the same, and the svmtrain function needs to be called during training, and the training statement is: model_test=svmtrain (track, '-s 4-t 2-c 1-g 0.5'). The g parameter value in the sentence can be adjusted according to the actual situation, the inverse of the number of the explanatory variables is taken, and after training is completed, the established SVM model is stored in the model_test. Next, the test set is tested using the established SVM model_test. And calling a prediction function during testing, wherein test sentences are as follows: [ prediction_y, accuracy, precision_values ] =svmpredict (testy, testx, model_test), after the test is completed, the test effect is saved in accuracy, and finally, the accuracy is opened by double-clicking, wherein the third is the "relative error MSE" we use to evaluate the test effect.
The parameters to be saved in the model are a support vector, a support vector coefficient, a model b value and a model modeling coefficient, wherein the support vector is saved in an SVs array of matlab; the support vector coefficient is stored in a determinant of sv_coef of matlab, and is a determinant of n 1, wherein n is the number of support vectors; the model b value is the negative number of the rho value output by matlab; the model modeling coefficient is the g value of the svmtrain function used in training, and if the g value is a default value in modeling, the g value is the reciprocal of the number of the explanatory variables and needs to be informed to a programmer of software.
Then, the atmospheric pollution in the area is identified, and the problem of the atmospheric pollution transmission in the area needs to be considered for PM2.5 and ozone. The current idea is to take out individually samples of atmospheric pollution in the presence of a region. In the case where it is necessary to identify what is the presence of atmospheric pollution in the area. For PM2.5, the following conditions are satisfied, noted as regional atmospheric pollution:
1. is that the total emission amount of pollutant source particles is not higher than other times;
2. it is the PM2.5 concentration that is much higher than other times;
3. it is the ratio PM2.5/PM10 that is much higher than other times;
4. the correlation between the urban monitoring points and the surrounding cities PM2.5 is obvious;
5. it is that city monitoring points are not significantly correlated with surrounding cities PM 10.
Samples with area contamination are then identified. (which may be identified by a virtual variable).
After modeling, planning model construction is carried out, and in the model, the relation between the pollution source emission and the monitoring point concentration needs to be identified, so that the monitoring point concentration is proved to be increased along with the increase of the pollution source emission. The background concentration value of the pollution source under zero emission is brought into the model, all data of the background concentration monitoring points are taken as samples by finding urban background concentration monitoring points, the samples are placed into the original training set, and the samples with little pollution source under zero emission need to be supplemented in the whole samples for model learning; and temporarily removing a part of the sample from the test set, wherein the sample comprises a sample with regional pollution: namely, the total emission amount of the pollution source is not high, but the concentration of PM2.5 at the monitoring point is very high, meanwhile, the PM2.5/PM10 ratio is found to be increased, the correlation between the concentration value of PM2.5 at the monitoring point and the concentration value of PM2.5 at the monitoring point of the surrounding city is obviously increased, and the correlation between the concentration value of PM10 at the monitoring point and the concentration value of PM10 at the monitoring point of the surrounding city is not high. Error samples: when the total emission amount of the pollution source is high and the meteorological conditions are unfavorable for diffusion, if the concentration of the monitoring point on the current day is at an extremely low level, the monitoring point is regarded as an error sample, and the sample is not removed from the test set.
The method for calculating the weighted average value of the pollution sources comprises the following specific steps of: the first step, extracting the position information (coordinate point of each pollution source) of the pollution source; secondly, extracting position information (coordinate point of each monitoring point) of the monitoring points; thirdly, calculating the distance between the pollution source and the monitoring point (Euclidean distance calculation is adopted); calculating azimuth angles of a pollution source and a monitoring point, wherein the azimuth angles are expressed by degrees); and fifthly, calculating the effective distance of the pollution source and the monitoring point on the same day by combining the wind direction degree on the same day and the azimuth angles of the pollution source and the monitoring point. (calculation of effective distance Using Gaussian diffusion model.)
And placing the weighted average pollution source emission into a test set, preferably performing a correlation test before placing the weighted average pollution source emission into the test set, performing correlation analysis on the weighted average pollution source emission and the corresponding monitoring point concentration, if the weighted average pollution source emission can prove that the weighted average pollution source emission has obvious positive correlation, indicating that the weighting is effective, and incorporating the model, and if the weighted average pollution source emission cannot prove that the weighted average pollution source emission has obvious positive correlation, indicating that the weighting method also needs to be slightly adjusted. How to adjust in particular, and discuss again.
Finally, the model is run, and the modeler provides data to the software developer, including model structure parameters (support vector, support vector coefficients, rho values, g values), pollution source weights, and a test set sample. The software needs to automatically run the model once, and according to the test result, possible errors in the software are adjusted by comparing the test result with the output result of the software.
As shown in FIG. 1, the data quality assessment predictions include weather predictions, fixed source emissions predictions, moving source emissions predictions.
The fixed source emission prediction model is based on continuous monitoring data of fixed source emission and other information, a statistical analysis tool is applied to analyze the change rule of the fixed source emission rate of a specific industry, a heavy point pollution source monitoring data abnormal value diagnosis model is researched, and a fixed source emission monitoring data analysis method and a technical specification are provided. And developing a fixed source emission control scheme compiling technical specification study, wherein the study comprises an existing emission data statistical characteristic analysis method, a fixed source emission reduction cost effectiveness analysis method, an emission data comparison analysis technical method and an emission control scheme compiling technical specification. And developing a fixed source emission control scheme design of an exemplary city based on an air quality management target and a fixed source emission prediction model research, and providing information support for air quality prediction and diagnosis.
The mobile source emission prediction comprises a road motor vehicle lane emission, a road motor vehicle emission estimation tool and a prediction diagnosis, and a road motor vehicle dynamic emission accounting tool and a prediction evaluation technology are developed based on traffic big data. Wherein the emissions accounting includes emissions accounting at a bicycle level and a road network level. The single vehicle layer emission accounting is mainly based on the driving condition information acquired by the big data of the internet of vehicles in real time, and the single vehicle emission and the space-time distribution thereof are accounted for, so that information support is provided for road network layer emission accounting. The research and development of the road network hierarchical emission accounting tool are that firstly, multi-source big data in the road traffic field are researched and collected, the quality of the data with different sources is evaluated, and a road network emission accounting model is constructed; secondly, developing a road section flow distribution algorithm research based on the section traffic flow detection data, and calibrating and checking a flow expansion result; finally, a road network level motor vehicle emission simulation technical study is developed, and a road network motor vehicle dynamic emission list programming technical specification based on big data is provided. The method is characterized by researching and analyzing the space-time distribution characteristics and rules of traffic flow, combining with future scene knowledge mining such as traffic demand prediction models, motor vehicle emission control schemes, internet big data analysis and the like, developing motor vehicle emission prediction models and technologies, evaluating emission reduction effects of the emission control schemes, and providing information support for air quality prediction diagnosis.
The air quality management internet big data analysis model is used for carrying out deep research on air quality management internet big data acquisition, integration, analysis and mining key technologies and policy evaluation methods. Firstly, in order to more accurately estimate the air quality of an area which cannot be covered by an air quality monitoring station and the evaluation information of the air quality, the management of a fixed source, a mobile source and the like of the public, social perception data is utilized to acquire internet big data field knowledge covering related microblogs, forums, network media and the like, and the knowledge is integrated with internet of things monitoring data of the air quality, the fixed source, the mobile source and the like to form an internet of things-internet big data integration linkage standard. Secondly, aiming at structured, semi-structured and unstructured space-time big data of air quality management, a modeling theory and a method of multi-source heterogeneous, multi-granularity and multi-dimensional data facing real-time mining and analysis are researched, a topic mining model (LDA) facing air quality management, an abnormal emergency detection method and the like are developed, the monitoring and evaluation effects of air quality management are improved, big data analysis technology facing air quality management is developed, the big data analysis technology comprises a distributed and streaming computing model, and the big data analysis efficiency of air quality management is improved. Finally, in order to more accurately evaluate and predict the effectiveness of air quality management measures, an air quality management policy evaluation method based on social perception data is provided, and comprehensive, three-dimensional and internet monitoring technical support is provided for air quality feeling, management effects on fixed sources, mobile sources and the like, future social activity events, emission scenario analysis, policy evaluation and the like.
The urban air quality prediction and diagnosis model is based on environmental protection Internet of things big data (weather, pollution sources and public opinion), and adopts a support vector machine, an artificial neural network and a multivariate space-time model to discriminate factors influencing air quality. The support vector machine model is used for distinguishing training samples from test samples, repeatedly testing the fitting effect of the model, selecting proper kernel function types and parameters, outputting support vectors and coefficients thereof required for establishing a prediction model, and giving out a corresponding structural body model; the artificial neural network adopts an SVM algorithm to form a network structure suitable for air quality prediction, and the problems of selection of the number of input nodes, the number of hidden layers, the number of hidden layer neurons, the number of output nodes, transfer functions and the like in the network structure are researched; the space-time model comprehensively considers the space-time effect, and researches the problems of the selection, collinearity, endogenous and the like of the space weight matrix. On the basis, the application conditions of the model are evaluated, and the prediction results obtained by different methods are weighted and averaged by adopting a nonlinear combination prediction method. And on the basis of the weight distribution problem, the sum of the absolute values of the prediction errors of the models is taken as a criterion to obtain an optimal combined prediction result.
In summary, by means of the technical scheme, the invention makes high-level innovation in scientific research by fully utilizing the big data and information of the Internet of things and the Internet, innovating big data statistical analysis thinking and method tools and serving city air quality management decision-making principle, and cultures high-level researchers, doctor and basic-level professionals, thereby providing statistical prediction diagnosis technical support for air quality management in heavy pollution areas.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (6)

1. The air quality prediction SVM model construction method is characterized by comprising the following steps:
s1, collecting air quality data, city meteorological data and pollution source continuous emission data of city national control points of each monitoring point of a city;
s2, processing the collected air quality data and meteorological data through sumif function calculation, and processing pollution source emission data through a pollution source emission data calculation method;
s3, constructing model variables, and processing air quality data into air quality variables needed In the model through an In function; subtracting the average value of the meteorological variables from the processed meteorological variables, dividing the average value by the standard deviation of the meteorological variables, and carrying out standardization processing to obtain standardized meteorological variables; the treatment of the pollution source variable is to weigh each pollution source, calculate the weighted average value of the pollution source, and then form the pollution source variable after standardization;
s4, building a prediction model, building a model by adopting an SVM method, and calling a libsvm tool kit in matlab to perform modeling;
s5, during modeling, selecting a training set and a testing set, firstly, calling a svmtrain function training set, storing a built SVM model into a model_test, then, utilizing the built SVM model_test, testing the testing set by calling a prediction function, storing a testing effect into a accuracy, and finding a relative error MSE for evaluating the testing effect in the accuracy;
s6, constructing a planning model, incorporating a background concentration value of the pollution source under zero emission into the model, temporarily taking out part of samples from the test set, calculating a weighted average value of pollutant emission amounts of the pollution source, and putting the pollutant emission amounts of the pollution source after the weighted average value into the test set;
s7, model test operation, namely providing model structure parameters, pollution source pollutant emission weight and a test set sample for a developer; and the software automatically runs the primary model, and is adjusted according to comparison between the test result and the software output result.
2. The air quality prediction SVM model building method of claim 1, wherein the collecting air quality data in step S1 includes collecting PM2.5, NOx, SO for each monitoring point 2 CO and O 3 Concentration data of (2); collecting meteorological data comprises collecting air pressure, humidity, wind speed, wind direction and rainfall data of an urban meteorological station; collecting pollution source data includes collecting particulate matter emission data and SO 2 Emission data.
3. According toThe air quality prediction SVM model construction method according to claim 1, wherein PM2.5, NOx, SO of each monitoring point are calculated in Excel through sumif function in step S2 2 CO and O 3 Processing the air quality data as a 24 hour average; meteorological data were processed by sumif function in Excel by processing barometric pressure, humidity, and wind data values to 24 hour averages.
4. The method for constructing an air quality prediction SVM model according to claim 1, wherein the processing of the pollution source emission data in step S2 specifically includes processing the pollution source emission data into daily emission data by using an hour emission flow of the pollution source, calculating a daily average emission flow of the pollution source, using an hour emission concentration of the pollution source, calculating a daily average emission concentration of the pollution source, multiplying the daily average emission flow of the pollution source by the daily average emission concentration of the pollution source, and obtaining the daily average emission amount of the pollution source.
5. The method of constructing an air quality prediction SVM model according to claim 1, wherein In step S3, the process of processing into air quality variables is by obtaining a logarithmic value of air quality data by an In function In Excel; the weather variable is processed by firstly calculating the average value and standard deviation of the air pressure, humidity and wind speed by using an average function and a std function, subtracting the daily average value of the air pressure, humidity and wind speed from the value of the air pressure, humidity and wind speed of the sample in Excel, and dividing the standard deviation of the air pressure, humidity and wind speed by the standard deviation of the air pressure, humidity and wind speed to perform standardized processing on the data values of the air pressure, humidity and wind speed to form the variables of the air pressure, humidity and wind speed.
6. The air quality prediction SVM model building method according to claim 1, wherein in step S5, the training set: the train_X variable represents an interpretation variable, including a meteorological variable and a pollution source variable; the train_y variable represents an interpreted variable, i.e., a concentration variable of the monitoring point.
CN202110140388.5A 2021-02-02 2021-02-02 Air quality prediction SVM model construction method Active CN113011455B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110140388.5A CN113011455B (en) 2021-02-02 2021-02-02 Air quality prediction SVM model construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110140388.5A CN113011455B (en) 2021-02-02 2021-02-02 Air quality prediction SVM model construction method

Publications (2)

Publication Number Publication Date
CN113011455A CN113011455A (en) 2021-06-22
CN113011455B true CN113011455B (en) 2024-01-05

Family

ID=76385029

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110140388.5A Active CN113011455B (en) 2021-02-02 2021-02-02 Air quality prediction SVM model construction method

Country Status (1)

Country Link
CN (1) CN113011455B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114414731A (en) * 2021-12-08 2022-04-29 南通大学 Air quality acquisition equipment and acquisition method based on mobile terminal positioning service
CN117473398B (en) * 2023-12-26 2024-03-19 四川国蓝中天环境科技集团有限公司 Urban dust pollution source classification method based on slag transport vehicle activity

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140024541A (en) * 2012-08-20 2014-03-03 주식회사 나인에코 System for providing atmospheric modeling system service
CN106651036A (en) * 2016-12-26 2017-05-10 东莞理工学院 Air quality forecasting system
CN108537336A (en) * 2018-03-26 2018-09-14 上海电力学院 A kind of Air Quality Forecast method based on deep neural network
CN109685281A (en) * 2018-12-29 2019-04-26 中科三清科技有限公司 Pollution source prediction technique, device and electronic equipment
CN110333556A (en) * 2019-06-03 2019-10-15 深圳中兴网信科技有限公司 Air Quality Forecast method, apparatus, computer equipment and readable storage medium storing program for executing
CN110346518A (en) * 2019-07-25 2019-10-18 中南大学 A kind of traffic emission pollution visualization method for early warning and its system
CN110363347A (en) * 2019-07-12 2019-10-22 江苏天长环保科技有限公司 The method of neural network prediction air quality based on decision tree index
CN110472782A (en) * 2019-08-01 2019-11-19 软通动力信息技术有限公司 A kind of data determination method, device, equipment and storage medium
CN110926532A (en) * 2019-11-29 2020-03-27 四川省生态环境科学研究院 Digital monitoring system of city raise dust based on big data
CN111143768A (en) * 2019-11-08 2020-05-12 昆明理工大学 Air quality prediction algorithm based on ARIMA-SVM combined model
CN111832814A (en) * 2020-07-01 2020-10-27 北京工商大学 Air pollutant concentration prediction method based on graph attention machine mechanism
CN111882205A (en) * 2020-07-24 2020-11-03 中科三清科技有限公司 Air quality standard-reaching analysis method and device, electronic equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10830922B2 (en) * 2015-10-28 2020-11-10 International Business Machines Corporation Air quality forecast by adapting pollutant emission inventory
US20180239057A1 (en) * 2017-02-22 2018-08-23 International Business Machines Corporation Forecasting air quality
CN110298560B (en) * 2019-06-13 2022-12-06 南方科技大学 Method and device for evaluating atmospheric pollution emission control effect and storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140024541A (en) * 2012-08-20 2014-03-03 주식회사 나인에코 System for providing atmospheric modeling system service
CN106651036A (en) * 2016-12-26 2017-05-10 东莞理工学院 Air quality forecasting system
CN108537336A (en) * 2018-03-26 2018-09-14 上海电力学院 A kind of Air Quality Forecast method based on deep neural network
CN109685281A (en) * 2018-12-29 2019-04-26 中科三清科技有限公司 Pollution source prediction technique, device and electronic equipment
CN110333556A (en) * 2019-06-03 2019-10-15 深圳中兴网信科技有限公司 Air Quality Forecast method, apparatus, computer equipment and readable storage medium storing program for executing
CN110363347A (en) * 2019-07-12 2019-10-22 江苏天长环保科技有限公司 The method of neural network prediction air quality based on decision tree index
CN110346518A (en) * 2019-07-25 2019-10-18 中南大学 A kind of traffic emission pollution visualization method for early warning and its system
CN110472782A (en) * 2019-08-01 2019-11-19 软通动力信息技术有限公司 A kind of data determination method, device, equipment and storage medium
CN111143768A (en) * 2019-11-08 2020-05-12 昆明理工大学 Air quality prediction algorithm based on ARIMA-SVM combined model
CN110926532A (en) * 2019-11-29 2020-03-27 四川省生态环境科学研究院 Digital monitoring system of city raise dust based on big data
CN111832814A (en) * 2020-07-01 2020-10-27 北京工商大学 Air pollutant concentration prediction method based on graph attention machine mechanism
CN111882205A (en) * 2020-07-24 2020-11-03 中科三清科技有限公司 Air quality standard-reaching analysis method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度学习的空气质量预报方法新进展;朱晏民;徐爱兰;孙强;;中国环境监测(第03期);全文 *

Also Published As

Publication number Publication date
CN113011455A (en) 2021-06-22

Similar Documents

Publication Publication Date Title
Chang et al. An LSTM-based aggregated model for air pollution forecasting
CN111798051B (en) Air quality space-time prediction method based on long-term and short-term memory neural network
CN106650825B (en) Motor vehicle exhaust emission data fusion system
CN112905560B (en) Air pollution prediction method based on multi-source time-space big data deep fusion
Barhmi et al. Forecasting of wind speed using multiple linear regression and artificial neural networks
CN105243435B (en) A kind of soil moisture content prediction technique based on deep learning cellular Automation Model
Jalalkamali Using of hybrid fuzzy models to predict spatiotemporal groundwater quality parameters
Wahid et al. Neural network-based meta-modelling approach for estimating spatial distribution of air pollutant levels
CN110727717B (en) Monitoring method, device, equipment and storage medium for gridding atmospheric pollution intensity
CN105740991A (en) Climate change prediction method and system for fitting various climate modes based on modified BP neural network
CN113011455B (en) Air quality prediction SVM model construction method
Kumar et al. Prediction and examination of seasonal variation of ozone with meteorological parameter through artificial neural network at NEERI, Nagpur, India
CN112884243A (en) Air quality analysis and prediction method based on deep learning and Bayesian model
CN111428942A (en) Line icing thickness prediction method for extracting micro-terrain factors based on variable grid technology
Jonnalagadda et al. Forecasting atmospheric visibility using auto regressive recurrent neural network
Ebrahimi-Khusfi et al. Accuracy, uncertainty, and interpretability assessments of ANFIS models to predict dust concentration in semi-arid regions
CN105956709A (en) GUI based modular support vector machine tide forecasting method
CN114492922A (en) Medium-and-long-term power generation capacity prediction method
CN112183625A (en) PM based on deep learning2.5High-precision time-space prediction method
Lin et al. Building autocorrelation-aware representations for fine-scale spatiotemporal prediction
CN112990531B (en) Haze prediction method based on feature-enhanced ConvLSTM
Ramedani et al. A method based on neural networks for generating solar radiation map
CN110471131B (en) High-spatial-resolution automatic prediction method and system for refined atmospheric horizontal visibility
Sohn et al. Prediction of ozone formation based on neural network
CN113344290B (en) Method for correcting sub-season rainfall weather forecast based on U-Net network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant