CN112711912A - Air quality monitoring and alarming method, system, device and medium based on cloud computing and machine learning algorithm - Google Patents
Air quality monitoring and alarming method, system, device and medium based on cloud computing and machine learning algorithm Download PDFInfo
- Publication number
- CN112711912A CN112711912A CN202011627381.8A CN202011627381A CN112711912A CN 112711912 A CN112711912 A CN 112711912A CN 202011627381 A CN202011627381 A CN 202011627381A CN 112711912 A CN112711912 A CN 112711912A
- Authority
- CN
- China
- Prior art keywords
- air quality
- value
- neural network
- sample
- svm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 57
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000012544 monitoring process Methods 0.000 title claims abstract description 28
- 238000010801 machine learning Methods 0.000 title claims abstract description 20
- 238000013528 artificial neural network Methods 0.000 claims abstract description 46
- 238000003066 decision tree Methods 0.000 claims abstract description 31
- 238000012706 support-vector machine Methods 0.000 claims description 28
- 238000012549 training Methods 0.000 claims description 26
- 230000006870 function Effects 0.000 claims description 23
- 238000004364 calculation method Methods 0.000 claims description 15
- 230000008569 process Effects 0.000 claims description 13
- 210000002569 neuron Anatomy 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000012935 Averaging Methods 0.000 claims description 2
- 238000006073 displacement reaction Methods 0.000 claims description 2
- 238000003062 neural network model Methods 0.000 claims description 2
- 238000005457 optimization Methods 0.000 claims description 2
- 230000001902 propagating effect Effects 0.000 claims description 2
- 230000011218 segmentation Effects 0.000 claims description 2
- 230000008901 benefit Effects 0.000 abstract description 3
- 238000003915 air pollution Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000007418 data mining Methods 0.000 description 3
- 239000012634 fragment Substances 0.000 description 3
- 238000007637 random forest analysis Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 238000007405 data analysis Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 206010011224 Cough Diseases 0.000 description 1
- 208000006673 asthma Diseases 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 239000000356 contaminant Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000003449 preventive effect Effects 0.000 description 1
- 238000010223 real-time analysis Methods 0.000 description 1
- 208000023504 respiratory system disease Diseases 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/0004—Gaseous mixtures, e.g. polluted air
- G01N33/0009—General constructional details of gas analysers, e.g. portable test equipment
- G01N33/0062—General constructional details of gas analysers, e.g. portable test equipment concerning the measuring method or the display, e.g. intermittent measurement or digital display
- G01N33/0063—General constructional details of gas analysers, e.g. portable test equipment concerning the measuring method or the display, e.g. intermittent measurement or digital display using a threshold to release an alarm or displaying means
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/0004—Gaseous mixtures, e.g. polluted air
- G01N33/0009—General constructional details of gas analysers, e.g. portable test equipment
- G01N33/0062—General constructional details of gas analysers, e.g. portable test equipment concerning the measuring method or the display, e.g. intermittent measurement or digital display
- G01N33/0068—General constructional details of gas analysers, e.g. portable test equipment concerning the measuring method or the display, e.g. intermittent measurement or digital display using a computer specifically programmed
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/06—Multi-objective optimisation, e.g. Pareto optimisation using simulated annealing [SA], ant colony algorithms or genetic algorithms [GA]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2113/00—Details relating to the application field
- G06F2113/08—Fluids
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Chemical & Material Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Software Systems (AREA)
- Food Science & Technology (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Combustion & Propulsion (AREA)
- Biomedical Technology (AREA)
- Medicinal Chemistry (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Geometry (AREA)
- Computer Hardware Design (AREA)
- Medical Informatics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides an air quality monitoring and alarming method, a system, a device and a medium based on cloud computing and machine learning, which utilize the computing power of cloud computing, adopt a MapReduce framework, transversely combine three machine learning algorithms of a neural network NN, a decision tree DT and a vector machine SVM, give weights to the three algorithms by using historical prediction errors, and then calculate a final air quality prediction result, thereby fully exerting the respective advantages of the machine learning algorithms, accurately predicting the air quality and providing help for people to go out and relevant departments to control the atmospheric pollution.
Description
Technical Field
The invention relates to air quality monitoring and alarming in a cloud environment, in particular to artificial intelligence-based air quality monitoring and alarming in cloud computing.
Background
Along with the development of industry, the problem of air pollution is increasingly aggravated, and the air pollution brings serious influence to people's life and work, for example, air pollution can cause respiratory diseases such as asthma, cough, etc., and can increase the risk that people who suffer from inherent diseases aggravate the state of an illness, even threaten life. The monitoring and alarming of the air quality can lead people to do preventive work in advance and can also assist the environmental management department to do decision-making and management work.
At present, machine learning develops rapidly, and various intelligent algorithms are applied to various industries. In the aspect of atmospheric pollution early warning, a plurality of artificial intelligence algorithms such as a Support Vector Machine (SVM) algorithm, a random forest algorithm (RF), a Decision Tree (DT), a Neural Network (NN), a Particle Swarm (PSO), an Artificial Fish Swarm (AFSA) and the like are integrated. However, these algorithms are used alone or in combination with each other longitudinally to analyze the contaminant concentration and do not take full advantage of the parallel calculations performed by these intelligent algorithms.
Meanwhile, with the continuous expansion of the scale of the air monitoring system in China, the air quality data expands rapidly, and the monitoring data of mass air quality factors brings great challenges to accurate and real-time analysis. The cloud computing technology not only provides mass data storage, but also has strong computing capacity to support real-time data analysis and mining, wherein MapReduce is a distributed programming model, data to be processed can be divided into a plurality of modules, a large number of computers in a network can respectively and simultaneously compute, and then results are collected to obtain a conclusion.
The invention provides an air quality monitoring and alarming method, a system, a device and a medium based on cloud computing and machine learning, which utilize the computing power of cloud computing, adopt a MapReduce framework, transversely combine three machine learning algorithms of a neural network NN, a decision tree DT and a vector machine SVM, give weights to the three algorithms by using historical prediction errors, and then calculate a final air quality prediction result, thereby fully exerting the respective advantages of the machine learning algorithms, accurately predicting the air quality and providing help for people to go out and relevant departments to control the atmospheric pollution.
Disclosure of Invention
The invention provides an air quality monitoring and alarming method based on cloud computing and machine learning, which specifically comprises the following steps: step 1: transmitting the sampling values of the air quality factors obtained by each monitoring sensor into the cloud platform; step 2: the cloud platform calculates a predicted value of the air quality factor; the specific calculation is that the neural network NN, the decision tree DT and the SVM are adopted to calculate the predicted values of the air quality respectively, and weights are given to the three algorithms based on the historical prediction condition: wNN,WDT,WSVMCalculating a final air quality predicted value; and step 3: and (3) determining the air quality grade according to the air quality predicted value obtained in the step (2), and giving an alarm to prompt that the air quality factor exceeds the standard when the air quality grade is light pollution, moderate pollution or severe pollution.
The specific calculation process in step 2 is as follows:
step 2.1, calculating an air quality prediction value by adopting a neural network NN algorithm:
a neural network is a multi-layer neural network, typically consisting of an input layer, one or more hidden layers, and an output layer. Within the same layer, there is no connection between each neuron, and the connection only exists between neuron nodes of adjacent layers. A Sigmoid function is generally used as a connection function, which can map input values in an arbitrary range into a (0, 1) range, and is therefore also called a compression function:
firstly, initializing the network, determining the number of layers of hidden layers and the number of neurons in each layer, and initializing each layer of neural networkDetermining input and target output by the connection weight between elements, the invention adopts experience mode to determine the number of nodes of hidden layerIs the number of nodes of the input layer, noutThe number of the nodes of the output layer is, and the air quality factors of the invention are six types: PM2.5, PM10, SO2、CO、NO2、O3Thus, the number of nodes of the input layer of the model is 6, i.e., nin6; meanwhile, the number of hidden layers is 1, namely, the air quality is predicted by adopting a three-layer neural network.
Then inputting training data, determining a training data set and inputting each group of data in the data set into a neural network; and calculating to obtain the output of the network according to the neural network and the weight value. And calculating the error between the output obtained by the neural network and the target output, and if the error does not reach an acceptable threshold value, reversely propagating through the error information to correct the connection weight in the network.
Let the training sample set be: TDS { (x)1,y1),(x2,y2)…(xN,yN)},x1、x2、…xNIs an air quality factor sample, y2、…yNIs the air quality value of the corresponding air quality factor sample.
E.g. x1Is the value of air quality factor of the city opened in 1 month in 2013, as shown in the following table:
month of the year | PM2.5 | PM10 | SO2 | CO | NO2 | O3 |
Jan-13 | 193 | 206 | 38 | 5.594 | 21 | 0 |
y1Is x1Corresponding air quality value, also called AQI value, of
Month of the year | AQI |
Jan-13 | 238 |
x2Is the value of the air quality factor of the city opened in 2013 in 2 months, as shown in the following table:
month of the year | PM2.5 | PM10 | SO2 | CO | NO2 | O3 | AQI |
Feb-13 | 145 | 145 | 26 | 4.557 | 15 | 0 | 188 |
y2Is x2Corresponding air quality value, also called AQI value, of
Month of the year | AQI |
Jan-13 | 188 |
The neural network is in (x)i,yi) Mean square error Ei ofThe error threshold is set to μ. The weight update Δ τ between the hidden layer and the output layer is:wherein theta is the learning efficiency of the neural network, theta is more than 0 and less than 1, tau is the weight between the hidden layer and the output layer,mean square error of expression pair EiThe first derivative of the weight τ.
And judging whether all the data in the data set participate in the training process of the neural network. If so, outputting the weight among the nodes of each layer of the neural network, and finishing the training process; otherwise, the training process continues.
Through the learning process, a neural network with an error within a given threshold can be obtained. After determining the connection weights between the neuron nodes in the neural network, the network can be used for calculation according to new input and outputting results.
Substituting the current air quality factor monitoring value into the neural network model to obtain the air quality predicted value PNN。
Step 2.2, calculating an air quality predicted value by a Decision Tree (DT) algorithm:
decision trees, Decision trees and DT can analyze the information hidden in the data and having important significance, the expression of the information is visual, users can easily understand the hidden information, and the method is widely applied to data mining and prediction.
The training Data set is TDS (training Data set), the number of samples is N, and the expression is TDS { (x)1,y1),(x2,y2)…(xN,yN)},x1、x2、…xNIs a sample of the air quality factor, e.g. x1Is in table;y1、y2、…yNIs the corresponding air mass sample value.
The objective function of the decision tree in sample division is the sum of the squares of the minimum errors, namely:wherein j represents j variable in each sample, the total number of sample variables is M, in the invention, M is 6, s represents dividing point s of j variable, R1(j, s) denotes the left region of the division, R2(j, s) denotes the right area of the division, c1And c2Marking region R1(j, s) and R2(j, s) of the optimal output value. Traversing each feature j of the sample, trying possible segmentation points s of each feature, selecting the least error square sum, and determining the optimal output value c1And c2. And obtaining an air quality prediction regression tree after training is finished.
Substituting the current air quality factor monitoring value into an air quality prediction regression tree to obtain an air quality prediction value PDT。
Step 2.3, calculating an air quality prediction value by adopting a Support Vector Machine (SVM):
when the SVM deals with the non-linear problem, the support vector machine can convert the input data into a space with higher dimension through a specific function. Due to the increase of dimensionality, the problem of finding an optimal classification line for classifying samples in a low-dimensional space is converted into the problem of finding an optimal classification plane in a high-dimensional space, the SVM can obtain a global optimal solution, and the calculation accuracy can be guaranteed for the problem of small sample quantity.
Let the training sample set be: TDS { (x)1,y1),(x2,y2)…(xN,yN)},x1、x2、…xNIs an air quality factor sample, y2、…yNIs the air quality value of the corresponding air quality factor sample.
Exist in a hyperplaneSo that the samples can be correctly classified, X ═ X1、x2、…xNIs the number N of samples that are,is to map X to a high-dimensional feature spaceOmega is a normal vector and determines the direction of the hyperplane, b is a displacement term and is used as an optimization variable, in order to enable samples to be accurately classified according to the hyperplane, a relaxation variable epsilon is introduced, and an objective function is
εi0, i 1, …, N is a relaxation variable, each sample x in the training sample setiAll correspond to a relaxation variable epsiloniTo characterize the sample as not satisfying the constraintTo the extent of (c).
The kernel function can replace inner product operation in high-dimensional space, and the kernel function does not need to knowThe specific form of (2) can be used to obtain the inner product result. The kernel function can be selected from linear kernel function, polynomial kernel function, radial basis kernel function, Gaussian kernel function and sigmoid kernel function, and the invention selects Gaussian kernel function, i.e. the kernel function is selectedSigma > 0, sigma is the Gaussian kernel bandwidth, alphaiIs the lagrange factor. Calculating by using sample values, averaging all obtained b values, and finally obtaining a support vector classification prediction function after training is finished
Substituting the current air quality factor monitoring value into an air quality support vector classification prediction function to obtain an air quality prediction value PSVM。
Step 2.4, weighting the predicted values obtained by at least two algorithms by using historical prediction errors, specifically:
taking historical data as input of three algorithms to obtain predicted values of the three algorithms, performing difference operation on the predicted values and corresponding historical air quality values, and then squaring to obtain an average error square value: eNN,EDT,ESVM. Then calculating the weight value W of the three algorithmsNN,WDT,WSVM:
The final air quality prediction value calculation formula is as follows: p ═ WNN·PNN+WDT·PDT+WSVM·PSVM。
The step 3 specifically comprises the following steps:
according to the technical regulation (trial) of the environmental Air Quality Index (AQI) published by the nation in 2012 (HJ633-2012), the air quality is evaluated by using the AQI, and the standard of evaluating and grading the air quality according to the AQI value is given.
AQI index | Air quality rating | Air quality status |
0~50 | First stage | Superior food |
51~100 | Second stage | Good wine |
101~150 | Three-stage | Slight pollution |
151~200 | Four stages | Moderate pollution |
201~250 | Five stages | Severe pollution |
251~300 | Six stages | Severe pollution |
And (3) determining the air quality grade according to the air quality predicted value obtained in the step (2), and giving an alarm to prompt that the air quality factor exceeds the standard when the air quality grade is light pollution, moderate pollution or severe pollution. The alarm content can be displayed by selecting proper content and form according to the requirement.
The cloud computing technology not only provides mass data storage, but also has strong computing capacity to support real-time data analysis and mining, wherein MapReduce is a distributed programming model, data to be processed can be divided into a plurality of modules, a large number of computers in a network can respectively and simultaneously compute, and then results are collected to obtain a conclusion. The MapReduce realizes the parallel processing of data and has two steps: map and Reduce, Map maps a group of Key/Value to another group of Key/Value, Reduce is a protocol process, merge the Value with the same Key together, finally output a series of Key/Value pairs as the result.
For data to be processed, the data to be processed can be divided into a plurality of fragments, one fragment corresponds to one data block in a file system, a single Map task reads one fragment, a plurality of Map tasks are operated on a cluster for parallel processing, the Map tasks preferentially read local data, and network transmission overhead is reduced as much as possible.
The structure and principle of MapReduce are described by taking Hadoop MapReduce as an example, a Hadoop bottom file system is an HDFS, and Hadoop MapReduce reads data from the HDFS and outputs an operation result to the HDFS.
The MapReduce adopts a Master-Slave architecture and comprises a Master node Master and a plurality of Slave nodes Slave, wherein JobTracker is operated on the Master node for initializing, distributing and coordinating monitoring operation, and TaskTracker is operated on the Slave nodes, communicates with the TaskTracker and is used for executing Map tasks and Reduce tasks. Communication and task allocation are completed between the JobTracker and the TaskTracker through a heartbeat mechanism, the TaskTracker sends inquiry information to the JobTracker regularly, if a job needs to be executed, the task can be allocated to a job task, the task can be a Map task or a Reduce task, after the task is allocated, the TaskTracker stores a task code and configuration information in the local, a JVM is started to execute the task, the information of the task is reported to the TaskTracker in the running process, the TaskTracker sends the summarized information to the JobTracker, and the job is marked as successful after the task counter JobTracker confirms that the last task is finished running.
When the MapReduce operation is specifically carried out according to the three artificial intelligence algorithms adopted by the invention, the three algorithms which are executed in parallel can be respectively arranged on the slave nodes, and the master node is responsible for summarizing the operation results of the three algorithms. Of course, each algorithm may also be segmented according to the size of the actual calculation amount and arranged on the slave nodes, for example, the neural network algorithm, the decision tree algorithm, and the support vector machine algorithm are segmented and deployed according to the steps of the respective algorithms, respectively, and the master node summarizes the calculation results of the slave node part after receiving the calculation results.
In another aspect of the invention, an air quality monitoring and warning system based on cloud computing and machine learning is provided, which can implement the air quality monitoring and warning method based on cloud computing and machine learning.
In another aspect of the present invention, an air quality monitoring and warning device based on cloud computing and machine learning is provided, where the device includes a processor and a memory, and is capable of implementing the air quality monitoring and warning method based on cloud computing and machine learning.
In another aspect of the present invention, a storage medium is provided, on which a computer program is stored, wherein the computer program is capable of implementing the foregoing air quality monitoring and warning method based on cloud computing and machine learning.
Drawings
FIG. 1 is a diagram of a relationship between a sensor and a cloud computing platform.
Fig. 2 is a work data flow diagram of a cloud computing platform.
FIG. 3 is a plot of the air quality values versus actual AQI values obtained by the NN, DT, SVM algorithms.
Fig. 4 is a comparison graph of an air quality predicted value and an actual AQI value obtained according to weights of three algorithms after the three algorithms of NN, DT and SVM are combined.
Detailed Description
The present invention will be further described with reference to the following examples.
As shown in fig. 1, sampling values of the air quality factors obtained by the monitoring sensors are transmitted to a cloud computing platform; the cloud computing platform at least comprises a base support layer, an algorithm layer and an application visualization layer. The base support layer adopts a mature Hadoop platform, for example, a MapReduce framework is adopted to distribute parallel tasks for each cloud server, and the like. The algorithm layer adopts the air quality prediction method based on the machine learning algorithm, and specifically, three machine learning algorithms of a neural network NN, a decision tree DT and a support vector machine SVM are transversely combined. And (3) carrying out visual display on the obtained air quality predicted value and the determined air quality grade by using a visual layer, for example, giving an alarm when the air quality grade is light pollution, moderate pollution or severe pollution, and prompting that the air quality factor exceeds the standard, wherein the alarm content can be displayed by selecting proper content and form according to the requirement.
As shown in fig. 2, the cloud computing respectively carries out calculation on a current sampling value by three algorithms of a neural network NN, a decision tree DT and a support vector machine SVM, respectively calculates prediction error conditions of the three algorithms based on historical data conditions, obtains an air quality prediction value based on an NN-DT-SVM combination according to a weight value when the three algorithms are transversely combined according to the error conditions, compares the prediction value with an air quality standard value, and judges and outputs whether to give an alarm or not.
Data set: the data set adopted is from the air quality data of the city unsealing from 1 month in 2013 to 11 months in 2020, and the first six items are air quality factors as described in the following table 1: PM2.5, PM10, SO2、CO、NO2、O3AQI is the air quality fraction, the last term corresponding to its quality class. The national Air Quality Index (AQI) technical regulation (trial) regulation for environment out of the counter is to use the Air Quality Index (AQI) to replace the original Air Pollution Index (API) AQI and divide the AQI into six grades, wherein the first grade is excellent, the second grade is excellent, the third grade is slightly polluted, the fourth grade is moderately polluted, the fifth grade is severely polluted and the sixth grade is severely polluted.
The program language adopts C + +, the operating system is windows 10, the data set is divided into a training set and a test set according to a proportion, three models are respectively used for calculation, 35 air quality samples from 1 month to 2020 and 11 months in 2018 are used as the test set, the rest are used as the training set, the obtained results are shown in the following table, and the figure 3 shows a comparison graph of the air quality values obtained by three algorithms of NN, DT and SVM and the actual AQI values:
the AQI list represents an actual AQI value of the sample, NN represents an AQI value obtained by adopting a random forest algorithm, DT represents an AQI value obtained by adopting a decision tree, SVM represents an AQI value obtained after adopting a support vector machine, E _ NN represents the square of a difference value between the AQI value obtained by adopting a random forest algorithm and the actual AQI value, E _ DT represents the square of a difference value between the AQI value obtained by adopting a decision tree algorithm and the actual AQI value, and E _ SVM represents the square of a difference value between the AQI value obtained by adopting a support vector machine algorithm.
According to the method provided by the invention, the average error square value of three algorithms of NN, DT and SVM is obtained by calculation: eNN,EDT,ESVM. Then calculating the weights of the three algorithmsValue WNN,WDT,WSVM:
The air quality prediction value calculation formula is as follows: p is 0.424971. PNN+0.136312·PDT+0.438717·PSVM。
Figure 4 shows a comparison graph of the air quality predicted value and the actual AQI value obtained according to the weight of three algorithms after the NN algorithm, the DT algorithm and the SVM algorithm are combined.
While one embodiment of the present invention has been described in detail, the description is only a preferred embodiment of the present invention and should not be taken as limiting the scope of the invention. All equivalent changes and modifications made within the scope of the present invention shall fall within the scope of the present invention.
Claims (10)
1. An air quality monitoring and alarming method based on cloud computing and machine learning is characterized by comprising the following steps:
step 1: transmitting the sampling values of the air quality factors obtained by each monitoring sensor into the cloud platform;
step 2: the cloud platform calculates a predicted value of the air quality factor; the specific calculation is to adoptThe neural network NN, the decision tree DT and the SVM algorithm respectively calculate the predicted value of the air quality, and weights are given to the three algorithms based on historical prediction conditions: wNN,WDT,WSVMCalculating a final air quality predicted value;
and step 3: and (3) determining the air quality grade according to the air quality predicted value obtained in the step (2), and giving an alarm to prompt that the air quality factor exceeds the standard when the air quality grade is light pollution, moderate pollution or severe pollution.
2. The method of claim 1, wherein the neural network NN is used to calculate the air quality prediction by:
firstly, initializing a network, determining the number of hidden layers and the number of neurons in each layer, initializing the connection weight between the neurons in each layer of the neural network, determining input and target output, and determining the number of nodes in the hidden layersninIs the number of nodes of the input layer, noutOutputting the number of nodes of the layer, then inputting training data, determining a training data set and inputting each group of data in the data set into a neural network; calculating according to the neural network and the weight value to obtain the output of the network; calculating the error between the output obtained by the neural network and the target output, and if the error does not reach an acceptable threshold, reversely propagating the error information to correct the connection weight in the network;
the training sample set is: TDS { (x)1,y1),(x2,y2)…(xN,yN)},x1、x2、…xNIs an air quality factor sample, y2、…yNIs the air quality value of the corresponding air quality factor sample; the neural network is in (x)i,yi) Mean square error ofiIs shown asError threshold is set to μ(ii) a The weight update Δ τ between the hidden layer and the output layer is:wherein theta is the learning efficiency of the neural network, theta is more than 0 and less than 1, tau is the weight between the hidden layer and the output layer,mean square error of expression pair EiA first derivative of the weight τ;
judging whether all data in the data set participate in the training process of the neural network; if so, outputting the weight among the nodes of each layer of the neural network, and finishing the training process; otherwise, continuing to execute the training process;
obtaining a neural network with an error within a given threshold range through the learning process; after determining the connection weight among each neuron node in the neural network, calculating according to new input by using the network and outputting a result;
substituting the current air quality factor monitoring value into the neural network model to obtain the air quality predicted value PNN。
3. The method according to claim 1, wherein the process of calculating the air quality prediction value using the decision tree DT is:
the training Data set is TDS (training Data set), the number of samples is N, and the expression is TDS { (x)1,y1),(x2,y2)…(xN,yN)},x1、x2、…xNIs a sample of the air quality factor, e.g. x1Is in a table; y is1、y2、…yNAre sample values of the corresponding air quality factor.
The objective function of the decision tree in sample division is the sum of the squares of the minimum errors, namely:where j denotes the jth variable in each sample,the total number of sample variables is M, s represents the dividing point s of the jth variable, R1(j, s) denotes the left region of the division, R2(j, s) denotes the right area of the division, c1And c2Marking region R1(j, s) and R2(j, s) optimal output value; traversing each feature j of the sample, trying possible segmentation points s of each feature, selecting the least error square sum, and determining the optimal output value c1And c2(ii) a Obtaining an air quality prediction regression tree after training is finished;
substituting the current air quality factor monitoring value into an air quality prediction regression tree to obtain an air quality prediction value PDT。
4. The method of claim 1, wherein the process of calculating the air quality prediction value using a Support Vector Machine (SVM) is:
the training sample set is: TDS { (x)1,y1),(x2,y2)…(xN,yN)},x1、x2、…xNIs an air quality factor sample, y2、…yNIs an air quality value corresponding to the air quality factor;
exist in a hyperplaneSo that the samples can be correctly classified, X ═ X1、x2、…xNIs the number N of samples that are,is to map X to a high-dimensional feature spaceOmega is a normal vector and determines the direction of the hyperplane, b is a displacement term and is used as an optimization variable, in order to enable samples to be accurately classified according to the hyperplane, a relaxation variable epsilon is introduced, and an objective function is
εi0, i 1, …, N is a relaxation variable, each sample x in the training sample setiAll correspond to a relaxation variable epsiloniTo characterize the sample as not satisfying the constraintThe degree of (d);
gaussian kernel functions replace inner product operations in high dimensional space,σ is the Gaussian kernel bandwidth, αiIs a Lagrange factor; calculating by using sample values, averaging all obtained b values, and finally obtaining a support vector classification prediction function after training is finished
Substituting the current air quality factor monitoring value into an air quality support vector classification prediction function to obtain an air quality prediction value PSVM。
5. The method of claim 1, further characterized in that neural networks NN, decision trees DT algorithms, support vector machines SVM algorithms are executed in parallel.
6. The method according to claim 1, wherein the three algorithms are weighted based on historical prediction, specifically:
taking historical data as input of three algorithms to obtain predicted values of the three algorithms, performing difference operation on the predicted values and corresponding historical air quality values, and then squaring to obtain an average error square value: eNN,EDT,ESVM(ii) a Then calculating the weight value W of the three algorithmsNN,WDT,WsVM:
The final air quality prediction value calculation formula is as follows: p ═ WNN·PNN+WDT·PDT+WSVM·PSVM。
7. The method of claim 1, the air quality factor being: PM2.5, PM10, SO2、CO、NO2、O3。
8. An air quality monitoring and warning system based on cloud computing and machine learning, which is capable of implementing the method of any one of the preceding claims 1-7.
9. An air quality monitoring and warning device based on cloud computing and machine learning, the device comprising a processor, a memory, which is capable of implementing the method of any of the preceding claims 1-7.
10. A storage medium having stored thereon a computer program enabling the method of any of the preceding claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011627381.8A CN112711912B (en) | 2020-12-30 | 2020-12-30 | Air quality monitoring and alarming method, system, device and medium based on cloud computing and machine learning algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011627381.8A CN112711912B (en) | 2020-12-30 | 2020-12-30 | Air quality monitoring and alarming method, system, device and medium based on cloud computing and machine learning algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112711912A true CN112711912A (en) | 2021-04-27 |
CN112711912B CN112711912B (en) | 2024-03-19 |
Family
ID=75547665
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011627381.8A Active CN112711912B (en) | 2020-12-30 | 2020-12-30 | Air quality monitoring and alarming method, system, device and medium based on cloud computing and machine learning algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112711912B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114022333A (en) * | 2022-01-10 | 2022-02-08 | 北京英视睿达科技股份有限公司 | Method and system for estimating atmospheric pollutant emission based on economic big data |
CN117129036A (en) * | 2023-08-28 | 2023-11-28 | 瀚能科技有限公司 | Cloud environment monitoring method and device |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104751242A (en) * | 2015-03-27 | 2015-07-01 | 北京奇虎科技有限公司 | Method and device for predicting air quality index |
CN106651036A (en) * | 2016-12-26 | 2017-05-10 | 东莞理工学院 | Air quality forecasting system |
CN106708016A (en) * | 2016-12-22 | 2017-05-24 | 中国石油天然气股份有限公司 | fault monitoring method and device |
CN107368894A (en) * | 2017-07-28 | 2017-11-21 | 国网河南省电力公司电力科学研究院 | The prevention and control of air pollution electricity consumption data analysis platform shared based on big data |
US20180284737A1 (en) * | 2016-05-09 | 2018-10-04 | StrongForce IoT Portfolio 2016, LLC | Methods and systems for detection in an industrial internet of things data collection environment with large data sets |
CN109063892A (en) * | 2018-06-25 | 2018-12-21 | 华北电力大学 | Industry watt-hour meter prediction technique based on BP-LSSVM combination optimization model |
CN109360022A (en) * | 2018-10-15 | 2019-02-19 | 广东工业大学 | A kind of market Sales Volume of Commodity prediction technique, device and equipment based on data mining |
CN109784708A (en) * | 2019-01-07 | 2019-05-21 | 江河瑞通(北京)技术有限公司 | The cloud service system that the coupling of water industry multi-model calculates |
CN110222762A (en) * | 2019-06-04 | 2019-09-10 | 恒安嘉新(北京)科技股份公司 | Object prediction method, apparatus, equipment and medium |
CN111343279A (en) * | 2020-03-04 | 2020-06-26 | 兰州理工大学 | Air pollution detection and alarm system based on big data and cloud computing |
-
2020
- 2020-12-30 CN CN202011627381.8A patent/CN112711912B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104751242A (en) * | 2015-03-27 | 2015-07-01 | 北京奇虎科技有限公司 | Method and device for predicting air quality index |
US20180284737A1 (en) * | 2016-05-09 | 2018-10-04 | StrongForce IoT Portfolio 2016, LLC | Methods and systems for detection in an industrial internet of things data collection environment with large data sets |
CN106708016A (en) * | 2016-12-22 | 2017-05-24 | 中国石油天然气股份有限公司 | fault monitoring method and device |
CN106651036A (en) * | 2016-12-26 | 2017-05-10 | 东莞理工学院 | Air quality forecasting system |
CN107368894A (en) * | 2017-07-28 | 2017-11-21 | 国网河南省电力公司电力科学研究院 | The prevention and control of air pollution electricity consumption data analysis platform shared based on big data |
CN109063892A (en) * | 2018-06-25 | 2018-12-21 | 华北电力大学 | Industry watt-hour meter prediction technique based on BP-LSSVM combination optimization model |
CN109360022A (en) * | 2018-10-15 | 2019-02-19 | 广东工业大学 | A kind of market Sales Volume of Commodity prediction technique, device and equipment based on data mining |
CN109784708A (en) * | 2019-01-07 | 2019-05-21 | 江河瑞通(北京)技术有限公司 | The cloud service system that the coupling of water industry multi-model calculates |
CN110222762A (en) * | 2019-06-04 | 2019-09-10 | 恒安嘉新(北京)科技股份公司 | Object prediction method, apparatus, equipment and medium |
CN111343279A (en) * | 2020-03-04 | 2020-06-26 | 兰州理工大学 | Air pollution detection and alarm system based on big data and cloud computing |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114022333A (en) * | 2022-01-10 | 2022-02-08 | 北京英视睿达科技股份有限公司 | Method and system for estimating atmospheric pollutant emission based on economic big data |
CN117129036A (en) * | 2023-08-28 | 2023-11-28 | 瀚能科技有限公司 | Cloud environment monitoring method and device |
Also Published As
Publication number | Publication date |
---|---|
CN112711912B (en) | 2024-03-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Angelov et al. | Automatic generation of fuzzy rule-based models from data by genetic algorithms | |
Sahoo et al. | Predicting flux decline in crossflow membranes using artificial neural networks and genetic algorithms | |
Chen et al. | Estimating simulation workload in cloud manufacturing using a classifying artificial neural network ensemble approach | |
CN108564136B (en) | A kind of airspace operation Situation Assessment classification method based on fuzzy reasoning | |
Valencia et al. | A Kendall correlation coefficient between functional data | |
CN110571792A (en) | Analysis and evaluation method and system for operation state of power grid regulation and control system | |
Ismail et al. | Quality monitoring in multistage manufacturing systems by using machine learning techniques | |
CN112711912A (en) | Air quality monitoring and alarming method, system, device and medium based on cloud computing and machine learning algorithm | |
CN113657814B (en) | Aviation network risk prediction method and risk grade evaluation method | |
Jyoti et al. | Data clustering approach to industrial process monitoring, fault detection and isolation | |
Jui et al. | Flat price prediction using linear and random forest regression based on machine learning techniques | |
Zhang et al. | Research on the combined prediction model of residential building energy consumption based on random forest and BP neural network | |
Marjuni et al. | Unsupervised software defect prediction using signed Laplacian-based spectral classifier | |
CN114118508A (en) | OD market aviation passenger flow prediction method based on space-time convolution network | |
CN117575564A (en) | Extensible infrastructure network component maintenance and transformation decision evaluation method and system | |
Shi et al. | A dynamic novel approach for bid/no-bid decision-making | |
CN112733903B (en) | SVM-RF-DT combination-based air quality monitoring and alarming method, system, device and medium | |
Wang et al. | Dynamic traffic prediction based on traffic flow mining | |
Vedavathi et al. | Unsupervised learning algorithm for time series using bivariate AR (1) model | |
CN114115150A (en) | Data-based heat pump system online modeling method and device | |
Pang et al. | Wt model & applications in loan platform customer default prediction based on decision tree algorithms | |
Liu et al. | RETRACTED ARTICLE: Company financial path analysis using fuzzy c-means and its application in financial failure prediction | |
Yanto et al. | Hybrid Method Air Quality Classification Analysis Model. | |
Zaabar et al. | A two-phase part family formation model to optimize resource planning: a case study in the electronics industry | |
Wang | Financial distress prediction for listed enterprises using fuzzy C-means |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |