CN112711912A - Air quality monitoring and alarming method, system, device and medium based on cloud computing and machine learning algorithm - Google Patents

Air quality monitoring and alarming method, system, device and medium based on cloud computing and machine learning algorithm Download PDF

Info

Publication number
CN112711912A
CN112711912A CN202011627381.8A CN202011627381A CN112711912A CN 112711912 A CN112711912 A CN 112711912A CN 202011627381 A CN202011627381 A CN 202011627381A CN 112711912 A CN112711912 A CN 112711912A
Authority
CN
China
Prior art keywords
air quality
value
neural network
sample
svm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011627381.8A
Other languages
Chinese (zh)
Other versions
CN112711912B (en
Inventor
黄海
吴霖瑞
谢昊岩
吴岁纯
罗莉芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xuchang University
Original Assignee
Xuchang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xuchang University filed Critical Xuchang University
Priority to CN202011627381.8A priority Critical patent/CN112711912B/en
Publication of CN112711912A publication Critical patent/CN112711912A/en
Application granted granted Critical
Publication of CN112711912B publication Critical patent/CN112711912B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/0004Gaseous mixtures, e.g. polluted air
    • G01N33/0009General constructional details of gas analysers, e.g. portable test equipment
    • G01N33/0062General constructional details of gas analysers, e.g. portable test equipment concerning the measuring method or the display, e.g. intermittent measurement or digital display
    • G01N33/0063General constructional details of gas analysers, e.g. portable test equipment concerning the measuring method or the display, e.g. intermittent measurement or digital display using a threshold to release an alarm or displaying means
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/0004Gaseous mixtures, e.g. polluted air
    • G01N33/0009General constructional details of gas analysers, e.g. portable test equipment
    • G01N33/0062General constructional details of gas analysers, e.g. portable test equipment concerning the measuring method or the display, e.g. intermittent measurement or digital display
    • G01N33/0068General constructional details of gas analysers, e.g. portable test equipment concerning the measuring method or the display, e.g. intermittent measurement or digital display using a computer specifically programmed
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/06Multi-objective optimisation, e.g. Pareto optimisation using simulated annealing [SA], ant colony algorithms or genetic algorithms [GA]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/08Fluids

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Chemical & Material Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Food Science & Technology (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Combustion & Propulsion (AREA)
  • Biomedical Technology (AREA)
  • Medicinal Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Geometry (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an air quality monitoring and alarming method, a system, a device and a medium based on cloud computing and machine learning, which utilize the computing power of cloud computing, adopt a MapReduce framework, transversely combine three machine learning algorithms of a neural network NN, a decision tree DT and a vector machine SVM, give weights to the three algorithms by using historical prediction errors, and then calculate a final air quality prediction result, thereby fully exerting the respective advantages of the machine learning algorithms, accurately predicting the air quality and providing help for people to go out and relevant departments to control the atmospheric pollution.

Description

Air quality monitoring and alarming method, system, device and medium based on cloud computing and machine learning algorithm
Technical Field
The invention relates to air quality monitoring and alarming in a cloud environment, in particular to artificial intelligence-based air quality monitoring and alarming in cloud computing.
Background
Along with the development of industry, the problem of air pollution is increasingly aggravated, and the air pollution brings serious influence to people's life and work, for example, air pollution can cause respiratory diseases such as asthma, cough, etc., and can increase the risk that people who suffer from inherent diseases aggravate the state of an illness, even threaten life. The monitoring and alarming of the air quality can lead people to do preventive work in advance and can also assist the environmental management department to do decision-making and management work.
At present, machine learning develops rapidly, and various intelligent algorithms are applied to various industries. In the aspect of atmospheric pollution early warning, a plurality of artificial intelligence algorithms such as a Support Vector Machine (SVM) algorithm, a random forest algorithm (RF), a Decision Tree (DT), a Neural Network (NN), a Particle Swarm (PSO), an Artificial Fish Swarm (AFSA) and the like are integrated. However, these algorithms are used alone or in combination with each other longitudinally to analyze the contaminant concentration and do not take full advantage of the parallel calculations performed by these intelligent algorithms.
Meanwhile, with the continuous expansion of the scale of the air monitoring system in China, the air quality data expands rapidly, and the monitoring data of mass air quality factors brings great challenges to accurate and real-time analysis. The cloud computing technology not only provides mass data storage, but also has strong computing capacity to support real-time data analysis and mining, wherein MapReduce is a distributed programming model, data to be processed can be divided into a plurality of modules, a large number of computers in a network can respectively and simultaneously compute, and then results are collected to obtain a conclusion.
The invention provides an air quality monitoring and alarming method, a system, a device and a medium based on cloud computing and machine learning, which utilize the computing power of cloud computing, adopt a MapReduce framework, transversely combine three machine learning algorithms of a neural network NN, a decision tree DT and a vector machine SVM, give weights to the three algorithms by using historical prediction errors, and then calculate a final air quality prediction result, thereby fully exerting the respective advantages of the machine learning algorithms, accurately predicting the air quality and providing help for people to go out and relevant departments to control the atmospheric pollution.
Disclosure of Invention
The invention provides an air quality monitoring and alarming method based on cloud computing and machine learning, which specifically comprises the following steps: step 1: transmitting the sampling values of the air quality factors obtained by each monitoring sensor into the cloud platform; step 2: the cloud platform calculates a predicted value of the air quality factor; the specific calculation is that the neural network NN, the decision tree DT and the SVM are adopted to calculate the predicted values of the air quality respectively, and weights are given to the three algorithms based on the historical prediction condition: wNN,WDT,WSVMCalculating a final air quality predicted value; and step 3: and (3) determining the air quality grade according to the air quality predicted value obtained in the step (2), and giving an alarm to prompt that the air quality factor exceeds the standard when the air quality grade is light pollution, moderate pollution or severe pollution.
The specific calculation process in step 2 is as follows:
step 2.1, calculating an air quality prediction value by adopting a neural network NN algorithm:
a neural network is a multi-layer neural network, typically consisting of an input layer, one or more hidden layers, and an output layer. Within the same layer, there is no connection between each neuron, and the connection only exists between neuron nodes of adjacent layers. A Sigmoid function is generally used as a connection function, which can map input values in an arbitrary range into a (0, 1) range, and is therefore also called a compression function:
Figure BDA0002875232820000021
firstly, initializing the network, determining the number of layers of hidden layers and the number of neurons in each layer, and initializing each layer of neural networkDetermining input and target output by the connection weight between elements, the invention adopts experience mode to determine the number of nodes of hidden layer
Figure BDA0002875232820000022
Is the number of nodes of the input layer, noutThe number of the nodes of the output layer is, and the air quality factors of the invention are six types: PM2.5, PM10, SO2、CO、NO2、O3Thus, the number of nodes of the input layer of the model is 6, i.e., nin6; meanwhile, the number of hidden layers is 1, namely, the air quality is predicted by adopting a three-layer neural network.
Then inputting training data, determining a training data set and inputting each group of data in the data set into a neural network; and calculating to obtain the output of the network according to the neural network and the weight value. And calculating the error between the output obtained by the neural network and the target output, and if the error does not reach an acceptable threshold value, reversely propagating through the error information to correct the connection weight in the network.
Let the training sample set be: TDS { (x)1,y1),(x2,y2)…(xN,yN)},x1、x2、…xNIs an air quality factor sample, y2、…yNIs the air quality value of the corresponding air quality factor sample.
E.g. x1Is the value of air quality factor of the city opened in 1 month in 2013, as shown in the following table:
month of the year PM2.5 PM10 SO2 CO NO2 O3
Jan-13 193 206 38 5.594 21 0
y1Is x1Corresponding air quality value, also called AQI value, of
Month of the year AQI
Jan-13 238
x2Is the value of the air quality factor of the city opened in 2013 in 2 months, as shown in the following table:
month of the year PM2.5 PM10 SO2 CO NO2 O3 AQI
Feb-13 145 145 26 4.557 15 0 188
y2Is x2Corresponding air quality value, also called AQI value, of
Month of the year AQI
Jan-13 188
The neural network is in (x)i,yi) Mean square error Ei of
Figure BDA0002875232820000031
The error threshold is set to μ. The weight update Δ τ between the hidden layer and the output layer is:
Figure BDA0002875232820000032
wherein theta is the learning efficiency of the neural network, theta is more than 0 and less than 1, tau is the weight between the hidden layer and the output layer,
Figure BDA0002875232820000033
mean square error of expression pair EiThe first derivative of the weight τ.
And judging whether all the data in the data set participate in the training process of the neural network. If so, outputting the weight among the nodes of each layer of the neural network, and finishing the training process; otherwise, the training process continues.
Through the learning process, a neural network with an error within a given threshold can be obtained. After determining the connection weights between the neuron nodes in the neural network, the network can be used for calculation according to new input and outputting results.
Substituting the current air quality factor monitoring value into the neural network model to obtain the air quality predicted value PNN
Step 2.2, calculating an air quality predicted value by a Decision Tree (DT) algorithm:
decision trees, Decision trees and DT can analyze the information hidden in the data and having important significance, the expression of the information is visual, users can easily understand the hidden information, and the method is widely applied to data mining and prediction.
The training Data set is TDS (training Data set), the number of samples is N, and the expression is TDS { (x)1,y1),(x2,y2)…(xN,yN)},x1、x2、…xNIs a sample of the air quality factor, e.g. x1Is in table;y1、y2、…yNIs the corresponding air mass sample value.
The objective function of the decision tree in sample division is the sum of the squares of the minimum errors, namely:
Figure BDA0002875232820000034
wherein j represents j variable in each sample, the total number of sample variables is M, in the invention, M is 6, s represents dividing point s of j variable, R1(j, s) denotes the left region of the division, R2(j, s) denotes the right area of the division, c1And c2Marking region R1(j, s) and R2(j, s) of the optimal output value. Traversing each feature j of the sample, trying possible segmentation points s of each feature, selecting the least error square sum, and determining the optimal output value c1And c2. And obtaining an air quality prediction regression tree after training is finished.
Substituting the current air quality factor monitoring value into an air quality prediction regression tree to obtain an air quality prediction value PDT
Step 2.3, calculating an air quality prediction value by adopting a Support Vector Machine (SVM):
when the SVM deals with the non-linear problem, the support vector machine can convert the input data into a space with higher dimension through a specific function. Due to the increase of dimensionality, the problem of finding an optimal classification line for classifying samples in a low-dimensional space is converted into the problem of finding an optimal classification plane in a high-dimensional space, the SVM can obtain a global optimal solution, and the calculation accuracy can be guaranteed for the problem of small sample quantity.
Let the training sample set be: TDS { (x)1,y1),(x2,y2)…(xN,yN)},x1、x2、…xNIs an air quality factor sample, y2、…yNIs the air quality value of the corresponding air quality factor sample.
Exist in a hyperplane
Figure BDA0002875232820000041
So that the samples can be correctly classified, X ═ X1、x2、…xNIs the number N of samples that are,
Figure BDA0002875232820000042
is to map X to a high-dimensional feature space
Figure BDA0002875232820000043
Omega is a normal vector and determines the direction of the hyperplane, b is a displacement term and is used as an optimization variable, in order to enable samples to be accurately classified according to the hyperplane, a relaxation variable epsilon is introduced, and an objective function is
Figure BDA0002875232820000044
εi0, i 1, …, N is a relaxation variable, each sample x in the training sample setiAll correspond to a relaxation variable epsiloniTo characterize the sample as not satisfying the constraint
Figure BDA0002875232820000047
To the extent of (c).
The kernel function can replace inner product operation in high-dimensional space, and the kernel function does not need to know
Figure BDA0002875232820000048
The specific form of (2) can be used to obtain the inner product result. The kernel function can be selected from linear kernel function, polynomial kernel function, radial basis kernel function, Gaussian kernel function and sigmoid kernel function, and the invention selects Gaussian kernel function, i.e. the kernel function is selected
Figure BDA0002875232820000045
Sigma > 0, sigma is the Gaussian kernel bandwidth, alphaiIs the lagrange factor. Calculating by using sample values, averaging all obtained b values, and finally obtaining a support vector classification prediction function after training is finished
Figure BDA0002875232820000046
Substituting the current air quality factor monitoring value into an air quality support vector classification prediction function to obtain an air quality prediction value PSVM
Step 2.4, weighting the predicted values obtained by at least two algorithms by using historical prediction errors, specifically:
taking historical data as input of three algorithms to obtain predicted values of the three algorithms, performing difference operation on the predicted values and corresponding historical air quality values, and then squaring to obtain an average error square value: eNN,EDT,ESVM. Then calculating the weight value W of the three algorithmsNN,WDT,WSVM
Figure BDA0002875232820000051
Will be provided with
Figure BDA0002875232820000052
Normalization processing is carried out to obtain WNN,WDT,WSVMI.e. by
Figure BDA0002875232820000053
Figure BDA0002875232820000054
The final air quality prediction value calculation formula is as follows: p ═ WNN·PNN+WDT·PDT+WSVM·PSVM
The step 3 specifically comprises the following steps:
according to the technical regulation (trial) of the environmental Air Quality Index (AQI) published by the nation in 2012 (HJ633-2012), the air quality is evaluated by using the AQI, and the standard of evaluating and grading the air quality according to the AQI value is given.
AQI index Air quality rating Air quality status
0~50 First stage Superior food
51~100 Second stage Good wine
101~150 Three-stage Slight pollution
151~200 Four stages Moderate pollution
201~250 Five stages Severe pollution
251~300 Six stages Severe pollution
And (3) determining the air quality grade according to the air quality predicted value obtained in the step (2), and giving an alarm to prompt that the air quality factor exceeds the standard when the air quality grade is light pollution, moderate pollution or severe pollution. The alarm content can be displayed by selecting proper content and form according to the requirement.
The cloud computing technology not only provides mass data storage, but also has strong computing capacity to support real-time data analysis and mining, wherein MapReduce is a distributed programming model, data to be processed can be divided into a plurality of modules, a large number of computers in a network can respectively and simultaneously compute, and then results are collected to obtain a conclusion. The MapReduce realizes the parallel processing of data and has two steps: map and Reduce, Map maps a group of Key/Value to another group of Key/Value, Reduce is a protocol process, merge the Value with the same Key together, finally output a series of Key/Value pairs as the result.
For data to be processed, the data to be processed can be divided into a plurality of fragments, one fragment corresponds to one data block in a file system, a single Map task reads one fragment, a plurality of Map tasks are operated on a cluster for parallel processing, the Map tasks preferentially read local data, and network transmission overhead is reduced as much as possible.
The structure and principle of MapReduce are described by taking Hadoop MapReduce as an example, a Hadoop bottom file system is an HDFS, and Hadoop MapReduce reads data from the HDFS and outputs an operation result to the HDFS.
The MapReduce adopts a Master-Slave architecture and comprises a Master node Master and a plurality of Slave nodes Slave, wherein JobTracker is operated on the Master node for initializing, distributing and coordinating monitoring operation, and TaskTracker is operated on the Slave nodes, communicates with the TaskTracker and is used for executing Map tasks and Reduce tasks. Communication and task allocation are completed between the JobTracker and the TaskTracker through a heartbeat mechanism, the TaskTracker sends inquiry information to the JobTracker regularly, if a job needs to be executed, the task can be allocated to a job task, the task can be a Map task or a Reduce task, after the task is allocated, the TaskTracker stores a task code and configuration information in the local, a JVM is started to execute the task, the information of the task is reported to the TaskTracker in the running process, the TaskTracker sends the summarized information to the JobTracker, and the job is marked as successful after the task counter JobTracker confirms that the last task is finished running.
When the MapReduce operation is specifically carried out according to the three artificial intelligence algorithms adopted by the invention, the three algorithms which are executed in parallel can be respectively arranged on the slave nodes, and the master node is responsible for summarizing the operation results of the three algorithms. Of course, each algorithm may also be segmented according to the size of the actual calculation amount and arranged on the slave nodes, for example, the neural network algorithm, the decision tree algorithm, and the support vector machine algorithm are segmented and deployed according to the steps of the respective algorithms, respectively, and the master node summarizes the calculation results of the slave node part after receiving the calculation results.
In another aspect of the invention, an air quality monitoring and warning system based on cloud computing and machine learning is provided, which can implement the air quality monitoring and warning method based on cloud computing and machine learning.
In another aspect of the present invention, an air quality monitoring and warning device based on cloud computing and machine learning is provided, where the device includes a processor and a memory, and is capable of implementing the air quality monitoring and warning method based on cloud computing and machine learning.
In another aspect of the present invention, a storage medium is provided, on which a computer program is stored, wherein the computer program is capable of implementing the foregoing air quality monitoring and warning method based on cloud computing and machine learning.
Drawings
FIG. 1 is a diagram of a relationship between a sensor and a cloud computing platform.
Fig. 2 is a work data flow diagram of a cloud computing platform.
FIG. 3 is a plot of the air quality values versus actual AQI values obtained by the NN, DT, SVM algorithms.
Fig. 4 is a comparison graph of an air quality predicted value and an actual AQI value obtained according to weights of three algorithms after the three algorithms of NN, DT and SVM are combined.
Detailed Description
The present invention will be further described with reference to the following examples.
As shown in fig. 1, sampling values of the air quality factors obtained by the monitoring sensors are transmitted to a cloud computing platform; the cloud computing platform at least comprises a base support layer, an algorithm layer and an application visualization layer. The base support layer adopts a mature Hadoop platform, for example, a MapReduce framework is adopted to distribute parallel tasks for each cloud server, and the like. The algorithm layer adopts the air quality prediction method based on the machine learning algorithm, and specifically, three machine learning algorithms of a neural network NN, a decision tree DT and a support vector machine SVM are transversely combined. And (3) carrying out visual display on the obtained air quality predicted value and the determined air quality grade by using a visual layer, for example, giving an alarm when the air quality grade is light pollution, moderate pollution or severe pollution, and prompting that the air quality factor exceeds the standard, wherein the alarm content can be displayed by selecting proper content and form according to the requirement.
As shown in fig. 2, the cloud computing respectively carries out calculation on a current sampling value by three algorithms of a neural network NN, a decision tree DT and a support vector machine SVM, respectively calculates prediction error conditions of the three algorithms based on historical data conditions, obtains an air quality prediction value based on an NN-DT-SVM combination according to a weight value when the three algorithms are transversely combined according to the error conditions, compares the prediction value with an air quality standard value, and judges and outputs whether to give an alarm or not.
Data set: the data set adopted is from the air quality data of the city unsealing from 1 month in 2013 to 11 months in 2020, and the first six items are air quality factors as described in the following table 1: PM2.5, PM10, SO2、CO、NO2、O3AQI is the air quality fraction, the last term corresponding to its quality class. The national Air Quality Index (AQI) technical regulation (trial) regulation for environment out of the counter is to use the Air Quality Index (AQI) to replace the original Air Pollution Index (API) AQI and divide the AQI into six grades, wherein the first grade is excellent, the second grade is excellent, the third grade is slightly polluted, the fourth grade is moderately polluted, the fifth grade is severely polluted and the sixth grade is severely polluted.
Figure BDA0002875232820000071
Figure BDA0002875232820000081
Figure BDA0002875232820000091
Figure BDA0002875232820000101
The program language adopts C + +, the operating system is windows 10, the data set is divided into a training set and a test set according to a proportion, three models are respectively used for calculation, 35 air quality samples from 1 month to 2020 and 11 months in 2018 are used as the test set, the rest are used as the training set, the obtained results are shown in the following table, and the figure 3 shows a comparison graph of the air quality values obtained by three algorithms of NN, DT and SVM and the actual AQI values:
Figure BDA0002875232820000102
Figure BDA0002875232820000111
the AQI list represents an actual AQI value of the sample, NN represents an AQI value obtained by adopting a random forest algorithm, DT represents an AQI value obtained by adopting a decision tree, SVM represents an AQI value obtained after adopting a support vector machine, E _ NN represents the square of a difference value between the AQI value obtained by adopting a random forest algorithm and the actual AQI value, E _ DT represents the square of a difference value between the AQI value obtained by adopting a decision tree algorithm and the actual AQI value, and E _ SVM represents the square of a difference value between the AQI value obtained by adopting a support vector machine algorithm.
According to the method provided by the invention, the average error square value of three algorithms of NN, DT and SVM is obtained by calculation: eNN,EDT,ESVM. Then calculating the weights of the three algorithmsValue WNN,WDT,WSVM
Figure BDA0002875232820000112
Will be provided with
Figure BDA0002875232820000113
Normalization processing is carried out to obtain WNN,WDT,WSVMI.e. by
Figure BDA0002875232820000114
Figure BDA0002875232820000115
Figure BDA0002875232820000116
The air quality prediction value calculation formula is as follows: p is 0.424971. PNN+0.136312·PDT+0.438717·PSVM
Figure 4 shows a comparison graph of the air quality predicted value and the actual AQI value obtained according to the weight of three algorithms after the NN algorithm, the DT algorithm and the SVM algorithm are combined.
While one embodiment of the present invention has been described in detail, the description is only a preferred embodiment of the present invention and should not be taken as limiting the scope of the invention. All equivalent changes and modifications made within the scope of the present invention shall fall within the scope of the present invention.

Claims (10)

1. An air quality monitoring and alarming method based on cloud computing and machine learning is characterized by comprising the following steps:
step 1: transmitting the sampling values of the air quality factors obtained by each monitoring sensor into the cloud platform;
step 2: the cloud platform calculates a predicted value of the air quality factor; the specific calculation is to adoptThe neural network NN, the decision tree DT and the SVM algorithm respectively calculate the predicted value of the air quality, and weights are given to the three algorithms based on historical prediction conditions: wNN,WDT,WSVMCalculating a final air quality predicted value;
and step 3: and (3) determining the air quality grade according to the air quality predicted value obtained in the step (2), and giving an alarm to prompt that the air quality factor exceeds the standard when the air quality grade is light pollution, moderate pollution or severe pollution.
2. The method of claim 1, wherein the neural network NN is used to calculate the air quality prediction by:
firstly, initializing a network, determining the number of hidden layers and the number of neurons in each layer, initializing the connection weight between the neurons in each layer of the neural network, determining input and target output, and determining the number of nodes in the hidden layers
Figure FDA0002875232810000011
ninIs the number of nodes of the input layer, noutOutputting the number of nodes of the layer, then inputting training data, determining a training data set and inputting each group of data in the data set into a neural network; calculating according to the neural network and the weight value to obtain the output of the network; calculating the error between the output obtained by the neural network and the target output, and if the error does not reach an acceptable threshold, reversely propagating the error information to correct the connection weight in the network;
the training sample set is: TDS { (x)1,y1),(x2,y2)…(xN,yN)},x1、x2、…xNIs an air quality factor sample, y2、…yNIs the air quality value of the corresponding air quality factor sample; the neural network is in (x)i,yi) Mean square error ofiIs shown as
Figure FDA0002875232810000012
Error threshold is set to μ(ii) a The weight update Δ τ between the hidden layer and the output layer is:
Figure FDA0002875232810000013
wherein theta is the learning efficiency of the neural network, theta is more than 0 and less than 1, tau is the weight between the hidden layer and the output layer,
Figure FDA0002875232810000014
mean square error of expression pair EiA first derivative of the weight τ;
judging whether all data in the data set participate in the training process of the neural network; if so, outputting the weight among the nodes of each layer of the neural network, and finishing the training process; otherwise, continuing to execute the training process;
obtaining a neural network with an error within a given threshold range through the learning process; after determining the connection weight among each neuron node in the neural network, calculating according to new input by using the network and outputting a result;
substituting the current air quality factor monitoring value into the neural network model to obtain the air quality predicted value PNN
3. The method according to claim 1, wherein the process of calculating the air quality prediction value using the decision tree DT is:
the training Data set is TDS (training Data set), the number of samples is N, and the expression is TDS { (x)1,y1),(x2,y2)…(xN,yN)},x1、x2、…xNIs a sample of the air quality factor, e.g. x1Is in a table; y is1、y2、…yNAre sample values of the corresponding air quality factor.
The objective function of the decision tree in sample division is the sum of the squares of the minimum errors, namely:
Figure FDA0002875232810000021
where j denotes the jth variable in each sample,the total number of sample variables is M, s represents the dividing point s of the jth variable, R1(j, s) denotes the left region of the division, R2(j, s) denotes the right area of the division, c1And c2Marking region R1(j, s) and R2(j, s) optimal output value; traversing each feature j of the sample, trying possible segmentation points s of each feature, selecting the least error square sum, and determining the optimal output value c1And c2(ii) a Obtaining an air quality prediction regression tree after training is finished;
substituting the current air quality factor monitoring value into an air quality prediction regression tree to obtain an air quality prediction value PDT
4. The method of claim 1, wherein the process of calculating the air quality prediction value using a Support Vector Machine (SVM) is:
the training sample set is: TDS { (x)1,y1),(x2,y2)…(xN,yN)},x1、x2、…xNIs an air quality factor sample, y2、…yNIs an air quality value corresponding to the air quality factor;
exist in a hyperplane
Figure FDA0002875232810000022
So that the samples can be correctly classified, X ═ X1、x2、…xNIs the number N of samples that are,
Figure FDA0002875232810000023
is to map X to a high-dimensional feature space
Figure FDA0002875232810000024
Omega is a normal vector and determines the direction of the hyperplane, b is a displacement term and is used as an optimization variable, in order to enable samples to be accurately classified according to the hyperplane, a relaxation variable epsilon is introduced, and an objective function is
Figure FDA0002875232810000025
εi0, i 1, …, N is a relaxation variable, each sample x in the training sample setiAll correspond to a relaxation variable epsiloniTo characterize the sample as not satisfying the constraint
Figure FDA0002875232810000026
The degree of (d);
gaussian kernel functions replace inner product operations in high dimensional space,
Figure FDA0002875232810000027
σ is the Gaussian kernel bandwidth, αiIs a Lagrange factor; calculating by using sample values, averaging all obtained b values, and finally obtaining a support vector classification prediction function after training is finished
Figure FDA0002875232810000031
Substituting the current air quality factor monitoring value into an air quality support vector classification prediction function to obtain an air quality prediction value PSVM
5. The method of claim 1, further characterized in that neural networks NN, decision trees DT algorithms, support vector machines SVM algorithms are executed in parallel.
6. The method according to claim 1, wherein the three algorithms are weighted based on historical prediction, specifically:
taking historical data as input of three algorithms to obtain predicted values of the three algorithms, performing difference operation on the predicted values and corresponding historical air quality values, and then squaring to obtain an average error square value: eNN,EDT,ESVM(ii) a Then calculating the weight value W of the three algorithmsNN,WDT,WsVM
Figure FDA0002875232810000032
Will be provided with
Figure FDA0002875232810000033
Normalization processing is carried out to obtain WNN,WDT,WSVMI.e. by
Figure FDA0002875232810000034
Figure FDA0002875232810000035
The final air quality prediction value calculation formula is as follows: p ═ WNN·PNN+WDT·PDT+WSVM·PSVM
7. The method of claim 1, the air quality factor being: PM2.5, PM10, SO2、CO、NO2、O3
8. An air quality monitoring and warning system based on cloud computing and machine learning, which is capable of implementing the method of any one of the preceding claims 1-7.
9. An air quality monitoring and warning device based on cloud computing and machine learning, the device comprising a processor, a memory, which is capable of implementing the method of any of the preceding claims 1-7.
10. A storage medium having stored thereon a computer program enabling the method of any of the preceding claims 1-7.
CN202011627381.8A 2020-12-30 2020-12-30 Air quality monitoring and alarming method, system, device and medium based on cloud computing and machine learning algorithm Active CN112711912B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011627381.8A CN112711912B (en) 2020-12-30 2020-12-30 Air quality monitoring and alarming method, system, device and medium based on cloud computing and machine learning algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011627381.8A CN112711912B (en) 2020-12-30 2020-12-30 Air quality monitoring and alarming method, system, device and medium based on cloud computing and machine learning algorithm

Publications (2)

Publication Number Publication Date
CN112711912A true CN112711912A (en) 2021-04-27
CN112711912B CN112711912B (en) 2024-03-19

Family

ID=75547665

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011627381.8A Active CN112711912B (en) 2020-12-30 2020-12-30 Air quality monitoring and alarming method, system, device and medium based on cloud computing and machine learning algorithm

Country Status (1)

Country Link
CN (1) CN112711912B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114022333A (en) * 2022-01-10 2022-02-08 北京英视睿达科技股份有限公司 Method and system for estimating atmospheric pollutant emission based on economic big data
CN117129036A (en) * 2023-08-28 2023-11-28 瀚能科技有限公司 Cloud environment monitoring method and device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104751242A (en) * 2015-03-27 2015-07-01 北京奇虎科技有限公司 Method and device for predicting air quality index
CN106651036A (en) * 2016-12-26 2017-05-10 东莞理工学院 Air quality forecasting system
CN106708016A (en) * 2016-12-22 2017-05-24 中国石油天然气股份有限公司 fault monitoring method and device
CN107368894A (en) * 2017-07-28 2017-11-21 国网河南省电力公司电力科学研究院 The prevention and control of air pollution electricity consumption data analysis platform shared based on big data
US20180284737A1 (en) * 2016-05-09 2018-10-04 StrongForce IoT Portfolio 2016, LLC Methods and systems for detection in an industrial internet of things data collection environment with large data sets
CN109063892A (en) * 2018-06-25 2018-12-21 华北电力大学 Industry watt-hour meter prediction technique based on BP-LSSVM combination optimization model
CN109360022A (en) * 2018-10-15 2019-02-19 广东工业大学 A kind of market Sales Volume of Commodity prediction technique, device and equipment based on data mining
CN109784708A (en) * 2019-01-07 2019-05-21 江河瑞通(北京)技术有限公司 The cloud service system that the coupling of water industry multi-model calculates
CN110222762A (en) * 2019-06-04 2019-09-10 恒安嘉新(北京)科技股份公司 Object prediction method, apparatus, equipment and medium
CN111343279A (en) * 2020-03-04 2020-06-26 兰州理工大学 Air pollution detection and alarm system based on big data and cloud computing

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104751242A (en) * 2015-03-27 2015-07-01 北京奇虎科技有限公司 Method and device for predicting air quality index
US20180284737A1 (en) * 2016-05-09 2018-10-04 StrongForce IoT Portfolio 2016, LLC Methods and systems for detection in an industrial internet of things data collection environment with large data sets
CN106708016A (en) * 2016-12-22 2017-05-24 中国石油天然气股份有限公司 fault monitoring method and device
CN106651036A (en) * 2016-12-26 2017-05-10 东莞理工学院 Air quality forecasting system
CN107368894A (en) * 2017-07-28 2017-11-21 国网河南省电力公司电力科学研究院 The prevention and control of air pollution electricity consumption data analysis platform shared based on big data
CN109063892A (en) * 2018-06-25 2018-12-21 华北电力大学 Industry watt-hour meter prediction technique based on BP-LSSVM combination optimization model
CN109360022A (en) * 2018-10-15 2019-02-19 广东工业大学 A kind of market Sales Volume of Commodity prediction technique, device and equipment based on data mining
CN109784708A (en) * 2019-01-07 2019-05-21 江河瑞通(北京)技术有限公司 The cloud service system that the coupling of water industry multi-model calculates
CN110222762A (en) * 2019-06-04 2019-09-10 恒安嘉新(北京)科技股份公司 Object prediction method, apparatus, equipment and medium
CN111343279A (en) * 2020-03-04 2020-06-26 兰州理工大学 Air pollution detection and alarm system based on big data and cloud computing

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114022333A (en) * 2022-01-10 2022-02-08 北京英视睿达科技股份有限公司 Method and system for estimating atmospheric pollutant emission based on economic big data
CN117129036A (en) * 2023-08-28 2023-11-28 瀚能科技有限公司 Cloud environment monitoring method and device

Also Published As

Publication number Publication date
CN112711912B (en) 2024-03-19

Similar Documents

Publication Publication Date Title
Angelov et al. Automatic generation of fuzzy rule-based models from data by genetic algorithms
Sahoo et al. Predicting flux decline in crossflow membranes using artificial neural networks and genetic algorithms
Chen et al. Estimating simulation workload in cloud manufacturing using a classifying artificial neural network ensemble approach
CN108564136B (en) A kind of airspace operation Situation Assessment classification method based on fuzzy reasoning
Valencia et al. A Kendall correlation coefficient between functional data
CN110571792A (en) Analysis and evaluation method and system for operation state of power grid regulation and control system
Ismail et al. Quality monitoring in multistage manufacturing systems by using machine learning techniques
CN112711912A (en) Air quality monitoring and alarming method, system, device and medium based on cloud computing and machine learning algorithm
CN113657814B (en) Aviation network risk prediction method and risk grade evaluation method
Jyoti et al. Data clustering approach to industrial process monitoring, fault detection and isolation
Jui et al. Flat price prediction using linear and random forest regression based on machine learning techniques
Zhang et al. Research on the combined prediction model of residential building energy consumption based on random forest and BP neural network
Marjuni et al. Unsupervised software defect prediction using signed Laplacian-based spectral classifier
CN114118508A (en) OD market aviation passenger flow prediction method based on space-time convolution network
CN117575564A (en) Extensible infrastructure network component maintenance and transformation decision evaluation method and system
Shi et al. A dynamic novel approach for bid/no-bid decision-making
CN112733903B (en) SVM-RF-DT combination-based air quality monitoring and alarming method, system, device and medium
Wang et al. Dynamic traffic prediction based on traffic flow mining
Vedavathi et al. Unsupervised learning algorithm for time series using bivariate AR (1) model
CN114115150A (en) Data-based heat pump system online modeling method and device
Pang et al. Wt model & applications in loan platform customer default prediction based on decision tree algorithms
Liu et al. RETRACTED ARTICLE: Company financial path analysis using fuzzy c-means and its application in financial failure prediction
Yanto et al. Hybrid Method Air Quality Classification Analysis Model.
Zaabar et al. A two-phase part family formation model to optimize resource planning: a case study in the electronics industry
Wang Financial distress prediction for listed enterprises using fuzzy C-means

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant