CN112711912A

CN112711912A - Air quality monitoring and alarming method, system, device and medium based on cloud computing and machine learning algorithm

Info

Publication number: CN112711912A
Application number: CN202011627381.8A
Authority: CN
Inventors: 黄海; 吴霖瑞; 谢昊岩; 吴岁纯; 罗莉芳
Original assignee: Xuchang University
Current assignee: Xuchang University
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-04-27
Anticipated expiration: 2040-12-30
Also published as: CN112711912B

Abstract

The invention provides an air quality monitoring and alarming method, a system, a device and a medium based on cloud computing and machine learning, which utilize the computing power of cloud computing, adopt a MapReduce framework, transversely combine three machine learning algorithms of a neural network NN, a decision tree DT and a vector machine SVM, give weights to the three algorithms by using historical prediction errors, and then calculate a final air quality prediction result, thereby fully exerting the respective advantages of the machine learning algorithms, accurately predicting the air quality and providing help for people to go out and relevant departments to control the atmospheric pollution.

Description

Air quality monitoring and alarming method, system, device and medium based on cloud computing and machine learning algorithm

Technical Field

The invention relates to air quality monitoring and alarming in a cloud environment, in particular to artificial intelligence-based air quality monitoring and alarming in cloud computing.

Background

Along with the development of industry, the problem of air pollution is increasingly aggravated, and the air pollution brings serious influence to people's life and work, for example, air pollution can cause respiratory diseases such as asthma, cough, etc., and can increase the risk that people who suffer from inherent diseases aggravate the state of an illness, even threaten life. The monitoring and alarming of the air quality can lead people to do preventive work in advance and can also assist the environmental management department to do decision-making and management work.

At present, machine learning develops rapidly, and various intelligent algorithms are applied to various industries. In the aspect of atmospheric pollution early warning, a plurality of artificial intelligence algorithms such as a Support Vector Machine (SVM) algorithm, a random forest algorithm (RF), a Decision Tree (DT), a Neural Network (NN), a Particle Swarm (PSO), an Artificial Fish Swarm (AFSA) and the like are integrated. However, these algorithms are used alone or in combination with each other longitudinally to analyze the contaminant concentration and do not take full advantage of the parallel calculations performed by these intelligent algorithms.

Meanwhile, with the continuous expansion of the scale of the air monitoring system in China, the air quality data expands rapidly, and the monitoring data of mass air quality factors brings great challenges to accurate and real-time analysis. The cloud computing technology not only provides mass data storage, but also has strong computing capacity to support real-time data analysis and mining, wherein MapReduce is a distributed programming model, data to be processed can be divided into a plurality of modules, a large number of computers in a network can respectively and simultaneously compute, and then results are collected to obtain a conclusion.

Disclosure of Invention

The invention provides an air quality monitoring and alarming method based on cloud computing and machine learning, which specifically comprises the following steps: step 1: transmitting the sampling values of the air quality factors obtained by each monitoring sensor into the cloud platform; step 2: the cloud platform calculates a predicted value of the air quality factor; the specific calculation is that the neural network NN, the decision tree DT and the SVM are adopted to calculate the predicted values of the air quality respectively, and weights are given to the three algorithms based on the historical prediction condition: w_NN，W_DT，W_SVMCalculating a final air quality predicted value; and step 3: and (3) determining the air quality grade according to the air quality predicted value obtained in the step (2), and giving an alarm to prompt that the air quality factor exceeds the standard when the air quality grade is light pollution, moderate pollution or severe pollution.

The specific calculation process in step 2 is as follows:

step 2.1, calculating an air quality prediction value by adopting a neural network NN algorithm:

a neural network is a multi-layer neural network, typically consisting of an input layer, one or more hidden layers, and an output layer. Within the same layer, there is no connection between each neuron, and the connection only exists between neuron nodes of adjacent layers. A Sigmoid function is generally used as a connection function, which can map input values in an arbitrary range into a (0, 1) range, and is therefore also called a compression function:

firstly, initializing the network, determining the number of layers of hidden layers and the number of neurons in each layer, and initializing each layer of neural networkDetermining input and target output by the connection weight between elements, the invention adopts experience mode to determine the number of nodes of hidden layer

Is the number of nodes of the input layer, n_outThe number of the nodes of the output layer is, and the air quality factors of the invention are six types: PM2.5, PM10, SO₂、CO、NO₂、O₃Thus, the number of nodes of the input layer of the model is 6, i.e., n_in6; meanwhile, the number of hidden layers is 1, namely, the air quality is predicted by adopting a three-layer neural network.

Then inputting training data, determining a training data set and inputting each group of data in the data set into a neural network; and calculating to obtain the output of the network according to the neural network and the weight value. And calculating the error between the output obtained by the neural network and the target output, and if the error does not reach an acceptable threshold value, reversely propagating through the error information to correct the connection weight in the network.

Let the training sample set be: TDS { (x)₁，y₁)，(x₂，y₂)…(x_N，y_N)}，x₁、x₂、…x_NIs an air quality factor sample, y₂、…y_NIs the air quality value of the corresponding air quality factor sample.

E.g. x₁Is the value of air quality factor of the city opened in 1 month in 2013, as shown in the following table:

month of the year	PM2.5	PM10	SO2	CO	NO2	O3
							Jan-13	193	206	38	5.594	21	0

y₁Is x₁Corresponding air quality value, also called AQI value, of

Month of the year	AQI
		Jan-13	238

x₂Is the value of the air quality factor of the city opened in 2013 in 2 months, as shown in the following table:

month of the year	PM2.5	PM10	SO2	CO	NO2	O3	AQI
								Feb-13	145	145	26	4.557	15	0	188

y₂Is x₂Corresponding air quality value, also called AQI value, of

Month of the year	AQI
		Jan-13	188

The neural network is in (x)_i，y_i) Mean square error Ei of

The error threshold is set to μ. The weight update Δ τ between the hidden layer and the output layer is:

wherein theta is the learning efficiency of the neural network, theta is more than 0 and less than 1, tau is the weight between the hidden layer and the output layer,

mean square error of expression pair E_iThe first derivative of the weight τ.

And judging whether all the data in the data set participate in the training process of the neural network. If so, outputting the weight among the nodes of each layer of the neural network, and finishing the training process; otherwise, the training process continues.

Through the learning process, a neural network with an error within a given threshold can be obtained. After determining the connection weights between the neuron nodes in the neural network, the network can be used for calculation according to new input and outputting results.

Substituting the current air quality factor monitoring value into the neural network model to obtain the air quality predicted value P_NN。

Step 2.2, calculating an air quality predicted value by a Decision Tree (DT) algorithm:

decision trees, Decision trees and DT can analyze the information hidden in the data and having important significance, the expression of the information is visual, users can easily understand the hidden information, and the method is widely applied to data mining and prediction.

The training Data set is TDS (training Data set), the number of samples is N, and the expression is TDS { (x)₁，y₁)，(x₂，y₂)…(x_N，y_N)}，x₁、x₂、…x_NIs a sample of the air quality factor, e.g. x₁Is in table；y₁、y₂、…y_NIs the corresponding air mass sample value.

The objective function of the decision tree in sample division is the sum of the squares of the minimum errors, namely:

wherein j represents j variable in each sample, the total number of sample variables is M, in the invention, M is 6, s represents dividing point s of j variable, R₁(j, s) denotes the left region of the division, R₂(j, s) denotes the right area of the division, c₁And c₂Marking region R₁(j, s) and R₂(j, s) of the optimal output value. Traversing each feature j of the sample, trying possible segmentation points s of each feature, selecting the least error square sum, and determining the optimal output value c₁And c₂. And obtaining an air quality prediction regression tree after training is finished.

Substituting the current air quality factor monitoring value into an air quality prediction regression tree to obtain an air quality prediction value P_DT。

Step 2.3, calculating an air quality prediction value by adopting a Support Vector Machine (SVM):

when the SVM deals with the non-linear problem, the support vector machine can convert the input data into a space with higher dimension through a specific function. Due to the increase of dimensionality, the problem of finding an optimal classification line for classifying samples in a low-dimensional space is converted into the problem of finding an optimal classification plane in a high-dimensional space, the SVM can obtain a global optimal solution, and the calculation accuracy can be guaranteed for the problem of small sample quantity.

Exist in a hyperplane

So that the samples can be correctly classified, X ═ X₁、x₂、…x_NIs the number N of samples that are,

is to map X to a high-dimensional feature space

Omega is a normal vector and determines the direction of the hyperplane, b is a displacement term and is used as an optimization variable, in order to enable samples to be accurately classified according to the hyperplane, a relaxation variable epsilon is introduced, and an objective function is

ε_i0, i 1, …, N is a relaxation variable, each sample x in the training sample set_iAll correspond to a relaxation variable epsilon_iTo characterize the sample as not satisfying the constraint

To the extent of (c).

The kernel function can replace inner product operation in high-dimensional space, and the kernel function does not need to know

The specific form of (2) can be used to obtain the inner product result. The kernel function can be selected from linear kernel function, polynomial kernel function, radial basis kernel function, Gaussian kernel function and sigmoid kernel function, and the invention selects Gaussian kernel function, i.e. the kernel function is selected

Sigma > 0, sigma is the Gaussian kernel bandwidth, alpha_iIs the lagrange factor. Calculating by using sample values, averaging all obtained b values, and finally obtaining a support vector classification prediction function after training is finished

Substituting the current air quality factor monitoring value into an air quality support vector classification prediction function to obtain an air quality prediction value P_SVM。

Step 2.4, weighting the predicted values obtained by at least two algorithms by using historical prediction errors, specifically:

taking historical data as input of three algorithms to obtain predicted values of the three algorithms, performing difference operation on the predicted values and corresponding historical air quality values, and then squaring to obtain an average error square value: e_NN，E_DT，E_SVM. Then calculating the weight value W of the three algorithms_NN，W_DT，W_SVM：

Will be provided with

Normalization processing is carried out to obtain W_NN，W_DT，W_SVMI.e. by

The final air quality prediction value calculation formula is as follows: p ═ W_NN·P_NN+W_DT·P_DT+W_SVM·P_SVM。

The step 3 specifically comprises the following steps:

according to the technical regulation (trial) of the environmental Air Quality Index (AQI) published by the nation in 2012 (HJ633-2012), the air quality is evaluated by using the AQI, and the standard of evaluating and grading the air quality according to the AQI value is given.

AQI index	Air quality rating	Air quality status
			0～50	First stage	Superior food
51～100	Second stage	Good wine
			101～150	Three-stage	Slight pollution
151～200	Four stages	Moderate pollution
			201～250	Five stages	Severe pollution
251～300	Six stages	Severe pollution

And (3) determining the air quality grade according to the air quality predicted value obtained in the step (2), and giving an alarm to prompt that the air quality factor exceeds the standard when the air quality grade is light pollution, moderate pollution or severe pollution. The alarm content can be displayed by selecting proper content and form according to the requirement.

The cloud computing technology not only provides mass data storage, but also has strong computing capacity to support real-time data analysis and mining, wherein MapReduce is a distributed programming model, data to be processed can be divided into a plurality of modules, a large number of computers in a network can respectively and simultaneously compute, and then results are collected to obtain a conclusion. The MapReduce realizes the parallel processing of data and has two steps: map and Reduce, Map maps a group of Key/Value to another group of Key/Value, Reduce is a protocol process, merge the Value with the same Key together, finally output a series of Key/Value pairs as the result.

For data to be processed, the data to be processed can be divided into a plurality of fragments, one fragment corresponds to one data block in a file system, a single Map task reads one fragment, a plurality of Map tasks are operated on a cluster for parallel processing, the Map tasks preferentially read local data, and network transmission overhead is reduced as much as possible.

The structure and principle of MapReduce are described by taking Hadoop MapReduce as an example, a Hadoop bottom file system is an HDFS, and Hadoop MapReduce reads data from the HDFS and outputs an operation result to the HDFS.

The MapReduce adopts a Master-Slave architecture and comprises a Master node Master and a plurality of Slave nodes Slave, wherein JobTracker is operated on the Master node for initializing, distributing and coordinating monitoring operation, and TaskTracker is operated on the Slave nodes, communicates with the TaskTracker and is used for executing Map tasks and Reduce tasks. Communication and task allocation are completed between the JobTracker and the TaskTracker through a heartbeat mechanism, the TaskTracker sends inquiry information to the JobTracker regularly, if a job needs to be executed, the task can be allocated to a job task, the task can be a Map task or a Reduce task, after the task is allocated, the TaskTracker stores a task code and configuration information in the local, a JVM is started to execute the task, the information of the task is reported to the TaskTracker in the running process, the TaskTracker sends the summarized information to the JobTracker, and the job is marked as successful after the task counter JobTracker confirms that the last task is finished running.

When the MapReduce operation is specifically carried out according to the three artificial intelligence algorithms adopted by the invention, the three algorithms which are executed in parallel can be respectively arranged on the slave nodes, and the master node is responsible for summarizing the operation results of the three algorithms. Of course, each algorithm may also be segmented according to the size of the actual calculation amount and arranged on the slave nodes, for example, the neural network algorithm, the decision tree algorithm, and the support vector machine algorithm are segmented and deployed according to the steps of the respective algorithms, respectively, and the master node summarizes the calculation results of the slave node part after receiving the calculation results.

In another aspect of the invention, an air quality monitoring and warning system based on cloud computing and machine learning is provided, which can implement the air quality monitoring and warning method based on cloud computing and machine learning.

In another aspect of the present invention, an air quality monitoring and warning device based on cloud computing and machine learning is provided, where the device includes a processor and a memory, and is capable of implementing the air quality monitoring and warning method based on cloud computing and machine learning.

In another aspect of the present invention, a storage medium is provided, on which a computer program is stored, wherein the computer program is capable of implementing the foregoing air quality monitoring and warning method based on cloud computing and machine learning.

Drawings

FIG. 1 is a diagram of a relationship between a sensor and a cloud computing platform.

Fig. 2 is a work data flow diagram of a cloud computing platform.

FIG. 3 is a plot of the air quality values versus actual AQI values obtained by the NN, DT, SVM algorithms.

Fig. 4 is a comparison graph of an air quality predicted value and an actual AQI value obtained according to weights of three algorithms after the three algorithms of NN, DT and SVM are combined.

Detailed Description

The present invention will be further described with reference to the following examples.

As shown in fig. 1, sampling values of the air quality factors obtained by the monitoring sensors are transmitted to a cloud computing platform; the cloud computing platform at least comprises a base support layer, an algorithm layer and an application visualization layer. The base support layer adopts a mature Hadoop platform, for example, a MapReduce framework is adopted to distribute parallel tasks for each cloud server, and the like. The algorithm layer adopts the air quality prediction method based on the machine learning algorithm, and specifically, three machine learning algorithms of a neural network NN, a decision tree DT and a support vector machine SVM are transversely combined. And (3) carrying out visual display on the obtained air quality predicted value and the determined air quality grade by using a visual layer, for example, giving an alarm when the air quality grade is light pollution, moderate pollution or severe pollution, and prompting that the air quality factor exceeds the standard, wherein the alarm content can be displayed by selecting proper content and form according to the requirement.

As shown in fig. 2, the cloud computing respectively carries out calculation on a current sampling value by three algorithms of a neural network NN, a decision tree DT and a support vector machine SVM, respectively calculates prediction error conditions of the three algorithms based on historical data conditions, obtains an air quality prediction value based on an NN-DT-SVM combination according to a weight value when the three algorithms are transversely combined according to the error conditions, compares the prediction value with an air quality standard value, and judges and outputs whether to give an alarm or not.

Data set: the data set adopted is from the air quality data of the city unsealing from 1 month in 2013 to 11 months in 2020, and the first six items are air quality factors as described in the following table 1: PM2.5, PM10, SO₂、CO、NO₂、O₃AQI is the air quality fraction, the last term corresponding to its quality class. The national Air Quality Index (AQI) technical regulation (trial) regulation for environment out of the counter is to use the Air Quality Index (AQI) to replace the original Air Pollution Index (API) AQI and divide the AQI into six grades, wherein the first grade is excellent, the second grade is excellent, the third grade is slightly polluted, the fourth grade is moderately polluted, the fifth grade is severely polluted and the sixth grade is severely polluted.

The program language adopts C + +, the operating system is windows 10, the data set is divided into a training set and a test set according to a proportion, three models are respectively used for calculation, 35 air quality samples from 1 month to 2020 and 11 months in 2018 are used as the test set, the rest are used as the training set, the obtained results are shown in the following table, and the figure 3 shows a comparison graph of the air quality values obtained by three algorithms of NN, DT and SVM and the actual AQI values:

the AQI list represents an actual AQI value of the sample, NN represents an AQI value obtained by adopting a random forest algorithm, DT represents an AQI value obtained by adopting a decision tree, SVM represents an AQI value obtained after adopting a support vector machine, E _ NN represents the square of a difference value between the AQI value obtained by adopting a random forest algorithm and the actual AQI value, E _ DT represents the square of a difference value between the AQI value obtained by adopting a decision tree algorithm and the actual AQI value, and E _ SVM represents the square of a difference value between the AQI value obtained by adopting a support vector machine algorithm.

According to the method provided by the invention, the average error square value of three algorithms of NN, DT and SVM is obtained by calculation: e_NN，E_DT，E_SVM. Then calculating the weights of the three algorithmsValue W_NN，W_DT，W_SVM：

Will be provided with

Normalization processing is carried out to obtain W_NN，W_DT，W_SVMI.e. by

The air quality prediction value calculation formula is as follows: p is 0.424971. P_NN+0.136312·P_DT+0.438717·P_SVM。

Figure 4 shows a comparison graph of the air quality predicted value and the actual AQI value obtained according to the weight of three algorithms after the NN algorithm, the DT algorithm and the SVM algorithm are combined.

While one embodiment of the present invention has been described in detail, the description is only a preferred embodiment of the present invention and should not be taken as limiting the scope of the invention. All equivalent changes and modifications made within the scope of the present invention shall fall within the scope of the present invention.

Claims

1. An air quality monitoring and alarming method based on cloud computing and machine learning is characterized by comprising the following steps:

step 1: transmitting the sampling values of the air quality factors obtained by each monitoring sensor into the cloud platform;

step 2: the cloud platform calculates a predicted value of the air quality factor; the specific calculation is to adoptThe neural network NN, the decision tree DT and the SVM algorithm respectively calculate the predicted value of the air quality, and weights are given to the three algorithms based on historical prediction conditions: w_NN，W_DT，W_SVMCalculating a final air quality predicted value;

and step 3: and (3) determining the air quality grade according to the air quality predicted value obtained in the step (2), and giving an alarm to prompt that the air quality factor exceeds the standard when the air quality grade is light pollution, moderate pollution or severe pollution.

2. The method of claim 1, wherein the neural network NN is used to calculate the air quality prediction by:

firstly, initializing a network, determining the number of hidden layers and the number of neurons in each layer, initializing the connection weight between the neurons in each layer of the neural network, determining input and target output, and determining the number of nodes in the hidden layers

n_inIs the number of nodes of the input layer, n_outOutputting the number of nodes of the layer, then inputting training data, determining a training data set and inputting each group of data in the data set into a neural network; calculating according to the neural network and the weight value to obtain the output of the network; calculating the error between the output obtained by the neural network and the target output, and if the error does not reach an acceptable threshold, reversely propagating the error information to correct the connection weight in the network;

the training sample set is: TDS { (x)₁,y₁),(x₂,y₂)…(x_N,y_N)}，x₁、x₂、…x_NIs an air quality factor sample, y₂、…y_NIs the air quality value of the corresponding air quality factor sample; the neural network is in (x)_i,y_i) Mean square error of_iIs shown as

Error threshold is set to μ(ii) a The weight update Δ τ between the hidden layer and the output layer is:

mean square error of expression pair E_iA first derivative of the weight τ;

judging whether all data in the data set participate in the training process of the neural network; if so, outputting the weight among the nodes of each layer of the neural network, and finishing the training process; otherwise, continuing to execute the training process;

obtaining a neural network with an error within a given threshold range through the learning process; after determining the connection weight among each neuron node in the neural network, calculating according to new input by using the network and outputting a result;

3. The method according to claim 1, wherein the process of calculating the air quality prediction value using the decision tree DT is:

the training Data set is TDS (training Data set), the number of samples is N, and the expression is TDS { (x)₁,y₁),(x₂,y₂)…(x_N,y_N)}，x₁、x₂、…x_NIs a sample of the air quality factor, e.g. x₁Is in a table; y is₁、y₂、…y_NAre sample values of the corresponding air quality factor.

where j denotes the jth variable in each sample,the total number of sample variables is M, s represents the dividing point s of the jth variable, R₁(j, s) denotes the left region of the division, R₂(j, s) denotes the right area of the division, c₁And c₂Marking region R₁(j, s) and R₂(j, s) optimal output value; traversing each feature j of the sample, trying possible segmentation points s of each feature, selecting the least error square sum, and determining the optimal output value c₁And c₂(ii) a Obtaining an air quality prediction regression tree after training is finished;

4. The method of claim 1, wherein the process of calculating the air quality prediction value using a Support Vector Machine (SVM) is:

the training sample set is: TDS { (x)₁,y₁),(x₂,y₂)…(x_N,y_N)}，x₁、x₂、…x_NIs an air quality factor sample, y₂、…y_NIs an air quality value corresponding to the air quality factor;

exist in a hyperplane

is to map X to a high-dimensional feature space

The degree of (d);

gaussian kernel functions replace inner product operations in high dimensional space,

σ is the Gaussian kernel bandwidth, α_iIs a Lagrange factor; calculating by using sample values, averaging all obtained b values, and finally obtaining a support vector classification prediction function after training is finished

5. The method of claim 1, further characterized in that neural networks NN, decision trees DT algorithms, support vector machines SVM algorithms are executed in parallel.

6. The method according to claim 1, wherein the three algorithms are weighted based on historical prediction, specifically:

taking historical data as input of three algorithms to obtain predicted values of the three algorithms, performing difference operation on the predicted values and corresponding historical air quality values, and then squaring to obtain an average error square value: e_NN，E_DT，E_SVM(ii) a Then calculating the weight value W of the three algorithms_NN，W_DT，W_sVM：

Will be provided with

Normalization processing is carried out to obtain W_NN，W_DT，W_SVMI.e. by

7. The method of claim 1, the air quality factor being: PM2.5, PM10, SO₂、CO、NO₂、O₃。

8. An air quality monitoring and warning system based on cloud computing and machine learning, which is capable of implementing the method of any one of the preceding claims 1-7.

9. An air quality monitoring and warning device based on cloud computing and machine learning, the device comprising a processor, a memory, which is capable of implementing the method of any of the preceding claims 1-7.

10. A storage medium having stored thereon a computer program enabling the method of any of the preceding claims 1-7.