CN113379148A

CN113379148A - Pollutant concentration inversion method based on fusion of multiple machine learning algorithms

Info

Publication number: CN113379148A
Application number: CN202110704245.2A
Authority: CN
Inventors: 胡俊涛; 陈一源; 方勇
Original assignee: Intelligent Manufacturing Institute of Hefei University Technology
Current assignee: Intelligent Manufacturing Institute of Hefei University Technology
Priority date: 2021-06-24
Filing date: 2021-06-24
Publication date: 2021-09-10

Abstract

The invention discloses a pollutant concentration inversion method based on fusion of various machine learning algorithms, which fuses three machine learning algorithms of CNN, SVM and XGboost, retains the advantages of the algorithms, CNN can extract representative characteristics, the SVM algorithm has the advantages of nonlinear mapping and small sample learning, and the XGboost algorithm adds a regularization item, so that overfitting can be avoided, and the efficiency of the algorithm and the precision of pollutant concentration inversion are improved. And the CNN part is used as an upper layer of the model structure, main characteristics of data are extracted and screened out through the convolution layer and the pooling layer, and the data are input into a lower layer of the model structure after being flattened through the full-connection layer. The SVM part and the XGboost part are used as the lower layer of the model structure, after the inversion results of the two-part algorithm are obtained, the fuzzy logic algorithm is adopted to carry out weight distribution, and the final result is obtained.

Description

Pollutant concentration inversion method based on fusion of multiple machine learning algorithms

Technical Field

The invention relates to the field of an environmental data inversion method based on a machine learning algorithm, in particular to a pollutant concentration inversion method based on fusion of various machine learning algorithms.

Background

Among the gas pollutants, the discharged sulfur dioxide can stimulate the respiratory tract of a human body, induce various respiratory diseases and simultaneously cause harm to vegetation and the like, and the discharged nitrogen oxide can be combined with other pollutants to generate photochemical smog pollution. The national index for evaluating the quality of ambient air is mainly based on the concentrations of six pollutants, namely ozone (O)₃) Nitrogen dioxide (NO)₂) Sulfur dioxide (SO)₂) Carbon monoxide (CO), fine particulate matter (PM2.5), respirable particulate matter (PM 10).

In recent years, the problem of air pollution has become more serious and has become a global problem. Air quality monitoring is an important means to deal with air pollution. The air pollution condition is monitored in real time by establishing a plurality of air monitoring sites in the country, the data accuracy is high, but the cost is high, the overall planning is carried out by government departments, and the deployment is sparse. Therefore, a large sensor network is usually constructed by using a miniature monitoring sensor device with lower cost, and intensive regional monitoring is realized. However, due to the influence of temperature and humidity, cross-talk, sensor aging and the like, the micro-sensor device reading may deviate from the standard concentration. To ensure the data quality of the sensors in the network, concentration inversion needs to be performed on the data of the micro sensors.

At present, the commonly used inversion algorithms include XGBoost, SVM, RNN, etc., which have the disadvantages of easy occurrence of over-fitting, dependence on large sample learning, feature redundancy, etc. in practical use. According to the method, three algorithms of CNN, XGboost and SVM are combined, so that the advantages of nonlinear mapping and small sample learning are achieved, overfitting can be avoided, the concentration inversion accuracy is improved, and meanwhile the calculation efficiency of the model is improved.

Disclosure of Invention

The invention aims to provide a pollutant concentration inversion method based on fusion of various machine learning algorithms, and aims to solve the problems that overfitting is easy to occur, a sample is large according to the support, the calculation efficiency is low, and the precision cannot meet the requirement in the prior art.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a pollutant concentration inversion method based on fusion of multiple machine learning algorithms comprises the following steps:

step 1, acquiring air pollutant data measured by an air micro-station, constructing a data set according to the air pollutant data, and preprocessing the data set;

the measured data in the air micro-station comprises concentration values, temperature, humidity, wind speed and wind direction and air pressure values of various air pollutants, and the data set is constructed by using the data measured in the air micro-station.

Step 2, constructing a convolutional neural network, and adjusting the convolutional neural network until the parameters of the convolutional neural network are optimal parameters;

step 3, inputting the data in the data set preprocessed in the step 1 into the convolutional neural network adjusted in the step 2, and extracting abstract features of the data by the convolutional neural network;

step 4, constructing an XGboost model, inputting the abstract features obtained in the step 3 into the XGboost model, training the XGboost model, calculating node loss of the XGboost model in the training process to select leaf nodes with the largest gain loss, obtaining the optimal parameters of the XGboost model through training, and outputting a concentration inversion result through the XGboost model when the optimal parameters exist;

step 5, constructing an SVM model, inputting the abstract features obtained in the step 3 into the SVM model, training the SVM model, and obtaining an optimal punishment coefficient C and a relaxation variable of the SVM model by using a grid search method in the training process, so that the optimal parameters of the SVM model are obtained through training, and a concentration inversion result is output through the SVM model when the optimal parameters are obtained;

and 6, carrying out weight distribution on the concentration inversion results output by the XGboost model in the step 3 and the SVM model in the step 4 through a fuzzy logic algorithm to obtain a final inversion result of the pollutant concentration.

Further, in step 1, the data in the data set is preprocessed by using a linear interpolation method, so as to complete missing values of the data in the data set.

Further, the convolution layer in the convolutional neural network constructed in the step 2 adopts a local connection mode, and the same convolution kernel is used for performing convolution operation on the target.

Further, in the fully connected layer of the convolutional neural network constructed in step 2, each neuron is connected with the neuron of the previous layer one by one.

Further, in step 3, the data in the data set preprocessed in step 1 is input into the convolutional neural network adjusted in step 2 after a continuous characteristic diagram is constructed by sliding a window according to time,

further, in step 4, the tree model adopted in the XGBoost model is a CART regression tree model, and the formula of the XGBoost model is as follows:

wherein: n is the number of trees; f. of_t() Is a function in the function space F;

for the inversion result, x_iI is an input ith abstract feature, and i is a natural number which is greater than or equal to 1; f is the set of all possible CARTs;

iteration of the XGboost model adopts an additive training mode to further minimize an objective function, and the iteration process is as follows:

wherein the content of the first and second substances,

for the inversion result at time t-0,

for the inversion result at time t ═ 1, f_t(x_i) To input the function value for the ith data,

is defined as the inversion result at time t,

is defined as the inversion result at time t-1, x_iIs the ith abstract feature of the input.

The XGboost model objective function is as follows:

wherein: where l () is a loss function,

to represent the difference between the inversion result and the true value,

is a regularization term, and T is the number of leaf nodes; omega_jIs the score of a leaf node; the purpose of gamma is to control the number of leaf nodes; λ ensures that the fraction of leaf nodes is not too large.

To find f that minimizes the objective function_t() The objective function is approximated as:

wherein h is_iAs a function of loss

The second derivative of (a) is,

is defined as the inversion result at time t-1, omega (f)_t) Is a regularization term, f_t(x_i) For inputting the i-th data as a function of value, y_iAt the current momentTrue value, x_iIs the ith abstract feature of the input.

Further, in step 4, the loss function values of each data of the approximation function of the objective function are added, and the process is as follows:

wherein, X_objIn order to be the objective function, the target function,

is the first derivative of the loss function,

for the second derivative of the loss function, Ω (f)_t) Is a regularization term, f_t(x_i) For inputting the i-th data as a function of value, y_iLambda is the true value of the current time, and T is the number of leaf nodes, omega_jIs the score of a leaf node, x_iIs the ith abstract feature of the input.

Rewriting the above formula as a quadratic function of a single element about leaf node fraction, and solving the obtained optimal

And objective function values are shown below:

wherein the content of the first and second substances,

g_iis the first derivative of the loss function, h_iFor the second derivative of the loss function, λ ensures that the fraction of the leaf nodes is not too large, and T is the number of leaf nodes.

Further, the estimation function of the support vector machine in the SVM model in step 5 is:

where ω is a normal vector, b is a constant,

is a mapping function.

The objective function is:

wherein: omega is a normal vector, b is a constant,

for the mapping function, ε is an insensitive loss function, y_iIs a true value, C is a penalty factor,

as a mapping function, f (x)_i) To estimate the function value, x_iIs the ith abstract feature of the input;

introducing a relaxation variable and a Lagrange function, and converting an objective function into:

wherein: alpha is alpha_i、α_jAnd

is the Lagrange coefficient, K (x)_i,x_j) Is a kernel function, C is a penalty coefficient, ε is an insensitive loss function, y_iIs the true value, max is the maximum value of the objective function;

solving for alpha_iObtaining a regression function formula:

wherein: alpha is alpha_iAnd

is the Lagrange coefficient, K (x)_iX) is a kernel function, b is a constant, x_iIs the ith abstract feature of the input.

Further, in step 6, weight distribution is performed on the concentration inversion results output by the XGBoost model and the SVM model through a fuzzy logic algorithm, and the final inversion result expression is as follows:

wherein omega_1jWeight of XGboost model, Y_jXGB(j) As an inversion result of the XGboost model, ω_2jAs the weight of the SVM model, ω_2jAnd Y is the inversion result of the SVM model, and the final inversion result is obtained.

Let K_1j＝|Y_jXGB-Y_j-1|,K_2j＝|Y_jSVM-Y_j-1L wherein Y_jXGB(j) Is the inversion result of the XGboost model at the current moment, Y_jSVMIs the inversion result of the SVM model at the current moment, Y_j-1The inversion result of the pollutant concentration at the last moment is obtained.

ω_1j、ω_2jIs determined by the following functional formula:

ω_2j＝1-ω_1j，

wherein: k_1j＝|Y_jXGB-Y_j-1|,K_2j＝|Y_jSVM-Y_j-1|,x_jIs the input pumping direction characteristic.

And obtaining a final pollutant concentration inversion result after weight distribution.

In the invention, the convolutional neural network extracts and screens out the main characteristics of input data, then an SVM model and an XGboost model are used for performing inversion on the concentration value of the measured pollutant, finally, weight distribution is performed through a fuzzy logic algorithm, and the results of the two models are fused. The method retains the advantages of the three algorithms, and further improves the inversion accuracy of the pollutant concentration on the basis of improving the efficiency of the algorithms.

Compared with the prior art, the invention has the advantages that:

the method disclosed by the invention integrates three machine learning algorithms of CNN, SVM and XGboost, the advantages of each algorithm are reserved, CNN can extract representative characteristics, the SVM algorithm has the advantages of nonlinear mapping and small sample learning, and the XGboost algorithm adds a regularization item, so that overfitting can be avoided, and the efficiency of the algorithm and the inversion accuracy of the concentration of pollutants are improved. And the CNN part is used as an upper layer of the model structure, main characteristics of data are extracted and screened out through the convolution layer and the pooling layer, and the data are input into a lower layer of the model structure after being flattened through the full-connection layer. The SVM part and the XGboost part are used as the lower layer of the model structure, after the inversion results of the two-part algorithm are obtained, the fuzzy logic algorithm is adopted to carry out weight distribution, and the final result is obtained. The CNN can extract representative features, the SVM algorithm has the advantages of nonlinear mapping and small sample learning, and the XGboost algorithm adds a regularization term to avoid overfitting and improve algorithm efficiency.

Drawings

FIG. 1 is a block diagram of the process of the present invention.

FIG. 2 is a flowchart of feature extraction for convolutional neural networks according to the present invention.

Detailed Description

The invention is further illustrated with reference to the following figures and examples.

As shown in fig. 1, the method for inverting the pollutant concentration based on the fusion of multiple machine learning algorithms of the present invention includes the following steps:

step 1: and acquiring air pollutant data measured by the air micro-station, constructing a data set by the air pollutant data, and preprocessing the data set.

In the information collected by the air micro-station, the data set is 7650 data with gas pollutant NO₂Concentration inversion is taken as an example, in step 1, a data set is preprocessed, and missing values of the data are completed through a linear interpolation method.

Step 2: and constructing a Convolutional Neural Network (CNN), and adjusting the convolutional neural network until the parameters of the convolutional neural network are optimal parameters.

As shown in fig. 2, the convolutional neural network of the present invention is mainly composed of an input layer, a convolutional layer, an activation function layer, a pooling layer, and a full link layer. The convolutional and pooling layers are data processing layers that function to filter input data and extract useful information. The activation layer causes the output characteristics to have a non-linear mapping. And the pooling layer screens the features, extracts the most representative features and reduces the dimensionality of the features. And the fully-connected layer collects the learned features and outputs the mapping features.

In the convolution operation process of the convolution layer of the convolution neural network, a local connection mode is adopted, namely, the convolution operation is carried out on a target by using the same convolution kernel, the risk of model overfitting is reduced, the memory required by program operation can be reduced, and the convolution operation process is as follows:

wherein, y_lFor the output after l layers of convolution operation, g () is the activation function,

is the input of the mth partial convolution region of the ith layer,

the weight of the mth part of the ith layer is convolution operation,

is the bias term for the l-th layer. The convolution layer performs local convolution operation by sliding a convolution kernel on input data, and the features obtained by the convolution operation are processed by an activation function, so that final features are obtained. The convolution kernel is a weight matrix, which may also be referred to as a filter, and each parameter in the matrix is obtained by training the CNN.

Parameters needing training are not arranged in the convolutional neural network pooling layer, and the pooling type, the kernel size of the pooling operation and the moving step length are specified, wherein the pooling operation process is as follows:

wherein:

for the pooling result of the mth array at the l-th layer,

for the p-th value in the m-th array region of the l-th layer, h () is a pooling function.

Each neuron in the convolutional neural network full-connection layer is connected with the neuron of the previous layer one by one, and the calculation formula of the full-connection layer is as follows:

wherein: d_lFor the output of the l-th fully-connected layer, l () is the activation function,

for the input of the layer l data,

is a weight coefficient of the l-th layer,

is a bias parameter. And the fully-connected layer collects the learned features and maps the features into two-dimensional features for output.

In step 2, after the convolutional neural network is constructed, network parameters are initialized, and the optimal parameters of the convolutional neural network are finally determined through multiple experimental adjustments. The number of convolution kernels is set to 10, the size is 1 × 1, the size of the pooling layer is set to 1, to prevent overfitting, Dropout is introduced in the fully-connected layer, the parameter is set to 0.1, the learning rate is set to 0.001, the batch _ size is 64, and the activation function is ReLU.

And step 3: and (3) inputting the data in the data set preprocessed in the step (1) into the convolutional neural network adjusted in the step (2), and extracting abstract features of the data by the convolutional neural network.

In step 3, a continuous characteristic diagram is constructed by the collected data according to a time sliding window and is used as the input of a convolutional neural network, and the abstract characteristics in the data are extracted by using CNN. Because the intervals are continuous, when the intervals change, the search space can be pruned through the old calculation result, so that the repeated calculation is reduced, the time complexity is reduced, and the convolutional neural network feature extraction flow chart is shown in fig. 2.

And 4, step 4: and (3) constructing an XGboost model, inputting the abstract features obtained in the step (3) into the XGboost model, training the XGboost model, calculating the node loss of the XGboost model in the training process to select leaf nodes with the largest gain loss, obtaining the optimal parameters of the XGboost model through training, and outputting a concentration inversion result through the XGboost model when the optimal parameters exist.

In step 4, the tree model adopted in the XGboost model is a CART regression tree model, and the formula of the XGboost model is as follows:

for the inversion result, x_iThe method comprises the steps of inputting an ith abstract feature, wherein i is a natural number which is greater than or equal to 1, and F is all possible CART sets;

wherein:

for the inversion result at time t-0,

is defined as the inversion result at time t,

The XGboost model objective function is as follows:

wherein: l () is a loss function which is,

to represent the difference between the inversion result and the true value,

is a regularization term, and T is the number of leaf nodes; omega_jIs the score of a leaf node; the purpose of gamma is to control the number of leaf nodes; λ ensures that the fraction of leaf nodes is not too large. To find f that minimizes the objective function_t() The objective function is approximated as:

wherein: h is_iAs a function of loss

The second derivative of (a) is,

is defined as the inversion result at time t-1, omega (f)_t) Is a regularization term, f_t(x_i) For inputting the i-th data as a function of value, y_iIs the true value, x, of the current time_iIs the ith abstract feature of the input.

The loss function values for each datum of the approximation function of the objective function are added up as follows:

wherein: x_objIn order to be the objective function, the target function,

is the first derivative of the loss function,

for the second derivative of the loss function, Ω (f)_t) Is a regularization term, f_t(x_i) For inputting the i-th data as a function of value, y_iLambda is the true value of the current time, and T is the number of leaf nodes, omega_jIs the score of a leaf node;

And objective function values are shown below:

wherein:

In step 4 of the invention, 3 parameters need to be determined during XGboost model prediction: general parameters, auxiliary parameters, and task parameters. The type of the ascending model in the ascending process is determined by general parameters, and a tree or linear model is often adopted; the auxiliary parameters are determined by the selected ascent model; the task parameters specify a learning task and a corresponding learning objective. Firstly, performing parameter initialization on the XGboost model, wherein the initialization values are respectively shown in Table 1:

TABLE 1 XGboost model parameter initialization

Parameter name	Initialization value
		Number of iterations	500
Leaf minimum weight	0.8
		Sampling ratio	0.8
Learning rate	0.05

The maximum height of the changed trees compared the error of the test data, and the results are shown in table 2:

TABLE 2 MAPE of different Tree heights

Maximum height of tree	MAPE
		1	0.912
3	0.957
		5	0.803
7	0.132

As can be seen from Table 2, the error in the test data is minimal for a maximum tree height of 5. After the maximum height of the tree is determined, the combination range of other parameters is given, and the optimal combination of other parameters is obtained by a search traversal method. Wherein the learning rate range is set to 0.01-0.1; the range of the iteration times is set to 100-1000; the random sampling ratio range is set to 0.1-0.9. Through search traversal, the optimal parameter setting of the XGBoost model tree used in the present invention is finally determined, as shown in table 3:

TABLE 3 optimal parameter settings for XGboost model trees

Parameter name	Parameter setting
		Number of iterations	300
Leaf minimum weight	0.7
		Sampling ratio	0.3
Learning rate	0.01
		Selective lifter	gbtree
Task function	Gramma

And 5: and (3) constructing an SVM model, inputting the abstract features obtained in the step (3) into the SVM model, training the SVM model, obtaining the optimal punishment coefficient and the relaxation variable of the SVM model by using a grid search method in the training process, obtaining the optimal parameters of the SVM model through training, and outputting a concentration inversion result through the SVM model when the optimal parameters exist.

The estimation function of the support vector machine in the SVM model in step 5 is as follows:

where ω is a normal vector, b is a constant,

is a mapping function.

The objective function is:

wherein: omega is a normal vector, b is a constant,

wherein alpha is_i、α_jAnd

solving for alpha_iObtaining a regression function formula:

wherein: alpha is alpha_iAnd

In step 5 of the method, in order to better express the relation between the sensor data characteristics, a radial basis function with strong nonlinear mapping capability is adopted as a kernel function of an SVM model, two hyper-parameters need to be optimized in the training process, a relaxation variable and a penalty coefficient are mainly adopted, the introduction of the relaxation variable increases the fault tolerance of the model, the penalty coefficient represents the importance degree of the model on the loss of an outlier sample, and the optimal values of the relaxation variable and the penalty coefficient (C) are determined to be 0.0136 and 300 respectively through a grid search method.

Step 6: and (3) carrying out weight distribution on the concentration inversion results output by the XGboost model in the step (3) and the SVM model in the step (4) through a fuzzy logic algorithm to obtain the final inversion result of the pollutant concentration as follows:

wherein, ω is_1jWeight of XGboost model, Y_jXGB(j) As an inversion result of the XGboost model, ω_2jAs the weight of the SVM model, ω_2jAnd Y is the inversion result of the SVM model, and the final inversion result is obtained.

ω_1j、ω_2jIs determined by the following functional formula:

ω_2j＝1-ω_1j，

wherein, K_1j＝|Y_jXGB-Y_j-1|,K_2j＝|Y_jSVM-Y_j-1|,x_jIs the input pumping direction characteristic.

The evaluation indexes of the inversion method are MAE, RMSE and R²The calculation formulas are respectively as follows:

wherein the content of the first and second substances,

to test the actual values of the set,

m is determined by the size of the test set as an inversion result of the inversion method of the invention.

Wherein the content of the first and second substances,

to test the actual values of the set,

Wherein, y⁽ⁱ⁾To test the actual values of the set,

for the inversion result of the inversion method of the present invention,

i is a natural number greater than or equal to 1 as an average of the true values.

The concentration inversion result pair ratio of the algorithm of the invention different from the prior art is shown in table 4:

table 4 compares the results of concentration inversion with different algorithms

Model (model)	MAE	RMSE	R²
				SVM	1.348	1.285	0.536
XGBoost	1.236	1.197	0.665
				CNN+SVM	1.014	1.001	0.617
CNN+XGBoost	0.986	0.954	0.746
				CNN+XGBoost+SVM	0.318	0.4495	0.932

From table 4, the precision of the pollutant concentration inversion algorithm provided by the invention is superior to that of other methods, the method integrates three machine learning algorithms of CNN, SVM and XGBoost, the advantages of each algorithm are retained, CNN can extract representative features, the SVM algorithm has the advantages of nonlinear mapping and small sample learning, and the XGBoost algorithm adds a regularization item, so that overfitting can be avoided, and the efficiency of the algorithm and the precision of pollutant concentration inversion are improved. The CNN can extract representative features, the SVM algorithm has the advantages of nonlinear mapping and small sample learning, and the XGboost algorithm adds a regularization term to avoid overfitting and improve the accuracy of the algorithm.

The present invention has been described in connection with the accompanying drawings, and it is to be understood that the invention is not limited to the specific embodiments disclosed, but is intended to cover various modifications, changes and equivalents, and may be made within the scope of the present invention without departing from the spirit and scope of the invention.

Claims

1. The pollutant concentration inversion method based on fusion of various machine learning algorithms is characterized by comprising the following steps of:

2. The pollutant concentration inversion method based on fusion of multiple machine learning algorithms according to claim 1, characterized in that in step 1, linear interpolation is adopted to preprocess the data in the data set so as to fill up missing values of the data in the data set.

3. The pollutant concentration inversion method based on the fusion of multiple machine learning algorithms according to claim 1, characterized in that convolution layers in the convolution neural network constructed in the step 2 adopt a local connection mode, and the same convolution kernel is used for performing convolution operation on a target.

4. The pollutant concentration inversion method based on fusion of multiple machine learning algorithms according to claim 1, characterized in that in the fully connected layer of the convolutional neural network constructed in step 2, each neuron is connected with the neuron in the previous layer one by one.

5. The pollutant concentration inversion method based on the fusion of multiple machine learning algorithms according to claim 1, characterized in that in step 3, the data in the data set preprocessed in step 1 are input into the convolutional neural network adjusted in step 2 after a continuous feature map is constructed according to a time sliding window.

6. The pollutant concentration inversion method based on the fusion of multiple machine learning algorithms according to claim 1, characterized in that in step 4, a tree model adopted in the XGBoost model is a CART regression tree model, and the formula of the XGBoost model is as follows:

wherein:

for the inversion result at time t-0,

is defined as the inversion result at time t,

is defined as the inversion result at time t-1, i is a natural number greater than or equal to 1, x_iIs the ith abstract feature of the input;

the XGboost model objective function is as follows:

wherein: l () is a loss function which is,

to represent the difference between the inversion result and the true value,

is a regularization term, and T is the number of leaf nodes; omega_jIs the score of a leaf node; the purpose of gamma is to control the number of leaf nodes; lambda ensures that the leaf node score is not too large;

to find f that minimizes the objective function_t() Approximating the objective function as：

Wherein: h is_iAs a function of loss

The second derivative of (a) is,

is defined as the inversion result at time t-1, omega (f)_t) Is a regularization term, x_iFor the i-th abstract feature of the input, f_t(x_i) For inputting the i-th data as a function of value, y_iIs the true value of the current time.

7. The method for inverting pollutant concentration based on fusion of multiple machine learning algorithms according to claim 6, characterized in that in step 4, the loss function value of each data of the approximation function of the objective function is added up, and the process is as follows:

wherein: x_objIn order to be the objective function, the target function,

is the first derivative of the loss function,

for the second derivative of the loss function, Ω (f)_t) Is a regularization term, f_t(x_i) For inputting the value of the function for the ith data, x_iIs the ith of the inputAbstract feature, y_iLambda is the true value of the current time, and T is the number of leaf nodes, omega_jIs the score of a leaf node;

And objective function values are shown below:

wherein:

8. The pollutant concentration inversion method based on fusion of multiple machine learning algorithms according to claim 1, characterized in that the estimation function of the support vector machine in the SVM model in step 5 is as follows:

wherein: omega is a normal vector, b is a constant,

is a mapping function;

the objective function is:

wherein: omega is a normal vector, b is a constant,

as a mapping function, f (x)_i) To estimate the function value, x_iIs an abstract feature;

wherein: alpha is alpha_i、α_jAnd

solving for alpha_iObtaining a regression function formula:

wherein: alpha is alpha_iAnd

is the Lagrange coefficient, K (x)_iX) is a kernel function, b is a constant, x_iAre abstract features.

9. The pollutant concentration inversion method based on the fusion of various machine learning algorithms according to claim 1, characterized in that in step 6, weight distribution is performed on concentration inversion results output by the XGboost model and the SVM model through a fuzzy logic algorithm, and the final inversion result expression is as follows:

wherein, ω is_1jWeight of XGboost model, Y_jXGB(j) As an inversion result of the XGboost model, ω_2jAs the weight of the SVM model, ω_2jThe inversion result of the SVM model is obtained, and Y is the final inversion result;

wherein, the weight of each model inversion result is omega_1j、ω_2j；

Let K_1j＝|Y_jXGB-Y_j-1|,K_2j＝|Y_jSVM-Y_j-1L wherein Y_jXGB(j) Is the inversion result of the XGboost model at the current moment, Y_jSVMIs the inversion result of the SVM model at the current moment, Y_j-1The inversion result of the pollutant concentration at the last moment is obtained;

ω_1j、ω_2jis determined by the following functional formula:

ω_2j＝1-ω_1j，

wherein: k_1j＝|Y_jXGB-Y_j-1|,K_2j＝|Y_jSVM-Y_j-1|,x_jFor the purpose of the input of the pump-direction feature,