CN117077819A

CN117077819A - Water quality prediction method

Info

Publication number: CN117077819A
Application number: CN202311106497.0A
Authority: CN
Inventors: 陈爱华; 郑金洪; 黄健萌; 占沛远; 范贵源; 张传琦; 何惺
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2023-08-30
Filing date: 2023-08-30
Publication date: 2023-11-17

Abstract

The invention provides a water quality prediction method, which utilizes a regression type SVR algorithm to predict dissolved oxygen in water, optimizes parameters C and g in the SVR by an artificial immune algorithm AIA to reduce subjective influence of human factors and improve universality and performance of the SVR, carries out correlation calculation on the output of a SVR model and various water quality parameters, and selects the water quality parameters with higher correlation coefficients as the input of the model so as to improve the accuracy of the algorithm; the predicted value of the invention is closer to the true value, and the performance is more excellent. The improved algorithm can be used for early prediction of dissolved oxygen.

Description

Water quality prediction method

Technical Field

The invention relates to the technical field of water quality detection, in particular to a water quality prediction method.

Background

Lake water quality affects the water safety of its surrounding organisms including human beings, and in order to make precautionary measures in advance, prediction of water quality is required. Traditional predictive algorithms have difficulty forming efficient nonlinear systems due to the complexity of the water quality system.

At present, the existing technology comprises the steps of forecasting water quality by using a gray neural network, and correcting error residual values through Markov, wherein the numerical value can be corrected by the method so as to be relatively easy to approach to a real numerical value; the grey neural network and the artificial neural network are combined, and the algorithm is used for predicting the water quality; the time sequence is optimized through a subdivision extrapolation limit method and a multi-reference weighted fuzzy prediction method, and the detection result shows that the time sequence prediction designed by the subdivision extrapolation limit method can obtain a good detection conclusion.

Disclosure of Invention

The invention provides a water quality prediction method, wherein the predicted value is closer to the true value, and the performance is more excellent. The improved algorithm can be used for early prediction of dissolved oxygen.

The invention adopts the following technical scheme.

A water quality prediction method utilizes a regression type SVR algorithm to predict dissolved oxygen in water, optimizes parameters C and g in the SVR by an artificial immune algorithm AIA to reduce subjective influence of human factors and improve universality and performance of the SVR, and carries out correlation calculation on output of a SVR model and various water quality parameters to select water quality parameters with higher correlation coefficients as input of the model so as to improve accuracy of the algorithm.

The method comprises the following steps;

step S1, selecting water quality data with higher correlation coefficient with dissolved oxygen as an input node of an algorithm, wherein the water quality data comprises water temperature, conductivity, total phosphorus and chemical oxygen demand, and the dissolved oxygen is an output node of the algorithm; normalizing the water quality historical data to obtain a test set and a training set;

s2, constructing an SVR water quality prediction model, and taking an antibody generated by an artificial immune algorithm as a parameter c and a parameter g in a regression type support vector machine SVR model;

s3, bringing the data of the training set obtained in the step S1 into a model, and comparing and analyzing the prediction accuracy of the SVR model on the dissolved oxygen under the action of different parameters c and parameters g;

s4, taking the prediction accuracy generated by SVR as an affinity function of an artificial immunity algorithm, and keeping parameters with high propagation probability as memory cells;

s5, in order to avoid the algorithm to fall into a local optimal solution, carrying out random variation on the antibody with low affinity in the memory cells, and finally forming a new parent group;

s6, re-screening the parent group newly generated by the artificial immune algorithm by applying the step c until iteration is finished;

and S7, obtaining the parameter c and the parameter g which are the optimal values after iteration is finished, namely an optimal algorithm model, and taking the data of the test set into the model to obtain the predicted value of the dissolved oxygen.

The correlation coefficient described in step S1 is an introduced correlation coefficient CC, and is used for selecting appropriate water quality data as an input node, where the correlation coefficient CC is used to display the closeness of the relationship between two variables, especially the trend of these variables;

the correlation coefficient CC is defined as:

wherein X, Y is the water quality data and dissolved oxygen data to be compared, cov (X, Y) is the covariance between the two data, σ _x Sum sigma _y Is the variance of the two data; correlation coefficient |CC|<0.4 is weakly correlated, 0.4<|CC|<0.7 is medium intensity related, |CC|>0.7 is a strong correlation.

The water quality prediction model of the regression-type support vector machine in the step S2 specifically comprises:

assuming that a group of training samples L (x, y) exist, wherein x represents input data of the training samples, namely other water quality data, and y represents output data corresponding to the training samples, namely dissolved oxygen data; in order to determine the corresponding relation between the two, a linear regression function is established in a high-dimensional feature space:

f (x) =wΦ (x) +b formula two;

where phi (x) is a nonlinear mapping function. To solve for w and b, a relaxation variable ζ is introduced here _i ，The mathematical expression is:

the constraint conditions are as follows:

to solve equation four, the larginge function is also introduced and converted to the dual form:

the constraint conditions are as follows:

wherein K (x) _i ,z _i ) Is a kernel function.

The SVR model under the action of the different parameters c and g in the step S3 is specifically expressed as follows: c is a penalty factor, the requirement of the whole SVR model function on errors is determined, and as the numerical value of c is increased, the requirement of the function on error values is stricter, so that real data are easily missed excessively; as the value of c is reduced, the more relaxed the function has to the error value, the more likely the function screening effect is invalid;

kernel function K (x _i ,z _i ) Adopting RBF, wherein the Gaussian kernel function RBF reduces the weight of data points far away from the plane, so that the RBF can process high-low frequency data faster than other kernel functions, and the RBF kernel function can find a proper plane by using a help regression type support vector machine faster than other kernel functions; the parameter g of the RBF influences generalization performance by influencing the action range of the Gaussian function, the action range of the Gaussian function is too small due to the fact that the value of the parameter g is too large, so that some other data are not classified, the effect of data classification is reduced due to the fact that the Gaussian function acts on too much data due to the fact that the value of the parameter g is too small, good training effect cannot be obtained on a training set, and the prediction result of the testing set is deteriorated.

The specific steps of the propagation probability calculation in the step S4 are as follows:

and step A1, analyzing the problem. Taking ideal predicted values as antigens and taking parameters C and g as antibodies; the difference between the predicted value and the true value generated by the SVR is used as an affinity function;

step A2, generating an initial antibody group; randomly generating an initial antibody population;

step A3, evaluating the antibody group; two criteria are used for evaluating antibody populations by artificial immune algorithms; firstly, the affinity between the antibody and the antigen, namely the affinity function in the step A1, and secondly, the concentration between the antibody and the antibody; the concentration expression is:

wherein N is the total number of antibodies, S _v,s Is the similarity between antibodies. The similarity expression is:

wherein k is _v,s The number of bits of the antibody v is the same as that of the antibody s, and L is the length of the antibody;

then calculating the reproduction probability by using the affinity between the antibody and the antigen and the concentration of the antibody, wherein the probability of being selected to a memory bank and a parent group is higher as the reproduction probability is higher; the propagation probability expression is as follows:

wherein alpha is a constant, A _v As the affinity function, it is known from the above equation that the higher the affinity, the higher the propagation probability, the higher the individual concentration, and the lower the propagation probability.

In step S5, the generation of the new parent group specifically includes the steps of:

step B1, generating a memory bank and a new antibody group; the antibody groups with highest similarity are reserved as a memory bank according to the arrangement of the similarity from high to low; arranging from top to bottom according to the propagation probability, and taking the first N individuals to form a new antibody group;

step B2, cross mutation; based on the antibody population produced in step B1, cross mutation was performed for each antibody to obtain a new antibody population.

Step B3, generating a new generation of parent group; combining the new antibody group obtained in the step B2 with the memory bank obtained in the step B1 to jointly form a new generation parent group.

The method is used for predicting the water quality change of the lake.

The invention belongs to a water quality prediction method based on a regression type support vector machine and an artificial immunity algorithm, which comprises the following steps: firstly, calculating the correlation between the data to be predicted and other various water quality data, and then taking the water quality data with high correlation coefficient as the input data of an optimization algorithm. Because the regression type support vector machine is greatly influenced by the parameters C and the parameters g, the variance of SVR output is used as the adaptability of an artificial immunity algorithm, the parameters C and the parameters g are optimized by utilizing the excellent optimizing capability of the artificial immunity algorithm, the best parameters C and parameters g are found, and the SVR model frame is built again, so that the regression type support vector machine outputs the optimal predicted value.

According to the invention, the parameter C and the parameter g in the Artificial Immune Algorithm (AIA) optimized regression type support vector machine (SVR) are used for predicting the dissolved oxygen in water, so that the subjective influence of human factors can be reduced, and the universality and the performance of the support vector machine are improved. Meanwhile, in order to improve algorithm accuracy, the output of the model and various water quality parameters are subjected to correlation calculation, the water quality parameters with high correlation coefficients are selected as the input of the model, finally, the prediction result is compared with other algorithm models, and the experimental result shows that the prediction value of the new model is smaller than the SVR and GRNN models in variance and maximum error value, the prediction value is closer to a true value, and the performance is more excellent. The improved algorithm can be used for early prediction of dissolved oxygen.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the invention provides a novel water quality prediction method which comprises the following steps: and the correlation coefficient is used for selecting water quality data as an input node of an algorithm, so that the problem that the predicted data effect is not ideal due to the error of selecting the input node when predicting different water quality data is avoided. And improving the parameters C and g of the regression type support vector machine through an artificial immune algorithm. Because the artificial immune algorithm not only has excellent optimizing capability, but also introduces the concept of propagation probability, the diversity of the antibody is ensured, and the algorithm is prevented from entering a local optimal solution. The method can rapidly predict the future water quality change of the lake and avoid the deterioration of the water body.

Drawings

The invention is described in further detail below with reference to the attached drawings and detailed description:

FIG. 1 is a schematic flow chart of the present invention;

FIG. 2 is a schematic diagram comparing the predicted results of the present invention with other algorithms.

Detailed Description

As shown in the figure, the method predicts the dissolved oxygen in water by using a regression type SVR algorithm, optimizes the parameters C and g in the SVR by using an artificial immune algorithm AIA to reduce the subjective influence of human factors and improve the universality and performance of the SVR, and carries out correlation calculation on the output of a SVR model and various water quality parameters to select the water quality parameters with higher correlation coefficients as the input of the model so as to improve the accuracy of the algorithm.

The method comprises the following steps;

the correlation coefficient CC is defined as:

f (x) =wΦ (x) +b formula two;

where phi (x) is a nonlinear mapping function. To solve for w and b, a relaxation variable ζ is introduced here _i ，ξ _i ^* The mathematical expression is:

the constraint conditions are as follows:

wherein K (x) _i ,z _i ) Is a kernel function.

The method is used for predicting the water quality change of the lake.

Examples:

the rationality of the example verification algorithm is utilized as follows:

the example data is derived from 108 groups of data in total from 1 month in 2007 to 12 months in 2015 of a Taihu lake station No. 0 observation station (120 DEG 22 217'E,31 DEG 53 983' N) selected herein, and the data sampling frequency is once in mid-month. The data of water temperature, conductivity, total nitrogen, total phosphorus, transparency, water depth, PH chemical oxygen demand and ammoniacal nitrogen are selected for calculation and analysis with the dissolved oxygen as output data, and the final results are shown in the following table:

TABLE 1 correlation coefficient of dissolved oxygen with individual water quality data

Generally, the correlation coefficient |cc| <0.4 is a weak correlation, 0.4< |cc| <0.7 is a medium-intensity correlation, and |cc| > 0.7 is a strong correlation. The water temperature, conductivity, total phosphorus, chemical oxygen demand are selected as the input nodes of the algorithm.

As shown in fig. 2, the predicted value of the modified algorithm is closer to the true value and the fluctuations are smaller than the original algorithm. Because the naked eyes cannot describe the number of the specific improvement, the variance and the maximum error value of the predicted value and the true value are used for comparison in the section, and the superiority and inferiority of the two algorithms are analyzed.

TABLE 2 variance to maximum error value comparison

As shown in the table above, the variance of the predicted and actual values of the AIA-SVR model was 0.19153, the maximum error value was 0.76558mg/L, the variance of the predicted and actual values of the SVR model was 0.6248, and the maximum error was 1.3952mg/L. The variance of the predicted value and the true value of the GRNN model is 0.39799, and the maximum error value is 1.19mg/L. As can be seen by comparison, the predicted value of the AIA-SVR model is smaller than the SVR and GRNN models in variance and maximum error value, the predicted value is closer to the true value, and the performance is more excellent.

Claims

1. A water quality prediction method is characterized in that: according to the method, dissolved oxygen in water is predicted by using a regression type SVR algorithm, parameters C and g in the SVR are optimized by using an artificial immune algorithm AIA to reduce subjective influence of human factors and improve universality and performance of the SVR.

2. A water quality prediction method according to claim 1, characterized in that: the method comprises the following steps;

3. A water quality prediction method according to claim 2, characterized in that: the correlation coefficient described in step S1 is an introduced correlation coefficient CC, and is used for selecting appropriate water quality data as an input node, where the correlation coefficient CC is used to display the closeness of the relationship between two variables, especially the trend of these variables;

the correlation coefficient CC is defined as:

4. A water quality prediction method according to claim 2, characterized in that: the water quality prediction model of the regression-type support vector machine in the step S2 specifically comprises:

f (x) =wΦ (x) +b formula two;

wherein phi is(x) Is a nonlinear mapping function. To solve for w and b, a relaxation variable ζ is introduced here _i ，ξ _i ^* The mathematical expression is:

the constraint conditions are as follows:

wherein K (x) _i ,z _i ) Is a kernel function.

5. The method for predicting water quality as claimed in claim 4, wherein: the SVR model under the action of the different parameters c and g in the step S3 is specifically expressed as follows: c is a penalty factor, the requirement of the whole SVR model function on errors is determined, and as the numerical value of c is increased, the requirement of the function on error values is stricter, so that real data are easily missed excessively; as the value of c is reduced, the more relaxed the function has to the error value, the more likely the function screening effect is invalid;

kernel function K (x _i ,z _i ) The RBF is adopted, wherein the Gaussian kernel function RBF reduces the weight of data points far from a plane, so that the RBF can process high-frequency data and low-frequency data faster than other kernel functions, and the RBF kernel function phases can be realizedFinding a proper plane by using a help regression type support vector machine faster than other kernel functions; the parameter g of the RBF influences generalization performance by influencing the action range of the Gaussian function, the action range of the Gaussian function is too small due to the fact that the value of the parameter g is too large, so that some other data are not classified, the effect of data classification is reduced due to the fact that the Gaussian function acts on too much data due to the fact that the value of the parameter g is too small, good training effect cannot be obtained on a training set, and the prediction result of the testing set is deteriorated.

6. The method for predicting water quality as claimed in claim 4, wherein: the specific steps of the propagation probability calculation in the step S4 are as follows:

7. A water quality prediction method according to claim 2, characterized in that: in step S5, the generation of the new parent group specifically includes the steps of:

8. A water quality prediction method according to claim 2, characterized in that: the method is used for predicting the water quality change of the lake.