CN117077819A - Water quality prediction method - Google Patents
Water quality prediction method Download PDFInfo
- Publication number
- CN117077819A CN117077819A CN202311106497.0A CN202311106497A CN117077819A CN 117077819 A CN117077819 A CN 117077819A CN 202311106497 A CN202311106497 A CN 202311106497A CN 117077819 A CN117077819 A CN 117077819A
- Authority
- CN
- China
- Prior art keywords
- water quality
- antibody
- data
- function
- svr
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 title claims abstract description 76
- 238000000034 method Methods 0.000 title claims abstract description 32
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 claims abstract description 33
- 229910052760 oxygen Inorganic materials 0.000 claims abstract description 33
- 239000001301 oxygen Substances 0.000 claims abstract description 33
- 238000004364 calculation method Methods 0.000 claims abstract description 8
- 241000282414 Homo sapiens Species 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 60
- 238000012706 support-vector machine Methods 0.000 claims description 43
- 238000012549 training Methods 0.000 claims description 21
- 230000009471 action Effects 0.000 claims description 12
- 230000000694 effects Effects 0.000 claims description 10
- 239000000427 antigen Substances 0.000 claims description 9
- 102000036639 antigens Human genes 0.000 claims description 9
- 108091007433 antigens Proteins 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 9
- 230000000875 corresponding effect Effects 0.000 claims description 6
- 230000036039 immunity Effects 0.000 claims description 6
- 230000035772 mutation Effects 0.000 claims description 6
- 238000012216 screening Methods 0.000 claims description 6
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 claims description 5
- 239000011574 phosphorus Substances 0.000 claims description 5
- 229910052698 phosphorus Inorganic materials 0.000 claims description 5
- 239000000126 substance Substances 0.000 claims description 5
- 230000008859 change Effects 0.000 claims description 4
- 230000002596 correlated effect Effects 0.000 claims description 3
- 230000009977 dual effect Effects 0.000 claims description 3
- 238000012417 linear regression Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000013213 extrapolation Methods 0.000 description 2
- 229910052757 nitrogen Inorganic materials 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- -1 transparency Substances 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/18—Water
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/27—Regression, e.g. linear or logistic regression
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Software Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Medical Informatics (AREA)
- Food Science & Technology (AREA)
- Medicinal Chemistry (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a water quality prediction method, which utilizes a regression type SVR algorithm to predict dissolved oxygen in water, optimizes parameters C and g in the SVR by an artificial immune algorithm AIA to reduce subjective influence of human factors and improve universality and performance of the SVR, carries out correlation calculation on the output of a SVR model and various water quality parameters, and selects the water quality parameters with higher correlation coefficients as the input of the model so as to improve the accuracy of the algorithm; the predicted value of the invention is closer to the true value, and the performance is more excellent. The improved algorithm can be used for early prediction of dissolved oxygen.
Description
Technical Field
The invention relates to the technical field of water quality detection, in particular to a water quality prediction method.
Background
Lake water quality affects the water safety of its surrounding organisms including human beings, and in order to make precautionary measures in advance, prediction of water quality is required. Traditional predictive algorithms have difficulty forming efficient nonlinear systems due to the complexity of the water quality system.
At present, the existing technology comprises the steps of forecasting water quality by using a gray neural network, and correcting error residual values through Markov, wherein the numerical value can be corrected by the method so as to be relatively easy to approach to a real numerical value; the grey neural network and the artificial neural network are combined, and the algorithm is used for predicting the water quality; the time sequence is optimized through a subdivision extrapolation limit method and a multi-reference weighted fuzzy prediction method, and the detection result shows that the time sequence prediction designed by the subdivision extrapolation limit method can obtain a good detection conclusion.
Disclosure of Invention
The invention provides a water quality prediction method, wherein the predicted value is closer to the true value, and the performance is more excellent. The improved algorithm can be used for early prediction of dissolved oxygen.
The invention adopts the following technical scheme.
A water quality prediction method utilizes a regression type SVR algorithm to predict dissolved oxygen in water, optimizes parameters C and g in the SVR by an artificial immune algorithm AIA to reduce subjective influence of human factors and improve universality and performance of the SVR, and carries out correlation calculation on output of a SVR model and various water quality parameters to select water quality parameters with higher correlation coefficients as input of the model so as to improve accuracy of the algorithm.
The method comprises the following steps;
step S1, selecting water quality data with higher correlation coefficient with dissolved oxygen as an input node of an algorithm, wherein the water quality data comprises water temperature, conductivity, total phosphorus and chemical oxygen demand, and the dissolved oxygen is an output node of the algorithm; normalizing the water quality historical data to obtain a test set and a training set;
s2, constructing an SVR water quality prediction model, and taking an antibody generated by an artificial immune algorithm as a parameter c and a parameter g in a regression type support vector machine SVR model;
s3, bringing the data of the training set obtained in the step S1 into a model, and comparing and analyzing the prediction accuracy of the SVR model on the dissolved oxygen under the action of different parameters c and parameters g;
s4, taking the prediction accuracy generated by SVR as an affinity function of an artificial immunity algorithm, and keeping parameters with high propagation probability as memory cells;
s5, in order to avoid the algorithm to fall into a local optimal solution, carrying out random variation on the antibody with low affinity in the memory cells, and finally forming a new parent group;
s6, re-screening the parent group newly generated by the artificial immune algorithm by applying the step c until iteration is finished;
and S7, obtaining the parameter c and the parameter g which are the optimal values after iteration is finished, namely an optimal algorithm model, and taking the data of the test set into the model to obtain the predicted value of the dissolved oxygen.
The correlation coefficient described in step S1 is an introduced correlation coefficient CC, and is used for selecting appropriate water quality data as an input node, where the correlation coefficient CC is used to display the closeness of the relationship between two variables, especially the trend of these variables;
the correlation coefficient CC is defined as:
wherein X, Y is the water quality data and dissolved oxygen data to be compared, cov (X, Y) is the covariance between the two data, σ x Sum sigma y Is the variance of the two data; correlation coefficient |CC|<0.4 is weakly correlated, 0.4<|CC|<0.7 is medium intensity related, |CC|>0.7 is a strong correlation.
The water quality prediction model of the regression-type support vector machine in the step S2 specifically comprises:
assuming that a group of training samples L (x, y) exist, wherein x represents input data of the training samples, namely other water quality data, and y represents output data corresponding to the training samples, namely dissolved oxygen data; in order to determine the corresponding relation between the two, a linear regression function is established in a high-dimensional feature space:
f (x) =wΦ (x) +b formula two;
where phi (x) is a nonlinear mapping function. To solve for w and b, a relaxation variable ζ is introduced here i ,The mathematical expression is:
the constraint conditions are as follows:
to solve equation four, the larginge function is also introduced and converted to the dual form:
the constraint conditions are as follows:
wherein K (x) i ,z i ) Is a kernel function.
The SVR model under the action of the different parameters c and g in the step S3 is specifically expressed as follows: c is a penalty factor, the requirement of the whole SVR model function on errors is determined, and as the numerical value of c is increased, the requirement of the function on error values is stricter, so that real data are easily missed excessively; as the value of c is reduced, the more relaxed the function has to the error value, the more likely the function screening effect is invalid;
kernel function K (x i ,z i ) Adopting RBF, wherein the Gaussian kernel function RBF reduces the weight of data points far away from the plane, so that the RBF can process high-low frequency data faster than other kernel functions, and the RBF kernel function can find a proper plane by using a help regression type support vector machine faster than other kernel functions; the parameter g of the RBF influences generalization performance by influencing the action range of the Gaussian function, the action range of the Gaussian function is too small due to the fact that the value of the parameter g is too large, so that some other data are not classified, the effect of data classification is reduced due to the fact that the Gaussian function acts on too much data due to the fact that the value of the parameter g is too small, good training effect cannot be obtained on a training set, and the prediction result of the testing set is deteriorated.
The specific steps of the propagation probability calculation in the step S4 are as follows:
and step A1, analyzing the problem. Taking ideal predicted values as antigens and taking parameters C and g as antibodies; the difference between the predicted value and the true value generated by the SVR is used as an affinity function;
step A2, generating an initial antibody group; randomly generating an initial antibody population;
step A3, evaluating the antibody group; two criteria are used for evaluating antibody populations by artificial immune algorithms; firstly, the affinity between the antibody and the antigen, namely the affinity function in the step A1, and secondly, the concentration between the antibody and the antibody; the concentration expression is:
wherein N is the total number of antibodies, S v,s Is the similarity between antibodies. The similarity expression is:
wherein k is v,s The number of bits of the antibody v is the same as that of the antibody s, and L is the length of the antibody;
then calculating the reproduction probability by using the affinity between the antibody and the antigen and the concentration of the antibody, wherein the probability of being selected to a memory bank and a parent group is higher as the reproduction probability is higher; the propagation probability expression is as follows:
wherein alpha is a constant, A v As the affinity function, it is known from the above equation that the higher the affinity, the higher the propagation probability, the higher the individual concentration, and the lower the propagation probability.
In step S5, the generation of the new parent group specifically includes the steps of:
step B1, generating a memory bank and a new antibody group; the antibody groups with highest similarity are reserved as a memory bank according to the arrangement of the similarity from high to low; arranging from top to bottom according to the propagation probability, and taking the first N individuals to form a new antibody group;
step B2, cross mutation; based on the antibody population produced in step B1, cross mutation was performed for each antibody to obtain a new antibody population.
Step B3, generating a new generation of parent group; combining the new antibody group obtained in the step B2 with the memory bank obtained in the step B1 to jointly form a new generation parent group.
The method is used for predicting the water quality change of the lake.
The invention belongs to a water quality prediction method based on a regression type support vector machine and an artificial immunity algorithm, which comprises the following steps: firstly, calculating the correlation between the data to be predicted and other various water quality data, and then taking the water quality data with high correlation coefficient as the input data of an optimization algorithm. Because the regression type support vector machine is greatly influenced by the parameters C and the parameters g, the variance of SVR output is used as the adaptability of an artificial immunity algorithm, the parameters C and the parameters g are optimized by utilizing the excellent optimizing capability of the artificial immunity algorithm, the best parameters C and parameters g are found, and the SVR model frame is built again, so that the regression type support vector machine outputs the optimal predicted value.
According to the invention, the parameter C and the parameter g in the Artificial Immune Algorithm (AIA) optimized regression type support vector machine (SVR) are used for predicting the dissolved oxygen in water, so that the subjective influence of human factors can be reduced, and the universality and the performance of the support vector machine are improved. Meanwhile, in order to improve algorithm accuracy, the output of the model and various water quality parameters are subjected to correlation calculation, the water quality parameters with high correlation coefficients are selected as the input of the model, finally, the prediction result is compared with other algorithm models, and the experimental result shows that the prediction value of the new model is smaller than the SVR and GRNN models in variance and maximum error value, the prediction value is closer to a true value, and the performance is more excellent. The improved algorithm can be used for early prediction of dissolved oxygen.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the invention provides a novel water quality prediction method which comprises the following steps: and the correlation coefficient is used for selecting water quality data as an input node of an algorithm, so that the problem that the predicted data effect is not ideal due to the error of selecting the input node when predicting different water quality data is avoided. And improving the parameters C and g of the regression type support vector machine through an artificial immune algorithm. Because the artificial immune algorithm not only has excellent optimizing capability, but also introduces the concept of propagation probability, the diversity of the antibody is ensured, and the algorithm is prevented from entering a local optimal solution. The method can rapidly predict the future water quality change of the lake and avoid the deterioration of the water body.
Drawings
The invention is described in further detail below with reference to the attached drawings and detailed description:
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is a schematic diagram comparing the predicted results of the present invention with other algorithms.
Detailed Description
As shown in the figure, the method predicts the dissolved oxygen in water by using a regression type SVR algorithm, optimizes the parameters C and g in the SVR by using an artificial immune algorithm AIA to reduce the subjective influence of human factors and improve the universality and performance of the SVR, and carries out correlation calculation on the output of a SVR model and various water quality parameters to select the water quality parameters with higher correlation coefficients as the input of the model so as to improve the accuracy of the algorithm.
The method comprises the following steps;
step S1, selecting water quality data with higher correlation coefficient with dissolved oxygen as an input node of an algorithm, wherein the water quality data comprises water temperature, conductivity, total phosphorus and chemical oxygen demand, and the dissolved oxygen is an output node of the algorithm; normalizing the water quality historical data to obtain a test set and a training set;
s2, constructing an SVR water quality prediction model, and taking an antibody generated by an artificial immune algorithm as a parameter c and a parameter g in a regression type support vector machine SVR model;
s3, bringing the data of the training set obtained in the step S1 into a model, and comparing and analyzing the prediction accuracy of the SVR model on the dissolved oxygen under the action of different parameters c and parameters g;
s4, taking the prediction accuracy generated by SVR as an affinity function of an artificial immunity algorithm, and keeping parameters with high propagation probability as memory cells;
s5, in order to avoid the algorithm to fall into a local optimal solution, carrying out random variation on the antibody with low affinity in the memory cells, and finally forming a new parent group;
s6, re-screening the parent group newly generated by the artificial immune algorithm by applying the step c until iteration is finished;
and S7, obtaining the parameter c and the parameter g which are the optimal values after iteration is finished, namely an optimal algorithm model, and taking the data of the test set into the model to obtain the predicted value of the dissolved oxygen.
The correlation coefficient described in step S1 is an introduced correlation coefficient CC, and is used for selecting appropriate water quality data as an input node, where the correlation coefficient CC is used to display the closeness of the relationship between two variables, especially the trend of these variables;
the correlation coefficient CC is defined as:
wherein X, Y is the water quality data and dissolved oxygen data to be compared, cov (X, Y) is the covariance between the two data, σ x Sum sigma y Is the variance of the two data; correlation coefficient |CC|<0.4 is weakly correlated, 0.4<|CC|<0.7 is medium intensity related, |CC|>0.7 is a strong correlation.
The water quality prediction model of the regression-type support vector machine in the step S2 specifically comprises:
assuming that a group of training samples L (x, y) exist, wherein x represents input data of the training samples, namely other water quality data, and y represents output data corresponding to the training samples, namely dissolved oxygen data; in order to determine the corresponding relation between the two, a linear regression function is established in a high-dimensional feature space:
f (x) =wΦ (x) +b formula two;
where phi (x) is a nonlinear mapping function. To solve for w and b, a relaxation variable ζ is introduced here i ,ξ i * The mathematical expression is:
the constraint conditions are as follows:
to solve equation four, the larginge function is also introduced and converted to the dual form:
the constraint conditions are as follows:
wherein K (x) i ,z i ) Is a kernel function.
The SVR model under the action of the different parameters c and g in the step S3 is specifically expressed as follows: c is a penalty factor, the requirement of the whole SVR model function on errors is determined, and as the numerical value of c is increased, the requirement of the function on error values is stricter, so that real data are easily missed excessively; as the value of c is reduced, the more relaxed the function has to the error value, the more likely the function screening effect is invalid;
kernel function K (x i ,z i ) Adopting RBF, wherein the Gaussian kernel function RBF reduces the weight of data points far away from the plane, so that the RBF can process high-low frequency data faster than other kernel functions, and the RBF kernel function can find a proper plane by using a help regression type support vector machine faster than other kernel functions; the parameter g of the RBF influences generalization performance by influencing the action range of the Gaussian function, the action range of the Gaussian function is too small due to the fact that the value of the parameter g is too large, so that some other data are not classified, the effect of data classification is reduced due to the fact that the Gaussian function acts on too much data due to the fact that the value of the parameter g is too small, good training effect cannot be obtained on a training set, and the prediction result of the testing set is deteriorated.
The specific steps of the propagation probability calculation in the step S4 are as follows:
and step A1, analyzing the problem. Taking ideal predicted values as antigens and taking parameters C and g as antibodies; the difference between the predicted value and the true value generated by the SVR is used as an affinity function;
step A2, generating an initial antibody group; randomly generating an initial antibody population;
step A3, evaluating the antibody group; two criteria are used for evaluating antibody populations by artificial immune algorithms; firstly, the affinity between the antibody and the antigen, namely the affinity function in the step A1, and secondly, the concentration between the antibody and the antibody; the concentration expression is:
wherein N is the total number of antibodies, S v,s Is the similarity between antibodies. The similarity expression is:
wherein k is v,s The number of bits of the antibody v is the same as that of the antibody s, and L is the length of the antibody;
then calculating the reproduction probability by using the affinity between the antibody and the antigen and the concentration of the antibody, wherein the probability of being selected to a memory bank and a parent group is higher as the reproduction probability is higher; the propagation probability expression is as follows:
wherein alpha is a constant, A v As the affinity function, it is known from the above equation that the higher the affinity, the higher the propagation probability, the higher the individual concentration, and the lower the propagation probability.
In step S5, the generation of the new parent group specifically includes the steps of:
step B1, generating a memory bank and a new antibody group; the antibody groups with highest similarity are reserved as a memory bank according to the arrangement of the similarity from high to low; arranging from top to bottom according to the propagation probability, and taking the first N individuals to form a new antibody group;
step B2, cross mutation; based on the antibody population produced in step B1, cross mutation was performed for each antibody to obtain a new antibody population.
Step B3, generating a new generation of parent group; combining the new antibody group obtained in the step B2 with the memory bank obtained in the step B1 to jointly form a new generation parent group.
The method is used for predicting the water quality change of the lake.
Examples:
the rationality of the example verification algorithm is utilized as follows:
the example data is derived from 108 groups of data in total from 1 month in 2007 to 12 months in 2015 of a Taihu lake station No. 0 observation station (120 DEG 22 217'E,31 DEG 53 983' N) selected herein, and the data sampling frequency is once in mid-month. The data of water temperature, conductivity, total nitrogen, total phosphorus, transparency, water depth, PH chemical oxygen demand and ammoniacal nitrogen are selected for calculation and analysis with the dissolved oxygen as output data, and the final results are shown in the following table:
TABLE 1 correlation coefficient of dissolved oxygen with individual water quality data
Generally, the correlation coefficient |cc| <0.4 is a weak correlation, 0.4< |cc| <0.7 is a medium-intensity correlation, and |cc| > 0.7 is a strong correlation. The water temperature, conductivity, total phosphorus, chemical oxygen demand are selected as the input nodes of the algorithm.
As shown in fig. 2, the predicted value of the modified algorithm is closer to the true value and the fluctuations are smaller than the original algorithm. Because the naked eyes cannot describe the number of the specific improvement, the variance and the maximum error value of the predicted value and the true value are used for comparison in the section, and the superiority and inferiority of the two algorithms are analyzed.
TABLE 2 variance to maximum error value comparison
As shown in the table above, the variance of the predicted and actual values of the AIA-SVR model was 0.19153, the maximum error value was 0.76558mg/L, the variance of the predicted and actual values of the SVR model was 0.6248, and the maximum error was 1.3952mg/L. The variance of the predicted value and the true value of the GRNN model is 0.39799, and the maximum error value is 1.19mg/L. As can be seen by comparison, the predicted value of the AIA-SVR model is smaller than the SVR and GRNN models in variance and maximum error value, the predicted value is closer to the true value, and the performance is more excellent.
Claims (8)
1. A water quality prediction method is characterized in that: according to the method, dissolved oxygen in water is predicted by using a regression type SVR algorithm, parameters C and g in the SVR are optimized by using an artificial immune algorithm AIA to reduce subjective influence of human factors and improve universality and performance of the SVR.
2. A water quality prediction method according to claim 1, characterized in that: the method comprises the following steps;
step S1, selecting water quality data with higher correlation coefficient with dissolved oxygen as an input node of an algorithm, wherein the water quality data comprises water temperature, conductivity, total phosphorus and chemical oxygen demand, and the dissolved oxygen is an output node of the algorithm; normalizing the water quality historical data to obtain a test set and a training set;
s2, constructing an SVR water quality prediction model, and taking an antibody generated by an artificial immune algorithm as a parameter c and a parameter g in a regression type support vector machine SVR model;
s3, bringing the data of the training set obtained in the step S1 into a model, and comparing and analyzing the prediction accuracy of the SVR model on the dissolved oxygen under the action of different parameters c and parameters g;
s4, taking the prediction accuracy generated by SVR as an affinity function of an artificial immunity algorithm, and keeping parameters with high propagation probability as memory cells;
s5, in order to avoid the algorithm to fall into a local optimal solution, carrying out random variation on the antibody with low affinity in the memory cells, and finally forming a new parent group;
s6, re-screening the parent group newly generated by the artificial immune algorithm by applying the step c until iteration is finished;
and S7, obtaining the parameter c and the parameter g which are the optimal values after iteration is finished, namely an optimal algorithm model, and taking the data of the test set into the model to obtain the predicted value of the dissolved oxygen.
3. A water quality prediction method according to claim 2, characterized in that: the correlation coefficient described in step S1 is an introduced correlation coefficient CC, and is used for selecting appropriate water quality data as an input node, where the correlation coefficient CC is used to display the closeness of the relationship between two variables, especially the trend of these variables;
the correlation coefficient CC is defined as:
wherein X, Y is the water quality data and dissolved oxygen data to be compared, cov (X, Y) is the covariance between the two data, σ x Sum sigma y Is the variance of the two data; correlation coefficient |CC|<0.4 is weakly correlated, 0.4<|CC|<0.7 is medium intensity related, |CC|>0.7 is a strong correlation.
4. A water quality prediction method according to claim 2, characterized in that: the water quality prediction model of the regression-type support vector machine in the step S2 specifically comprises:
assuming that a group of training samples L (x, y) exist, wherein x represents input data of the training samples, namely other water quality data, and y represents output data corresponding to the training samples, namely dissolved oxygen data; in order to determine the corresponding relation between the two, a linear regression function is established in a high-dimensional feature space:
f (x) =wΦ (x) +b formula two;
wherein phi is(x) Is a nonlinear mapping function. To solve for w and b, a relaxation variable ζ is introduced here i ,ξ i * The mathematical expression is:
the constraint conditions are as follows:
to solve equation four, the larginge function is also introduced and converted to the dual form:
the constraint conditions are as follows:
wherein K (x) i ,z i ) Is a kernel function.
5. The method for predicting water quality as claimed in claim 4, wherein: the SVR model under the action of the different parameters c and g in the step S3 is specifically expressed as follows: c is a penalty factor, the requirement of the whole SVR model function on errors is determined, and as the numerical value of c is increased, the requirement of the function on error values is stricter, so that real data are easily missed excessively; as the value of c is reduced, the more relaxed the function has to the error value, the more likely the function screening effect is invalid;
kernel function K (x i ,z i ) The RBF is adopted, wherein the Gaussian kernel function RBF reduces the weight of data points far from a plane, so that the RBF can process high-frequency data and low-frequency data faster than other kernel functions, and the RBF kernel function phases can be realizedFinding a proper plane by using a help regression type support vector machine faster than other kernel functions; the parameter g of the RBF influences generalization performance by influencing the action range of the Gaussian function, the action range of the Gaussian function is too small due to the fact that the value of the parameter g is too large, so that some other data are not classified, the effect of data classification is reduced due to the fact that the Gaussian function acts on too much data due to the fact that the value of the parameter g is too small, good training effect cannot be obtained on a training set, and the prediction result of the testing set is deteriorated.
6. The method for predicting water quality as claimed in claim 4, wherein: the specific steps of the propagation probability calculation in the step S4 are as follows:
and step A1, analyzing the problem. Taking ideal predicted values as antigens and taking parameters C and g as antibodies; the difference between the predicted value and the true value generated by the SVR is used as an affinity function;
step A2, generating an initial antibody group; randomly generating an initial antibody population;
step A3, evaluating the antibody group; two criteria are used for evaluating antibody populations by artificial immune algorithms; firstly, the affinity between the antibody and the antigen, namely the affinity function in the step A1, and secondly, the concentration between the antibody and the antibody; the concentration expression is:
wherein N is the total number of antibodies, S v,s Is the similarity between antibodies. The similarity expression is:
wherein k is v,s The number of bits of the antibody v is the same as that of the antibody s, and L is the length of the antibody;
then calculating the reproduction probability by using the affinity between the antibody and the antigen and the concentration of the antibody, wherein the probability of being selected to a memory bank and a parent group is higher as the reproduction probability is higher; the propagation probability expression is as follows:
wherein alpha is a constant, A v As the affinity function, it is known from the above equation that the higher the affinity, the higher the propagation probability, the higher the individual concentration, and the lower the propagation probability.
7. A water quality prediction method according to claim 2, characterized in that: in step S5, the generation of the new parent group specifically includes the steps of:
step B1, generating a memory bank and a new antibody group; the antibody groups with highest similarity are reserved as a memory bank according to the arrangement of the similarity from high to low; arranging from top to bottom according to the propagation probability, and taking the first N individuals to form a new antibody group;
step B2, cross mutation; based on the antibody population produced in step B1, cross mutation was performed for each antibody to obtain a new antibody population.
Step B3, generating a new generation of parent group; combining the new antibody group obtained in the step B2 with the memory bank obtained in the step B1 to jointly form a new generation parent group.
8. A water quality prediction method according to claim 2, characterized in that: the method is used for predicting the water quality change of the lake.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311106497.0A CN117077819A (en) | 2023-08-30 | 2023-08-30 | Water quality prediction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311106497.0A CN117077819A (en) | 2023-08-30 | 2023-08-30 | Water quality prediction method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117077819A true CN117077819A (en) | 2023-11-17 |
Family
ID=88709581
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311106497.0A Pending CN117077819A (en) | 2023-08-30 | 2023-08-30 | Water quality prediction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117077819A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117633721A (en) * | 2024-01-25 | 2024-03-01 | 水利部交通运输部国家能源局南京水利科学研究院 | Urban river network transparency prediction method driven by mechanism model and data in combined mode |
-
2023
- 2023-08-30 CN CN202311106497.0A patent/CN117077819A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117633721A (en) * | 2024-01-25 | 2024-03-01 | 水利部交通运输部国家能源局南京水利科学研究院 | Urban river network transparency prediction method driven by mechanism model and data in combined mode |
CN117633721B (en) * | 2024-01-25 | 2024-04-09 | 水利部交通运输部国家能源局南京水利科学研究院 | Urban river network transparency prediction method driven by mechanism model and data in combined mode |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109214575B (en) | Ultrashort-term wind power prediction method based on small-wavelength short-term memory network | |
CN108900346B (en) | Wireless network flow prediction method based on LSTM network | |
CN111563706A (en) | Multivariable logistics freight volume prediction method based on LSTM network | |
CN112989635B (en) | Integrated learning soft measurement modeling method based on self-encoder diversity generation mechanism | |
CN117077819A (en) | Water quality prediction method | |
Tsakiridis et al. | DECO3RUM: A Differential Evolution learning approach for generating compact Mamdani fuzzy rule-based models | |
CN112289391B (en) | Anode aluminum foil performance prediction system based on machine learning | |
CN112434891A (en) | Method for predicting solar irradiance time sequence based on WCNN-ALSTM | |
CN111191823B (en) | Deep learning-based production logistics prediction method | |
CN110765418B (en) | Intelligent set evaluation method and system for basin water and sand research model | |
CN115185937A (en) | SA-GAN architecture-based time sequence anomaly detection method | |
CN111985825A (en) | Crystal face quality evaluation method for roller mill orientation instrument | |
CN113889198A (en) | Transformer fault diagnosis method and equipment based on oil chromatogram time-frequency domain information and residual error attention network | |
CN113537469A (en) | Urban water demand prediction method based on LSTM network and Attention mechanism | |
Tessoni et al. | Advanced statistical and machine learning methods for multi-step multivariate time series forecasting in predictive maintenance | |
Buragohain | Adaptive network based fuzzy inference system (ANFIS) as a tool for system identification with special emphasis on training data minimization | |
Wang et al. | Causal carbon price interval prediction using lower upper bound estimation combined with asymmetric multi-objective evolutionary algorithm and long short-term memory | |
CN117728403A (en) | Wind power probability prediction method and system under severe wind scene of cold weather | |
CN116579371A (en) | Double-layer optimization heterogeneous proxy model assisted multi-objective evolutionary optimization computing method | |
Xu et al. | Wisdom: Weighted incremental spatio-temporal multi-task learning via tensor decomposition | |
CN114004346A (en) | Soft measurement modeling method based on gating stacking isomorphic self-encoder and storage medium | |
CN115618987A (en) | Production well production data prediction method, device, equipment and storage medium | |
Huang et al. | Calibration-aware bayesian learning | |
CN115035962A (en) | Variational self-encoder and generation countermeasure network-based virtual sample generation and soft measurement modeling method | |
CN113723707A (en) | Medium-and-long-term runoff trend prediction method based on deep learning model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |