CN115470702A - Sewage treatment water quality prediction method and system based on machine learning - Google Patents

Sewage treatment water quality prediction method and system based on machine learning Download PDF

Info

Publication number
CN115470702A
CN115470702A CN202211112693.4A CN202211112693A CN115470702A CN 115470702 A CN115470702 A CN 115470702A CN 202211112693 A CN202211112693 A CN 202211112693A CN 115470702 A CN115470702 A CN 115470702A
Authority
CN
China
Prior art keywords
water quality
prediction model
quality prediction
inlet
sewage treatment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211112693.4A
Other languages
Chinese (zh)
Inventor
祝新哲
刘炳佑
孙连鹏
吕慧
邓欢忠
李若泓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202211112693.4A priority Critical patent/CN115470702A/en
Publication of CN115470702A publication Critical patent/CN115470702A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A20/00Water conservation; Efficient water supply; Efficient water use
    • Y02A20/152Water filtration

Abstract

The invention discloses a machine learning-based sewage treatment water quality prediction method and system, wherein the method comprises the following steps: acquiring daily historical inlet water data of a sewage treatment plant and constructing an inlet water quality database; dividing inlet water quality data in an inlet water quality database into a training set and a testing set; constructing a water quality prediction model by utilizing a training set based on a five-fold cross validation method; verifying the water quality prediction model by using the test set to obtain an optimal water quality prediction model; and inputting the target to be measured into the optimal water quality prediction model to obtain a prediction result. The system comprises: the device comprises a database construction module, a data division module, a model construction module, a verification module and a prediction module. By using the method and the device, the water quality index of the inlet water can be rapidly and accurately predicted. The method and the system for predicting the water quality of sewage treatment based on machine learning can be widely applied to the technical field of water quality index prediction of inlet water.

Description

Sewage treatment water quality prediction method and system based on machine learning
Technical Field
The invention relates to the technical field of inflow water quality index prediction, in particular to a sewage treatment water quality prediction method and system based on machine learning.
Background
In the process of town sewage treatment, the quality of inlet water has direct requirements on the treatment capacity of a sewage treatment plant, and the treatment process and the control of effluent indexes are also closely influenced.
At present, the water quality index of inlet water in the sewage treatment process is based on the monitoring of various hardware devices, and for some indexes which are difficult to be directly monitored on line, the data acquisition is delayed and non-real-time, which is not beneficial to carrying out technical adjustment on instruments and equipment in various treatment units in a sewage treatment plant, such as reflux ratio, aeration quantity and the like.
Disclosure of Invention
In order to solve the technical problems, the invention aims to provide a sewage treatment water quality prediction method and system based on machine learning, which can quickly and accurately predict the water quality index of inlet water.
The first technical scheme adopted by the invention is as follows: a sewage treatment water quality prediction method based on machine learning comprises the following steps:
acquiring daily historical inlet water data of a sewage treatment plant and constructing an inlet water quality database;
dividing inlet water quality data in an inlet water quality database into a training set and a testing set;
constructing a water quality prediction model by utilizing a training set based on a five-fold cross validation method;
verifying the water quality prediction model by using the test set to obtain an optimal water quality prediction model;
and inputting the target to be measured into the optimal water quality prediction model to obtain a prediction result.
Further, the step of obtaining daily historical inlet water data of the sewage treatment plant and constructing an inlet water quality database specifically comprises:
acquiring daily historical inlet water data of a sewage treatment plant and extracting inlet water quality indexes;
classifying the extracted daily historical inlet water data according to inlet water quality indexes;
calculating the quartile interval of each type of water inlet quality index by utilizing a quartile algorithm;
rejecting abnormal values which are larger than a preset threshold value in each type of water inlet quality index;
and constructing a water inlet quality database according to the daily historical water inlet data after the abnormal values are removed.
Further, the water quality indexes of the inlet water comprise flow, chemical oxygen demand, five-day biochemical oxygen demand, total nitrogen, total phosphorus, ammonia nitrogen, pH, chroma and suspended solid concentration.
Further, the step of constructing the water quality prediction model by using the training set based on the five-fold cross validation method specifically comprises the following steps:
equally dividing the training set into 5 disjoint parts, selecting one part as a verification set, and taking the other four parts as the training set;
taking the biochemical oxygen demand of five days in the training set as a training dependent variable, taking the other water quality indexes of the inlet water in the training set as training independent variables, and obtaining a water quality prediction model by adopting a deep learning algorithm;
verifying the water quality prediction model by using a verification set to obtain an empirical error of the water quality prediction model;
re-selecting one part as a verification set and the other four parts as training sets to carry out circular training to obtain five water quality prediction models and corresponding experience errors;
and selecting a water quality prediction model with the minimum empirical error.
Further, still include:
and calculating a determined correlation coefficient of the water quality prediction model with the minimum empirical error, if the determined correlation coefficient is less than or equal to a preset value, readjusting each parameter of the water quality prediction model, and repeatedly performing five-fold cross validation to obtain the water quality prediction model meeting the expectation.
Further, the step of verifying the water quality prediction model by using the test set to obtain an optimal water quality prediction model specifically comprises:
taking the biochemical oxygen demand of five days in the concentrated test as an actual value, and inputting the water quality indexes of the rest of the inlet water into a water quality prediction model to obtain a predicted value of the biochemical oxygen demand of five days;
calculating a generalization error and determining a correlation coefficient according to the actual value of the biochemical oxygen demand in five days and the predicted value of the biochemical oxygen demand in five days;
and evaluating the water quality prediction model according to the generalization error and the determined correlation coefficient to obtain the optimal water quality prediction model.
Further, still include:
and if the correlation coefficient is determined to be smaller than the preset value, reconstructing the water quality prediction model.
The second technical scheme adopted by the invention is as follows: a sewage treatment water quality prediction system based on machine learning comprises:
the database construction module is used for acquiring daily historical inlet water data of the sewage treatment plant and constructing an inlet water quality database;
the data dividing module is used for dividing the water quality data of the inlet water in the inlet water quality database into a training set and a testing set;
the model construction module is used for constructing a water quality prediction model by utilizing a training set based on a five-fold cross validation method;
the verification module is used for verifying the water quality prediction model by using the test set to obtain an optimal water quality prediction model;
and the prediction module is used for inputting the target to be measured into the optimal water quality prediction model to obtain a prediction result.
The method and the system have the beneficial effects that: the method comprises the steps of firstly, acquiring daily historical inlet water data of a sewage treatment plant, extracting inlet water quality indexes of the daily historical inlet water data and eliminating abnormal values, and constructing an inlet water quality database according to the daily historical inlet water data after the abnormal values are eliminated so as to ensure the accuracy of constructing a water quality prediction model; secondly, dividing the water quality data of the inlet water in the inlet water quality database into a training set and a testing set; then, a water quality prediction model is constructed by utilizing a training set based on a five-fold cross validation method, the problem of less training data can be solved, and the constructed water quality prediction model is more accurate and stable by using all 5-fold training data; then, verifying the water quality prediction model by using a test set to obtain an optimal water quality prediction model, and determining the generalization capability of the water quality prediction model; and finally, predicting the target to be measured by using the optimal water quality prediction model to obtain a prediction result, thereby realizing the rapid and accurate prediction of the water quality.
Drawings
FIG. 1 is a flow chart of steps of a sewage treatment water quality prediction method based on machine learning according to the present invention;
FIG. 2 is a block diagram of the structure of a sewage treatment water quality prediction system based on machine learning according to the present invention;
FIG. 3 is a schematic illustration of an incoming water quality indicator in accordance with an embodiment of the present invention;
FIG. 4 is a schematic diagram of a five-fold cross-validation method according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating the test results of the water quality prediction model according to the embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and the specific embodiments. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.
Referring to fig. 1, the invention provides a sewage treatment water quality prediction method based on machine learning, which comprises the following steps:
s1, acquiring daily historical inlet water data of a sewage treatment plant and constructing an inlet water quality database;
s1.1, acquiring daily historical water inlet data of a sewage treatment plant and extracting water inlet quality indexes;
specifically, daily historical water inlet data of the sewage treatment plant are obtained, the daily historical water inlet data comprise online automatic monitoring data and manual sampling detection data, and the water inlet quality index comprises flow(Q), chemical Oxygen Demand (COD), five-day Biochemical Oxygen Demand (BOD) 5 ) Total Nitrogen (TN), total Phosphorus (TP), ammonia Nitrogen (NH) 3 -N), pH, colour and Suspended Solids (SS) concentration, totaling 1057 sets of influent water quality indicator data.
The flow rate refers to the inflow rate of the sewage treatment plant, namely the amount of sewage entering the sewage treatment plant in unit time, the change trend of the flow rate can affect the treatment efficiency of subsequent units, and in addition, the influence of rainwater inflow and external water infiltration on the quality and the amount of water is considered, and hidden relations may exist between the flow rate and various water quality parameters.
The chemical oxygen demand is the quantity of reducing substances needing to be oxidized in a water sample measured by a chemical method, the quantity of an oxidant consumed by oxidizing the reducing substances in 1 liter of the water sample is taken as an index of the water sample under a certain condition, the quantity of the oxidant is converted into milligrams of oxygen required after each liter of the water sample is completely oxidized, and the milligrams of the oxygen is expressed in mg/L, and the oxygen demand reflects the degree of pollution of the reducing substances in the water.
The five-day biochemical oxygen demand is the amount of dissolved oxygen consumed by microorganisms to decompose certain oxidizable substances, particularly organic substances, in a certain volume of water within a certain period of time, expressed in mg/L or percentage, ppm, and is a comprehensive index reflecting the content of organic pollutants in water, and the higher the biochemical oxygen demand is, the more organic pollutants in water are, and the more serious the water pollution is.
Total nitrogen is the total amount of various forms of inorganic and organic nitrogen in water, including
Figure BDA0003844181340000041
And
Figure BDA0003844181340000042
inorganic nitrogen and organic nitrogen such as protein, amino acid and organic amine are often used to indicate the degree of water body polluted by nutrient substances, and the higher the numerical value is, the more serious the water quality pollution is.
The total phosphorus is the sum of the phosphorus existing in an inorganic state and an organic state in the wastewater, and is one of indexes for measuring the water pollution degree, and the larger the value is, the higher the water pollution degree is.
The ammonia nitrogen is nitrogen existing in the form of free ammonia and ionic ammonia, mainly comes from industrial wastewater such as decomposition of nitrogenous organic matters in domestic sewage, coking, synthetic ammonia and the like, is an important pollutant of 'water eutrophication' and 'environmental pollution', and the higher the numerical value is, the more serious the water quality pollution is.
The pH value is used as a conventional daily sewage test, the pH value of the sewage is measured in the operation management of a sewage plant, not only is a factor for monitoring the quality of the sewage, but also the pH value of the sewage can influence the living environment of microorganisms in the activated sludge, and the greatly changed pH value in the sewage is also one of the indexes for judging the sewage as pollution or other environmental factors.
The chroma is the color of water, and refers to the similar yellow or yellow brown degree presented by soluble substances or colloidal substances in the water, the chroma of the water is divided into an apparent color and a true color, the apparent color refers to the color of the water without removing suspended substances and includes the color generated by the soluble substances and insoluble suspended substances, the true color refers to the color of the water after removing the suspended substances, is generated only by soluble colored substances, is clean or has very low turbidity, and the true color is similar to the apparent color; the difference between the industrial wastewater and the domestic sewage which are deeply colored and have more suspended matters is larger.
The suspended solid concentration is the solid matter suspended in water, including inorganic matter, organic matter, silt, clay, microbe, etc. insoluble in water, and the suspended matter content in water is one of the indexes for measuring water pollution.
S1.2, classifying the extracted daily historical inlet water data according to inlet water quality indexes;
s1.3, calculating the quartile interval of each type of water quality index by using a quartile algorithm;
specifically, the flow data is arranged from small to large according to the magnitude of the numerical value and divided into four equal parts, so that the minimum value, the first quantile Q1, the median (the second quantile Q2), the third quantile Q3 and the maximum value of the flow data are obtained, and the quartile distance IQR is obtained according to the difference value between the first quartile and the third quartile, that is, IQR = Q3-Q1.
And in the same way, the quartile interval of other types of water quality indexes can be obtained.
S1.4, eliminating abnormal values which are larger than a preset threshold value in each type of water inlet quality indexes;
specifically, the abnormal value greater than the preset threshold in each type of water quality index is removed, the preset threshold preferably in this embodiment is 1.5 times of the interquartile interval, that is, the abnormal value is a value smaller than Q1-1.5 × iqr or a value greater than Q3+1.5 × iqr, and 984 sets of data remain, and a musical instrument diagram of each type of water quality index is drawn, as shown in fig. 3.
S1.5, constructing an inlet water quality database according to daily historical inlet water data after the abnormal values are removed.
S2, dividing inlet water quality data in an inlet water quality database into a training set and a testing set;
specifically, the inlet water quality data in the inlet water quality database is divided into two parts according to 4:1, 80% of the inlet water quality data is used as a training set, 20% of the inlet water quality data is used as a test set, the sample data volume of the training set is 787 groups, and the sample data volume of the test set is 197 groups.
S3, as shown in the figure 4, constructing a water quality prediction model by utilizing a training set based on a five-fold cross validation method;
specifically, the principle of the K-fold cross-validation method is as follows: dividing the data set into K parts and K-1 parts of training data set to construct a model, determining the optimal hyper-parameter value of the model, and then verifying the model performance based on the determined hyper-parameter value and 1 part of test data set.
If the training set is relatively small, the K value is increased, more data are used for model training in each iteration process, the minimum deviation can be obtained, meanwhile, the algorithm time is prolonged, and training blocks are highly similar, so that the evaluation result variance is high.
If the training set is relatively large, the K value is reduced, the calculation cost of performance evaluation of repeated fitting of the model on different data blocks is reduced, and accurate evaluation of the model is obtained on the basis of average performance.
Therefore, the preferred embodiment of the scheme selects a five-fold cross-validation method.
S3.1, equally dividing the training set into 5 disjoint parts, selecting one part as a verification set, and taking the other four parts as the training set;
specifically, as shown in fig. 4, the training set is approximately divided into 5 disjoint parts at random, the first sample data is 157 groups, the second sample data is 157 groups, the third sample data is 157 groups, the fourth sample data is 158 groups, and the fifth sample data is 158 groups, the first sample data is selected as the verification set, and the remaining four parts are the training set.
S3.2, taking the biochemical oxygen demand of five days in the training set as a training dependent variable, taking the rest water quality indexes in the training set as training independent variables, and obtaining a water quality prediction model by adopting a deep learning algorithm;
specifically, first, the biochemical oxygen demand, BOD, is a five-day biochemical oxygen demand 5 That is, the microorganisms can substantially complete the oxidative decomposition process in the first stage (99% of the completion process) at an optimum temperature, generally 20 ℃ as a standard temperature for the measurement, and 20 ℃ under BOD measurement conditions (oxygen sufficiency, no agitation), generally 20 days, that is, 20 days are required for the measurement of the biochemical oxygen demand in the first stage, which is difficult in practical use, and for this purpose, a standard time is defined, generally 5 days as a standard time for the measurement of BOD, and thus it is called five-day biochemical oxygen demand, and BOD is used 5 Is represented by BOD 5 Is about BOD 20 About 70 percent of the total amount of the biochemical oxygen demand, five days of biochemical oxygen demand in the training set is used as a training dependent variable, and the other water quality indexes are used as training independent variables.
Secondly, the quality of inlet water of the sewage treatment plant is data of coupling and correlation among various physical, chemical and biological indexes, the relation among the water quality index data is complex, multidimensional and nonlinear, and the machine learning has the capability of mining the correlation rules among the data.
S3.3, verifying the water quality prediction model by using a verification set to obtain an empirical error of the water quality prediction model;
specifically, the empirical error refers to an error of the model on the training set.
The method comprises the following steps of taking the biochemical oxygen demand of five days in the concentrated verification as an actual value, inputting the other water quality indexes of the inlet water into a water quality prediction model to obtain a predicted value of the biochemical oxygen demand of five days, and calculating an empirical error, namely a root mean square error, of the water quality prediction model according to the actual value of the biochemical oxygen demand of five days and the predicted value of the biochemical oxygen demand of five days, wherein a calculation formula is as follows:
Figure BDA0003844181340000061
in the above formula, N is the sample data size of the verification set,
Figure BDA0003844181340000062
is a predicted value of biochemical oxygen demand, y, for five days i Is the actual value of biochemical oxygen demand for five days.
S3.4, selecting one part as a verification set and the other four parts as training sets again to carry out circular training to obtain five water quality prediction models and corresponding experience errors;
specifically, a second sample data is selected as a verification set, the rest four samples are selected as training sets, and the step S3.2 and the step S3.3 are repeated to obtain a second water quality prediction model and an empirical error thereof; by analogy, five water quality prediction models and corresponding empirical errors can be obtained.
And S3.5, selecting the water quality prediction model with the minimum empirical error.
Further, calculating a determined correlation coefficient of the water quality prediction model with the minimum empirical error, and if the determined correlation coefficient is less than or equal to a preset value, then R 2 And (5) if the water quality prediction model is less than or equal to 0.6, readjusting each parameter of the water quality prediction model, and repeatedly carrying out five-fold cross validation to obtain the water quality prediction model meeting the expectation.
Wherein, the calculation formula for determining the correlation coefficient is as follows:
Figure BDA0003844181340000063
in the above formula, N is the sample data size of the verification set,
Figure BDA0003844181340000064
the biochemical oxygen demand in five days, y, is obtained by inputting the water quality indexes of the inlet water except the biochemical oxygen demand in five days into a water quality prediction model with the minimum empirical error i In order to verify the concentrated biochemical oxygen demand for five days,
Figure BDA0003844181340000065
the average value calculated for all five days of biochemical oxygen demand in the validation set.
S4, verifying the water quality prediction model by using the test set to obtain an optimal water quality prediction model;
s4.1, taking the biochemical oxygen demand of five days in the concentrated test as an actual value, and inputting the other water quality indexes of the inlet water into a water quality prediction model to obtain a predicted value of the biochemical oxygen demand of five days;
s4.2, calculating a generalization error and determining a correlation coefficient according to the actual value of the five-day biochemical oxygen demand and the predicted value of the five-day biochemical oxygen demand;
the generalization error refers to an error of the model on a new sample set (test set), and a calculation formula is as follows:
Figure BDA0003844181340000071
in the above formula, N is the sample data size of the test set,
Figure BDA0003844181340000072
is a predicted value of biochemical oxygen demand for five days, y i Is the actual value of biochemical oxygen demand for five days.
The calculation formula for determining the correlation coefficient is specifically as follows:
Figure BDA0003844181340000073
in the above formula, N is the sample data size of the test set,
Figure BDA0003844181340000074
is a predicted value of biochemical oxygen demand, y, for five days i Is the actual value of the biochemical oxygen demand in five days,
Figure BDA0003844181340000075
the average value calculated by using the actual values of the biochemical oxygen demand for all five days.
Wherein the content of the first and second substances,
Figure BDA0003844181340000076
from the above formula, RSME =19.29,r is calculated 2 =0.6421。
And S4.3, evaluating the water quality prediction model according to the generalization error and the determined correlation coefficient to obtain the optimal water quality prediction model.
Specifically, as shown in fig. 5, it can be visually seen that the actual value of the five-day biochemical oxygen demand substantially coincides with the predicted value of the five-day biochemical oxygen demand, and RSME =19.29 in step S4.2 2 =0.6421, further data illustrates the generalization capability of this predictive model.
Further, if the correlation coefficient is determined to be less than or equal to the preset value, R is obtained 2 And if the water quality is less than or equal to 0.6, reconstructing the water quality prediction model.
And S5, inputting the target to be detected into the optimal water quality prediction model to obtain a prediction result, so that the rapid and accurate prediction of the difficultly-measured water quality index of the sewage treatment plant is realized, the non-real-time property of the difficultly-measured water quality index data is overcome, the soft measurement effect of the inlet water quality of the sewage treatment plant is achieved, and the prediction value obtained through the optimal water quality prediction model can be used for supplementing historical missing data and reducing the online monitoring sensor and manual monitoring cost of the sewage treatment plant.
As shown in fig. 2, a sewage treatment water quality prediction system based on machine learning includes:
the database construction module is used for acquiring daily historical inlet water data of the sewage treatment plant and constructing an inlet water quality database;
the data dividing module is used for dividing the water quality data of the inlet water in the inlet water quality database into a training set and a testing set;
the model construction module is used for constructing a water quality prediction model by utilizing a training set based on a five-fold cross validation method;
the verification module is used for verifying the water quality prediction model by using the test set to obtain an optimal water quality prediction model;
and the prediction module is used for inputting the target to be measured into the optimal water quality prediction model to obtain a prediction result.
The contents in the above method embodiments are all applicable to the present system embodiment, the functions specifically implemented by the present system embodiment are the same as those in the above method embodiment, and the beneficial effects achieved by the present system embodiment are also the same as those achieved by the above method embodiment.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A sewage treatment water quality prediction method based on machine learning is characterized by comprising the following steps:
acquiring daily historical inlet water data of a sewage treatment plant and constructing an inlet water quality database;
dividing inlet water quality data in an inlet water quality database into a training set and a testing set;
constructing a water quality prediction model by utilizing a training set based on a five-fold cross validation method;
verifying the water quality prediction model by using the test set to obtain an optimal water quality prediction model;
and inputting the target to be measured into the optimal water quality prediction model to obtain a prediction result.
2. The method for predicting the water quality of sewage treatment based on machine learning according to claim 1, wherein the step of obtaining daily historical inlet water data of a sewage treatment plant and constructing an inlet water quality database specifically comprises:
acquiring daily historical inlet water data of a sewage treatment plant and extracting inlet water quality indexes;
classifying the extracted daily historical inlet water data according to inlet water quality indexes;
calculating the quartile interval of each type of water inlet quality index by utilizing a quartile algorithm;
removing abnormal values which are larger than a preset threshold value in each type of water inlet quality indexes;
and constructing a water inlet quality database according to the daily historical water inlet data after the abnormal values are removed.
3. The machine learning-based sewage treatment water quality prediction method according to claim 2, wherein the influent water quality indicators include flow, chemical oxygen demand, five-day biochemical oxygen demand, total nitrogen, total phosphorus, ammonia nitrogen, pH, chromaticity, and suspended solids concentration.
4. The sewage treatment water quality prediction method based on machine learning as claimed in claim 3, wherein the step of constructing the water quality prediction model by using the training set based on the five-fold cross validation method specifically comprises:
equally dividing the training set into 5 disjoint parts, selecting one part as a verification set, and taking the other four parts as the training set;
taking the biochemical oxygen demand of five days in the training set as a training dependent variable, taking other water quality indexes of inlet water in the training set as training independent variables, and obtaining a water quality prediction model by adopting a deep learning algorithm;
verifying the water quality prediction model by using a verification set to obtain an empirical error of the water quality prediction model;
re-selecting one part as a verification set and the other four parts as training sets to carry out circular training to obtain five water quality prediction models and corresponding experience errors;
and selecting a water quality prediction model with the minimum empirical error.
5. The machine learning-based sewage treatment water quality prediction method according to claim 4, further comprising:
and calculating a determined correlation coefficient of the water quality prediction model with the minimum empirical error, if the determined correlation coefficient is less than or equal to a preset value, readjusting each parameter of the water quality prediction model, and repeatedly performing five-fold cross validation to obtain the water quality prediction model meeting the expectation.
6. The sewage treatment water quality prediction method based on machine learning of claim 1, wherein the step of verifying the water quality prediction model by using the test set to obtain the optimal water quality prediction model specifically comprises:
taking the biochemical oxygen demand of five days in the concentrated test as an actual value, and inputting the water quality indexes of the rest of the inlet water into a water quality prediction model to obtain a predicted value of the biochemical oxygen demand of five days;
calculating a generalization error and determining a correlation coefficient according to the actual value of the biochemical oxygen demand in five days and the predicted value of the biochemical oxygen demand in five days;
and evaluating the water quality prediction model according to the generalization error and the determined correlation coefficient to obtain the optimal water quality prediction model.
7. The machine learning-based sewage treatment water quality prediction method according to claim 6, further comprising:
and if the correlation coefficient is smaller than the preset value, reconstructing the water quality prediction model.
8. A sewage treatment water quality prediction system based on machine learning is characterized by comprising:
the database construction module is used for acquiring daily historical inlet water data of the sewage treatment plant and constructing an inlet water quality database;
the data dividing module is used for dividing the water quality data of the inlet water in the inlet water quality database into a training set and a testing set;
the model construction module is used for constructing a water quality prediction model by utilizing a training set based on a five-fold cross validation method;
the verification module is used for verifying the water quality prediction model by using the test set to obtain an optimal water quality prediction model;
and the prediction module is used for inputting the target to be measured into the optimal water quality prediction model to obtain a prediction result.
CN202211112693.4A 2022-09-14 2022-09-14 Sewage treatment water quality prediction method and system based on machine learning Pending CN115470702A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211112693.4A CN115470702A (en) 2022-09-14 2022-09-14 Sewage treatment water quality prediction method and system based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211112693.4A CN115470702A (en) 2022-09-14 2022-09-14 Sewage treatment water quality prediction method and system based on machine learning

Publications (1)

Publication Number Publication Date
CN115470702A true CN115470702A (en) 2022-12-13

Family

ID=84333391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211112693.4A Pending CN115470702A (en) 2022-09-14 2022-09-14 Sewage treatment water quality prediction method and system based on machine learning

Country Status (1)

Country Link
CN (1) CN115470702A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115952685A (en) * 2023-02-02 2023-04-11 淮阴工学院 Sewage treatment process soft measurement modeling method based on integrated deep learning
CN116090678A (en) * 2023-04-11 2023-05-09 北京埃睿迪硬科技有限公司 Data processing method, device and equipment
CN116433041A (en) * 2023-02-17 2023-07-14 广州珠科院工程勘察设计有限公司 Integrated treatment method and system for small-basin water ecology
CN117059201A (en) * 2023-07-26 2023-11-14 佛山市南舟智能科技有限公司 Method, device, equipment and storage medium for predicting chemical oxygen demand of sewage
CN117174198A (en) * 2023-11-02 2023-12-05 山东鸿远新材料科技股份有限公司 Automatic detection cleaning method and system based on zirconium oxychloride production

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160092794A1 (en) * 2013-06-29 2016-03-31 Emc Corporation General framework for cross-validation of machine learning algorithms using sql on distributed systems
CN110598902A (en) * 2019-08-02 2019-12-20 浙江工业大学 Water quality prediction method based on combination of support vector machine and KNN
CN111291937A (en) * 2020-02-25 2020-06-16 合肥学院 Method for predicting quality of treated sewage based on combination of support vector classification and GRU neural network
CN111639111A (en) * 2020-06-09 2020-09-08 天津大学 Water transfer engineering-oriented multi-source monitoring data deep mining and intelligent analysis method
CN111768813A (en) * 2020-07-07 2020-10-13 扬州大学 Method for predicting organic PDMS membrane-water distribution coefficient based on SW-SVM algorithm quantitative structure-activity relationship model
CN112132333A (en) * 2020-09-16 2020-12-25 安徽泽众安全科技有限公司 Short-term water quality and water quantity prediction method and system based on deep learning
CN114242156A (en) * 2021-12-17 2022-03-25 厦门大学 Real-time prediction method and system for relative abundance of pathogenic vibrios on marine micro-plastic
CN114894725A (en) * 2022-03-21 2022-08-12 重庆邮电大学 Water quality multi-parameter spectral data Stacking fusion model and water quality multi-parameter measuring method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160092794A1 (en) * 2013-06-29 2016-03-31 Emc Corporation General framework for cross-validation of machine learning algorithms using sql on distributed systems
CN110598902A (en) * 2019-08-02 2019-12-20 浙江工业大学 Water quality prediction method based on combination of support vector machine and KNN
CN111291937A (en) * 2020-02-25 2020-06-16 合肥学院 Method for predicting quality of treated sewage based on combination of support vector classification and GRU neural network
CN111639111A (en) * 2020-06-09 2020-09-08 天津大学 Water transfer engineering-oriented multi-source monitoring data deep mining and intelligent analysis method
CN111768813A (en) * 2020-07-07 2020-10-13 扬州大学 Method for predicting organic PDMS membrane-water distribution coefficient based on SW-SVM algorithm quantitative structure-activity relationship model
CN112132333A (en) * 2020-09-16 2020-12-25 安徽泽众安全科技有限公司 Short-term water quality and water quantity prediction method and system based on deep learning
CN114242156A (en) * 2021-12-17 2022-03-25 厦门大学 Real-time prediction method and system for relative abundance of pathogenic vibrios on marine micro-plastic
CN114894725A (en) * 2022-03-21 2022-08-12 重庆邮电大学 Water quality multi-parameter spectral data Stacking fusion model and water quality multi-parameter measuring method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邓欢忠 等: "精确曝气流量控制系统在污水处理厂的应用", 给水排水, 31 December 2019 (2019-12-31), pages 51 - 54 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115952685A (en) * 2023-02-02 2023-04-11 淮阴工学院 Sewage treatment process soft measurement modeling method based on integrated deep learning
CN115952685B (en) * 2023-02-02 2023-09-29 淮阴工学院 Sewage treatment process soft measurement modeling method based on integrated deep learning
CN116433041A (en) * 2023-02-17 2023-07-14 广州珠科院工程勘察设计有限公司 Integrated treatment method and system for small-basin water ecology
CN116433041B (en) * 2023-02-17 2024-04-05 广州珠科院工程勘察设计有限公司 Integrated treatment method and system for small-basin water ecology
CN116090678A (en) * 2023-04-11 2023-05-09 北京埃睿迪硬科技有限公司 Data processing method, device and equipment
CN116090678B (en) * 2023-04-11 2023-06-02 北京埃睿迪硬科技有限公司 Data processing method, device and equipment
CN117059201A (en) * 2023-07-26 2023-11-14 佛山市南舟智能科技有限公司 Method, device, equipment and storage medium for predicting chemical oxygen demand of sewage
CN117174198A (en) * 2023-11-02 2023-12-05 山东鸿远新材料科技股份有限公司 Automatic detection cleaning method and system based on zirconium oxychloride production
CN117174198B (en) * 2023-11-02 2024-01-26 山东鸿远新材料科技股份有限公司 Automatic detection cleaning method and system based on zirconium oxychloride production

Similar Documents

Publication Publication Date Title
CN115470702A (en) Sewage treatment water quality prediction method and system based on machine learning
CN111291937A (en) Method for predicting quality of treated sewage based on combination of support vector classification and GRU neural network
Vanrolleghem et al. On-line monitoring equipment for wastewater treatment processes: state of the art
CN104376380B (en) A kind of ammonia nitrogen concentration Forecasting Methodology based on recurrence self organizing neural network
CN110186505B (en) Method for predicting standard reaching condition of rural domestic sewage treatment facility effluent based on support vector machine
CN108088974B (en) Soft measurement method for effluent nitrate nitrogen in anaerobic simultaneous denitrification methanogenesis process
CN102854296A (en) Sewage-disposal soft measurement method on basis of integrated neural network
CN103632032A (en) Effluent index online soft measurement prediction method in urban sewage treatment process
CN107402586A (en) Dissolved Oxygen concentration Control method and system based on deep neural network
CN112989704B (en) IRFM-CMNN effluent BOD concentration prediction method based on DE algorithm
CN112417765B (en) Sewage treatment process fault detection method based on improved teacher-student network model
CN111977710A (en) Industrial wastewater treatment system and method based on artificial intelligence
CN113325702B (en) Aeration control method and device
CN203772781U (en) Characteristic variable-based sewage total phosphorus measuring device
US20220316994A1 (en) A method for predicting operation effectiveness of decentralized sewage treatment facility by using support vector machine
CN115078667A (en) Industrial sewage discharge treatment on-line monitoring analysis early warning system based on internet of things technology
Pan et al. A new approach to estimating oxygen off-gas fraction and dynamic alpha factor in aeration systems using hybrid machine learning and mechanistic models
CN107665288A (en) A kind of water quality hard measurement Forecasting Methodology of COD
CN201330211Y (en) Working parameter self-optimizing simulation system for sewage treatment plant
KR101016394B1 (en) Real-time wastewater composition analyzer using a rapid microbial respiration detector, ss and ec combined sensing system and its measuring method
CN116681174A (en) Sewage treatment biochemical tank aeration quantity prediction method, system, equipment and medium
CN115754207A (en) Simulation method and system for biological sewage treatment process
CN115403226B (en) Factory network joint debugging control method, system and device for carbon source in balance system
CN107367476A (en) Assess method and system and its application in water process of the biodegradability of water
Schwarz et al. Dynamic alpha factor prediction with operating data-a machine learning approach to model oxygen transfer dynamics in activated sludge

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination