CN111310299B - Method for constructing predictive model of ozone digestion rate in PPCPs organic pollutant wastewater - Google Patents
Method for constructing predictive model of ozone digestion rate in PPCPs organic pollutant wastewater Download PDFInfo
- Publication number
- CN111310299B CN111310299B CN201911341369.8A CN201911341369A CN111310299B CN 111310299 B CN111310299 B CN 111310299B CN 201911341369 A CN201911341369 A CN 201911341369A CN 111310299 B CN111310299 B CN 111310299B
- Authority
- CN
- China
- Prior art keywords
- model
- ppcps
- ext
- ozone
- compound
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- CBENFWSGALASAD-UHFFFAOYSA-N Ozone Chemical compound [O-][O+]=O CBENFWSGALASAD-UHFFFAOYSA-N 0.000 title claims abstract description 53
- 238000000034 method Methods 0.000 title claims abstract description 45
- 230000029087 digestion Effects 0.000 title claims abstract description 34
- 239000002351 wastewater Substances 0.000 title claims abstract description 21
- 239000002957 persistent organic pollutant Substances 0.000 title claims abstract description 14
- 150000001875 compounds Chemical class 0.000 claims abstract description 59
- 238000012549 training Methods 0.000 claims abstract description 25
- 238000012512 characterization method Methods 0.000 claims abstract description 15
- 238000011156 evaluation Methods 0.000 claims abstract description 12
- 238000012795 verification Methods 0.000 claims abstract description 12
- 230000008569 process Effects 0.000 claims abstract description 9
- 238000013480 data collection Methods 0.000 claims abstract description 4
- 238000010200 validation analysis Methods 0.000 claims description 15
- 230000000694 effects Effects 0.000 claims description 13
- 238000006243 chemical reaction Methods 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 10
- 238000002790 cross-validation Methods 0.000 claims description 6
- 125000001997 phenyl group Chemical group [H]C1=C([H])C([H])=C(*)C([H])=C1[H] 0.000 claims description 6
- 230000010287 polarization Effects 0.000 claims description 6
- 239000002253 acid Substances 0.000 claims description 5
- 238000012216 screening Methods 0.000 claims description 5
- 230000002068 genetic effect Effects 0.000 claims description 4
- 230000010354 integration Effects 0.000 claims description 4
- 238000012417 linear regression Methods 0.000 claims description 4
- 229960000282 metronidazole Drugs 0.000 claims description 4
- VAOCPAMSLUNLGC-UHFFFAOYSA-N metronidazole Chemical compound CC1=NC=C([N+]([O-])=O)N1CCO VAOCPAMSLUNLGC-UHFFFAOYSA-N 0.000 claims description 4
- 238000000324 molecular mechanic Methods 0.000 claims description 3
- 230000002452 interceptive effect Effects 0.000 claims description 2
- 238000005457 optimization Methods 0.000 claims description 2
- 239000000126 substance Substances 0.000 abstract description 26
- 238000004617 QSAR study Methods 0.000 abstract description 7
- 239000010865 sewage Substances 0.000 abstract description 7
- 230000007613 environmental effect Effects 0.000 abstract description 6
- 230000009286 beneficial effect Effects 0.000 abstract description 4
- 230000007246 mechanism Effects 0.000 abstract description 4
- 238000010276 construction Methods 0.000 abstract description 3
- 239000000463 material Substances 0.000 abstract description 2
- 230000001105 regulatory effect Effects 0.000 abstract description 2
- 238000004364 calculation method Methods 0.000 abstract 1
- -1 saves manpower Substances 0.000 abstract 1
- TUJKJAMUKRIRHC-UHFFFAOYSA-N hydroxyl Chemical compound [OH] TUJKJAMUKRIRHC-UHFFFAOYSA-N 0.000 description 8
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 6
- 230000001186 cumulative effect Effects 0.000 description 4
- HPHUVLMMVZITSG-LURJTMIESA-N levetiracetam Chemical compound CC[C@@H](C(N)=O)N1CCCC1=O HPHUVLMMVZITSG-LURJTMIESA-N 0.000 description 4
- ADIMAYPTOBDMTL-UHFFFAOYSA-N oxazepam Chemical compound C12=CC(Cl)=CC=C2NC(=O)C(O)N=C1C1=CC=CC=C1 ADIMAYPTOBDMTL-UHFFFAOYSA-N 0.000 description 4
- 230000003647 oxidation Effects 0.000 description 3
- 238000007254 oxidation reaction Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- FFGPTBGBLSHEPO-UHFFFAOYSA-N carbamazepine Chemical compound C1=CC2=CC=CC=C2N(C(=O)N)C2=CC=CC=C21 FFGPTBGBLSHEPO-UHFFFAOYSA-N 0.000 description 2
- 239000002537 cosmetic Substances 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000000556 factor analysis Methods 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 229960004002 levetiracetam Drugs 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 229960004535 oxazepam Drugs 0.000 description 2
- 150000007513 acids Chemical class 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 239000003242 anti bacterial agent Substances 0.000 description 1
- 229940124599 anti-inflammatory drug Drugs 0.000 description 1
- 229940088710 antibiotic agent Drugs 0.000 description 1
- 239000002876 beta blocker Substances 0.000 description 1
- 229940097320 beta blocking agent Drugs 0.000 description 1
- 238000009395 breeding Methods 0.000 description 1
- 230000001488 breeding effect Effects 0.000 description 1
- 229960000623 carbamazepine Drugs 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000011109 contamination Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 239000000824 cytostatic agent Substances 0.000 description 1
- 230000001085 cytostatic effect Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 230000005684 electric field Effects 0.000 description 1
- 150000002632 lipids Chemical class 0.000 description 1
- 244000144972 livestock Species 0.000 description 1
- 239000002207 metabolite Substances 0.000 description 1
- 239000000820 nonprescription drug Substances 0.000 description 1
- 239000000955 prescription drug Substances 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 229940125723 sedative agent Drugs 0.000 description 1
- 239000000932 sedative agent Substances 0.000 description 1
- 241000894007 species Species 0.000 description 1
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Treatment Of Water By Oxidation Or Reduction (AREA)
Abstract
A method for constructing an ozone digestion rate prediction model in PPCPs organic pollutant wastewater comprises the following steps: data collection, training set setting and verification set sample compounds, descriptor calculation, model construction, and characterization and evaluation of the model. The ozone digestion rate prediction model in the organic pollutant wastewater constructed by the method can accurately predict the ozone digestion rate in the PPCPs chemical organic pollutant wastewater, saves manpower, material resources and time, is simple, quick and effective, interprets the ozone digestion PPCPs mechanism from the molecular descriptor structure strictly according to the QSAR model usage rule regulated by OECD, is beneficial to the adjustment of process parameters such as the ozone dosage of the PPCPs removal process of a sewage treatment plant, and has important significance for the risk management and control of the PPCPs and the environmental safety.
Description
Technical Field
The invention belongs to the technical field of environmental protection, and particularly relates to a method for predicting ozone digestion rate in PPCPs organic pollutant wastewater.
Background
The contamination of water environments by drugs and personal care products (Pharmaceuticals and Personal Care Products, PPCPs) is of increasing concern. PPCPs include various prescription and over-the-counter drugs such as antibiotics, anti-inflammatory drugs, sedatives, lipid modulators, beta-blockers, cytostatics, and cosmetic and other personal care products and their respective metabolites, and the like. PPCPs are widely used in industries such as human medical treatment, health care products, cosmetics, aquatic products, livestock breeding and the like, and the use of the PPCPs in a large amount causes the PPCPs to inevitably and continuously enter water environment. Although their concentration in water environments is very low (ng/L to μg/L), they are increasingly enriched with food chains and food nets, thereby endangering the ecological environment and human health.
The sewage treatment stage is the last gateway of PPCPs entering the environment, and the removal efficiency of the PPCPs is directly related to the exposure concentration of the PPCPs in the environment, thereby influencing the environmental risk. In recent years, ozone oxidation is used as a pretreatment for breaking macromolecular organic matters before biochemical treatment or a advanced treatment and upgrading method for biochemical drainage, and is gradually applied to the advanced oxidation advanced treatment field of sewage treatment. Therefore, the evaluation of the removal efficiency of the PPCPs by the ozone treatment in the sewage is an important content of the environmental risk evaluation of the PPCPs, is also beneficial to the adjustment of process parameters such as the ozone dosage of the PPCPs removal process of sewage treatment plants, and has important significance on the risk management and the environmental safety of the PPCPs.
The current evaluation of the removal efficiency of PPCPs by ozone in water can only be determined by means of tests. Because PPCPs are more in variety, and the method for analyzing the wastewater needs higher cost for establishment and measurement, time and labor are wasted, the test for measuring the removal efficiency of the PPCPs is not very feasible.
The mechanism prediction method is a better alternative method. The chemical reacts with ozone to follow a secondary kinetic equation, and the ozone reaction kinetic equation is established to predict the ozone reaction removal efficiency of the chemical:
in the formula, [ C ]] t -concentration of chemical at time t, mg/L;
[C] 0 -concentration of chemical at time 0, mg/L;
[C] t /[C] 0 -the residual rate of the chemical at time t, dimensionless;
k O3 -reaction of chemicals with ozoneResponse rate, M -1 ·s -1 ;
k ·OH -reaction rate of chemical with hydroxyl radical, M -1 ·s -1 ;
∫[O 3 ]dt, the cumulative exposure concentration of ozone, M.s;
and [. OH ] dt-cumulative exposure concentration of hydroxyl radicals, M.s.
There are 4 main parameters of the chemical removal rate calculated according to the above formula, in which the ozone cumulative exposure concentration (+[ O ] 3 ]dt) and the cumulative exposure concentration of hydroxyl radicals (+.OH)]dt) is related to the amount of ozone additive in the wastewater removal process. Reaction Rate of ozone (k) O3 ) Reaction Rate with hydroxyl radical (k) ·OH ) Is a property of the chemical itself.
As can be seen from the above formula, the reaction rate (k) of ozone of PPCPs is known O3 ) Reaction Rate with hydroxyl radical (k) ·OH ) The removal efficiency of PPCPs can be calculated.
Some predictive organic chemicals k are at home and abroad O3 Is a predictive model of (Sairam Sudhakaran. QSAR models for oxidation of organic micropollutants in water based on ozone and hydroxyl radical rate constants and their chemical classification. Water Research,2013, 47:1111-1122), but these predictive models are broad-spectrum models, not specific to the k of PPCPs O3 Is a predictive model of (a). Broad spectrum models sacrifice accuracy of prediction of a particular class of chemicals in order to fit most chemicals, and therefore their k to PPCPs O3 The prediction accuracy is low.
The present study establishes k for PPCPs O3 The applicability of the PPCPs is greatly enhanced, and the accuracy of the PPCPs prediction result is improved.
Disclosure of Invention
The invention establishes a method for constructing a model for predicting the ozone reaction rate in PPCPs organic pollutant wastewater, which is used for predicting and evaluating the PPCPs removal efficiency of a sewage treatment plant in a chemical industry park. The method comprises the following steps:
(1) Data collection, set training set and validation set sample compounds
The second-order reaction rate (k) of ozone in 50 PPCPs wastewater is inquired from the literature O3 ph 8.5) data; from environ. Sci. Technology.2013, 47,5872-5881. The training set and the verification set are set to obtain sample compounds, wherein the training set is used for selecting 37 sample compounds, and the verification set is used for selecting 13 sample compounds. The training set samples are structurally diversified as much as possible, and the active coverage range is as large as possible, so that the application range of the model is wide, and the prediction capability is strong. The validation set is then used to evaluate the predictive power of the built model, contained within the descriptor space of the training set.
(2) Computing descriptors
Pre-optimizing a compound structure by adopting MM+ molecular mechanics in Hyperchem 7.0 software, optimizing the compound structure by a semi-empirical AM1 method, calculating descriptors by using Dragon 5.4 software based on the optimized structure, and primarily screening the calculated 1664 descriptors, namely removing constant terms, approximate constant terms and 704 molecular descriptors with high correlation (the correlation coefficient between two molecular descriptors with the correlation coefficient being larger than 0.98 and a target value is smaller), extracting 19 factors (the cumulative contribution rate reaches 90%) based on principal component factor analysis (SPSS 19.0), and screening the descriptors with the contribution rate being larger than 0.7 in each factor to obtain 66 important molecular descriptors.
X=AF+e (1)
Wherein a= (a ij ),a ij The factor load is the correlation coefficient of the ith variable and the jth factor; a is a factor loading matrix, F is a common factor of X, e is a special factor of X, and X is a descriptor matrix.
(3) Model construction
Variable selection is carried out by adopting a Genetic Algorithm (GA) in MobyDigs software, and the related parameters of the GA are as follows: the population number is 100, the variation probability is 0.5, the maximum allowed feature number in the model is 10, the evaluation function is leave-one-out interactive verification (LOO-CV), and other parameters are all default values. When the influence of the increased variable number on the result is not great, the optimal parameter number is obtained. Based on the screened variables, a Multiple Linear Regression (MLR) method is adopted to build a prediction model, namely a GA-MLR model.
logY=-15.61706-1.99793AMW+90.15638Mp-8.38954AAC-3.00276Eeig01r+181.24166JGI5-0.55738nCbH (2)
Wherein Y represents k O3 The values AMW represents the average molecular weight, mp represents the average atomic polarization degree, AAC represents the atomic average information index, JGI5 represents the topological charge index, eeig01r represents the characteristic value of the bond energy conjugate integration matrix, and nCbH represents the number of unsubstituted C benzene rings.
(4) Characterization and evaluation of models
According to OECD guidelines for QSAR models, internal (goodness of fit and robustness assessment) and external (predictive ability assessment) validation of the constructed model is required. The goodness of fit is the proportion of the variation information which can be interpreted by the model in the total variation information of the independent variables. The square of the correlation coefficient between the corrected experimental value and the fitted value (R 2 adj ) Root Mean Square Error (RMSE) to characterize the goodness of fit of the model:
wherein n represents the number of compounds, m is the number of predicted variables, y i Andrespectively representing an experimental value and a predicted value of the activity index of the ith compound; />Is the average value of the experimental values of the compound activity index.
Cross-validation coefficient (Q) by one-step method 2 LOO ) And Bootstrapping method (Q) 2 BOOT ) Characterization of stability of the model:
wherein,mean values of the experimental values of the activity indexes of the compounds in the training set are shown. Bootstrapping method was repeated 5000 times with 1/5 cross-validation.
Using externally verified correlation coefficients Q 2 EXT ,R 2 EXT ,RMSE EXT Characterization of model predictive capabilities:
wherein n is EXT Represents the number of compounds in the validation set,mean values of experimental and predicted values of the compound activity index of the validation set are shown.
And (3) obtaining characterization and evaluation parameters of the model:
n tr =37,R 2 adj =0.709,Q 2 LOO =0.658,Q 2 BOOT =0.687,RMSE tr =1.19
n EXT =13,R 2 EXT =0.628,Q 2 EXT =0.604,RMSE EXT =0.928
wherein n is tr And n EXT The number of compounds in the training set and the validation set, respectively, p being the significance level. R is R 2 adj Is a determination coefficient corrected by the degree of freedom; RMSE is root mean square error; q (Q) 2 LOO Cross-validating the coefficients for a one-way removal; q (Q) 2 BOOT Verifying coefficients for a Bootstrapping method; r is R 2 EXT For the correlation coefficient of experimental value and predicted value, Q 2 EXT Determining coefficients for external verification, RMSE EXT For verification set root mean square errorAnd (3) difference.
Golbraikh et al research considers that the QSAR model acceptable criteria is Q 2 >0.50 and R 2 >0.60. The result shows that the model has better prediction capability.
Furthermore, unknown compound k is obtained by calculating AMW, mp, AAC, JGI, eeig01r and nCbH 6 descriptors through Dragon software through the input of molecular structure and structure optimization process of the unknown compound and utilizing the prediction model O3 Is a predicted value of (a).
Further, the method is applicable to ozone digestion rate prediction of the compounds metronidazole and Li Tailin acid.
Further, the mechanism explanation is performed on the model: based on Dragon descriptors, a quantitative prediction model of PPCPs organic chemical ozone digestion rate constants is constructed by adopting principal component analysis and genetic algorithm-multiple linear regression algorithm, and research results show that: average Molecular Weight (AMW), average atomic polarization degree (Mp), atomic average information index (AAC), topological charge index (JGI 5), eeig01r (representing characteristic value of bond energy conjugate integration matrix), and number of unsubstituted C benzene rings (nCbH) have significant influence on the ozone digestion rate of PPCPs class compound, as shown in table 1. AMW, AAC, eeig01r and nCbH are inversely related to the ozone digestion rate, i.e. the greater the average molecular weight, atomic average information index, bond energy conjugate integration matrix eigenvalue and benzene ring unsubstituted C value, the lesser the ozone digestion rate. The Mp is positively correlated with the ozone digestion rate, and the greater the relative degree of change of electron cloud forming a bond under the action of an external electric field, the smaller the electronegativity, the greater the polarization degree of the bond and the greater the ozone digestion effect. JGI5 reflects the 5 th order topological charge average index of the compound, with greater topological charge index values and greater ozone digestion rates.
TABLE 1 six parameters in regression model and their physicochemical meanings and coefficients in model
Variable(s) | Descriptor meaning | Regression coefficient | Regression coefficient deviation | Standard regression coefficient |
Constant (constant) | -15.61706 | 7.96968 | ||
AMW | Average molecular weight | -1.99793 | 0.34868 | -1.86685 |
Mp | Average atomic polarization degree | 90.15638 | 17.43372 | 1.75023 |
AAC | Atomic average information index | -8.38954 | 1.81277 | -0.78161 |
EEig01r | Key energy conjugate integral matrix eigenvalue | -3.00276 | 1.08265 | -0.29843 |
JGI5 | 5 th order topological charge average index | 181.24166 | 26.98228 | 0.7957 |
nCbH | Benzene ring unsubstituted C (sp 2) | -0.55738 | 0.14196 | -0.6387 |
The invention has the beneficial effects that: the ozone digestion rate prediction model in the organic pollutant wastewater constructed by the method can accurately predict the ozone digestion rate in the PPCPs chemical organic pollutant wastewater, enhances the applicability of the prediction of the ozone digestion rate in the PPCPs wastewater, improves the accuracy of a prediction result, saves manpower, material resources and time, is simple, quick and effective, interprets the ozone digestion PPCPs mechanism from a molecular descriptor structure strictly according to the QSAR model usage rules regulated by OECD, is beneficial to the adjustment of process parameters such as the ozone dosage of the PPCPs removal process of a sewage treatment plant, and has important significance for the risk management and the environmental safety of the PPCPs.
Drawings
FIG. 1 application domain Williams of PPCPs-type chemical ozone digestion rate model.
Fig. 2 is a graph of a euclidean distance based model application domain representation.
FIG. 3 is a graph showing the fit of the predicted value and experimental value of the MLR model for ozone digestion rate of PPCPs-type chemicals.
Detailed Description
The invention will be better understood from the following examples. However, it will be readily appreciated by those skilled in the art that the description of the embodiments is provided for illustration only and should not limit the invention as described in detail in the claims.
Example 1
The specific steps of constructing the model for predicting the digestion rate of ozone in PPCPs organic pollutant wastewater are as follows:
(1) Data collection, set training set and validation set sample compounds
Data of ozone second-order reaction rate (kO 3, pH 8.5) in 50 PPCPs wastewater are inquired from the literature; from environ. Sci. Technology.2013, 47,5872-5881. The training set selects a total of 37 sample compounds and the validation set selects a total of 13 sample compounds.
(2) Computing descriptors
Pre-optimizing the structure of the compound by MM+ molecular mechanics in Hyperchem 7.0 software, optimizing the structure of the compound by a semi-empirical AM1 method, calculating descriptors by using Dragon 5.4 software based on the optimized structure, primarily screening the calculated 1664 descriptors, and obtaining 66 molecular descriptors based on principal component factor analysis (SPSS 19.0).
X=AF+e (1)
Wherein a= (a ij ),a ij The factor load represents the correlation coefficient of the ith variable and the jth factor.
(3) Model construction
Variable selection is carried out by adopting a Genetic Algorithm (GA) in MobyDigs software, and a prediction model, namely a GA-MLR model, is established by adopting a Multiple Linear Regression (MLR) method based on the screened variable:
logY=-15.61706-1.99793AMW+90.15638Mp-8.38954AAC-3.00276Eeig01r+181.24166JGI5-0.55738nCbH (2)
wherein Y represents k O3 The value AMW represents the average molecular weight, MP represents the average atomic polarization degree, AAC represents the atomic average informationThe index, JGI5, eeig01r, the characteristic value of bond energy conjugate integral matrix, and nCbH, the number of unsubstituted C benzene rings.
(4) Characterization and evaluation of models
According to OECD guidelines for QSAR models, internal (goodness of fit and robustness assessment) and external (predictive ability assessment) validation of the constructed model is required. The square of the correlation coefficient between the corrected experimental value and the fitted value (R 2 adj ) Root Mean Square Error (RMSE) to characterize the goodness of fit of the model:
wherein n represents the number of compounds, m is the number of predicted variables, y i Andrespectively representing an experimental value and a predicted value of the activity index of the ith compound; />Is the average value of the experimental values of the compound activity index.
Cross-validation coefficient (Q) by one-step method 2 LOO ) And Bootstrapping method (Q) 2 BOOT ) Characterization of stability of the model:
wherein,mean values of the experimental values of the activity indexes of the compounds in the training set are shown. Bootstrapping method employs 1/5 cross-validation,repeated 5000 times.
Using externally verified correlation coefficients Q 2 EXT ,R 2 EXT ,RMSE EXT Characterization of model predictive capabilities:
wherein n is EXT Represents the number of compounds in the validation set,mean values of experimental and predicted values of the compound activity index of the validation set are shown.
And (3) obtaining characterization and evaluation parameters of the model:
n tr =37,R 2 adj =0.709,Q 2 LOO =0.658,Q 2 BOOT =0.687,RMSE tr =1.19
n EXT =13,R 2 EXT =0.628,Q 2 EXT =0.604,RMSE EXT =0.928
wherein n is tr And n EXT The number of compounds in the training set and the validation set, respectively, p being the significance level. R is R 2 adj Is a determination coefficient corrected by the degree of freedom; RMSE is root mean square error; q (Q) 2 LOO Cross-validating the coefficients for a one-way removal; q (Q) 2 BOOT Verifying coefficients for a Bootstrapping method; r is R 2 EXT For the correlation coefficient of experimental value and predicted value, Q 2 EXT Determining coefficients for external verification, RMSE EXT Root mean square error is the validation set.
The result shows that the model has better prediction capability and robustness.
Example 2
The present embodiment performs application domain characterization on the above-described prediction model. Application domains of the model are defined by using Euclidean distance method and a level-based Williams diagram. Euclidean distance was calculated using AMBIT Discovery v0.04 software (http:// ambit. Sourceforge. Net/download_ambit discovery. Html). The euclidean distance is calculated by the following formula:
where μ is the mean of the descriptor x.
The Williams plot is constructed from the standard residual error (delta) and the leverage value (in h i Representing i represents different compounds) defines a model application domain. Delta is calculated using the following formula:
lever value of training set compounds (level, h i ) The following equation may be used to determine:
h i = x i T (X T X) –1 x i (9)
wherein x is i Is the row vector of the molecular structure descriptor of the ith compound. Alert value (h) * ) The definition is as follows:
h * = 3(k + 1)/n (10)
where k is the number of descriptors and n is the number of training sets.
The model application domain characterization results are shown in fig. 1 and fig. 2. H in FIG. 1 * =3 (k+1)/n=3 (6+1)/37=0.568. The Williams diagram ordinate characterizes the degree of dispersion of experimental values by standard residuals of experimental values and predicted values, and is considered as an outlier when the absolute value of the standard residual delta of the compound is greater than 3.0. The abscissa represents compound h in the training set i Value of h i Above the alert value (h=0.568), this indicates that fewer substructures of the substance occur in the training set, which can have a significant impact on model predictions.
As can be seen, the lever value h of 1 compound exceeds the guard lever value h * The structure of the compound is different from that of the compound in the training set, butStandard residuals are all in the (-3, +3) range, indicating that the model is applicable to carbamazepine (CAS: 298-46-4). Standard residuals of both oxazepam (CAS: 604-75-1) and levetiracetam (CAS: 102767-28-2), which fall outside the (-3, +3) range, are outliers, indicating that the model is not suitable for prediction of these two species, while the remaining compounds are considered log k O3 Can be well predicted.
The application domain of the model is characterized based on Euclidean distance method. Fig. 2 is a euclidean distance map. The Euclidean distance from the feature vector of the training set compound to the feature vector of the central point ranges from 0.132 to 1.088, so that the compound with the feature vector Euclidean distance not more than 1.088 is suitable for the model. The model verifies that the Euclidean distance of the concentrated compound is less than 1.088, and the concentrated compound is in the application domain of the model.
Example 3
The ozone digestion rate in the wastewater of 50 PPCPs organic pollutants is predicted by using the model constructed in the example 1, and the results are shown in Table 2. R of model 2 adj =0.709, indicating that the model has a strong fitting ability. Q (Q) 2 LOO =0.658,Q 2 BOOT =0.687, indicating that the model is more robust. R is R 2 EXT =0.628,Q 2 EXT =0.604, golbraikh et al study considers that the acceptable standard for the QSAR model is Q2>0.50 and R2>0.60. The result shows that the model has better prediction capability and can be successfully applied to compounds outside the training set. FIG. 3 is a graph showing the fit of the predicted values and experimental values of the GA-MLR model of the ozone digestion rate of PPCPs chemical, and it can be seen from FIG. 3 that the predicted values and experimental values of most of the substances fit well, the deviation of the experimental values and predicted values of oxazepam (CAS: 604-75-1) and levetiracetam (CAS: 102767-28-2) is large, the prediction is poor, and the rest of the substances can be predicted well.
Example 4
Using the model constructed in example 1, the ozone digestion rate of metronidazole (SMILES: o= [ n+ ] ([ O- ])/c=c (nccscc1=cc=c (O1) CN (C)/NC) was predicted. First, 6 descriptors AMW, MP, AAC, eeig01r, JGI5, nCbH, 7.83,0.59,1.812,4.455,0.024,0, hat, 0.505, euclidean distance, 0.532, are calculated using Dragon software according to chemical molecular structure, within the range of the model application domain.
logY=-15.61706-1.99793*(12.06)+90.15638*(0.8)-8.38954*(1.692)-3.00276*(4.31)+181.24166*(0.025)-0.55738*(6)=-2.29
The metronidazole ozone digestion rate (k) was predicted to be <1, the experimental value was <1, and it was close to the experimental determination result.
Example 5
Using the model constructed in example 1, the ozone digestion rate of Li Tailin acids (SMILES: o=c (O) C (c2ccccn2) c1=cc=c1,) was predicted. Firstly, according to the molecular structure of chemical substances, using Dragon software to calculate 6 descriptors AMW, mp, AAC, eeig01r, JGI5 and nCbH; 6.65, 0.8, 1.692, 4.31, 0.025 and 6, respectively. Hat is 0.213, euclidean distance is 0.427, and within the application domain range of the model, the model can be used for predicting Li Tailin acid ozone digestion rate, and descriptor values are substituted into the model to obtain the following components:
logY=-15.61706-1.99793*(12.06)+90.15638*(0.8)-8.38954*(1.692)-3.00276*(4.31)+181.24166*(0.025)-0.55738*(6)=3.37
then Li Tailin acid ozone digestion rate (k) is predicted to be 4.47 x 10 3 Experimental value was 2.1X10 4 Is close to the test measurement result.
TABLE 2 results of experimental and predicted values of ozone digestion rates in 50 PPCPs-type organic chemical wastewater
Claims (3)
- The method for constructing the prediction model of the ozone digestion rate in the PPCPs organic pollutant wastewater is characterized by comprising the following steps of:step one, data collection, namely setting a training set and a verification set sample compound;step two, calculating descriptors;thirdly, constructing a model;step four, characterization and evaluation of the model;the data in the first step are ozone second-order reaction rate k of 50 PPCPs wastewater at pH8.5 O3 Data, training set selecting 37 sample compounds, verification set selecting 13 sample compounds;pre-optimizing a compound structure by adopting MM+ molecular mechanics in Hyperchem 7.0 software, optimizing the compound structure by a semi-empirical AM1 method, calculating descriptors by using Dragon 5.4 software based on the optimized structure, and primarily screening the calculated 1664 descriptors, namely removing constant terms, approximate constant terms and 704 highly relevant molecular descriptors, analyzing SPSS19.0 based on principal component factors to extract 19 factors with accumulated contribution rate reaching 90%, and screening descriptors with contribution rate greater than 0.7 in each factor to obtain 66 important molecular descriptors:X=AF+e (1)wherein a= (a ij ),a ij The factor load is the correlation coefficient of the ith variable and the jth factor; a is a factor load matrix, F is a common factor of X, e is a special factor of X, and X is a descriptor matrix;step three, adopting genetic algorithm GA in MobyDigs software to select variables, wherein the related parameters of the GA are as follows: the population is 100, the variation probability is 0.5, the maximum allowed feature number in the model is 10, the evaluation function is LOO-CV (LOO-CV) interactive verification by a leave-one-out method, and other parameters are all default values; when the influence of the increased variable number on the result is not great, obtaining the optimal parameter number; based on the screened variables, a predictive model is established by adopting a multiple linear regression method, and 6 molecular descriptors and models are screened as follows:GLA-MLR linear equation:logY=-15.61706-1.99793AMW+90.15638Mp-8.38954AAC-3.00276Eeig01r+181.24166JGI5-0.55738nCbH (2)wherein Y represents k O3 Values, AMW represents average molecular weight, mp represents average atomic polarization degree, AAC represents atomic average information index, JGI5 represents topological charge index, eeig01r represents bond energy conjugate integration matrix eigenvalue, nCbH represents unsubstituted number of C benzene rings;step four, adopting a correlation coefficient square R between the corrected experimental value and the fitting value 2 adj The root mean square error RMSE to characterize the goodness of fit of the model:wherein n represents the number of compounds, m is the number of predicted variables, y i Andrespectively representing an experimental value and a predicted value of the activity index of the ith compound; />Is the average value of the experimental values of the compound activity index;by removing cross-validation factor Q 2 LOO And Bootstrapping method Q 2 BOOT Characterization of stability of the model:wherein,the average value of the experimental values of the activity indexes of the compound in the training set is represented, 1/5 of cross validation is adopted by the Bootstrapping method, and the training set is repeated for 5000 times;using externally verified correlation coefficients Q 2 EXT ,R 2 EXT ,RMSE EXT Characterization of model predictive capabilities:wherein n is EXT Represents the number of compounds in the validation set,representing the average value of the experimental value and the predicted value of the activity index of the compound in the verification set;and (3) obtaining characterization and evaluation parameters of the model:n tr =37,R 2 adj =0.709,Q 2 LOO =0.658,Q 2 BOOT =0.687,RMSE tr =1.19;n EXT =13,R 2 EXT =0.628,Q 2 EXT =0.604,RMSE EXT =0.928wherein n is tr For the number of training sets, R 2 adj Is a determination coefficient corrected by the degree of freedom; RMSE tr Root mean square error for training set; q (Q) 2 LOO Cross-validating the coefficients for a one-way removal; q (Q) 2 BOOT Verifying coefficients for a Bootstrapping method; r is R 2 EXT For the correlation coefficient of experimental value and predicted value, Q 2 EXT Determining coefficients for external verification, RMSE EXT Root mean square error is the validation set.
- 2. The method for constructing an ozone digestion rate prediction model in PPCPs organic pollutant wastewater according to claim 1, wherein unknown compounds are subjected to molecular structure input and structure optimization processes, AMW, mp, AAC, JGI, eeig01r and nCbH 6 descriptors are calculated through Dragon software, and the prediction is utilizedModel to obtain unknown compound k O3 Is a predicted value of (a).
- 3. The method for constructing the prediction model of the ozone digestion rate in the PPCPs organic pollutant wastewater, which is characterized in that: the method is used for ozone digestion rate prediction of the compounds metronidazole and Li Tailin acid.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911341369.8A CN111310299B (en) | 2019-12-24 | 2019-12-24 | Method for constructing predictive model of ozone digestion rate in PPCPs organic pollutant wastewater |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911341369.8A CN111310299B (en) | 2019-12-24 | 2019-12-24 | Method for constructing predictive model of ozone digestion rate in PPCPs organic pollutant wastewater |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111310299A CN111310299A (en) | 2020-06-19 |
CN111310299B true CN111310299B (en) | 2024-03-19 |
Family
ID=71161494
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911341369.8A Active CN111310299B (en) | 2019-12-24 | 2019-12-24 | Method for constructing predictive model of ozone digestion rate in PPCPs organic pollutant wastewater |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111310299B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113393907B (en) * | 2021-07-20 | 2023-05-02 | 西安交通大学 | PPCPs organic pollutant degradation rate prediction model construction method and device |
CN117236528B (en) * | 2023-11-15 | 2024-01-23 | 成都信息工程大学 | Ozone concentration forecasting method and system based on combined model and factor screening |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102507630A (en) * | 2011-11-30 | 2012-06-20 | 大连理工大学 | Method for forecasting oxidation reaction rate constant of chemical substance and ozone based on molecular structure and environmental temperature |
-
2019
- 2019-12-24 CN CN201911341369.8A patent/CN111310299B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102507630A (en) * | 2011-11-30 | 2012-06-20 | 大连理工大学 | Method for forecasting oxidation reaction rate constant of chemical substance and ozone based on molecular structure and environmental temperature |
Non-Patent Citations (1)
Title |
---|
范德玲等."有机化学品与臭氧反应速率常数的定量预测模型研究".《生态与农村环境学报》.2019,第35卷(第9期),第1214-1218页. * |
Also Published As
Publication number | Publication date |
---|---|
CN111310299A (en) | 2020-06-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111310299B (en) | Method for constructing predictive model of ozone digestion rate in PPCPs organic pollutant wastewater | |
Appiani et al. | On the use of hydroxyl radical kinetics to assess the number-average molecular weight of dissolved organic matter | |
Chakraborty et al. | Predicting soil arsenic pools by visible near infrared diffuse reflectance spectroscopy | |
Chen et al. | Ecotoxicological QSAR study of fused/non-fused polycyclic aromatic hydrocarbons (FNFPAHs): Assessment and priority ranking of the acute toxicity to Pimephales promelas by QSAR and consensus modeling methods | |
Loeffler et al. | Determination of non-extractable residues in soils: Towards a standardised approach | |
Tahmasbian et al. | Using laboratory-based hyperspectral imaging method to determine carbon functional group distributions in decomposing forest litterfall | |
Moore et al. | Underestimation of sector-wide methane emissions from United States wastewater treatment | |
Imoto et al. | Comparison of the impacts of the experimental parameters and soil properties on the prediction of the soil sorption of Cd and Pb | |
Ilyas et al. | Prediction of the removal efficiency of emerging organic contaminants based on design and operational parameters of constructed wetlands | |
Li et al. | Genetic algorithm (GA)-Artificial neural network (ANN) modeling for the emission rates of toxic volatile organic compounds (VOCs) emitted from landfill working surface | |
CN111261238A (en) | Construction method of PPCPs organic chemical mesophilic anaerobic digestion removal rate prediction model | |
Yassin et al. | Geochemical and Spatial Distribution of Topsoil HMs Coupled with Modeling of Cr Using Chemometrics Intelligent Techniques: Case Study from Dammam Area, Saudi Arabia | |
Moufid et al. | Pollution parameters evaluation of wastewater collected at different treatment stages from wastewater treatment plant based on E-nose and E-tongue systems combined with chemometric techniques | |
Onufrak et al. | The missing metric: an evaluation of fungal importance in wetland assessments | |
Zhang et al. | Chemical Space Covered by Applicability Domains of Quantitative Structure–Property Relationships and Semiempirical Relationships in Chemical Assessments | |
Baek et al. | Analysis of micropollutants in a marine outfall using network analysis and decision tree | |
Nguyen et al. | Estimating ammonium changes in pilot and full-scale constructed wetlands using kinetic model, linear regression, and machine learning | |
Langford et al. | Evaluation of the efficacy of SIFT-MS for speciation of wastewater treatment plant odors in parallel with human sensory analysis | |
Sohn et al. | Non-specific conducting polymer-based array capable of monitoring odour emissions from a biofiltration system in a piggery building | |
Rajabi et al. | QSAR models for predicting aquatic toxicity of esters using genetic algorithm-multiple linear regression methods | |
Spigno et al. | Development of Hybrid Models for a Vapor‐Phase Fungi Bioreactor | |
CN114864015A (en) | Water eutrophication detection method, device, equipment and storage medium | |
Kamran Haghighi et al. | Modeling on transition of heavy metals from Ni–Cd zinc plant residue using artificial neural network | |
LU502703B1 (en) | Method for predicting ozone digestion rate of PPCPS-Type organic chemical in wastewater treatment | |
Facchin et al. | Simultaneous determination of lead and sulfur by energy‐dispersive x‐ray spectrometry. Comparison between artificial neural networks and other multivariate calibration methods |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |