CN111310299A - Method for constructing model for predicting ozone digestion rate in PPCPs organic pollutant wastewater - Google Patents

Method for constructing model for predicting ozone digestion rate in PPCPs organic pollutant wastewater Download PDF

Info

Publication number
CN111310299A
CN111310299A CN201911341369.8A CN201911341369A CN111310299A CN 111310299 A CN111310299 A CN 111310299A CN 201911341369 A CN201911341369 A CN 201911341369A CN 111310299 A CN111310299 A CN 111310299A
Authority
CN
China
Prior art keywords
model
ppcps
ozone
ext
organic pollutant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911341369.8A
Other languages
Chinese (zh)
Other versions
CN111310299B (en
Inventor
范德玲
周林军
汪贞
郭敏
王蕾
古文
刘济宁
石利利
刘明庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Institute of Environmental Sciences MEE
Original Assignee
Nanjing Institute of Environmental Sciences MEE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Institute of Environmental Sciences MEE filed Critical Nanjing Institute of Environmental Sciences MEE
Priority to CN201911341369.8A priority Critical patent/CN111310299B/en
Publication of CN111310299A publication Critical patent/CN111310299A/en
Application granted granted Critical
Publication of CN111310299B publication Critical patent/CN111310299B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Treatment Of Water By Oxidation Or Reduction (AREA)

Abstract

A method for constructing a model for predicting the ozone digestion rate in PPCPs organic pollutant wastewater comprises the following steps: data collection, training set and verification set sample compounds setting, descriptor calculation, model construction, model characterization and evaluation. The ozone digestion rate prediction model in the organic pollutant wastewater constructed by the method can accurately predict the ozone digestion rate in the PPCPs chemical organic pollutant wastewater, saves manpower, material resources and time, is simple, quick and effective, and strictly according to the QSAR model use rule specified by OECD, the mechanism of ozone digestion PPCPs is structurally released from the molecular descriptor, so that the method is beneficial to the adjustment of process parameters such as the removal process ozone dosage of the PPCPs in a sewage treatment plant, and has important significance on the risk control and the environmental safety of the PPCPs.

Description

Method for constructing model for predicting ozone digestion rate in PPCPs organic pollutant wastewater
Technical Field
The invention belongs to the technical field of environmental protection, and particularly relates to a method for predicting the ozone digestion rate in PPCPs organic pollutant wastewater.
Background
PPCPs include various prescription and non-prescription drugs such as antibiotics, anti-inflammatory drugs, sedatives, lipid regulators, β -blockers, cytostatics, and cosmetics and other personal Care Products and their respective metabolites, etc. PPCPs are widely used in the industries of human medicine, health Care, cosmetics, aquaculture and livestock farming, where their heavy use results in the inevitable and continuous entry of PPCPs into the aqueous environment despite their low concentration in the aqueous environment (ng/L- μ g/L), but with the continuous enrichment of food chains and food nets, they further compromise the ecological environment and human health.
The sewage treatment stage is the last gateway of PPCPs entering the environment, and the removal efficiency is directly related to the exposure concentration of the PPCPs in the environment, thereby further influencing the environmental risk. In recent years, ozone oxidation is gradually applied to the advanced oxidation advanced treatment field of sewage treatment as a pretreatment method for breaking macromolecular organic matters before biochemical treatment or a biochemical drainage advanced treatment upgrading method. Therefore, the evaluation of the removal efficiency of PPCPs by ozone treatment in sewage is an important content of PPCPs environmental risk evaluation, is also beneficial to the adjustment of process parameters such as the dosage of PPCPs removal process ozone in sewage treatment plants, and has important significance on the risk control and environmental safety of PPCPs.
Currently, the efficiency of removing PPCPs by ozone in water can only be evaluated by means of tests. Because the types of PPCPs are more, and the establishment and determination of the analysis method in the wastewater need higher cost, time and labor are wasted, the test for determining the removal efficiency of the PPCPs is not very feasible.
The mechanism prediction method is a better alternative method. The reaction of the chemicals and the ozone follows a two-stage kinetic equation, and the ozone reaction kinetic equation is established to predict the ozone reaction removal efficiency of the chemicals:
Figure BDA0002332358580000011
wherein [ C ]]t-concentration of chemicals at time t, mg/L;
[C]0-concentration of chemicals at time 0, mg/L;
[C]t/[C]0-the residual rate of chemicals at time t, dimensionless;
kO3reaction rate of chemicals with ozone, M-1·s-1
k·OHReaction rate of chemical with hydroxyl radical, M-1·s-1
∫[O3]dt-cumulative ozone exposure concentration, M.s;
[ OH ] dt-cumulative exposure concentration of hydroxyl radicals, M.s.
The chemical removal rate calculated according to the above formula has 4 main parameters, of which the cumulative ozone exposure concentration ([ integral ] O)3]dt) and cumulative exposed concentration of free hydroxyl groups ([ [. OH [. H ])]dt) is related to the amount of ozone addition in the effluent removal process. Reaction Rate (k) of ozoneO3) Reaction rate (k) with hydroxyl radical·OH) Is a property of the chemical itself.
From the above formula, it is only necessary to know the ozone reaction rate (k) of PPCPsO3) Reaction rate (k) with hydroxyl radical·OH) The removal efficiency of the PPCPs can be calculated.
Some prediction organic chemicals k exist at home and abroadO3(iii) a predictive model of (Sairam Sudhakaran. QSARmodoles for oxidation of organic micropollutants in water based on ozone and dihydroxyl radial rates, compositionsStands and the chemical classification. Waterresearch,2013,47:1111-O3The predictive model of (1). Broad-spectrum models sacrifice the accuracy of predictions for a particular class of chemicals in order to accommodate most chemicals, and thus k for PPCPsO3The prediction accuracy is low.
This study establishes k for PPCPsO3The forecasting model greatly enhances the applicability of the PPCPs and improves the accuracy of the PPCPs forecasting result.
Disclosure of Invention
The invention establishes a construction method of a model for predicting the ozone reaction rate in PPCPs organic pollutant wastewater, which is used for predicting and evaluating the PPCPs removal efficiency of a sewage treatment plant in a chemical industrial park. The method comprises the following steps:
(1) data collection, setting training set and validation set sample compounds
The second-order reaction rate (k) of ozone in 50 PPCPs wastewater is inquired from the literatureO3ph8.5) data; from the document environ, sci, technol.2013,47, 5872-. And setting a training set and a verification set of sample compounds, wherein 37 sample compounds are selected in the training set, and 13 sample compounds are selected in the verification set. The training set samples are as diverse as possible in structure, and the activity coverage range is as large as possible, so that the model has wide application range and strong prediction capability. The validation set is used to evaluate the predictive power of the established model, contained within the descriptor space of the training set.
(2) Computing descriptors
The method comprises the steps of adopting MM + molecular mechanics in Hyperchem 7.0 software to pre-optimize a compound structure, optimizing the compound structure by a semi-empirical AM1 method, calculating descriptors by using Dragon 5.4 software based on the optimized structure, preliminarily screening 1664 calculated descriptors, namely removing a constant term, an approximate constant term and 704 molecular descriptors with high correlation (correlation coefficient is smaller than a target value in two molecular descriptors with correlation coefficients larger than 0.98), extracting 19 factors (accumulated contribution rate reaches 90%) based on principal component factor analysis (SPSS 19.0), screening the descriptors with the contribution rate larger than 0.7 in each factor, and obtaining 66 important molecular descriptors.
X=AF+e (1)
Wherein A ═ aij),aijThe factor load represents the correlation coefficient of the ith variable and the jth factor; a is a factor load matrix, F is a common factor of X, e is a special factor of X, and X is a descriptor matrix.
(3) Model construction
A Genetic Algorithm (GA) in MobyDigs software is adopted for variable selection, and related parameters of the GA are as follows: the population number is 100, the variation probability is 0.5, the maximum feature number allowed in the model is 10, the evaluation function is leave-one-out cross validation (LOO-CV), and other parameters are default values. When the number of the added variables has little influence on the result, the optimal number of the parameters is obtained. And (3) establishing a prediction model, namely a GA-MLR model, by adopting a Multiple Linear Regression (MLR) method based on the screened variables.
logY=-15.61706-1.99793AMW+90.15638Mp-8.38954AAC-3.00276Eeig01r+181.24166JGI5-0.55738nCbH (2)
Wherein Y represents kO3The values, AMW represents the average molecular weight, Mp represents the average atomic polarizability, AAC represents the atomic average information index, JGI5 represents the topological charge index, Eeig01r represents the bond energy conjugate integral matrix eigenvalue, nCbH represents the number of unsubstituted C benzene rings.
(4) Characterization and evaluation of models
According to OECD's guidance on QSAR models, both internal verification (goodness-of-fit and robustness assessment) and external verification (prediction capability assessment) of the constructed model are required. The goodness of fit is the proportion of the variation information that can be interpreted by the model to the total variation information of the independent variables. Using the square of the correlation coefficient (R) between the corrected experimental value and the fitting value2 adj) Root Mean Square Error (RMSE) to characterize the goodness of fit of the model:
Figure BDA0002332358580000031
Figure BDA0002332358580000041
wherein n represents the number of compounds, m is the number of predictor variables, yiAnd
Figure BDA0002332358580000042
respectively representing the experimental value and the predicted value of the i < th > compound activity index;
Figure BDA0002332358580000043
the average value of the compound activity index experiment value is shown.
Cross validation of coefficient (Q) using the Deyi method2 LOO) And Bootstrapping method (Q)2 BOOT) The stability of the characterization model:
Figure BDA0002332358580000044
wherein,
Figure BDA0002332358580000045
represents the average value of the activity index experiment value of the compound in the training set. The Bootstrapping method was cross-validated with 1/5 and repeated 5000 times.
Using external verification correlation coefficient (Q)2 EXT),R2 EXT,RMSEEXTCharacterizing model prediction capability:
Figure BDA0002332358580000046
wherein n isEXTThe number of compounds in the representative validation set,
Figure BDA0002332358580000047
and (4) representing the average value of the experimental value and the predicted value of the compound activity index in the verification set.
Obtaining the characterization and evaluation parameters of the model:
ntr=37,R2 adj=0.709,Q2 LOO=0.658,Q2 BOOT=0.687,RMSEtr=1.19
next=13,R2 ext=0.628,Q2 ext=0.604,RMSEext=0.928
wherein n istrAnd nextThe number of compounds in the training set and the validation set, respectively, p is the significance level. R2 adjIs a decision coefficient corrected by a degree of freedom; RMSE is root mean square error; q2 LOOCross-validating the coefficients for a one-out method; q2 BOOTVerifying the coefficient for a boosting method; r2 extFor experimental and predicted correlation coefficients, Q2 extDetermining coefficients for external verification, RMSEextTo verify the set root mean square error.
Golbraikh et al have concluded that the accepted standard for QSAR models is Q2>0.50 and R2>0.60. The result shows that the model has better prediction capability.
Further, unknown compound is subjected to molecular structure input and structure optimization process, and by means of Dragon software, AMW, Mp, AAC, JGI5, Eeig01r and nCbH 6 descriptors are calculated, and unknown compound k is obtained by means of the prediction modelO3The predicted value of (2).
Further, the method is applicable to ozone digestion rate prediction of the compounds metronidazole and linalooc acid.
Further, the above model is explained for the mechanism: based on the Dragon descriptor, a quantitative prediction model of the ozone digestion rate constant of the PPCPs organic chemicals is constructed by adopting principal component analysis and a genetic algorithm-multiple linear regression algorithm, and research results show that: average Molecular Weight (AMW), average atomic polarizability (Mp), atomic average information index (AAC), topological charge index (JGI5), Eeig01r (representing a characteristic value of a bond energy conjugate integral matrix), and unsubstituted C number (nCbH) of a benzene ring have a significant influence on the ozone digestion rate of PPCPs compounds, as shown in Table 1. AMW, AAC, Eeig01r and nCbH are in negative correlation with the ozone digestion rate, namely the larger the average molecular weight, the atomic average information index, the characteristic value of the bond energy conjugate integral matrix and the unsubstituted C value of a benzene ring are, the smaller the ozone digestion rate is. The Mp is positively correlated with the ozone digestion rate, and the electron cloud forming the bond has larger relative degree of change, smaller electronegativity, larger degree of polarization of the bond and larger ozone digestion effect under the action of an external electric field. JGI5, reflecting the 5 th order topological charge average index of the compound, the greater the topological charge index value, the greater the ozone digestion rate.
TABLE 1 six parameters in the regression model and their physicochemical meanings and coefficients in the model
Variables of Meaning of descriptor Regression coefficient Deviation of regression coefficient Standard regression coefficient
Constant number -15.61706 7.96968
AMW Average molecular weight -1.99793 0.34868 -1.86685
Mp Average degree of atomic polarization 90.15638 17.43372 1.75023
AAC Atomic mean information index -8.38954 1.81277 -0.78161
EEig01r Eigenvalue of bond energy conjugate integral matrix -3.00276 1.08265 -0.29843
JGI5 Topological charge average index of 5 th order 181.24166 26.98228 0.7957
nCbH The phenyl ring being unsubstituted C (sp2) -0.55738 0.14196 -0.6387
The invention has the beneficial effects that: the model for predicting the ozone digestion rate in the organic pollutant wastewater, which is constructed by the method, can accurately predict the ozone digestion rate in the PPCPs chemical organic pollutant wastewater, enhances the applicability of the prediction of the ozone digestion rate in the PPCPs wastewater, improves the accuracy of a prediction result, saves manpower, material resources and time, is simple, quick and effective, and is beneficial to the adjustment of process parameters such as the dosage of process ozone for removing PPCPs in a sewage treatment plant from a molecular descriptor structure according to the use rule of a QSAR model specified by OECD, thereby having important significance on the risk control and environmental safety of the PPCPs.
Drawings
FIG. 1 application domain of the PPCPs-class chemical ozone digestion rate model to Williams.
FIG. 2 is a graph of a model application domain representation based on Euclidean distance.
FIG. 3 is a fitting graph of predicted values and experimental values of an MLR model for PPCPs chemical ozone digestion rates.
Detailed Description
The invention will be better understood from the following examples. However, those skilled in the art will readily appreciate that the description of the embodiments is only for illustrating the present invention and should not be taken as limiting the invention as detailed in the claims.
Example 1
The specific steps of constructing the model for predicting the ozone digestion rate in the PPCPs organic pollutant wastewater are as follows:
(1) data collection, setting training set and validation set sample compounds
The data of ozone second-order reaction rate (kO3, pH8.5) in 50 PPCPs wastewater are inquired from the literature; from the document environ, sci, technol.2013,47, 5872-. The training set had 37 total sample compounds, and the validation set had 13 total sample compounds.
(2) Computing descriptors
The method comprises the steps of adopting MM + molecular mechanics in Hyperchem 7.0 software to carry out pre-optimization on a compound structure, optimizing the compound structure by a semi-empirical AM1 method, calculating descriptors by using Dragon 5.4 software based on the optimized structure, carrying out preliminary screening on 1664 calculated descriptors, and obtaining 66 molecular descriptors based on principal component factor analysis (SPSS 19.0).
X=AF+e (1)
Wherein A ═ aij),aijThe factor load represents the correlation coefficient of the ith variable and the jth factor.
(3) Model construction
Selecting variables by adopting a Genetic Algorithm (GA) in MobyDigs software, and establishing a prediction model, namely a GA-MLR model, by adopting a Multivariate Linear Regression (MLR) method based on the screened variables:
logY=-15.61706-1.99793AMW+90.15638Mp-8.38954AAC-3.00276Eeig01r+
181.24166JGI5-0.55738nCbH (2)
wherein Y represents kO3The values, AMW represents the average molecular weight, Mp represents the average atomic polarizability, AAC represents the atomic average information index, JGI5 represents the topological charge index, Eeig01r represents the bond energy conjugate integral matrix eigenvalue, nCbH represents the number of unsubstituted C benzene rings.
(4) Characterization and evaluation of models
According to OECD's guidance on QSAR models, both internal verification (goodness-of-fit and robustness assessment) and external verification (prediction capability assessment) of the constructed model are required. Using the square of the correlation coefficient (R) between the corrected experimental value and the fitting value2 adj) Root Mean Square Error (RMSE) to characterize the goodness of fit of the model:
Figure BDA0002332358580000071
Figure BDA0002332358580000072
wherein n represents the number of compounds, m is the number of predictor variables, yiAnd
Figure BDA0002332358580000073
respectively representing the experimental value and the predicted value of the i < th > compound activity index;
Figure BDA0002332358580000074
is the average of the experimental values of the compound activity indexThe value is obtained.
Cross validation of coefficient (Q) using the Deyi method2 LOO) And Bootstrapping method (Q)2 BOOT) The stability of the characterization model:
Figure BDA0002332358580000075
wherein,
Figure BDA0002332358580000076
represents the average value of the activity index experiment value of the compound in the training set. The Bootstrapping method was cross-validated with 1/5 and repeated 5000 times.
Using external verification correlation coefficient (Q)2 EXT),R2 EXT,RMSEEXTCharacterizing model prediction capability:
Figure BDA0002332358580000077
wherein n isEXTThe number of compounds in the representative validation set,
Figure BDA0002332358580000078
and (4) representing the average value of the experimental value and the predicted value of the compound activity index in the verification set.
Obtaining the characterization and evaluation parameters of the model:
ntr=37,R2 adj=0.709,Q2 LOO=0.658,Q2 BOOT=0.687,RMSEtr=1.19
next=13,R2 ext=0.628,Q2 ext=0.604,RMSEext=0.928
wherein n istrAnd nextThe number of compounds in the training set and the validation set, respectively, p is the significance level. R2 adjIs a decision coefficient corrected by a degree of freedom; RMSE is root mean square error; q2 LOOCross-validating the coefficients for a one-out method; q2 BOOTValidating coefficients for a boosting method;R2 extFor experimental and predicted correlation coefficients, Q2 extDetermining coefficients for external verification, RMSEextTo verify the set root mean square error.
The result shows that the model has better prediction capability and robustness.
Example 2
The embodiment performs application domain characterization on the prediction model. And defining an application domain of the model by using a Euclidean distance method and a Williams graph based on the leafage. The Euclidean distance was calculated using AMBIT Discovery v0.04 software (http:// argument. sourceform. net/download _ argument Discovery. html). The euclidean distance is calculated by:
Figure BDA0002332358580000081
where μ is the mean of the descriptors x.
The Williams diagram is constructed from the standard residual (δ) and the leverage value (in h)iRepresenting i represents a different compound). δ is calculated using the formula:
Figure BDA0002332358580000082
lever of training set Compounds (lever, h)i) This can be found by the following equation:
hi=xi T(XTX)–1xi(9)
in the formula, xiIs a row vector of the i-th compound molecular structure descriptor. Alarm value (h)*) Is defined as:
h*=3(k+1)/n (10)
wherein k is the number of descriptors, and n is the number of training sets.
The model application domain characterization results are shown in fig. 1 and fig. 2. H in FIG. 1*3(k + 1)/n-3 (6+ 1)/37-0.568. Williams diagram ordinate represents the degree of dispersion of experimental values by the standard residuals of experimental and predicted valuesWhen the absolute value of the standard residual δ of a compound is greater than 3.0, it is considered as an outlier. The abscissa represents compound h in the training setiValue hiAbove the alarm value (h ═ 0.568), this indicates that the substructures of the material are less present in the training set, and have a significant effect on the model prediction.
As can be seen, the lever value h of 1 compound exceeds the warning lever value h*The structure of this compound was shown to be somewhat different from that of the training set compound, but the standard residuals were all in the (-3, +3) range, indicating that the model is applicable to carbamazepine (CAS: 298-46-4). The standard residuals for both oxazepam (CAS:604-75-1) and levetiracetam (CAS:102767-28-2) fall outside the (-3, +3) range, are outliers, indicating that this model is not suitable for prediction of both species, while the remaining compounds are considered log kO3Can be predicted well.
Figure BDA0002332358580000091
And characterizing the application domain of the model based on the Euclidean distance method. Fig. 2 is a euclidean distance graph. The Euclidean distance from the feature vector of the training set compound to the feature vector of the central point is in the range of 0.132-1.088, so that the compound with the Euclidean distance of the feature vector not greater than 1.088 is suitable for the model. The Euclidean distances of compounds in the model verification set are all smaller than 1.088, and the compounds are all in the model application domain.
Example 3
The model constructed in example 1 is used for predicting the ozone digestion rate in the wastewater containing 50 PPCPs organic pollutants, and the results are shown in Table 2. R of the model2 adj0.709, indicating that the model has a strong fitting ability. Q2 LOO=0.658,Q2 BOOT0.687, indicating that the model is more robust. R2 ext=0.628,Q2 extThe accepted standard for the QSAR model is Q2 as studied by Golbraikh et al 0.604>0.50 and R2>0.60. The result shows that the model has better prediction capability and can be successfully applied to compounds except the training set. FIG. 3 is a fitting graph of a predicted value and an experimental value of a GA-MLR model of the PPCPs chemical ozone digestion rate, and it can be seen from FIG. 3 that the predicted value and the experimental value of most substances are well fitted, the deviation of the experimental value and the predicted value of oxazepam (CAS:604-75-1) and levetiracetam (CAS:102767-28-2) is large, the prediction is poor, and other substances can be well predicted.
Example 4
Using the model constructed in example 1, the ozone digestion rate of metronidazole (SMILES: O + ] ([ O- ])/C ═ C (NCCSCC1 ═ C (O1) cn (C)/NC) was predicted. Firstly, 6 descriptors, AMW, Mp, AAC, Eeig01r, JGI5 and nCbH, are calculated by using Dragon software according to the molecular structure of the chemical substance, wherein the descriptors are respectively 7.83, 0.59, 1.812, 4.455, 0.024, 0, Hat is 0.505, and the Euclidean distance is 0.532 in the range of a model application domain.
logY=-15.61706-1.99793*(12.06)+90.15638*(0.8)-8.38954*(1.692)-3.00276*(4.31)+181.24166*(0.025)-0.55738*(6)=-2.29
The metronidazole ozonolysis rate (k) is predicted to be <1 and the experimental value is <1, close to the experimental determination.
Example 5
The ozone digestion rate of lytalin acid (SMILES: O ═ C (O) C (C2CCCCN2) C1 ═ CC ═ C1) was predicted using the model constructed in example 1. Firstly, 6 descriptors AMW, Mp, AAC, Eeig01r, JGI5 and nCbH are calculated by using Dragon software according to the molecular structure of a chemical substance; 6.65, 0.8, 1.692, 4.31, 0.025 and 6, respectively. Hat is 0.213, Euclidean distance is 0.427, in the range of model application domain, the model can be used for predicting the ozone resolution rate of the Rielin acid, and the descriptor value is substituted into the established model:
logY=-15.61706-1.99793*(12.06)+90.15638*(0.8)-8.38954*(1.692)-3.00276*(4.31)+181.24166*(0.025)-0.55738*(6)=3.37
the ozone digestion rate (k) of Ritalin acid was predicted to be 4.47X 103The experimental value is 2.1X 104And the result is close to the test measurement result.
Table 250 experimental values and predicted value results of ozone digestion rate in PPCPs organic chemical wastewater
Figure BDA0002332358580000101
Figure BDA0002332358580000111

Claims (7)

  1. The method for constructing the model for predicting the ozone digestion rate in the PPCPs organic pollutant wastewater is characterized by comprising the following steps of:
    step one, data collection, setting training set and verification set sample compounds;
    step two, calculating a descriptor;
    step three, constructing a model;
    and step four, characterizing and evaluating the model.
  2. 2. The method for constructing the model for predicting the digestion rate of ozone in the PPCPs-type organic pollutant wastewater according to claim 1, wherein the data in the first step is the second-order reaction rate k of ozone of 50 PPCPs-type wastewater at pH8.5O3Data, 37 sample compounds were selected for the training set and 13 sample compounds were selected for the validation set.
  3. 3. The method for constructing the model for predicting the ozone digestion rate in the PPCPs organic pollutant wastewater as claimed in claim 1, wherein in the second step, MM + molecular mechanics in Hyperchem 7.0 software is adopted to pre-optimize the structure of the compound, a semi-empirical AM1 method is adopted to optimize the structure of the compound, based on the optimized structure, Dragon 5.4 software is used to calculate descriptors, and preliminary screening is performed on 1664 calculated descriptors, i.e. constant terms, approximate constant terms and 704 molecular descriptors with high correlation are removed, 19 factors with a cumulative contribution rate of 90% are extracted based on the principle component factor analysis SPSS 19.0, descriptors with a contribution rate of more than 0.7 in each factor are screened, and 66 important molecular descriptors are obtained:
    X=AF+e (1)
    wherein A ═ aij),aijThe factor load represents the correlation coefficient of the ith variable and the jth factor; a is a factor load matrix, F is a common factor of X, e is a special factor of X, and X is a descriptor matrix.
  4. 4. The method for constructing the model for predicting the digestion rate of ozone in the PPCPs organic pollutant wastewater according to claim 1, wherein in the third step, a genetic algorithm GA in MobyDigs software is adopted for variable selection, and related parameters of GA are as follows: the population number is 100, the variation probability is 0.5, the maximum feature number allowed in the model is 10, the evaluation function is LOO-CV (long-term evolution-constant-value) interactive verification, and other parameters are default values; when the influence of the number of the added variables on the result is small, the optimal number of the parameters is obtained; based on the screened variables, a prediction model is established by adopting a multiple linear regression method, and 6 molecular descriptors and the model are screened as follows:
    GLA-MLR linear equation:
    logY=-15.61706-1.99793AMW+90.15638Mp-8.38954AAC-3.00276Eeig01r+181.24166JGI5-0.55738nCbH (2)
    wherein Y represents kO3The values, AMW represents the average molecular weight, Mp represents the average atomic polarizability, AAC represents the atomic average information index, JGI5 represents the topological charge index, Eeig01r represents the bond energy conjugate integral matrix eigenvalue, nCbH represents the number of unsubstituted C benzene rings.
  5. 5. The method for constructing the model for predicting the digestion rate of ozone in the PPCPs organic pollutant wastewater according to claim 1, wherein the fourth step adopts the correlation coefficient squared R between the corrected experimental value and the fitting value2 adjRoot mean square error RMSE to characterize the goodness of fit of the model:
    Figure FDA0002332358570000021
    Figure FDA0002332358570000022
    wherein n represents the number of compounds, m is the number of predictor variables, yiAnd
    Figure FDA0002332358570000023
    respectively representing the experimental value and the predicted value of the i < th > compound activity index;
    Figure FDA0002332358570000024
    the average value of the compound activity index experiment value is shown;
    cross validation of coefficient Q by using de-one method2 LOOAnd boosting method Q2 BOOTThe stability of the characterization model:
    Figure FDA0002332358570000025
    wherein,
    Figure FDA0002332358570000026
    representing the average value of the activity index experimental values of the compounds in the training set, adopting 1/5-removing cross validation by adopting a Bootstrapping method, and repeating for 5000 times;
    using external verification correlation coefficient (Q)2 EXT),R2 EXT,RMSEEXTCharacterizing model prediction capability:
    Figure FDA0002332358570000027
    wherein n isEXTThe number of compounds in the representative validation set,
    Figure FDA0002332358570000028
    representing the average value of the activity index experimental value and the predicted value of the compound in the verification set;
    obtaining the characterization and evaluation parameters of the model:
    ntr=37,R2 adj=0.709,Q2 LOO=0.658,Q2 BOOT=0.687,RMSEtr=1.19;
    next=13,R2 ext=0.628,Q2 ext=0.604,RMSEext=0.928
    wherein R is2 adjIs a decision coefficient corrected by a degree of freedom; RMSE is root mean square error; q2 LOOCross-validating the coefficients for a one-out method; q2 BOOTVerifying the coefficient for a boosting method; r2 extFor experimental and predicted correlation coefficients, Q2 extDetermining coefficients for external verification, RMSEextTo verify the set root mean square error.
  6. 6. The method for constructing the model for predicting the ozone digestion rate in the PPCPs organic pollutant wastewater as claimed in claim 4, wherein unknown compounds are subjected to molecular structure input and structure optimization processes, Dragon software is used for calculating descriptors of AMW, Mp, AAC, JGI5, Eeig01r and nCbH 6, and the unknown compounds k are obtained by using the prediction modelO3The predicted value of (2).
  7. 7. The method for constructing the model for predicting the digestion rate of the ozone in the PPCPs organic pollutant wastewater as claimed in claim 6, is characterized in that: the method is used for ozone digestion rate prediction of the compounds metronidazole and linalooc acid.
CN201911341369.8A 2019-12-24 2019-12-24 Method for constructing predictive model of ozone digestion rate in PPCPs organic pollutant wastewater Active CN111310299B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911341369.8A CN111310299B (en) 2019-12-24 2019-12-24 Method for constructing predictive model of ozone digestion rate in PPCPs organic pollutant wastewater

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911341369.8A CN111310299B (en) 2019-12-24 2019-12-24 Method for constructing predictive model of ozone digestion rate in PPCPs organic pollutant wastewater

Publications (2)

Publication Number Publication Date
CN111310299A true CN111310299A (en) 2020-06-19
CN111310299B CN111310299B (en) 2024-03-19

Family

ID=71161494

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911341369.8A Active CN111310299B (en) 2019-12-24 2019-12-24 Method for constructing predictive model of ozone digestion rate in PPCPs organic pollutant wastewater

Country Status (1)

Country Link
CN (1) CN111310299B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113393907A (en) * 2021-07-20 2021-09-14 西安交通大学 Construction method and device of PPCPs organic pollutant degradation rate prediction model
CN115713978A (en) * 2022-11-30 2023-02-24 南京科技职业学院 Method for predicting removal rate of new pollutants in ozone advanced treatment wastewater
CN117236528A (en) * 2023-11-15 2023-12-15 成都信息工程大学 Ozone concentration forecasting method and system based on combined model and factor screening

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102507630A (en) * 2011-11-30 2012-06-20 大连理工大学 Method for forecasting oxidation reaction rate constant of chemical substance and ozone based on molecular structure and environmental temperature

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102507630A (en) * 2011-11-30 2012-06-20 大连理工大学 Method for forecasting oxidation reaction rate constant of chemical substance and ozone based on molecular structure and environmental temperature

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
范德玲等: ""有机化学品与臭氧反应速率常数的定量预测模型研究"" *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113393907A (en) * 2021-07-20 2021-09-14 西安交通大学 Construction method and device of PPCPs organic pollutant degradation rate prediction model
CN115713978A (en) * 2022-11-30 2023-02-24 南京科技职业学院 Method for predicting removal rate of new pollutants in ozone advanced treatment wastewater
CN117236528A (en) * 2023-11-15 2023-12-15 成都信息工程大学 Ozone concentration forecasting method and system based on combined model and factor screening
CN117236528B (en) * 2023-11-15 2024-01-23 成都信息工程大学 Ozone concentration forecasting method and system based on combined model and factor screening

Also Published As

Publication number Publication date
CN111310299B (en) 2024-03-19

Similar Documents

Publication Publication Date Title
CN111310299A (en) Method for constructing model for predicting ozone digestion rate in PPCPs organic pollutant wastewater
Appiani et al. On the use of hydroxyl radical kinetics to assess the number-average molecular weight of dissolved organic matter
Usman et al. Artificial intelligence-based models for the qualitative and quantitative prediction of aphytochemical compound using HPLC method
Ravi et al. Multiobjective optimization of cyclone separators using genetic algorithm
Grieu et al. Prediction of parameters characterizing the state of a pollution removal biologic process
Ashauer Post-ozonation in a municipal wastewater treatment plant improves water quality in the receiving stream
Tian et al. Quantum cascade laser imaging (LDIR) and machine learning for the identification of environmentally exposed microplastics and polymers
Holguin-Gonzalez et al. Integrating hydraulic, physicochemical and ecological models to assess the effectiveness of water quality management strategies for the River Cuenca in Ecuador
Ilyas et al. Prediction of the removal efficiency of emerging organic contaminants based on design and operational parameters of constructed wetlands
Radzevičius et al. A rapid UV/Vis spectrophotometric method for the water quality monitoring at on-farm root vegetable pack houses
Teles et al. Time series forecasting of cyanobacteria blooms in the Crestuma Reservoir (Douro River, Portugal) using artificial neural networks
CN111261238A (en) Construction method of PPCPs organic chemical mesophilic anaerobic digestion removal rate prediction model
Berkemeier et al. Accelerating models for multiphase chemical kinetics through machine learning with polynomial chaos expansion and neural networks
Ntuli et al. Designing of sampling programmes for industrial effluent monitoring
Naser et al. Simulation of low TDS and biological units of Fajr industrial wastewater Treatment plant using artificial neural network and principal component analysis hybrid method
Verlicchi et al. Quantitative and qualitative approaches for CEC prioritization when reusing reclaimed water for irrigation needs–A critical review
Nguyen et al. Estimating ammonium changes in pilot and full-scale constructed wetlands using kinetic model, linear regression, and machine learning
Rajabi et al. QSAR models for predicting aquatic toxicity of esters using genetic algorithm-multiple linear regression methods
US20060017008A1 (en) Dyed microspheres for characterization of photochemical reactor behavior
LU502703B1 (en) Method for predicting ozone digestion rate of PPCPS-Type organic chemical in wastewater treatment
Lim et al. A systematic model calibration methodology based on multiple errors minimization method for the optimal parameter estimation of ASM1
López-Kleine et al. UV-vis in situ spectrometry data mining through linear and non linear analysis methods
CN114864015A (en) Water eutrophication detection method, device, equipment and storage medium
Hamed et al. Performance simulation of H-TDS unit of FAJR industrial wastewater treatment plant using a combination of neural network and principal component analysis
Shi et al. Process modeling based on nonlinear PLS models using a prior knowledge-driven time difference method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant