CN111310299B - Method for constructing predictive model of ozone digestion rate in PPCPs organic pollutant wastewater - Google Patents

Method for constructing predictive model of ozone digestion rate in PPCPs organic pollutant wastewater Download PDF

Info

Publication number
CN111310299B
CN111310299B CN201911341369.8A CN201911341369A CN111310299B CN 111310299 B CN111310299 B CN 111310299B CN 201911341369 A CN201911341369 A CN 201911341369A CN 111310299 B CN111310299 B CN 111310299B
Authority
CN
China
Prior art keywords
model
ppcps
ext
ozone
compound
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911341369.8A
Other languages
Chinese (zh)
Other versions
CN111310299A (en
Inventor
范德玲
周林军
汪贞
郭敏
王蕾
古文
刘济宁
石利利
刘明庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Institute of Environmental Sciences MEE
Original Assignee
Nanjing Institute of Environmental Sciences MEE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Institute of Environmental Sciences MEE filed Critical Nanjing Institute of Environmental Sciences MEE
Priority to CN201911341369.8A priority Critical patent/CN111310299B/en
Publication of CN111310299A publication Critical patent/CN111310299A/en
Application granted granted Critical
Publication of CN111310299B publication Critical patent/CN111310299B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Treatment Of Water By Oxidation Or Reduction (AREA)

Abstract

A method for constructing an ozone digestion rate prediction model in PPCPs organic pollutant wastewater comprises the following steps: data collection, training set setting and verification set sample compounds, descriptor calculation, model construction, and characterization and evaluation of the model. The ozone digestion rate prediction model in the organic pollutant wastewater constructed by the method can accurately predict the ozone digestion rate in the PPCPs chemical organic pollutant wastewater, saves manpower, material resources and time, is simple, quick and effective, interprets the ozone digestion PPCPs mechanism from the molecular descriptor structure strictly according to the QSAR model usage rule regulated by OECD, is beneficial to the adjustment of process parameters such as the ozone dosage of the PPCPs removal process of a sewage treatment plant, and has important significance for the risk management and control of the PPCPs and the environmental safety.

Description

Method for constructing predictive model of ozone digestion rate in PPCPs organic pollutant wastewater
Technical Field
The invention belongs to the technical field of environmental protection, and particularly relates to a method for predicting ozone digestion rate in PPCPs organic pollutant wastewater.
Background
The contamination of water environments by drugs and personal care products (Pharmaceuticals and Personal Care Products, PPCPs) is of increasing concern. PPCPs include various prescription and over-the-counter drugs such as antibiotics, anti-inflammatory drugs, sedatives, lipid modulators, beta-blockers, cytostatics, and cosmetic and other personal care products and their respective metabolites, and the like. PPCPs are widely used in industries such as human medical treatment, health care products, cosmetics, aquatic products, livestock breeding and the like, and the use of the PPCPs in a large amount causes the PPCPs to inevitably and continuously enter water environment. Although their concentration in water environments is very low (ng/L to μg/L), they are increasingly enriched with food chains and food nets, thereby endangering the ecological environment and human health.
The sewage treatment stage is the last gateway of PPCPs entering the environment, and the removal efficiency of the PPCPs is directly related to the exposure concentration of the PPCPs in the environment, thereby influencing the environmental risk. In recent years, ozone oxidation is used as a pretreatment for breaking macromolecular organic matters before biochemical treatment or a advanced treatment and upgrading method for biochemical drainage, and is gradually applied to the advanced oxidation advanced treatment field of sewage treatment. Therefore, the evaluation of the removal efficiency of the PPCPs by the ozone treatment in the sewage is an important content of the environmental risk evaluation of the PPCPs, is also beneficial to the adjustment of process parameters such as the ozone dosage of the PPCPs removal process of sewage treatment plants, and has important significance on the risk management and the environmental safety of the PPCPs.
The current evaluation of the removal efficiency of PPCPs by ozone in water can only be determined by means of tests. Because PPCPs are more in variety, and the method for analyzing the wastewater needs higher cost for establishment and measurement, time and labor are wasted, the test for measuring the removal efficiency of the PPCPs is not very feasible.
The mechanism prediction method is a better alternative method. The chemical reacts with ozone to follow a secondary kinetic equation, and the ozone reaction kinetic equation is established to predict the ozone reaction removal efficiency of the chemical:
in the formula, [ C ]] t -concentration of chemical at time t, mg/L;
[C] 0 -concentration of chemical at time 0, mg/L;
[C] t /[C] 0 -the residual rate of the chemical at time t, dimensionless;
k O3 -reaction of chemicals with ozoneResponse rate, M -1 ·s -1
k ·OH -reaction rate of chemical with hydroxyl radical, M -1 ·s -1
∫[O 3 ]dt, the cumulative exposure concentration of ozone, M.s;
and [. OH ] dt-cumulative exposure concentration of hydroxyl radicals, M.s.
There are 4 main parameters of the chemical removal rate calculated according to the above formula, in which the ozone cumulative exposure concentration (+[ O ] 3 ]dt) and the cumulative exposure concentration of hydroxyl radicals (+.OH)]dt) is related to the amount of ozone additive in the wastewater removal process. Reaction Rate of ozone (k) O3 ) Reaction Rate with hydroxyl radical (k) ·OH ) Is a property of the chemical itself.
As can be seen from the above formula, the reaction rate (k) of ozone of PPCPs is known O3 ) Reaction Rate with hydroxyl radical (k) ·OH ) The removal efficiency of PPCPs can be calculated.
Some predictive organic chemicals k are at home and abroad O3 Is a predictive model of (Sairam Sudhakaran. QSAR models for oxidation of organic micropollutants in water based on ozone and hydroxyl radical rate constants and their chemical classification. Water Research,2013, 47:1111-1122), but these predictive models are broad-spectrum models, not specific to the k of PPCPs O3 Is a predictive model of (a). Broad spectrum models sacrifice accuracy of prediction of a particular class of chemicals in order to fit most chemicals, and therefore their k to PPCPs O3 The prediction accuracy is low.
The present study establishes k for PPCPs O3 The applicability of the PPCPs is greatly enhanced, and the accuracy of the PPCPs prediction result is improved.
Disclosure of Invention
The invention establishes a method for constructing a model for predicting the ozone reaction rate in PPCPs organic pollutant wastewater, which is used for predicting and evaluating the PPCPs removal efficiency of a sewage treatment plant in a chemical industry park. The method comprises the following steps:
(1) Data collection, set training set and validation set sample compounds
The second-order reaction rate (k) of ozone in 50 PPCPs wastewater is inquired from the literature O3 ph 8.5) data; from environ. Sci. Technology.2013, 47,5872-5881. The training set and the verification set are set to obtain sample compounds, wherein the training set is used for selecting 37 sample compounds, and the verification set is used for selecting 13 sample compounds. The training set samples are structurally diversified as much as possible, and the active coverage range is as large as possible, so that the application range of the model is wide, and the prediction capability is strong. The validation set is then used to evaluate the predictive power of the built model, contained within the descriptor space of the training set.
(2) Computing descriptors
Pre-optimizing a compound structure by adopting MM+ molecular mechanics in Hyperchem 7.0 software, optimizing the compound structure by a semi-empirical AM1 method, calculating descriptors by using Dragon 5.4 software based on the optimized structure, and primarily screening the calculated 1664 descriptors, namely removing constant terms, approximate constant terms and 704 molecular descriptors with high correlation (the correlation coefficient between two molecular descriptors with the correlation coefficient being larger than 0.98 and a target value is smaller), extracting 19 factors (the cumulative contribution rate reaches 90%) based on principal component factor analysis (SPSS 19.0), and screening the descriptors with the contribution rate being larger than 0.7 in each factor to obtain 66 important molecular descriptors.
X=AF+e (1)
Wherein a= (a ij ),a ij The factor load is the correlation coefficient of the ith variable and the jth factor; a is a factor loading matrix, F is a common factor of X, e is a special factor of X, and X is a descriptor matrix.
(3) Model construction
Variable selection is carried out by adopting a Genetic Algorithm (GA) in MobyDigs software, and the related parameters of the GA are as follows: the population number is 100, the variation probability is 0.5, the maximum allowed feature number in the model is 10, the evaluation function is leave-one-out interactive verification (LOO-CV), and other parameters are all default values. When the influence of the increased variable number on the result is not great, the optimal parameter number is obtained. Based on the screened variables, a Multiple Linear Regression (MLR) method is adopted to build a prediction model, namely a GA-MLR model.
logY=-15.61706-1.99793AMW+90.15638Mp-8.38954AAC-3.00276Eeig01r+181.24166JGI5-0.55738nCbH (2)
Wherein Y represents k O3 The values AMW represents the average molecular weight, mp represents the average atomic polarization degree, AAC represents the atomic average information index, JGI5 represents the topological charge index, eeig01r represents the characteristic value of the bond energy conjugate integration matrix, and nCbH represents the number of unsubstituted C benzene rings.
(4) Characterization and evaluation of models
According to OECD guidelines for QSAR models, internal (goodness of fit and robustness assessment) and external (predictive ability assessment) validation of the constructed model is required. The goodness of fit is the proportion of the variation information which can be interpreted by the model in the total variation information of the independent variables. The square of the correlation coefficient between the corrected experimental value and the fitted value (R 2 adj ) Root Mean Square Error (RMSE) to characterize the goodness of fit of the model:
wherein n represents the number of compounds, m is the number of predicted variables, y i Andrespectively representing an experimental value and a predicted value of the activity index of the ith compound; />Is the average value of the experimental values of the compound activity index.
Cross-validation coefficient (Q) by one-step method 2 LOO ) And Bootstrapping method (Q) 2 BOOT ) Characterization of stability of the model:
wherein,mean values of the experimental values of the activity indexes of the compounds in the training set are shown. Bootstrapping method was repeated 5000 times with 1/5 cross-validation.
Using externally verified correlation coefficients Q 2 EXT ,R 2 EXT ,RMSE EXT Characterization of model predictive capabilities:
wherein n is EXT Represents the number of compounds in the validation set,mean values of experimental and predicted values of the compound activity index of the validation set are shown.
And (3) obtaining characterization and evaluation parameters of the model:
n tr =37,R 2 adj =0.709,Q 2 LOO =0.658,Q 2 BOOT =0.687,RMSE tr =1.19
n EXT =13,R 2 EXT =0.628,Q 2 EXT =0.604,RMSE EXT =0.928
wherein n is tr And n EXT The number of compounds in the training set and the validation set, respectively, p being the significance level. R is R 2 adj Is a determination coefficient corrected by the degree of freedom; RMSE is root mean square error; q (Q) 2 LOO Cross-validating the coefficients for a one-way removal; q (Q) 2 BOOT Verifying coefficients for a Bootstrapping method; r is R 2 EXT For the correlation coefficient of experimental value and predicted value, Q 2 EXT Determining coefficients for external verification, RMSE EXT For verification set root mean square errorAnd (3) difference.
Golbraikh et al research considers that the QSAR model acceptable criteria is Q 2 >0.50 and R 2 >0.60. The result shows that the model has better prediction capability.
Furthermore, unknown compound k is obtained by calculating AMW, mp, AAC, JGI, eeig01r and nCbH 6 descriptors through Dragon software through the input of molecular structure and structure optimization process of the unknown compound and utilizing the prediction model O3 Is a predicted value of (a).
Further, the method is applicable to ozone digestion rate prediction of the compounds metronidazole and Li Tailin acid.
Further, the mechanism explanation is performed on the model: based on Dragon descriptors, a quantitative prediction model of PPCPs organic chemical ozone digestion rate constants is constructed by adopting principal component analysis and genetic algorithm-multiple linear regression algorithm, and research results show that: average Molecular Weight (AMW), average atomic polarization degree (Mp), atomic average information index (AAC), topological charge index (JGI 5), eeig01r (representing characteristic value of bond energy conjugate integration matrix), and number of unsubstituted C benzene rings (nCbH) have significant influence on the ozone digestion rate of PPCPs class compound, as shown in table 1. AMW, AAC, eeig01r and nCbH are inversely related to the ozone digestion rate, i.e. the greater the average molecular weight, atomic average information index, bond energy conjugate integration matrix eigenvalue and benzene ring unsubstituted C value, the lesser the ozone digestion rate. The Mp is positively correlated with the ozone digestion rate, and the greater the relative degree of change of electron cloud forming a bond under the action of an external electric field, the smaller the electronegativity, the greater the polarization degree of the bond and the greater the ozone digestion effect. JGI5 reflects the 5 th order topological charge average index of the compound, with greater topological charge index values and greater ozone digestion rates.
TABLE 1 six parameters in regression model and their physicochemical meanings and coefficients in model
Variable(s) Descriptor meaning Regression coefficient Regression coefficient deviation Standard regression coefficient
Constant (constant) -15.61706 7.96968
AMW Average molecular weight -1.99793 0.34868 -1.86685
Mp Average atomic polarization degree 90.15638 17.43372 1.75023
AAC Atomic average information index -8.38954 1.81277 -0.78161
EEig01r Key energy conjugate integral matrix eigenvalue -3.00276 1.08265 -0.29843
JGI5 5 th order topological charge average index 181.24166 26.98228 0.7957
nCbH Benzene ring unsubstituted C (sp 2) -0.55738 0.14196 -0.6387
The invention has the beneficial effects that: the ozone digestion rate prediction model in the organic pollutant wastewater constructed by the method can accurately predict the ozone digestion rate in the PPCPs chemical organic pollutant wastewater, enhances the applicability of the prediction of the ozone digestion rate in the PPCPs wastewater, improves the accuracy of a prediction result, saves manpower, material resources and time, is simple, quick and effective, interprets the ozone digestion PPCPs mechanism from a molecular descriptor structure strictly according to the QSAR model usage rules regulated by OECD, is beneficial to the adjustment of process parameters such as the ozone dosage of the PPCPs removal process of a sewage treatment plant, and has important significance for the risk management and the environmental safety of the PPCPs.
Drawings
FIG. 1 application domain Williams of PPCPs-type chemical ozone digestion rate model.
Fig. 2 is a graph of a euclidean distance based model application domain representation.
FIG. 3 is a graph showing the fit of the predicted value and experimental value of the MLR model for ozone digestion rate of PPCPs-type chemicals.
Detailed Description
The invention will be better understood from the following examples. However, it will be readily appreciated by those skilled in the art that the description of the embodiments is provided for illustration only and should not limit the invention as described in detail in the claims.
Example 1
The specific steps of constructing the model for predicting the digestion rate of ozone in PPCPs organic pollutant wastewater are as follows:
(1) Data collection, set training set and validation set sample compounds
Data of ozone second-order reaction rate (kO 3, pH 8.5) in 50 PPCPs wastewater are inquired from the literature; from environ. Sci. Technology.2013, 47,5872-5881. The training set selects a total of 37 sample compounds and the validation set selects a total of 13 sample compounds.
(2) Computing descriptors
Pre-optimizing the structure of the compound by MM+ molecular mechanics in Hyperchem 7.0 software, optimizing the structure of the compound by a semi-empirical AM1 method, calculating descriptors by using Dragon 5.4 software based on the optimized structure, primarily screening the calculated 1664 descriptors, and obtaining 66 molecular descriptors based on principal component factor analysis (SPSS 19.0).
X=AF+e (1)
Wherein a= (a ij ),a ij The factor load represents the correlation coefficient of the ith variable and the jth factor.
(3) Model construction
Variable selection is carried out by adopting a Genetic Algorithm (GA) in MobyDigs software, and a prediction model, namely a GA-MLR model, is established by adopting a Multiple Linear Regression (MLR) method based on the screened variable:
logY=-15.61706-1.99793AMW+90.15638Mp-8.38954AAC-3.00276Eeig01r+181.24166JGI5-0.55738nCbH (2)
wherein Y represents k O3 The value AMW represents the average molecular weight, MP represents the average atomic polarization degree, AAC represents the atomic average informationThe index, JGI5, eeig01r, the characteristic value of bond energy conjugate integral matrix, and nCbH, the number of unsubstituted C benzene rings.
(4) Characterization and evaluation of models
According to OECD guidelines for QSAR models, internal (goodness of fit and robustness assessment) and external (predictive ability assessment) validation of the constructed model is required. The square of the correlation coefficient between the corrected experimental value and the fitted value (R 2 adj ) Root Mean Square Error (RMSE) to characterize the goodness of fit of the model:
wherein n represents the number of compounds, m is the number of predicted variables, y i Andrespectively representing an experimental value and a predicted value of the activity index of the ith compound; />Is the average value of the experimental values of the compound activity index.
Cross-validation coefficient (Q) by one-step method 2 LOO ) And Bootstrapping method (Q) 2 BOOT ) Characterization of stability of the model:
wherein,mean values of the experimental values of the activity indexes of the compounds in the training set are shown. Bootstrapping method employs 1/5 cross-validation,repeated 5000 times.
Using externally verified correlation coefficients Q 2 EXT ,R 2 EXT ,RMSE EXT Characterization of model predictive capabilities:
wherein n is EXT Represents the number of compounds in the validation set,mean values of experimental and predicted values of the compound activity index of the validation set are shown.
And (3) obtaining characterization and evaluation parameters of the model:
n tr =37,R 2 adj =0.709,Q 2 LOO =0.658,Q 2 BOOT =0.687,RMSE tr =1.19
n EXT =13,R 2 EXT =0.628,Q 2 EXT =0.604,RMSE EXT =0.928
wherein n is tr And n EXT The number of compounds in the training set and the validation set, respectively, p being the significance level. R is R 2 adj Is a determination coefficient corrected by the degree of freedom; RMSE is root mean square error; q (Q) 2 LOO Cross-validating the coefficients for a one-way removal; q (Q) 2 BOOT Verifying coefficients for a Bootstrapping method; r is R 2 EXT For the correlation coefficient of experimental value and predicted value, Q 2 EXT Determining coefficients for external verification, RMSE EXT Root mean square error is the validation set.
The result shows that the model has better prediction capability and robustness.
Example 2
The present embodiment performs application domain characterization on the above-described prediction model. Application domains of the model are defined by using Euclidean distance method and a level-based Williams diagram. Euclidean distance was calculated using AMBIT Discovery v0.04 software (http:// ambit. Sourceforge. Net/download_ambit discovery. Html). The euclidean distance is calculated by the following formula:
where μ is the mean of the descriptor x.
The Williams plot is constructed from the standard residual error (delta) and the leverage value (in h i Representing i represents different compounds) defines a model application domain. Delta is calculated using the following formula:
lever value of training set compounds (level, h i ) The following equation may be used to determine:
h i = x i T (X T X) –1 x i (9)
wherein x is i Is the row vector of the molecular structure descriptor of the ith compound. Alert value (h) * ) The definition is as follows:
h * = 3(k + 1)/n (10)
where k is the number of descriptors and n is the number of training sets.
The model application domain characterization results are shown in fig. 1 and fig. 2. H in FIG. 1 * =3 (k+1)/n=3 (6+1)/37=0.568. The Williams diagram ordinate characterizes the degree of dispersion of experimental values by standard residuals of experimental values and predicted values, and is considered as an outlier when the absolute value of the standard residual delta of the compound is greater than 3.0. The abscissa represents compound h in the training set i Value of h i Above the alert value (h=0.568), this indicates that fewer substructures of the substance occur in the training set, which can have a significant impact on model predictions.
As can be seen, the lever value h of 1 compound exceeds the guard lever value h * The structure of the compound is different from that of the compound in the training set, butStandard residuals are all in the (-3, +3) range, indicating that the model is applicable to carbamazepine (CAS: 298-46-4). Standard residuals of both oxazepam (CAS: 604-75-1) and levetiracetam (CAS: 102767-28-2), which fall outside the (-3, +3) range, are outliers, indicating that the model is not suitable for prediction of these two species, while the remaining compounds are considered log k O3 Can be well predicted.
The application domain of the model is characterized based on Euclidean distance method. Fig. 2 is a euclidean distance map. The Euclidean distance from the feature vector of the training set compound to the feature vector of the central point ranges from 0.132 to 1.088, so that the compound with the feature vector Euclidean distance not more than 1.088 is suitable for the model. The model verifies that the Euclidean distance of the concentrated compound is less than 1.088, and the concentrated compound is in the application domain of the model.
Example 3
The ozone digestion rate in the wastewater of 50 PPCPs organic pollutants is predicted by using the model constructed in the example 1, and the results are shown in Table 2. R of model 2 adj =0.709, indicating that the model has a strong fitting ability. Q (Q) 2 LOO =0.658,Q 2 BOOT =0.687, indicating that the model is more robust. R is R 2 EXT =0.628,Q 2 EXT =0.604, golbraikh et al study considers that the acceptable standard for the QSAR model is Q2>0.50 and R2>0.60. The result shows that the model has better prediction capability and can be successfully applied to compounds outside the training set. FIG. 3 is a graph showing the fit of the predicted values and experimental values of the GA-MLR model of the ozone digestion rate of PPCPs chemical, and it can be seen from FIG. 3 that the predicted values and experimental values of most of the substances fit well, the deviation of the experimental values and predicted values of oxazepam (CAS: 604-75-1) and levetiracetam (CAS: 102767-28-2) is large, the prediction is poor, and the rest of the substances can be predicted well.
Example 4
Using the model constructed in example 1, the ozone digestion rate of metronidazole (SMILES: o= [ n+ ] ([ O- ])/c=c (nccscc1=cc=c (O1) CN (C)/NC) was predicted. First, 6 descriptors AMW, MP, AAC, eeig01r, JGI5, nCbH, 7.83,0.59,1.812,4.455,0.024,0, hat, 0.505, euclidean distance, 0.532, are calculated using Dragon software according to chemical molecular structure, within the range of the model application domain.
logY=-15.61706-1.99793*(12.06)+90.15638*(0.8)-8.38954*(1.692)-3.00276*(4.31)+181.24166*(0.025)-0.55738*(6)=-2.29
The metronidazole ozone digestion rate (k) was predicted to be <1, the experimental value was <1, and it was close to the experimental determination result.
Example 5
Using the model constructed in example 1, the ozone digestion rate of Li Tailin acids (SMILES: o=c (O) C (c2ccccn2) c1=cc=c1,) was predicted. Firstly, according to the molecular structure of chemical substances, using Dragon software to calculate 6 descriptors AMW, mp, AAC, eeig01r, JGI5 and nCbH; 6.65, 0.8, 1.692, 4.31, 0.025 and 6, respectively. Hat is 0.213, euclidean distance is 0.427, and within the application domain range of the model, the model can be used for predicting Li Tailin acid ozone digestion rate, and descriptor values are substituted into the model to obtain the following components:
logY=-15.61706-1.99793*(12.06)+90.15638*(0.8)-8.38954*(1.692)-3.00276*(4.31)+181.24166*(0.025)-0.55738*(6)=3.37
then Li Tailin acid ozone digestion rate (k) is predicted to be 4.47 x 10 3 Experimental value was 2.1X10 4 Is close to the test measurement result.
TABLE 2 results of experimental and predicted values of ozone digestion rates in 50 PPCPs-type organic chemical wastewater

Claims (3)

  1. The method for constructing the prediction model of the ozone digestion rate in the PPCPs organic pollutant wastewater is characterized by comprising the following steps of:
    step one, data collection, namely setting a training set and a verification set sample compound;
    step two, calculating descriptors;
    thirdly, constructing a model;
    step four, characterization and evaluation of the model;
    the data in the first step are ozone second-order reaction rate k of 50 PPCPs wastewater at pH8.5 O3 Data, training set selecting 37 sample compounds, verification set selecting 13 sample compounds;
    pre-optimizing a compound structure by adopting MM+ molecular mechanics in Hyperchem 7.0 software, optimizing the compound structure by a semi-empirical AM1 method, calculating descriptors by using Dragon 5.4 software based on the optimized structure, and primarily screening the calculated 1664 descriptors, namely removing constant terms, approximate constant terms and 704 highly relevant molecular descriptors, analyzing SPSS19.0 based on principal component factors to extract 19 factors with accumulated contribution rate reaching 90%, and screening descriptors with contribution rate greater than 0.7 in each factor to obtain 66 important molecular descriptors:
    X=AF+e (1)
    wherein a= (a ij ),a ij The factor load is the correlation coefficient of the ith variable and the jth factor; a is a factor load matrix, F is a common factor of X, e is a special factor of X, and X is a descriptor matrix;
    step three, adopting genetic algorithm GA in MobyDigs software to select variables, wherein the related parameters of the GA are as follows: the population is 100, the variation probability is 0.5, the maximum allowed feature number in the model is 10, the evaluation function is LOO-CV (LOO-CV) interactive verification by a leave-one-out method, and other parameters are all default values; when the influence of the increased variable number on the result is not great, obtaining the optimal parameter number; based on the screened variables, a predictive model is established by adopting a multiple linear regression method, and 6 molecular descriptors and models are screened as follows:
    GLA-MLR linear equation:
    logY=-15.61706-1.99793AMW+90.15638Mp-8.38954AAC-3.00276Eeig01r+181.24166JGI5-0.55738nCbH (2)
    wherein Y represents k O3 Values, AMW represents average molecular weight, mp represents average atomic polarization degree, AAC represents atomic average information index, JGI5 represents topological charge index, eeig01r represents bond energy conjugate integration matrix eigenvalue, nCbH represents unsubstituted number of C benzene rings;
    step four, adopting a correlation coefficient square R between the corrected experimental value and the fitting value 2 adj The root mean square error RMSE to characterize the goodness of fit of the model:
    wherein n represents the number of compounds, m is the number of predicted variables, y i Andrespectively representing an experimental value and a predicted value of the activity index of the ith compound; />Is the average value of the experimental values of the compound activity index;
    by removing cross-validation factor Q 2 LOO And Bootstrapping method Q 2 BOOT Characterization of stability of the model:
    wherein,the average value of the experimental values of the activity indexes of the compound in the training set is represented, 1/5 of cross validation is adopted by the Bootstrapping method, and the training set is repeated for 5000 times;
    using externally verified correlation coefficients Q 2 EXT ,R 2 EXT ,RMSE EXT Characterization of model predictive capabilities:
    wherein n is EXT Represents the number of compounds in the validation set,representing the average value of the experimental value and the predicted value of the activity index of the compound in the verification set;
    and (3) obtaining characterization and evaluation parameters of the model:
    n tr =37,R 2 adj =0.709,Q 2 LOO =0.658,Q 2 BOOT =0.687,RMSE tr =1.19;
    n EXT =13,R 2 EXT =0.628,Q 2 EXT =0.604,RMSE EXT =0.928
    wherein n is tr For the number of training sets, R 2 adj Is a determination coefficient corrected by the degree of freedom; RMSE tr Root mean square error for training set; q (Q) 2 LOO Cross-validating the coefficients for a one-way removal; q (Q) 2 BOOT Verifying coefficients for a Bootstrapping method; r is R 2 EXT For the correlation coefficient of experimental value and predicted value, Q 2 EXT Determining coefficients for external verification, RMSE EXT Root mean square error is the validation set.
  2. 2. The method for constructing an ozone digestion rate prediction model in PPCPs organic pollutant wastewater according to claim 1, wherein unknown compounds are subjected to molecular structure input and structure optimization processes, AMW, mp, AAC, JGI, eeig01r and nCbH 6 descriptors are calculated through Dragon software, and the prediction is utilizedModel to obtain unknown compound k O3 Is a predicted value of (a).
  3. 3. The method for constructing the prediction model of the ozone digestion rate in the PPCPs organic pollutant wastewater, which is characterized in that: the method is used for ozone digestion rate prediction of the compounds metronidazole and Li Tailin acid.
CN201911341369.8A 2019-12-24 2019-12-24 Method for constructing predictive model of ozone digestion rate in PPCPs organic pollutant wastewater Active CN111310299B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911341369.8A CN111310299B (en) 2019-12-24 2019-12-24 Method for constructing predictive model of ozone digestion rate in PPCPs organic pollutant wastewater

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911341369.8A CN111310299B (en) 2019-12-24 2019-12-24 Method for constructing predictive model of ozone digestion rate in PPCPs organic pollutant wastewater

Publications (2)

Publication Number Publication Date
CN111310299A CN111310299A (en) 2020-06-19
CN111310299B true CN111310299B (en) 2024-03-19

Family

ID=71161494

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911341369.8A Active CN111310299B (en) 2019-12-24 2019-12-24 Method for constructing predictive model of ozone digestion rate in PPCPs organic pollutant wastewater

Country Status (1)

Country Link
CN (1) CN111310299B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113393907B (en) * 2021-07-20 2023-05-02 西安交通大学 PPCPs organic pollutant degradation rate prediction model construction method and device
CN117236528B (en) * 2023-11-15 2024-01-23 成都信息工程大学 Ozone concentration forecasting method and system based on combined model and factor screening

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102507630A (en) * 2011-11-30 2012-06-20 大连理工大学 Method for forecasting oxidation reaction rate constant of chemical substance and ozone based on molecular structure and environmental temperature

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102507630A (en) * 2011-11-30 2012-06-20 大连理工大学 Method for forecasting oxidation reaction rate constant of chemical substance and ozone based on molecular structure and environmental temperature

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
范德玲等."有机化学品与臭氧反应速率常数的定量预测模型研究".《生态与农村环境学报》.2019,第35卷(第9期),第1214-1218页. *

Also Published As

Publication number Publication date
CN111310299A (en) 2020-06-19

Similar Documents

Publication Publication Date Title
CN111310299B (en) Method for constructing predictive model of ozone digestion rate in PPCPs organic pollutant wastewater
Appiani et al. On the use of hydroxyl radical kinetics to assess the number-average molecular weight of dissolved organic matter
Chakraborty et al. Predicting soil arsenic pools by visible near infrared diffuse reflectance spectroscopy
Chen et al. Ecotoxicological QSAR study of fused/non-fused polycyclic aromatic hydrocarbons (FNFPAHs): Assessment and priority ranking of the acute toxicity to Pimephales promelas by QSAR and consensus modeling methods
Loeffler et al. Determination of non-extractable residues in soils: Towards a standardised approach
Tahmasbian et al. Using laboratory-based hyperspectral imaging method to determine carbon functional group distributions in decomposing forest litterfall
Moore et al. Underestimation of sector-wide methane emissions from United States wastewater treatment
Imoto et al. Comparison of the impacts of the experimental parameters and soil properties on the prediction of the soil sorption of Cd and Pb
Ilyas et al. Prediction of the removal efficiency of emerging organic contaminants based on design and operational parameters of constructed wetlands
Li et al. Genetic algorithm (GA)-Artificial neural network (ANN) modeling for the emission rates of toxic volatile organic compounds (VOCs) emitted from landfill working surface
CN111261238A (en) Construction method of PPCPs organic chemical mesophilic anaerobic digestion removal rate prediction model
Yassin et al. Geochemical and Spatial Distribution of Topsoil HMs Coupled with Modeling of Cr Using Chemometrics Intelligent Techniques: Case Study from Dammam Area, Saudi Arabia
Moufid et al. Pollution parameters evaluation of wastewater collected at different treatment stages from wastewater treatment plant based on E-nose and E-tongue systems combined with chemometric techniques
Onufrak et al. The missing metric: an evaluation of fungal importance in wetland assessments
Zhang et al. Chemical Space Covered by Applicability Domains of Quantitative Structure–Property Relationships and Semiempirical Relationships in Chemical Assessments
Baek et al. Analysis of micropollutants in a marine outfall using network analysis and decision tree
Nguyen et al. Estimating ammonium changes in pilot and full-scale constructed wetlands using kinetic model, linear regression, and machine learning
Langford et al. Evaluation of the efficacy of SIFT-MS for speciation of wastewater treatment plant odors in parallel with human sensory analysis
Sohn et al. Non-specific conducting polymer-based array capable of monitoring odour emissions from a biofiltration system in a piggery building
Rajabi et al. QSAR models for predicting aquatic toxicity of esters using genetic algorithm-multiple linear regression methods
Spigno et al. Development of Hybrid Models for a Vapor‐Phase Fungi Bioreactor
CN114864015A (en) Water eutrophication detection method, device, equipment and storage medium
Kamran Haghighi et al. Modeling on transition of heavy metals from Ni–Cd zinc plant residue using artificial neural network
LU502703B1 (en) Method for predicting ozone digestion rate of PPCPS-Type organic chemical in wastewater treatment
Facchin et al. Simultaneous determination of lead and sulfur by energy‐dispersive x‐ray spectrometry. Comparison between artificial neural networks and other multivariate calibration methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant