CN111310299B

CN111310299B - Method for constructing predictive model of ozone digestion rate in PPCPs organic pollutant wastewater

Info

Publication number: CN111310299B
Application number: CN201911341369.8A
Authority: CN
Inventors: 范德玲; 周林军; 汪贞; 郭敏; 王蕾; 古文; 刘济宁; 石利利; 刘明庆
Original assignee: Nanjing Institute of Environmental Sciences MEE
Current assignee: Nanjing Institute of Environmental Sciences MEE
Priority date: 2019-12-24
Filing date: 2019-12-24
Publication date: 2024-03-19
Anticipated expiration: 2039-12-24
Also published as: CN111310299A

Abstract

A method for constructing an ozone digestion rate prediction model in PPCPs organic pollutant wastewater comprises the following steps: data collection, training set setting and verification set sample compounds, descriptor calculation, model construction, and characterization and evaluation of the model. The ozone digestion rate prediction model in the organic pollutant wastewater constructed by the method can accurately predict the ozone digestion rate in the PPCPs chemical organic pollutant wastewater, saves manpower, material resources and time, is simple, quick and effective, interprets the ozone digestion PPCPs mechanism from the molecular descriptor structure strictly according to the QSAR model usage rule regulated by OECD, is beneficial to the adjustment of process parameters such as the ozone dosage of the PPCPs removal process of a sewage treatment plant, and has important significance for the risk management and control of the PPCPs and the environmental safety.

Description

Method for constructing predictive model of ozone digestion rate in PPCPs organic pollutant wastewater

Technical Field

The invention belongs to the technical field of environmental protection, and particularly relates to a method for predicting ozone digestion rate in PPCPs organic pollutant wastewater.

Background

The contamination of water environments by drugs and personal care products (Pharmaceuticals and Personal Care Products, PPCPs) is of increasing concern. PPCPs include various prescription and over-the-counter drugs such as antibiotics, anti-inflammatory drugs, sedatives, lipid modulators, beta-blockers, cytostatics, and cosmetic and other personal care products and their respective metabolites, and the like. PPCPs are widely used in industries such as human medical treatment, health care products, cosmetics, aquatic products, livestock breeding and the like, and the use of the PPCPs in a large amount causes the PPCPs to inevitably and continuously enter water environment. Although their concentration in water environments is very low (ng/L to μg/L), they are increasingly enriched with food chains and food nets, thereby endangering the ecological environment and human health.

The sewage treatment stage is the last gateway of PPCPs entering the environment, and the removal efficiency of the PPCPs is directly related to the exposure concentration of the PPCPs in the environment, thereby influencing the environmental risk. In recent years, ozone oxidation is used as a pretreatment for breaking macromolecular organic matters before biochemical treatment or a advanced treatment and upgrading method for biochemical drainage, and is gradually applied to the advanced oxidation advanced treatment field of sewage treatment. Therefore, the evaluation of the removal efficiency of the PPCPs by the ozone treatment in the sewage is an important content of the environmental risk evaluation of the PPCPs, is also beneficial to the adjustment of process parameters such as the ozone dosage of the PPCPs removal process of sewage treatment plants, and has important significance on the risk management and the environmental safety of the PPCPs.

The current evaluation of the removal efficiency of PPCPs by ozone in water can only be determined by means of tests. Because PPCPs are more in variety, and the method for analyzing the wastewater needs higher cost for establishment and measurement, time and labor are wasted, the test for measuring the removal efficiency of the PPCPs is not very feasible.

The mechanism prediction method is a better alternative method. The chemical reacts with ozone to follow a secondary kinetic equation, and the ozone reaction kinetic equation is established to predict the ozone reaction removal efficiency of the chemical:

in the formula, [ C ]] _t -concentration of chemical at time t, mg/L;

[C] ₀ -concentration of chemical at time 0, mg/L;

[C] _t /[C] ₀ -the residual rate of the chemical at time t, dimensionless;

k _O3 -reaction of chemicals with ozoneResponse rate, M ^-1 ·s ^-1 ；

k _·OH -reaction rate of chemical with hydroxyl radical, M ^-1 ·s ^-1 ；

∫[O ₃ ]dt, the cumulative exposure concentration of ozone, M.s;

and [. OH ] dt-cumulative exposure concentration of hydroxyl radicals, M.s.

There are 4 main parameters of the chemical removal rate calculated according to the above formula, in which the ozone cumulative exposure concentration (+[ O ] ₃ ]dt) and the cumulative exposure concentration of hydroxyl radicals (+.OH)]dt) is related to the amount of ozone additive in the wastewater removal process. Reaction Rate of ozone (k) _O3 ) Reaction Rate with hydroxyl radical (k) _·OH ) Is a property of the chemical itself.

As can be seen from the above formula, the reaction rate (k) of ozone of PPCPs is known _O3 ) Reaction Rate with hydroxyl radical (k) _·OH ) The removal efficiency of PPCPs can be calculated.

Some predictive organic chemicals k are at home and abroad _O3 Is a predictive model of (Sairam Sudhakaran. QSAR models for oxidation of organic micropollutants in water based on ozone and hydroxyl radical rate constants and their chemical classification. Water Research,2013, 47:1111-1122), but these predictive models are broad-spectrum models, not specific to the k of PPCPs _O3 Is a predictive model of (a). Broad spectrum models sacrifice accuracy of prediction of a particular class of chemicals in order to fit most chemicals, and therefore their k to PPCPs _O3 The prediction accuracy is low.

The present study establishes k for PPCPs _O3 The applicability of the PPCPs is greatly enhanced, and the accuracy of the PPCPs prediction result is improved.

Disclosure of Invention

The invention establishes a method for constructing a model for predicting the ozone reaction rate in PPCPs organic pollutant wastewater, which is used for predicting and evaluating the PPCPs removal efficiency of a sewage treatment plant in a chemical industry park. The method comprises the following steps:

(1) Data collection, set training set and validation set sample compounds

The second-order reaction rate (k) of ozone in 50 PPCPs wastewater is inquired from the literature _O3 ph 8.5) data; from environ. Sci. Technology.2013, 47,5872-5881. The training set and the verification set are set to obtain sample compounds, wherein the training set is used for selecting 37 sample compounds, and the verification set is used for selecting 13 sample compounds. The training set samples are structurally diversified as much as possible, and the active coverage range is as large as possible, so that the application range of the model is wide, and the prediction capability is strong. The validation set is then used to evaluate the predictive power of the built model, contained within the descriptor space of the training set.

(2) Computing descriptors

Pre-optimizing a compound structure by adopting MM+ molecular mechanics in Hyperchem 7.0 software, optimizing the compound structure by a semi-empirical AM1 method, calculating descriptors by using Dragon 5.4 software based on the optimized structure, and primarily screening the calculated 1664 descriptors, namely removing constant terms, approximate constant terms and 704 molecular descriptors with high correlation (the correlation coefficient between two molecular descriptors with the correlation coefficient being larger than 0.98 and a target value is smaller), extracting 19 factors (the cumulative contribution rate reaches 90%) based on principal component factor analysis (SPSS 19.0), and screening the descriptors with the contribution rate being larger than 0.7 in each factor to obtain 66 important molecular descriptors.

X＝AF+e (1)

Wherein a= (a _ij )，a _ij The factor load is the correlation coefficient of the ith variable and the jth factor; a is a factor loading matrix, F is a common factor of X, e is a special factor of X, and X is a descriptor matrix.

(3) Model construction

Variable selection is carried out by adopting a Genetic Algorithm (GA) in MobyDigs software, and the related parameters of the GA are as follows: the population number is 100, the variation probability is 0.5, the maximum allowed feature number in the model is 10, the evaluation function is leave-one-out interactive verification (LOO-CV), and other parameters are all default values. When the influence of the increased variable number on the result is not great, the optimal parameter number is obtained. Based on the screened variables, a Multiple Linear Regression (MLR) method is adopted to build a prediction model, namely a GA-MLR model.

logY＝-15.61706-1.99793AMW+90.15638Mp-8.38954AAC-3.00276Eeig01r+181.24166JGI5-0.55738nCbH (2)

Wherein Y represents k _O3 The values AMW represents the average molecular weight, mp represents the average atomic polarization degree, AAC represents the atomic average information index, JGI5 represents the topological charge index, eeig01r represents the characteristic value of the bond energy conjugate integration matrix, and nCbH represents the number of unsubstituted C benzene rings.

(4) Characterization and evaluation of models

According to OECD guidelines for QSAR models, internal (goodness of fit and robustness assessment) and external (predictive ability assessment) validation of the constructed model is required. The goodness of fit is the proportion of the variation information which can be interpreted by the model in the total variation information of the independent variables. The square of the correlation coefficient between the corrected experimental value and the fitted value (R ² _adj ) Root Mean Square Error (RMSE) to characterize the goodness of fit of the model:

wherein n represents the number of compounds, m is the number of predicted variables, y _i Andrespectively representing an experimental value and a predicted value of the activity index of the ith compound; />Is the average value of the experimental values of the compound activity index.

Cross-validation coefficient (Q) by one-step method ² _LOO ) And Bootstrapping method (Q) ² _BOOT ) Characterization of stability of the model:

wherein,mean values of the experimental values of the activity indexes of the compounds in the training set are shown. Bootstrapping method was repeated 5000 times with 1/5 cross-validation.

Using externally verified correlation coefficients Q ² _EXT ,R ² _EXT ,RMSE _EXT Characterization of model predictive capabilities:

wherein n is _EXT Represents the number of compounds in the validation set,mean values of experimental and predicted values of the compound activity index of the validation set are shown.

And (3) obtaining characterization and evaluation parameters of the model:

n _tr ＝37，R ² _adj ＝0.709，Q ² _LOO ＝0.658，Q ² _BOOT ＝0.687，RMSE _tr ＝1.19

n _EXT ＝13，R ² _EXT ＝0.628，Q ² _EXT ＝0.604，RMSE _EXT ＝0.928

wherein n is _tr And n _EXT The number of compounds in the training set and the validation set, respectively, p being the significance level. R is R ² _adj Is a determination coefficient corrected by the degree of freedom; RMSE is root mean square error; q (Q) ² _LOO Cross-validating the coefficients for a one-way removal; q (Q) ² _BOOT Verifying coefficients for a Bootstrapping method; r is R ² _EXT For the correlation coefficient of experimental value and predicted value, Q ² _EXT Determining coefficients for external verification, RMSE _EXT For verification set root mean square errorAnd (3) difference.

Golbraikh et al research considers that the QSAR model acceptable criteria is Q ² >0.50 and R ² >0.60. The result shows that the model has better prediction capability.

Furthermore, unknown compound k is obtained by calculating AMW, mp, AAC, JGI, eeig01r and nCbH 6 descriptors through Dragon software through the input of molecular structure and structure optimization process of the unknown compound and utilizing the prediction model _O3 Is a predicted value of (a).

Further, the method is applicable to ozone digestion rate prediction of the compounds metronidazole and Li Tailin acid.

Further, the mechanism explanation is performed on the model: based on Dragon descriptors, a quantitative prediction model of PPCPs organic chemical ozone digestion rate constants is constructed by adopting principal component analysis and genetic algorithm-multiple linear regression algorithm, and research results show that: average Molecular Weight (AMW), average atomic polarization degree (Mp), atomic average information index (AAC), topological charge index (JGI 5), eeig01r (representing characteristic value of bond energy conjugate integration matrix), and number of unsubstituted C benzene rings (nCbH) have significant influence on the ozone digestion rate of PPCPs class compound, as shown in table 1. AMW, AAC, eeig01r and nCbH are inversely related to the ozone digestion rate, i.e. the greater the average molecular weight, atomic average information index, bond energy conjugate integration matrix eigenvalue and benzene ring unsubstituted C value, the lesser the ozone digestion rate. The Mp is positively correlated with the ozone digestion rate, and the greater the relative degree of change of electron cloud forming a bond under the action of an external electric field, the smaller the electronegativity, the greater the polarization degree of the bond and the greater the ozone digestion effect. JGI5 reflects the 5 th order topological charge average index of the compound, with greater topological charge index values and greater ozone digestion rates.

TABLE 1 six parameters in regression model and their physicochemical meanings and coefficients in model

Variable(s)	Descriptor meaning	Regression coefficient	Regression coefficient deviation	Standard regression coefficient
					Constant (constant)		-15.61706	7.96968
AMW	Average molecular weight	-1.99793	0.34868	-1.86685
					Mp	Average atomic polarization degree	90.15638	17.43372	1.75023
AAC	Atomic average information index	-8.38954	1.81277	-0.78161
					EEig01r	Key energy conjugate integral matrix eigenvalue	-3.00276	1.08265	-0.29843
JGI5	5 th order topological charge average index	181.24166	26.98228	0.7957
					nCbH	Benzene ring unsubstituted C (sp 2)	-0.55738	0.14196	-0.6387

The invention has the beneficial effects that: the ozone digestion rate prediction model in the organic pollutant wastewater constructed by the method can accurately predict the ozone digestion rate in the PPCPs chemical organic pollutant wastewater, enhances the applicability of the prediction of the ozone digestion rate in the PPCPs wastewater, improves the accuracy of a prediction result, saves manpower, material resources and time, is simple, quick and effective, interprets the ozone digestion PPCPs mechanism from a molecular descriptor structure strictly according to the QSAR model usage rules regulated by OECD, is beneficial to the adjustment of process parameters such as the ozone dosage of the PPCPs removal process of a sewage treatment plant, and has important significance for the risk management and the environmental safety of the PPCPs.

Drawings

FIG. 1 application domain Williams of PPCPs-type chemical ozone digestion rate model.

Fig. 2 is a graph of a euclidean distance based model application domain representation.

FIG. 3 is a graph showing the fit of the predicted value and experimental value of the MLR model for ozone digestion rate of PPCPs-type chemicals.

Detailed Description

The invention will be better understood from the following examples. However, it will be readily appreciated by those skilled in the art that the description of the embodiments is provided for illustration only and should not limit the invention as described in detail in the claims.

Example 1

The specific steps of constructing the model for predicting the digestion rate of ozone in PPCPs organic pollutant wastewater are as follows:

(1) Data collection, set training set and validation set sample compounds

Data of ozone second-order reaction rate (kO 3, pH 8.5) in 50 PPCPs wastewater are inquired from the literature; from environ. Sci. Technology.2013, 47,5872-5881. The training set selects a total of 37 sample compounds and the validation set selects a total of 13 sample compounds.

(2) Computing descriptors

Pre-optimizing the structure of the compound by MM+ molecular mechanics in Hyperchem 7.0 software, optimizing the structure of the compound by a semi-empirical AM1 method, calculating descriptors by using Dragon 5.4 software based on the optimized structure, primarily screening the calculated 1664 descriptors, and obtaining 66 molecular descriptors based on principal component factor analysis (SPSS 19.0).

X＝AF+e (1)

Wherein a= (a _ij ),a _ij The factor load represents the correlation coefficient of the ith variable and the jth factor.

(3) Model construction

Variable selection is carried out by adopting a Genetic Algorithm (GA) in MobyDigs software, and a prediction model, namely a GA-MLR model, is established by adopting a Multiple Linear Regression (MLR) method based on the screened variable:

wherein Y represents k _O3 The value AMW represents the average molecular weight, MP represents the average atomic polarization degree, AAC represents the atomic average informationThe index, JGI5, eeig01r, the characteristic value of bond energy conjugate integral matrix, and nCbH, the number of unsubstituted C benzene rings.

(4) Characterization and evaluation of models

According to OECD guidelines for QSAR models, internal (goodness of fit and robustness assessment) and external (predictive ability assessment) validation of the constructed model is required. The square of the correlation coefficient between the corrected experimental value and the fitted value (R ² _adj ) Root Mean Square Error (RMSE) to characterize the goodness of fit of the model:

wherein,mean values of the experimental values of the activity indexes of the compounds in the training set are shown. Bootstrapping method employs 1/5 cross-validation,repeated 5000 times.

And (3) obtaining characterization and evaluation parameters of the model:

n _EXT ＝13，R ² _EXT ＝0.628，Q ² _EXT ＝0.604，RMSE _EXT ＝0.928

wherein n is _tr And n _EXT The number of compounds in the training set and the validation set, respectively, p being the significance level. R is R ² _adj Is a determination coefficient corrected by the degree of freedom; RMSE is root mean square error; q (Q) ² _LOO Cross-validating the coefficients for a one-way removal; q (Q) ² _BOOT Verifying coefficients for a Bootstrapping method; r is R ² _EXT For the correlation coefficient of experimental value and predicted value, Q ² _EXT Determining coefficients for external verification, RMSE _EXT Root mean square error is the validation set.

The result shows that the model has better prediction capability and robustness.

Example 2

The present embodiment performs application domain characterization on the above-described prediction model. Application domains of the model are defined by using Euclidean distance method and a level-based Williams diagram. Euclidean distance was calculated using AMBIT Discovery v0.04 software (http:// ambit. Sourceforge. Net/download_ambit discovery. Html). The euclidean distance is calculated by the following formula:

where μ is the mean of the descriptor x.

The Williams plot is constructed from the standard residual error (delta) and the leverage value (in h _i Representing i represents different compounds) defines a model application domain. Delta is calculated using the following formula:

lever value of training set compounds (level, h _i ) The following equation may be used to determine:

h _i ＝ x _i ^T (X ^T X) ^–1 x _i (9)

wherein x is _i Is the row vector of the molecular structure descriptor of the ith compound. Alert value (h) ^* ) The definition is as follows:

h ^* ＝ 3(k + 1)/n (10)

where k is the number of descriptors and n is the number of training sets.

The model application domain characterization results are shown in fig. 1 and fig. 2. H in FIG. 1 ^* =3 (k+1)/n=3 (6+1)/37=0.568. The Williams diagram ordinate characterizes the degree of dispersion of experimental values by standard residuals of experimental values and predicted values, and is considered as an outlier when the absolute value of the standard residual delta of the compound is greater than 3.0. The abscissa represents compound h in the training set _i Value of h _i Above the alert value (h=0.568), this indicates that fewer substructures of the substance occur in the training set, which can have a significant impact on model predictions.

As can be seen, the lever value h of 1 compound exceeds the guard lever value h ^* The structure of the compound is different from that of the compound in the training set, butStandard residuals are all in the (-3, +3) range, indicating that the model is applicable to carbamazepine (CAS: 298-46-4). Standard residuals of both oxazepam (CAS: 604-75-1) and levetiracetam (CAS: 102767-28-2), which fall outside the (-3, +3) range, are outliers, indicating that the model is not suitable for prediction of these two species, while the remaining compounds are considered log k _O3 Can be well predicted.

The application domain of the model is characterized based on Euclidean distance method. Fig. 2 is a euclidean distance map. The Euclidean distance from the feature vector of the training set compound to the feature vector of the central point ranges from 0.132 to 1.088, so that the compound with the feature vector Euclidean distance not more than 1.088 is suitable for the model. The model verifies that the Euclidean distance of the concentrated compound is less than 1.088, and the concentrated compound is in the application domain of the model.

Example 3

The ozone digestion rate in the wastewater of 50 PPCPs organic pollutants is predicted by using the model constructed in the example 1, and the results are shown in Table 2. R of model ² _adj =0.709, indicating that the model has a strong fitting ability. Q (Q) ² _LOO ＝0.658，Q ² _BOOT =0.687, indicating that the model is more robust. R is R ² _EXT ＝0.628，Q ² _EXT =0.604, golbraikh et al study considers that the acceptable standard for the QSAR model is Q2>0.50 and R2>0.60. The result shows that the model has better prediction capability and can be successfully applied to compounds outside the training set. FIG. 3 is a graph showing the fit of the predicted values and experimental values of the GA-MLR model of the ozone digestion rate of PPCPs chemical, and it can be seen from FIG. 3 that the predicted values and experimental values of most of the substances fit well, the deviation of the experimental values and predicted values of oxazepam (CAS: 604-75-1) and levetiracetam (CAS: 102767-28-2) is large, the prediction is poor, and the rest of the substances can be predicted well.

Example 4

Using the model constructed in example 1, the ozone digestion rate of metronidazole (SMILES: o= [ n+ ] ([ O- ])/c=c (nccscc1=cc=c (O1) CN (C)/NC) was predicted. First, 6 descriptors AMW, MP, AAC, eeig01r, JGI5, nCbH, 7.83,0.59,1.812,4.455,0.024,0, hat, 0.505, euclidean distance, 0.532, are calculated using Dragon software according to chemical molecular structure, within the range of the model application domain.

logY＝-15.61706-1.99793*(12.06)+90.15638*(0.8)-8.38954*(1.692)-3.00276*(4.31)+181.24166*(0.025)-0.55738*(6)＝-2.29

The metronidazole ozone digestion rate (k) was predicted to be <1, the experimental value was <1, and it was close to the experimental determination result.

Example 5

Using the model constructed in example 1, the ozone digestion rate of Li Tailin acids (SMILES: o=c (O) C (c2ccccn2) c1=cc=c1,) was predicted. Firstly, according to the molecular structure of chemical substances, using Dragon software to calculate 6 descriptors AMW, mp, AAC, eeig01r, JGI5 and nCbH; 6.65, 0.8, 1.692, 4.31, 0.025 and 6, respectively. Hat is 0.213, euclidean distance is 0.427, and within the application domain range of the model, the model can be used for predicting Li Tailin acid ozone digestion rate, and descriptor values are substituted into the model to obtain the following components:

logY＝-15.61706-1.99793*(12.06)+90.15638*(0.8)-8.38954*(1.692)-3.00276*(4.31)+181.24166*(0.025)-0.55738*(6)＝3.37

then Li Tailin acid ozone digestion rate (k) is predicted to be 4.47 x 10 ³ Experimental value was 2.1X10 ⁴ Is close to the test measurement result.

TABLE 2 results of experimental and predicted values of ozone digestion rates in 50 PPCPs-type organic chemical wastewater

Claims

The method for constructing the prediction model of the ozone digestion rate in the PPCPs organic pollutant wastewater is characterized by comprising the following steps of:

step one, data collection, namely setting a training set and a verification set sample compound;

step two, calculating descriptors;

thirdly, constructing a model;

step four, characterization and evaluation of the model;

the data in the first step are ozone second-order reaction rate k of 50 PPCPs wastewater at pH8.5 _O3 Data, training set selecting 37 sample compounds, verification set selecting 13 sample compounds;

pre-optimizing a compound structure by adopting MM+ molecular mechanics in Hyperchem 7.0 software, optimizing the compound structure by a semi-empirical AM1 method, calculating descriptors by using Dragon 5.4 software based on the optimized structure, and primarily screening the calculated 1664 descriptors, namely removing constant terms, approximate constant terms and 704 highly relevant molecular descriptors, analyzing SPSS19.0 based on principal component factors to extract 19 factors with accumulated contribution rate reaching 90%, and screening descriptors with contribution rate greater than 0.7 in each factor to obtain 66 important molecular descriptors:

X＝AF+e (1)

wherein a= (a _ij )，a _ij The factor load is the correlation coefficient of the ith variable and the jth factor; a is a factor load matrix, F is a common factor of X, e is a special factor of X, and X is a descriptor matrix;

step three, adopting genetic algorithm GA in MobyDigs software to select variables, wherein the related parameters of the GA are as follows: the population is 100, the variation probability is 0.5, the maximum allowed feature number in the model is 10, the evaluation function is LOO-CV (LOO-CV) interactive verification by a leave-one-out method, and other parameters are all default values; when the influence of the increased variable number on the result is not great, obtaining the optimal parameter number; based on the screened variables, a predictive model is established by adopting a multiple linear regression method, and 6 molecular descriptors and models are screened as follows:

GLA-MLR linear equation:

logY＝-15.61706-1.99793AMW+90.15638Mp-8.38954AAC-3.00276Eeig01r+181.24166JGI5-0.55738nCbH (2)

wherein Y represents k _O3 Values, AMW represents average molecular weight, mp represents average atomic polarization degree, AAC represents atomic average information index, JGI5 represents topological charge index, eeig01r represents bond energy conjugate integration matrix eigenvalue, nCbH represents unsubstituted number of C benzene rings;

step four, adopting a correlation coefficient square R between the corrected experimental value and the fitting value ² _adj The root mean square error RMSE to characterize the goodness of fit of the model:

wherein n represents the number of compounds, m is the number of predicted variables, y _i Andrespectively representing an experimental value and a predicted value of the activity index of the ith compound; />Is the average value of the experimental values of the compound activity index;

by removing cross-validation factor Q ² _LOO And Bootstrapping method Q ² _BOOT Characterization of stability of the model:

wherein,the average value of the experimental values of the activity indexes of the compound in the training set is represented, 1/5 of cross validation is adopted by the Bootstrapping method, and the training set is repeated for 5000 times;

using externally verified correlation coefficients Q ² _EXT ,R ² _EXT ,RMSE _EXT Characterization of model predictive capabilities:

wherein n is _EXT Represents the number of compounds in the validation set,representing the average value of the experimental value and the predicted value of the activity index of the compound in the verification set;

and (3) obtaining characterization and evaluation parameters of the model:

n _tr ＝37，R ² _adj ＝0.709，Q ² _LOO ＝0.658，Q ² _BOOT ＝0.687，RMSE _tr ＝1.19；

n _EXT ＝13，R ² _EXT ＝0.628，Q ² _EXT ＝0.604，RMSE _EXT ＝0.928

wherein n is _tr For the number of training sets, R ² _adj Is a determination coefficient corrected by the degree of freedom; RMSE _tr Root mean square error for training set; q (Q) ² _LOO Cross-validating the coefficients for a one-way removal; q (Q) ² _BOOT Verifying coefficients for a Bootstrapping method; r is R ² _EXT For the correlation coefficient of experimental value and predicted value, Q ² _EXT Determining coefficients for external verification, RMSE _EXT Root mean square error is the validation set.
2. The method for constructing an ozone digestion rate prediction model in PPCPs organic pollutant wastewater according to claim 1, wherein unknown compounds are subjected to molecular structure input and structure optimization processes, AMW, mp, AAC, JGI, eeig01r and nCbH 6 descriptors are calculated through Dragon software, and the prediction is utilizedModel to obtain unknown compound k _O3 Is a predicted value of (a).
3. The method for constructing the prediction model of the ozone digestion rate in the PPCPs organic pollutant wastewater, which is characterized in that: the method is used for ozone digestion rate prediction of the compounds metronidazole and Li Tailin acid.