CN102999705A - Method for predicting n-octyl alcohol air distribution coefficient (KOA) at different temperatures through quantitative structure-activity relationship and solvent model - Google Patents
Method for predicting n-octyl alcohol air distribution coefficient (KOA) at different temperatures through quantitative structure-activity relationship and solvent model Download PDFInfo
- Publication number
- CN102999705A CN102999705A CN2012105059356A CN201210505935A CN102999705A CN 102999705 A CN102999705 A CN 102999705A CN 2012105059356 A CN2012105059356 A CN 2012105059356A CN 201210505935 A CN201210505935 A CN 201210505935A CN 102999705 A CN102999705 A CN 102999705A
- Authority
- CN
- China
- Prior art keywords
- qsar
- model
- compound
- descriptor
- logk
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a theoretical prediction method for organic chemical n-octyl alcohol/ air distribution coefficient (KOA) and belongs to the field of ecological risk assessment testing strategy. The method comprises the following steps of: establishing a quantitative structure-activity relationship (QSAR) based on a molecular Dragon descriptor of a compound and calculating free melting energy based on a thermodynamic principle by adopting an open source solvent model, and transforming to obtain the KOA according to a thermodynamic principle formula of logKOA=-deltaGOA/2.303RT. A general strategy of predicting the KOA of the compound is provided based on the method, namely whether the molecule is in the application range is judged according to the Dragon descriptor, if so, a QSAR model is preferentially adopted (QSAR-T is adopted at different temperatures), otherwise, the compound is predicted by adopting an SM8AD solvent model. The method and strategy are adopted and accorded, the KOA of different compounds at different temperatures can be rapidly and effectively predicted, lots of manpower, material resources and financial resources are saved, and important essential data is provided for large-scale ecological risk assessment and environment supervision of chemicals.
Description
Technical field
The invention belongs to chemicals ecological risk assessment Test Strategy field, specifically judge based on the Dragon descriptor of organic compound molecule whether compound is in the n-octyl alcohol of structure/air partition factor (K
OA) with the QSAR-T model application domain of the quantitative structure activity relationship model (QSAR) of molecule descriptor and temperature dependency in, compound preferentially calculates its K under single temperature (25 ℃) by the QSAR model in the territory
OA, the K of different temperatures (10-50 ℃)
OAValue is calculated by the QSAR-T model.To overseas compound, then adopt solvation model SM8AD to carry out K
OAPrediction.
Background technology
N-octyl alcohol/air partition factor (K
OA) can be defined as: at a certain temperature, during partition equilibrium, organic compound n-octyl alcohol phase and air mutually in the ratio (dimensionless) of concentration, Chang Yiqi logarithmic form (logK in the practical application
OA) expression.Because n-octyl alcohol is long-chain fatty alcohol, has lipoid, so K
OABeing commonly used to describe the distribution behavior of pollutant between air phase and environment organic phase, is the long-distance migration ability of assessing compound in environment and the important environmental behaviour parameter of bioconcentration.K
OAThe Thermodynamics Formulas of definition is:
logK
OA=-ΔG
OA/2.303RT (1)
Δ G in the formula (1)
OAThe expression compound from the air phase transfer to octanol mutually the variation of Gibbs free energy, also claim free melting energy or solvation free energy.R refers to ideal gas constant, and T is environment temperature.K
OABe worth larger compound, easier be allocated in the environment organic phase (comprise the soil organism, the organic principle of air particle, and the horny layer of epidermis of animals and plants etc.).In addition, K
OAHave stronger temperature dependency: temperature is lower, K
OABe worth larger.
The K of compound
OAMainly method for measuring is obtained by experiment: produce post method, headspace gas chromatography, Solid-phase Microextraction, fugacity mensuration and gas chromatography retention time method.Forefathers by this several method obtained some typical environment pollutants (such as polychlorinated biphenyl (PCBs), palycyclic aromatic (PAHs), the K of PBDE (PBDEs) and bioxin (PCDD/Fs) etc.
OAThis several method respectively has relative merits, and the compound property that is suitable for measuring is also different.Because chemicals is of a great variety, adopt experimental determining method to obtain K
OACost is large, length consuming time, and also some compound lacks standard items, causes the difficulty of mensuration.According to the statistics of EPA, so far, has K
OAThe compound of measured value of experiment only has the hundreds of kind, far can not satisfy the needs of chemicals risk assessment.Therefore, development or structure theoretical prediction method or model obtain the K of compound effective and rapidly
OABe of great significance and value.Wherein, Study on Quantitative Structure-Activity Relationship relation (QSAR) model based on the molecule descriptor is a kind of commonly used and simple and effective method.
The ultimate principle of QSAR model is: the molecular structure of compound determines its character.QSAR is as a kind of microcomputer modelling method, can deeply excavate and characterize quantitative change rule and the cause-effect relationship of the character such as organic compound molecule structure and its physicochemical property, environmental behaviour, toxicological effect, thereby estimate the ecological risk of pollutant from molecular level.At present, the QSAR model has become the important tool of the risk assessment of pollutant environmental ecology and the risk assessment of human body Jiankang.In order to promote the QSAR model in the application aspect the chemicals risk management, the Organization for Economic Cooperation and Development (OECD) proposed the structure of QSAR in 2007 and has used guide rule, this guide rule thinks that a good QSAR model should possess following five standards: (1) clearly defined environment (activity) index; (2) clear and definite algorithm; (3) defined the application domain of model; (4) goodness of fit, robustness and the predictive ability of model there is the suitable evaluation; (5) preferably carry out mechanism explain.There have been at present many researchers to set up some compound Ks
OAThe QSAR forecast model.These QSAR models are based on the group contribution model of molecular fragment a bit, such as document " Li, X. H.; Chen, J. W. et al. The fragment constant method for predicting octanol-air partition coefficients of persistent organic pollutants at different temperatures. Journal of Physical and Chemical Reference Data 2006,35, (3), 1365-1384. " adopt the method for multiple linear regression (MLR), set up typical persistence organic pollutant K
OALinear QSAR model, the independent variable of model is the molecular fragment constant.It is easy that this model has application, and predictive ability is high, the characteristics that robustness is good.Also some is multiparameter linear free energy relationship (pp-LFER) model, for example " Chen, J. W.; Harner, T.; Yang, P.; Quan, X.; Chen, S.; Schramm, K. W.; Kettrup, A. Quantitative predictive models for octanol-air partition coefficients of polybrominated diphenyl ethers at different temperatures. Chemosphere 2003,51, (7), 577-584. " and " Chen, J. W.; Harner, T. et al. Quantitative relationships between molecular structures, environmental temperatures and octanol-air partition coefficients of polychlorinated biphenyls. Computational Biology and Chemistry 2003,27, (3), 405-421. " etc. based on the Molecular structure descriptor of compound; adopt the method for offset minimum binary (PLS) to set up the QSAR model, predicted respectively the K of PCBs and PBDEs
OAModel adopts accumulation cross validation coefficient Q
2 CUMCharacterize robustness (robustness), and added this descriptor of temperature and considered K
OAThe temperature dependency, obtained good prediction effect.Governing principle with reference to OECD can find that these QSAR models all belong to the quantitative structure activity relationship on local (local) meaning, often can only be applicable to the compound of a certain type.And some models lacks the accurate sign of application domain and lacks effectively that the external certificate collection carries out the discussion of model prediction ability.Simultaneously, for the stronger K of temperature dependency
OA, also need to consider the impact of temperature.Therefore, be necessary to set up (global) QSAR model that an application domain covers the broad sense of variety classes compound, and temperature is added model as a variable.Simultaneously, according to the QSAR guide rule of OECD, after setting up model, need to carry out the sign of application domain and possible mechanism explain to it.
Generally speaking, prediction K
OAThe calculated amount of QSAR method little, reliable results can be applicable to risk management.But the QSAR model depends on the K of compound strongly
OAExperiment value (modeling and checking), and a good QSAR model only can be to the K of compound limited in its application domain
OAMake prediction.These 2 major limitation that consist of its application.Therefore, based on free melting energy △ G
OAK
OAForecasting Methodology has many-sided advantage: (1) is based on free melting energy △ G
OACalculating do not have in theory the restriction of application domain, be applicable to all organic compounds; (2) only need the coordinate information of input molecule, just can obtain comparatively quickly and accurately compound at the △ G of octanol in mutually
OAThereby, obtain logK
OAValue.In recent years, the development of quantum chemistry calculation is advanced by leaps and bounds, computational accuracy and speed perfect adaptation, the scholar that in addition theorizes, calculates solvent free energy by the solvation model and can obtain preferably precision in the understanding of alternate physical and chemical process and reasonably simulation molecule.In addition, the method for this HF Ab initio has been evaded the screening of a large amount of molecule descriptors from another point of view, only needs the most basic structure coordinate information of compound molecule, has stepped essential step for simplifying to calculate.Many external software engineering researchers invent many recessive solvation models of increasing income (models adopt parameter represent solvent), inquire into compound molecule and between medium, distribute, and the process of in solvent, dissolving and energy variation thereof.Such as, common IEFPCM solvation model in famous quantum chemistry calculation software Gauss (Gaussian), and a series of solvation model SMx(solvation models of Truhlar group report).In the document " Marenich; A. V. et al. Universal Solvation Model Based on the Generlizaed Born Approxiamtion with Asymmetric Descreening. J. Chem. Theory Comput 2009; 5; (9); 2447-2464. ", the author has delivered up-to-date SM8AD solvation model, calculates the free melting energy of compound molecule.This model is dissolved in situation in nearly all solvent applicable to all neutral molecules or charged ion.Parameterized result shows, 2560 free energy values that model calculates (different Neutral Solute molecules in water, acetonitrile, methyl alcohol and dimethyl sulfoxide (DMSO) free melting energy number and) be about 0.6kcal/mol without error in label, be adapted to the free melting energy of compound molecule in the n-octyl alcohol solvent and calculate.In general, as long as the principle of model is reliable, parametrization just can obtain the free melting energy value of target compound exactly with rationally approximate, thereby according to its K of formula (1) Accurate Prediction
OATherefore, on the basis in conjunction with traditional Q SAR model method, be necessary attempt to introduce quantum chemistry solvation model, calculate compound in the free melting energy of octanol in mutually, thus the K of predictive compound
OA
Summary of the invention
The invention provides a kind of method by the n-octyl alcohol air partition factor KOA under quantitative structure activity relationship and the solvation model prediction organic compound different temperatures.Present theoretical prediction n-octyl alcohol air partition factor (K
OA) the QSAR method often can not fully satisfy the standard-required of the application criterion of OECD, the QSAR model application domain less (local) of setting up does not have effective predictive ability to characterize, and therefore is being applied to there is restriction aspect the chemicals risk management.How to set up wide area, can Accurate Prediction chemicals K
OA, meet the QSAR method of OECD criterion and consider simultaneously K
OAThe temperature dependency, be problem demanding prompt solution.In addition QSAR method calculation process relative complex, and depend critically upon experimental data.Therefore how outside QSAR, development is for compound K
OAFast and effectively Quantum chemical calculation seem very necessary.The present invention is directed to above technical barrier, the method that provides QSAR model that a kind of combination meets the OECD guide rule and solvation free energy model to predict organic chemicals n-octyl alcohol-air partition factor.Thereby can be to the K of different types of compound under different temperatures
OAValue predicts comparatively accurately, for risk assessment and the environment supervision of the organic chemicals of magnanimity provides necessary basic data.
The molecular structure of compound determines its physicochemical property and environmental behaviour, therefore, by the Dragon descriptor of computational representation compound molecule, set up the QSAR model, the characterization model application domain, thus can be used for predicting environmental behaviour parameter---the n-octyl alcohol/air partition factor of compound in the territory.For the situation of different temperatures, preferentially adopt the result of calculation of QSAR-T model.Compound outside QSAR model application domain, can based on quantum chemistry solvation model calculate determine 25 ℃ its in the free melting energy of n-octyl alcohol in mutually, also can predict K
OA
The technical solution used in the present invention comprises the steps:
(1) based on the structure of the QSAR model of Dragon descriptor
In order to guarantee model that the present invention sets up and the data accuracy of method, collect the n-octyl alcohol of measuring in the document/air partition factor K
OAValue obtains 936 Ks of 380 kinds of organic compounds under different temperatures
OAData; To at random wherein 264 kinds, 654 K
OAValue divides training set into, and all the other are checking collection (being used for the external certificate of QSAR model).;
Adopt Dragon software to calculate all Dragon descriptors of 264 kinds of training set compounds (molecule saves as the .mol form).The structure of model adopts the method for multiple linear regression (MLR) and offset minimum binary (PLS).(a) at first, adopt the stepwise multiple linear regression in the SPSS software to screen compound K
OAThe descriptor that has been worth appreciable impact.Obtain optimum MLR equation, according to OECD guide rule (OECD, 2007. Guidance Document On The Validation Of (Quantitative) Structure-Activity Relationships [(Q) SAR] Models. Organisation for Economic Co-Operation and Development, Paris, France), the evaluation criterion of optimum MLR equation is to have the maximum correction coefficient of determination (R
2 Adj), the wherein tolerance expansion factor (VIF)<10 of each variable (descriptor), and the level of significance p of equation=0.001.R
2 AdjBe defined as follows:
Wherein,
And y
iRespectively the logK of i compound
OAPredicted value and experiment value.
Experiment Training collection compound actual measurement logK
OAWhat be worth is average.N refers to the sum of training set compound, is 264 kinds altogether, and p refers to the number of descriptor.Optimum MLR model need carry out next step analysis optimization.(b) the PLS algorithm in the employing SIMCA software is further removed the nuisance variable of MLR model.The design conditions of PLS algorithm are: cross validation number of times=7, and maximum iteration time=200, allowing the shortage of data ratio is 50%, the level of significance of PLS model limit=0.05 simultaneously.In per step PLS calculates, remove the minimum descriptor of weight (VIP index), carry out again next step match.Choose at last and have maximum R
2 AdjValue and maximum accumulation cross validation coefficient Q
2 CUMOptimum PLS model.Q
2 CUMCan be used for characterizing the robustness of a PLS model.In general, Q
2 CUM0.5 model robustness is better, Q
2 CUMThe 0.9 more excellent (Eriksson of model robustness, L., Jaworska, J. et al. 2003. Methods for reliability and uncertainty assessment and for applicability evaluations of classification-and regression-based QSARs. Environmental Health Perspectives 111,1361-1375.).Q
2 CUMBe defined as follows:
Q
2 CUM = 1.0 - ∏(PRESS/SS)
a ,(a = 1, 2, …A) (3)
Wherein, SS refers to residual sum of squares (RSS), is for a PLS model that match is good, the quadratic sum of the difference of its predicted value and experiment value; PRESS refers to the prediction residual quadratic sum, is in a PLS checking, removes the again PLS equation of match of a variable, the quadratic sum of the predicted value of gained and the difference of experiment value.Wherein A is the variable number of PLS equation; Y
ImIt is experiment;
It is predicted value.In addition, the accuracy of model also with the variable number A of model, p is relevant for the level of significance limit.
In the structure of PLS model, can add this descriptor of temperature (Chen, J. W.; Harner, T. et al. Universal predictive models on octanol-air partition coefficients at different temperatures for persistent organic pollutants. Environ. Toxicol. Chem. 2004,23, (10), 2309-2317), and the descriptor that other have screened is added the K of temperature formation temperature dependency
OAQSAR forecast model (QSAR-T).
(2) based on the QSAR model of Dragon descriptor and the QSAR-T model linear relation of temperature dependency.
① QSAR:logK
OA = 0.509 + 0.986 × X1sol – 1.018 × Mor13v + 1.384 × H-050 – 1.528 × R5v – 0.015 × T(O..Cl) + 0.043 × HATS5v – 0.026 × RDF035m – 0.197 × RCI – 0.130 × n
COOR – 0.077 × Mor15u – 0.077 × RDF090m
Wherein, X1sol represents solvent Connectivity Index of Electronic Density chi-1; Mor13v and Mor15u are the 3D-MoRSE descriptors; H-050 represents the heteroatomic fragment constant of the former sub-connection of H; R5v and HATS5v represent the GETAWAY descriptor; RDF035m and RDF090m represent the radial distribution function descriptor; RCI represents the aromatic index descriptor; T (O..Cl) be between O and Cl atom topology distance and; n
COORThe number of ehter bond in the expression molecule.Wherein, the VIP value of X1sol is maximum, shows that it is to determine compound molecule K
OAThe main descriptor of value.
In optimization model, logK
OABe expressed as the function of 11 descriptor variables, training set data collection number n=264.R
2 Y (adj)=0.966, standard deviation S E=0.818, p<0.001 show that model has the good goodness of fit.Q
2 CUM=0.949 shows that the robustness of model is good.
② QSAR-T:logK
OA = –3.03 + 3.13 × 10
2X1sol/T – 8.57 × 10Mor13v/T + 4.32 × 10
2H050/T – 1.27 × 10
3R5v/T – 5.54 × T(O..Cl)/T + 1.25 × 10
2HATS5v/T – 1.33 × 10RDF035m/T – 6.11 × 10RCI/T – 3.76 × 10n
COOR/T + 1.56 × 10
2Mor15u/T – 5.49 × RDF090m/T + 1.04 × 10
3/T
Wherein every meaning is the same with the QSAR model, has just added the impact of temperature.In the model of optimum, the K of total n=654 different temperatures
OAData.R
2 Y (adj)=0.963, standard deviation S E=0.463, p<0.001 show that model has the good goodness of fit.Q
2 CUM=0.959, show that model has good robustness.
(3) checking of QSAR and QSAR-T model and application domain characterize
QSAR(and QSAR-T) predictive ability of model need to check by external certificate.The checking collection is n=282 data altogether.The result of external certificate can be by square Q of outside prediction related coefficient
2 EXTAnd external certificate result's root-mean-square error RMSE represents.These two parameters are defined as respectively:
QSAR model application domain characterizes the four kinds of methods that adopt simultaneously: the descriptor Furthest Neighbor, euclidean distance (0.33 ~ 1.48), Urban Streets method (0.68 ~ 4.29) and probability density, can determine whether delocalization of compound, as long as there wherein have a kind of method to determine that compound is in to be overseas, then this compound is the delocalization compound; The limits of error ± 3RMSE in conjunction with prediction determines whether compound peels off simultaneously, thus the application domain of decision model.For the QSAR model, checking collection compound K in application domain
OAThe result of prediction is: n=101(15 kind compound is overseas), Q
2 EXT=0.953, RMSE=0.820.Show that this QSAR model has good predictive ability.For the result n of QSAR-T model=282, RMSE=1.27, Q
2 EXT=0.870, show that this model can be used for the K under the prediction different temperatures
OA
(4) adopt the solvation model to calculate solvent free energy prediction K
OA
From the beginning the solvation model SM8AD that employing is increased income calculates compound molecule free melting energy in n-octyl alcohol when 298.15 K.And calculate logK by thermodynamic relation formula (1)
OAValue.Calculate the coordinate file that only needs the input compound molecule.
Contrast SM8AD model and the QSAR model calculating linear fit result (predicted value and experiment value mapping) on 264 kinds of compounds of training set compares the related coefficient of match, slope k (leveling off to 1), root-mean-square error.Can find that the accuracy of SM8AD model prediction is higher, but weaker than the result of QSAR model.(SM8AD: R
2 = 0.860, k = 0.80, RMSE = 1.29; QSAR: R
2 = 0.966, k = 0.97, RMSE = 0.524)。If therefore compound is in outside QSAR model and the QSAR-T model application domain, can pay the utmost attention to the K with SM8AD model prediction compound
OA
Method provided by the invention has following features:
1. according to the guide rule of OECD about QSAR model construction and use, the QSAR model of foundation has the good goodness of fit, robustness and predictive ability.
2. the application domain of model is wider, contains the organic compound of various structures, adds the impact of temperature, can be used for predicting the K of different compounds under different temperatures
OA, for the global environmental behaviour analysis of organic chemicals and ecological risk assessment provide basic data.
3. the solvation model can be K
OAPrediction new reference is provided, can remedy the deficiency of QSAR model prediction, reliably K is provided more
OAPredicted data, and from the beginning algorithm is easy, application domain is unrestricted in theory, can extend to the K of all compounds
OAPrediction.
The present invention can fast and effeciently predict under the varying environment temperature, the n-octyl alcohol of different organic contaminants/air partition factor.The method is with low cost, easy and quick, can save a large amount of human and material resources and financial resources.The foundation of the QSAR model that this invention relates to and checking are strictly according to the QSAR model construction of OECD regulation and the guide rule of use, and accurately and reliably, the HF Ab initio solvation model SM8AD with reference to wide area predicts the outcome Obtaining Accurate K simultaneously
OA, for the chemicals supervision provides important basic data, and ecological risk assessment is had important directive significance.
Description of drawings
Fig. 1 compound logK
OAQSAR model and solvation model prediction method key diagram.
Fig. 2 is training set logK
OAMeasured value and the fitted figure of QSAR model predication value.
Fig. 3 is training set QSAR model prediction K
OAResidual distribution.
Fig. 4 is the predicted value of QSAR-T model on training set and the Linear Fit Chart of measured value.
Fig. 5 is the predicted value of SM8AD solvation model on training set and the Linear Fit Chart of measured value.
Embodiment
Adopt the inventive method to predict a kind of biphenyl compound-polychlorinated biphenyl PCB-66(2,3 ', 4,4 '-tetrachlorobiphenyl) at 25 ℃ n-octyl alcohol/air partition factor K
OAForecasting Methodology is:
(1) at first, obtain and preserve the file that PCB-66 is the .mol form with chemical mapping software.Use MOPAC software that it is carried out preliminary structure optimization; (2) calculate 11 kinds of Dragon descriptors in the QSAR model by Dragon software, the result is: X1sol=8.73, Mor13v=-0.33, H-050=0, R5v=0.118, T (O..Cl)=0, HATS5v=0.254, RDF035m=0.443, RCI=1.48, n
COOR=0, Mor15u=0.155, RDF090m=0.334; (3) judge according to the decision method of application domain: 1. descriptor distance: because the descriptor that the training set compound is determined distance is X1sol ~ (1,13.35), Mor13v ~ (1.538,0.938), H-050 ~ (0,1,2), R5v ~ (0,0.319), T (O..Cl) ~ (0,100), HATS5v ~ (0,1.115), RDF035m ~ (0,59.03), RCI ~ (0,1.54), n
COOR~ (0 or 1), Mor15u ~ (1.25,1.75), RDF090m ~ (0,53.63).These 11 kinds of descriptors all meet the application domain scope; 2. Euclidean distance: the Euclidean distance value that calculates is 0.41, is between (0.33-1.48), illustrates that this compound is in the territory; 3. city block distance: the city block distance value of calculating is 0.86, is in (0.68 ~ 4.29); 4. probability density distribution, the probability density that calculates is in training set probability density distribution value (〉 0.01) in the scope.11 kinds of descriptors that calculate all are in the scope, and therefore, this compound is in the QSAR model application domain of setting up.So can utilize this model predicts.Linear representation with above descriptor difference substitution QSAR model.
logK
OA = 0.509 + 0.986 × 8.73 – 1.018 ×(-0.33)+ 1.384 × 0 – 1.528 × 0.118 – 0.015 × 0 + 0.043 × 0.254 – 0.026 × 0.443 – 0.197 × 1.48 – 0.130 × 0 – 0.077 × 0.155 – 0.077 × 0.334 = 9.02
The logK of gained
OAPredicted value 9.02 approached with its measured value of experiment in 8.97 ten minutes, and the error of prediction is 0.05 log unit.Show this QSAR model prediction reliable results.
Adopt naphthalene polychloride compound Isosorbide-5-Nitrae of the inventive method prediction, 6,7-tetrachloronapthalene is at 283.15 K, 293.15 K, 298.15 K, 303.15 K, 313.15 K, the n-octyl alcohol under the 323.15 K temperature/air partition factor value also compares with experiment value.Utilize the QSAR-T model to the K of this compound under different temperatures
OAPredict.Prediction steps is as follows:
At first, obtain and preserve Isosorbide-5-Nitrae with chemical mapping software, 6,7-naphthalene tetrachloride is the file of .mol form.Use MOPAC2000 software that it is carried out preliminary structure optimization.Calculate X1sol by Draogon software, Mor13v, H-050, R5v, T (O..Cl), HATS5v, RDF035m, RCI, n
COOR, Mor15u and RDF090m.Judge again the judgement of QSAR application domain is carried out in these 11 kinds of descriptions, determine that target compound is in the application domain of QSAR-T model.Then with the temperature of descriptor computation value divided by correspondence, obtain the value of these 11 kinds of descriptors under the different temperatures, as shown in the table respectively:
The Dragon descriptor value that different temperatures is proofreaied and correct in the table 1 QSAR-T model
With the linear relation of the value substitution QSAR-T model of the descriptor of above temperature correction, prediction obtains the logK of under different temperatures (from low to high)
OAValue is 8.012,8.597,8.201,7.83,7.484,7.158.Corresponding measured value is 8.13,8.85,8.42,7.87,7.43,7.12.Both compare, and residual values is between 0.038~0.253.Predicted value and experiment value are very approaching.Linear fit predicted value and experiment value, both coefficient R
2=0.996, demonstrate good correlativity, show that this QSAR-T forecast model is reliable.
An any given Biphenyl Ether compounds methoxyl polybrominated diphenyl ethers 6-OH-BDE-157.Same 11 kinds of Dragon descriptors that calculate it, value is respectively X1sol=12.77, Mor13v=-0.921, H-050=1, R5v=0.103, T (O..Cl)=0, HATS5v=0.245, RDF035m=84.685, RCI=1.41, n
COOR=0, Mor15u=1.33, RDF090m=14.19.Decision method according to application domain is judged: 1. descriptor distance: the descriptor distance that the training set compound is determined is X1sol ~ (1,13.35), Mor13v ~ (1.538,0.938), H-050 ~ (0,1,2), R5v ~ (0,0.319), T (O..Cl) ~ (0,100), HATS5v ~ (0,1.115), RDF035m ~ (0,59.03), RCI ~ (0,1.54), n
COOR~ (0 or 1), Mor15u ~ (1.25,1.75), RDF090m ~ (0,53.63).RDF035m is in outside the descriptor scope in these 11 kinds of descriptors, so this compound is overseas; 2. Euclidean distance: the Euclidean distance value that calculates is 0.829, is between (0.33-1.48); 3. city block distance: the city block distance value of calculating is 2.09, is in (0.68 ~ 4.29); 4. probability density distribution, the probability density value that calculates is 4.36, is in training set probability density distribution value (〉 0.01) in the scope.A kind of method definition compound is arranged overseas, therefore, this compound is overseas at the QSAR model.Can be with reference to using the SM8AD model to calculate.Same, molecular structure of compounds is saved as the .inp formatted file.Add the calculating control parameter value of model, concrete coordinate file form is as follows:
$END
The free melting energy that calculates is 11.8kcal/mol, by relational expression logK
OA=-△ G
OA/ 2.303RT converses logK
OAValue is 25 ℃ of 10.68(room temperatures), inquire its logK
OAMeasured value is 10.95, and both differences are 0.27 log unit, proves that the result of calculation of SM8AD solvation model is more accurate.
Embodiment 4
Adopt the K of the inventive method prediction dimethylformamide (Dimethylformamide) compound when 25 ℃ of room temperatures
OAValue.At first, calculating its Dragon descriptor is: X1sol=2.27, Mor13v=-0.01, H-050=0, R5v=0, T (O..Cl)=0, HATS5v=0, RDF035m=1.14, RCI=0, n
COOR=0, Mor15u=-0.11, RDF090m=0.Decision method according to application domain is judged: 1. descriptor distance: because the descriptor that the training set compound is determined distance is X1sol ~ (1,13.35), Mor13v ~ (1.538,0.938), H-050 ~ (0,1,2), R5v ~ (0,0.319), T (O..Cl) ~ (0,100), HATS5v ~ (0,1.115), RDF035m ~ (0,59.03), RCI ~ (0,1.54), n
COOR~ (0 or 1), Mor15u ~ (1.25,1.75), RDF090m ~ (0,53.63).These 11 kinds of descriptors all are in separately the descriptor scope; 2. Euclidean distance: the Euclidean distance value that calculates is 0.81, is between (0.33-1.48), illustrates that this compound is in the territory; 3. city block distance: the city block distance value of calculating is 1.37, is in (0.68 ~ 4.29); 4. probability density distribution, the probability density that calculates is in training set probability density distribution value (〉 0.01) in the scope.Judge that according to four kinds of methods this compound is in the application domain of QSAR model, substitution QSAR linear relation calculates logK
OA=2.73, with experiment value logK
OA=4.38 residual errors (1.65) are larger, training set ± 3RMSE(-1.57,1.57) outside, therefore belong to the outlier of prediction.Attempt adopting SM8AD solvation model to calculate.Result of calculation is logK
OA=3.54, relatively more credible.
Embodiment 5
Adopt the inventive method to predict a kind of organo-chlorine pesticide DDT(p, the K when 25 ℃ of room temperatures of p '-DDT)
OAValue.At first, calculating its Dragon descriptor is: X1sol=10.20, Mor13v=-0.31, H-050=0, R5v=0.151, T (O..Cl)=0, HATS5v=0.381, RDF035m=3.993, RCI=1.4, n
COOR=0, Mor15u=0.157, RDF090m=0.Decision method according to application domain is judged: 1. descriptor distance: because the descriptor that the training set compound is determined distance is X1sol ~ (1,13.35), Mor13v ~ (1.538,0.938), H-050 ~ (0,1,2), R5v ~ (0,0.319), T (O..Cl) ~ (0,100), HATS5v ~ (0,1.115), RDF035m ~ (0,59.03), RCI ~ (0,1.54), n
COOR~ (0 or 1), Mor15u ~ (1.25,1.75), RDF090m ~ (0,53.63).These 11 kinds of descriptors all are in separately the descriptor scope; 2. Euclidean distance: the Euclidean distance value that calculates is 0.48, is between (0.33-1.48), illustrates that this compound is in the territory; 3. city block distance: the city block distance value of calculating is 0.94, is in (0.68 ~ 4.29); 4. probability density distribution, the probability density that calculates is in training set probability density distribution value (〉 0.01) in the scope.Judge that according to four kinds of methods this compound is in the application domain of QSAR model, substitution QSAR linear relation calculates logK
OA=10.28.And p, p '-DDT tests logK
OA=9.82.Show this QSAR model prediction reliable results, be applicable to comprise the variety classes compound K of organic agricultural chemicals
OAPrediction.
Claims (2)
1. by the n-octyl alcohol air partition factor K under quantitative structure activity relationship and the solvation model prediction different temperatures
OAMethod, its feature may further comprise the steps:
(1) data collection and division: the n-octyl alcohol of measuring/air partition factor K in the collection document
OAValue obtains 936 Ks of 380 kinds of organic compounds under different temperatures
OAData; To at random wherein 264 kinds, 654 K
OAValue divides training set into, and all the other are the checking collection;
(2) QSAR and QSAR-T model construction: logK when adopting multiple linear regression and offset minimum binary method to make up 25 ℃
OAWith the quantitative structure activity relationship QSAR model of training set compound molecule Dragon descriptor, expression formula is:
LogK
OA=0.509+0.986 * X1sol – 1.018 * Mor13v+1.384 * H-050 –, 1.528 * R5v – 0.015 * T (O..Cl)+0.043 * HATS5v –, 0.026 * RDF035m –, 0.197 * RCI –, 0.130 * n
COOR– 0.077 * Mor15u – 0.077 * RDF090m, wherein, X1sol represents solvent Connectivity Index of Electronic Density chi-1; Mor13v and Mor15u are the 3D-MoRSE descriptors; H-050 represents the heteroatomic fragment constant of the former sub-connection of H; R5v and HATS5v represent the GETAWAY descriptor; RDF035m and RDF090m represent the radial distribution function descriptor; RCI represents the aromatic index descriptor; T (O..Cl) be between O and Cl atom topology distance and; n
COORThe number of ehter bond in the expression molecule; LogK when after the Dragon of this QSAR model descriptor adds temperature correction, making up-10-50 ℃
OAAnd the temperature dependency QSAR-T model between the Dragon descriptor of proofreading and correct, expression formula is:
LogK
OA=– 3.03+3.13 * 10
2X1sol/T – 8.57 * 10Mor13v/T+4.32 * 10
2H050/T – 1.27 * 10
3R5v/T – 5.54 * T (O..Cl)/T+ 1.25 * 10
2HATS5v/T – 1.33 * 10RDF035m/T – 6.11 * 10RCl/T – 3.76 * 10nCOOR/T+1.56 * 10
2Mor15u/T – 5.49 * RDF090m/T+1.04 * 10
3/ T; The correction coefficient of determination R of this QSAR and QSAR-T model
2 Adj0.9, accumulation cross validation coefficient Q
2 CUM0.9;
(3) checking of QSAR and QSAR-T model and application domain characterize: the result of QSAR and QSAR-T model is with square Q of outside prediction related coefficient
2 EXTRMSE represents with root-mean-square error; The compound application domain of QSAR and QSAR-T model is identical, all adopts simultaneously following four kinds of application domain characterizing methods: descriptor distance range method, euclidean distance, city block distance method and probability density distribution method;
(4) solvation model: adopt the solvation model SM8AD that increases income of HF Ab initio to calculate organic compound molecule free melting energy △ G in n-octyl alcohol in the time of 25 ℃
OA, by K
OAThermodynamic principles formula logK
OA=-△ G
OA/ 2.303RT calculates logK
OAValue;
(5) K of unknown compound
OAPrediction: calculate the Dragon descriptor of unknown compound, judge whether this descriptor is in QSAR and the QSAR-T model application domain; If, the K when then adopting 25 ℃ of QSAR model predictions
OAValue obtains K under other temperature such as need
OA, then adopt the QSAR-T model prediction; If be in overseasly, then adopt solvation model SM8AD to calculate and obtain K
OA
2. method according to claim 1 is characterized in that, described compound comprises alkanes, alcohols, ethers, ketone, carboxylic acids and substituent, benzene, biphenyl, phenol, palycyclic aromatic and substituted compound thereof, organic agricultural chemicals.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012105059356A CN102999705A (en) | 2012-11-30 | 2012-11-30 | Method for predicting n-octyl alcohol air distribution coefficient (KOA) at different temperatures through quantitative structure-activity relationship and solvent model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012105059356A CN102999705A (en) | 2012-11-30 | 2012-11-30 | Method for predicting n-octyl alcohol air distribution coefficient (KOA) at different temperatures through quantitative structure-activity relationship and solvent model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102999705A true CN102999705A (en) | 2013-03-27 |
Family
ID=47928264
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2012105059356A Pending CN102999705A (en) | 2012-11-30 | 2012-11-30 | Method for predicting n-octyl alcohol air distribution coefficient (KOA) at different temperatures through quantitative structure-activity relationship and solvent model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102999705A (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103488901A (en) * | 2013-09-25 | 2014-01-01 | 大连理工大学 | Method for adopting quantitative structure-activity relationship model to predicting soil or sediment adsorption coefficients of organic compound |
CN103714220A (en) * | 2014-01-07 | 2014-04-09 | 中国科学院烟台海岸带研究所 | Method for predicting elimination speed of persistent organic pollutants on coastal zones |
CN105548463A (en) * | 2015-11-26 | 2016-05-04 | 昆明理工大学 | Method for predicating adsorption rate of sulfur-containing compounds in atmosphere |
CN105678069A (en) * | 2016-01-06 | 2016-06-15 | 昆明理工大学 | Method for predicting elimination rate coefficient of gas state sulfur compound on low-temperature hydrolysis condition |
CN105868540A (en) * | 2016-03-25 | 2016-08-17 | 哈尔滨理工大学 | A polycyclic aromatic hydrocarbon property/toxicity prediction method using an intelligent support vector machine |
CN107516016A (en) * | 2017-08-30 | 2017-12-26 | 华南理工大学 | A kind of method by building the silicone oil air distribution coefficient of quantitative structure activity relationship model prediction hydrophobic compound |
CN110110934A (en) * | 2019-05-10 | 2019-08-09 | 大连民族大学 | A method of pollutant plant-atmosphere distribution coefficient is predicted based on plant growth difference factor |
CN110534163A (en) * | 2019-08-22 | 2019-12-03 | 大连理工大学 | Using the method for the Octanol/water Partition Coefficients of multi-parameter linear free energy relationship model prediction organic compound |
CN111189869A (en) * | 2018-11-15 | 2020-05-22 | 中国科学院大连化学物理研究所 | Method for measuring key parameters of semi-volatile organic compound release in building decoration material |
CN111613266A (en) * | 2020-05-20 | 2020-09-01 | 中南大学 | Outlier detection method based on quantitative structure-activity relationship |
CN113591394A (en) * | 2021-08-11 | 2021-11-02 | 清华大学 | Method for predicting organic compound n-hexadecane/air distribution coefficient |
CN113705008A (en) * | 2021-08-31 | 2021-11-26 | 扬州大学 | Prediction model, modeling method and prediction method for distribution coefficient of POPs between XAD films and air |
CN116312854A (en) * | 2023-03-06 | 2023-06-23 | 杭州以勒标准技术有限公司 | Method for predicting n-octanol water distribution coefficient of sulfamethoxazole substances |
CN116246717B (en) * | 2021-12-08 | 2024-06-28 | 中国科学院大连化学物理研究所 | Additive screening method for improving solubility of ferrocyanide in water |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101673321A (en) * | 2009-10-17 | 2010-03-17 | 大连理工大学 | Method for fast predicting organic pollutant n-caprylic alcohol/air distribution coefficient based on molecular structure |
CN102507630A (en) * | 2011-11-30 | 2012-06-20 | 大连理工大学 | Method for forecasting oxidation reaction rate constant of chemical substance and ozone based on molecular structure and environmental temperature |
-
2012
- 2012-11-30 CN CN2012105059356A patent/CN102999705A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101673321A (en) * | 2009-10-17 | 2010-03-17 | 大连理工大学 | Method for fast predicting organic pollutant n-caprylic alcohol/air distribution coefficient based on molecular structure |
CN102507630A (en) * | 2011-11-30 | 2012-06-20 | 大连理工大学 | Method for forecasting oxidation reaction rate constant of chemical substance and ozone based on molecular structure and environmental temperature |
Non-Patent Citations (1)
Title |
---|
李雪花: "有毒有机污染物正辛醇/空气分配系数(KOA)的定量预测方法", 《中国博士学位论文全文数据库 工程科技I辑》, 31 May 2009 (2009-05-31) * |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103488901A (en) * | 2013-09-25 | 2014-01-01 | 大连理工大学 | Method for adopting quantitative structure-activity relationship model to predicting soil or sediment adsorption coefficients of organic compound |
CN103488901B (en) * | 2013-09-25 | 2016-06-22 | 大连理工大学 | Adopt the soil of Quantitative structure-activity relationship model prediction organic compound or the method for sediment sorption coefficients |
CN103714220B (en) * | 2014-01-07 | 2017-01-11 | 中国科学院烟台海岸带研究所 | Method for predicting elimination speed of persistent organic pollutants on coastal zones |
CN103714220A (en) * | 2014-01-07 | 2014-04-09 | 中国科学院烟台海岸带研究所 | Method for predicting elimination speed of persistent organic pollutants on coastal zones |
CN105548463A (en) * | 2015-11-26 | 2016-05-04 | 昆明理工大学 | Method for predicating adsorption rate of sulfur-containing compounds in atmosphere |
CN105548463B (en) * | 2015-11-26 | 2017-11-10 | 昆明理工大学 | A kind of method of the sulfur-containing compound rate of adsorption in prediction air |
CN105678069A (en) * | 2016-01-06 | 2016-06-15 | 昆明理工大学 | Method for predicting elimination rate coefficient of gas state sulfur compound on low-temperature hydrolysis condition |
CN105868540A (en) * | 2016-03-25 | 2016-08-17 | 哈尔滨理工大学 | A polycyclic aromatic hydrocarbon property/toxicity prediction method using an intelligent support vector machine |
CN105868540B (en) * | 2016-03-25 | 2018-04-13 | 哈尔滨理工大学 | Forecasting Methodology using Intelligent Support vector machine to polycyclic aromatic hydrocarbon property/toxicity |
CN107516016A (en) * | 2017-08-30 | 2017-12-26 | 华南理工大学 | A kind of method by building the silicone oil air distribution coefficient of quantitative structure activity relationship model prediction hydrophobic compound |
CN107516016B (en) * | 2017-08-30 | 2021-01-19 | 华南理工大学 | Method for predicting silicone oil-air distribution coefficient of hydrophobic compound by structure mode |
CN111189869B (en) * | 2018-11-15 | 2022-04-08 | 中国科学院大连化学物理研究所 | Method for determining SVOC release key parameters in building decoration material |
CN111189869A (en) * | 2018-11-15 | 2020-05-22 | 中国科学院大连化学物理研究所 | Method for measuring key parameters of semi-volatile organic compound release in building decoration material |
CN110110934A (en) * | 2019-05-10 | 2019-08-09 | 大连民族大学 | A method of pollutant plant-atmosphere distribution coefficient is predicted based on plant growth difference factor |
CN110110934B (en) * | 2019-05-10 | 2021-03-30 | 大连民族大学 | Method for predicting pollutant plant-atmosphere distribution coefficient based on plant growth difference factor |
CN110534163A (en) * | 2019-08-22 | 2019-12-03 | 大连理工大学 | Using the method for the Octanol/water Partition Coefficients of multi-parameter linear free energy relationship model prediction organic compound |
CN110534163B (en) * | 2019-08-22 | 2022-09-06 | 大连理工大学 | Method for predicting octanol/water distribution coefficient of organic compound by adopting multi-parameter linear free energy relation model |
CN111613266A (en) * | 2020-05-20 | 2020-09-01 | 中南大学 | Outlier detection method based on quantitative structure-activity relationship |
CN113591394A (en) * | 2021-08-11 | 2021-11-02 | 清华大学 | Method for predicting organic compound n-hexadecane/air distribution coefficient |
CN113591394B (en) * | 2021-08-11 | 2024-02-23 | 清华大学 | Method for predicting n-hexadecane/air distribution coefficient of organic compound |
CN113705008A (en) * | 2021-08-31 | 2021-11-26 | 扬州大学 | Prediction model, modeling method and prediction method for distribution coefficient of POPs between XAD films and air |
CN116246717B (en) * | 2021-12-08 | 2024-06-28 | 中国科学院大连化学物理研究所 | Additive screening method for improving solubility of ferrocyanide in water |
CN116312854A (en) * | 2023-03-06 | 2023-06-23 | 杭州以勒标准技术有限公司 | Method for predicting n-octanol water distribution coefficient of sulfamethoxazole substances |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102999705A (en) | Method for predicting n-octyl alcohol air distribution coefficient (KOA) at different temperatures through quantitative structure-activity relationship and solvent model | |
Lian et al. | CN-China: Revised runoff curve number by using rainfall-runoff events data in China | |
Uddin et al. | A novel approach for estimating and predicting uncertainty in water quality index model using machine learning approaches | |
Man et al. | Forecasting COD load in municipal sewage based on ARMA and VAR algorithms | |
Sabouri et al. | Impervious surfaces and sewer pipe effects on stormwater runoff temperature | |
Tzuc et al. | Modeling of hygrothermal behavior for green facade's concrete wall exposed to nordic climate using artificial intelligence and global sensitivity analysis | |
Song et al. | Parameter identification and global sensitivity analysis of Xin'anjiang model using meta-modeling approach | |
Roshni et al. | Development and evaluation of hybrid artificial neural network architectures for modeling spatio-temporal groundwater fluctuations in a complex aquifer system | |
Zhou et al. | Impacts of building configurations on urban stormwater management at a block scale using XGBoost | |
CN103488901B (en) | Adopt the soil of Quantitative structure-activity relationship model prediction organic compound or the method for sediment sorption coefficients | |
Alamdari et al. | Evaluating the impact of climate change on water quality and quantity in an urban watershed using an ensemble approach | |
Sabouri et al. | Event-based stormwater management pond runoff temperature model | |
Donatelli et al. | A generic framework for evaluating hybrid models by reuse and composition–a case study on soil temperature simulation | |
Gu et al. | Achieving the objective of ecological planning for arid inland river basin under uncertainty based on ecological risk assessment | |
CN104573863B (en) | Predict organic compound and the method for hydroxyl radical reaction speed constant in aqueous phase | |
Wu et al. | Runoff modeling in ungauged catchments using machine learning algorithm-based model parameters regionalization methodology | |
Abdul-Wahab et al. | Optimization of multistage flash desalination process by using a two-level factorial design | |
Kabir et al. | Investigating capabilities of machine learning techniques in forecasting stream flow | |
Li et al. | Role of multimodel combination and data assimilation in improving streamflow prediction over multiple time scales | |
de Souza et al. | Regional flood frequency analysis and uncertainties: Maximum streamflow estimates in ungauged basins in the region of Lavras, MG, Brazil | |
Rautela et al. | Modelling of streamflow and water balance in the Kuttiyadi River Basin using SWAT and remote sensing/GIS tools | |
Nayak et al. | A novel framework to determine the usefulness of satellite-based soil moisture data in streamflow prediction using dynamic Budyko model | |
Hou et al. | Parameter sensitivity analysis and optimization of Noah land surface model with field measurements from Huaihe River Basin, China | |
Li et al. | Impacts of climate and reservoirs on the downstream design flood hydrograph: a case study of Yichang Station | |
Huang et al. | Eutrophication prediction using a markov chain model: Application to lakes in the Yangtze River basin, China |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20130327 |