CN102999705A - Method for predicting n-octyl alcohol air distribution coefficient (KOA) at different temperatures through quantitative structure-activity relationship and solvent model - Google Patents

Method for predicting n-octyl alcohol air distribution coefficient (KOA) at different temperatures through quantitative structure-activity relationship and solvent model Download PDF

Info

Publication number
CN102999705A
CN102999705A CN2012105059356A CN201210505935A CN102999705A CN 102999705 A CN102999705 A CN 102999705A CN 2012105059356 A CN2012105059356 A CN 2012105059356A CN 201210505935 A CN201210505935 A CN 201210505935A CN 102999705 A CN102999705 A CN 102999705A
Authority
CN
China
Prior art keywords
qsar
model
compound
descriptor
logk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012105059356A
Other languages
Chinese (zh)
Inventor
李雪花
傅志强
陈景文
乔显亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN2012105059356A priority Critical patent/CN102999705A/en
Publication of CN102999705A publication Critical patent/CN102999705A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a theoretical prediction method for organic chemical n-octyl alcohol/ air distribution coefficient (KOA) and belongs to the field of ecological risk assessment testing strategy. The method comprises the following steps of: establishing a quantitative structure-activity relationship (QSAR) based on a molecular Dragon descriptor of a compound and calculating free melting energy based on a thermodynamic principle by adopting an open source solvent model, and transforming to obtain the KOA according to a thermodynamic principle formula of logKOA=-deltaGOA/2.303RT. A general strategy of predicting the KOA of the compound is provided based on the method, namely whether the molecule is in the application range is judged according to the Dragon descriptor, if so, a QSAR model is preferentially adopted (QSAR-T is adopted at different temperatures), otherwise, the compound is predicted by adopting an SM8AD solvent model. The method and strategy are adopted and accorded, the KOA of different compounds at different temperatures can be rapidly and effectively predicted, lots of manpower, material resources and financial resources are saved, and important essential data is provided for large-scale ecological risk assessment and environment supervision of chemicals.

Description

By the n-octyl alcohol air partition factor K under quantitative structure activity relationship and the solvation model prediction different temperatures OAMethod
Technical field
The invention belongs to chemicals ecological risk assessment Test Strategy field, specifically judge based on the Dragon descriptor of organic compound molecule whether compound is in the n-octyl alcohol of structure/air partition factor (K OA) with the QSAR-T model application domain of the quantitative structure activity relationship model (QSAR) of molecule descriptor and temperature dependency in, compound preferentially calculates its K under single temperature (25 ℃) by the QSAR model in the territory OA, the K of different temperatures (10-50 ℃) OAValue is calculated by the QSAR-T model.To overseas compound, then adopt solvation model SM8AD to carry out K OAPrediction.
Background technology
N-octyl alcohol/air partition factor (K OA) can be defined as: at a certain temperature, during partition equilibrium, organic compound n-octyl alcohol phase and air mutually in the ratio (dimensionless) of concentration, Chang Yiqi logarithmic form (logK in the practical application OA) expression.Because n-octyl alcohol is long-chain fatty alcohol, has lipoid, so K OABeing commonly used to describe the distribution behavior of pollutant between air phase and environment organic phase, is the long-distance migration ability of assessing compound in environment and the important environmental behaviour parameter of bioconcentration.K OAThe Thermodynamics Formulas of definition is:
logK OA=-ΔG OA/2.303RT (1)
Δ G in the formula (1) OAThe expression compound from the air phase transfer to octanol mutually the variation of Gibbs free energy, also claim free melting energy or solvation free energy.R refers to ideal gas constant, and T is environment temperature.K OABe worth larger compound, easier be allocated in the environment organic phase (comprise the soil organism, the organic principle of air particle, and the horny layer of epidermis of animals and plants etc.).In addition, K OAHave stronger temperature dependency: temperature is lower, K OABe worth larger.
The K of compound OAMainly method for measuring is obtained by experiment: produce post method, headspace gas chromatography, Solid-phase Microextraction, fugacity mensuration and gas chromatography retention time method.Forefathers by this several method obtained some typical environment pollutants (such as polychlorinated biphenyl (PCBs), palycyclic aromatic (PAHs), the K of PBDE (PBDEs) and bioxin (PCDD/Fs) etc. OAThis several method respectively has relative merits, and the compound property that is suitable for measuring is also different.Because chemicals is of a great variety, adopt experimental determining method to obtain K OACost is large, length consuming time, and also some compound lacks standard items, causes the difficulty of mensuration.According to the statistics of EPA, so far, has K OAThe compound of measured value of experiment only has the hundreds of kind, far can not satisfy the needs of chemicals risk assessment.Therefore, development or structure theoretical prediction method or model obtain the K of compound effective and rapidly OABe of great significance and value.Wherein, Study on Quantitative Structure-Activity Relationship relation (QSAR) model based on the molecule descriptor is a kind of commonly used and simple and effective method.
The ultimate principle of QSAR model is: the molecular structure of compound determines its character.QSAR is as a kind of microcomputer modelling method, can deeply excavate and characterize quantitative change rule and the cause-effect relationship of the character such as organic compound molecule structure and its physicochemical property, environmental behaviour, toxicological effect, thereby estimate the ecological risk of pollutant from molecular level.At present, the QSAR model has become the important tool of the risk assessment of pollutant environmental ecology and the risk assessment of human body Jiankang.In order to promote the QSAR model in the application aspect the chemicals risk management, the Organization for Economic Cooperation and Development (OECD) proposed the structure of QSAR in 2007 and has used guide rule, this guide rule thinks that a good QSAR model should possess following five standards: (1) clearly defined environment (activity) index; (2) clear and definite algorithm; (3) defined the application domain of model; (4) goodness of fit, robustness and the predictive ability of model there is the suitable evaluation; (5) preferably carry out mechanism explain.There have been at present many researchers to set up some compound Ks OAThe QSAR forecast model.These QSAR models are based on the group contribution model of molecular fragment a bit, such as document " Li, X. H.; Chen, J. W. et al. The fragment constant method for predicting octanol-air partition coefficients of persistent organic pollutants at different temperatures. Journal of Physical and Chemical Reference Data 2006,35, (3), 1365-1384. " adopt the method for multiple linear regression (MLR), set up typical persistence organic pollutant K OALinear QSAR model, the independent variable of model is the molecular fragment constant.It is easy that this model has application, and predictive ability is high, the characteristics that robustness is good.Also some is multiparameter linear free energy relationship (pp-LFER) model, for example " Chen, J. W.; Harner, T.; Yang, P.; Quan, X.; Chen, S.; Schramm, K. W.; Kettrup, A. Quantitative predictive models for octanol-air partition coefficients of polybrominated diphenyl ethers at different temperatures. Chemosphere 2003,51, (7), 577-584. " and " Chen, J. W.; Harner, T. et al. Quantitative relationships between molecular structures, environmental temperatures and octanol-air partition coefficients of polychlorinated biphenyls. Computational Biology and Chemistry 2003,27, (3), 405-421. " etc. based on the Molecular structure descriptor of compound; adopt the method for offset minimum binary (PLS) to set up the QSAR model, predicted respectively the K of PCBs and PBDEs OAModel adopts accumulation cross validation coefficient Q 2 CUMCharacterize robustness (robustness), and added this descriptor of temperature and considered K OAThe temperature dependency, obtained good prediction effect.Governing principle with reference to OECD can find that these QSAR models all belong to the quantitative structure activity relationship on local (local) meaning, often can only be applicable to the compound of a certain type.And some models lacks the accurate sign of application domain and lacks effectively that the external certificate collection carries out the discussion of model prediction ability.Simultaneously, for the stronger K of temperature dependency OA, also need to consider the impact of temperature.Therefore, be necessary to set up (global) QSAR model that an application domain covers the broad sense of variety classes compound, and temperature is added model as a variable.Simultaneously, according to the QSAR guide rule of OECD, after setting up model, need to carry out the sign of application domain and possible mechanism explain to it.
Generally speaking, prediction K OAThe calculated amount of QSAR method little, reliable results can be applicable to risk management.But the QSAR model depends on the K of compound strongly OAExperiment value (modeling and checking), and a good QSAR model only can be to the K of compound limited in its application domain OAMake prediction.These 2 major limitation that consist of its application.Therefore, based on free melting energy △ G OAK OAForecasting Methodology has many-sided advantage: (1) is based on free melting energy △ G OACalculating do not have in theory the restriction of application domain, be applicable to all organic compounds; (2) only need the coordinate information of input molecule, just can obtain comparatively quickly and accurately compound at the △ G of octanol in mutually OAThereby, obtain logK OAValue.In recent years, the development of quantum chemistry calculation is advanced by leaps and bounds, computational accuracy and speed perfect adaptation, the scholar that in addition theorizes, calculates solvent free energy by the solvation model and can obtain preferably precision in the understanding of alternate physical and chemical process and reasonably simulation molecule.In addition, the method for this HF Ab initio has been evaded the screening of a large amount of molecule descriptors from another point of view, only needs the most basic structure coordinate information of compound molecule, has stepped essential step for simplifying to calculate.Many external software engineering researchers invent many recessive solvation models of increasing income (models adopt parameter represent solvent), inquire into compound molecule and between medium, distribute, and the process of in solvent, dissolving and energy variation thereof.Such as, common IEFPCM solvation model in famous quantum chemistry calculation software Gauss (Gaussian), and a series of solvation model SMx(solvation models of Truhlar group report).In the document " Marenich; A. V. et al. Universal Solvation Model Based on the Generlizaed Born Approxiamtion with Asymmetric Descreening. J. Chem. Theory Comput 2009; 5; (9); 2447-2464. ", the author has delivered up-to-date SM8AD solvation model, calculates the free melting energy of compound molecule.This model is dissolved in situation in nearly all solvent applicable to all neutral molecules or charged ion.Parameterized result shows, 2560 free energy values that model calculates (different Neutral Solute molecules in water, acetonitrile, methyl alcohol and dimethyl sulfoxide (DMSO) free melting energy number and) be about 0.6kcal/mol without error in label, be adapted to the free melting energy of compound molecule in the n-octyl alcohol solvent and calculate.In general, as long as the principle of model is reliable, parametrization just can obtain the free melting energy value of target compound exactly with rationally approximate, thereby according to its K of formula (1) Accurate Prediction OATherefore, on the basis in conjunction with traditional Q SAR model method, be necessary attempt to introduce quantum chemistry solvation model, calculate compound in the free melting energy of octanol in mutually, thus the K of predictive compound OA
Summary of the invention
The invention provides a kind of method by the n-octyl alcohol air partition factor KOA under quantitative structure activity relationship and the solvation model prediction organic compound different temperatures.Present theoretical prediction n-octyl alcohol air partition factor (K OA) the QSAR method often can not fully satisfy the standard-required of the application criterion of OECD, the QSAR model application domain less (local) of setting up does not have effective predictive ability to characterize, and therefore is being applied to there is restriction aspect the chemicals risk management.How to set up wide area, can Accurate Prediction chemicals K OA, meet the QSAR method of OECD criterion and consider simultaneously K OAThe temperature dependency, be problem demanding prompt solution.In addition QSAR method calculation process relative complex, and depend critically upon experimental data.Therefore how outside QSAR, development is for compound K OAFast and effectively Quantum chemical calculation seem very necessary.The present invention is directed to above technical barrier, the method that provides QSAR model that a kind of combination meets the OECD guide rule and solvation free energy model to predict organic chemicals n-octyl alcohol-air partition factor.Thereby can be to the K of different types of compound under different temperatures OAValue predicts comparatively accurately, for risk assessment and the environment supervision of the organic chemicals of magnanimity provides necessary basic data.
The molecular structure of compound determines its physicochemical property and environmental behaviour, therefore, by the Dragon descriptor of computational representation compound molecule, set up the QSAR model, the characterization model application domain, thus can be used for predicting environmental behaviour parameter---the n-octyl alcohol/air partition factor of compound in the territory.For the situation of different temperatures, preferentially adopt the result of calculation of QSAR-T model.Compound outside QSAR model application domain, can based on quantum chemistry solvation model calculate determine 25 ℃ its in the free melting energy of n-octyl alcohol in mutually, also can predict K OA
The technical solution used in the present invention comprises the steps:
(1) based on the structure of the QSAR model of Dragon descriptor
In order to guarantee model that the present invention sets up and the data accuracy of method, collect the n-octyl alcohol of measuring in the document/air partition factor K OAValue obtains 936 Ks of 380 kinds of organic compounds under different temperatures OAData; To at random wherein 264 kinds, 654 K OAValue divides training set into, and all the other are checking collection (being used for the external certificate of QSAR model).;
Adopt Dragon software to calculate all Dragon descriptors of 264 kinds of training set compounds (molecule saves as the .mol form).The structure of model adopts the method for multiple linear regression (MLR) and offset minimum binary (PLS).(a) at first, adopt the stepwise multiple linear regression in the SPSS software to screen compound K OAThe descriptor that has been worth appreciable impact.Obtain optimum MLR equation, according to OECD guide rule (OECD, 2007. Guidance Document On The Validation Of (Quantitative) Structure-Activity Relationships [(Q) SAR] Models. Organisation for Economic Co-Operation and Development, Paris, France), the evaluation criterion of optimum MLR equation is to have the maximum correction coefficient of determination (R 2 Adj), the wherein tolerance expansion factor (VIF)<10 of each variable (descriptor), and the level of significance p of equation=0.001.R 2 AdjBe defined as follows:
Figure BDA0000249906801
Wherein, And y iRespectively the logK of i compound OAPredicted value and experiment value.
Figure BDA0000249906803
Experiment Training collection compound actual measurement logK OAWhat be worth is average.N refers to the sum of training set compound, is 264 kinds altogether, and p refers to the number of descriptor.Optimum MLR model need carry out next step analysis optimization.(b) the PLS algorithm in the employing SIMCA software is further removed the nuisance variable of MLR model.The design conditions of PLS algorithm are: cross validation number of times=7, and maximum iteration time=200, allowing the shortage of data ratio is 50%, the level of significance of PLS model limit=0.05 simultaneously.In per step PLS calculates, remove the minimum descriptor of weight (VIP index), carry out again next step match.Choose at last and have maximum R 2 AdjValue and maximum accumulation cross validation coefficient Q 2 CUMOptimum PLS model.Q 2 CUMCan be used for characterizing the robustness of a PLS model.In general, Q 2 CUM0.5 model robustness is better, Q 2 CUMThe 0.9 more excellent (Eriksson of model robustness, L., Jaworska, J. et al. 2003. Methods for reliability and uncertainty assessment and for applicability evaluations of classification-and regression-based QSARs. Environmental Health Perspectives 111,1361-1375.).Q 2 CUMBe defined as follows:
Q 2 CUM = 1.0 - ∏(PRESS/SS) a ,(a = 1, 2, …A) (3)
PRESS = Σ i Σ m ( Y im - Y im ^ ) 2 - - - ( 4 )
Wherein, SS refers to residual sum of squares (RSS), is for a PLS model that match is good, the quadratic sum of the difference of its predicted value and experiment value; PRESS refers to the prediction residual quadratic sum, is in a PLS checking, removes the again PLS equation of match of a variable, the quadratic sum of the predicted value of gained and the difference of experiment value.Wherein A is the variable number of PLS equation; Y ImIt is experiment; It is predicted value.In addition, the accuracy of model also with the variable number A of model, p is relevant for the level of significance limit.
In the structure of PLS model, can add this descriptor of temperature (Chen, J. W.; Harner, T. et al. Universal predictive models on octanol-air partition coefficients at different temperatures for persistent organic pollutants. Environ. Toxicol. Chem. 2004,23, (10), 2309-2317), and the descriptor that other have screened is added the K of temperature formation temperature dependency OAQSAR forecast model (QSAR-T).
(2) based on the QSAR model of Dragon descriptor and the QSAR-T model linear relation of temperature dependency.
① QSAR:logK OA = 0.509 + 0.986 × X1sol – 1.018 × Mor13v + 1.384 × H-050 – 1.528 × R5v – 0.015 × T(O..Cl) + 0.043 × HATS5v – 0.026 × RDF035m – 0.197 × RCI – 0.130 × n COOR – 0.077 × Mor15u – 0.077 × RDF090m
Wherein, X1sol represents solvent Connectivity Index of Electronic Density chi-1; Mor13v and Mor15u are the 3D-MoRSE descriptors; H-050 represents the heteroatomic fragment constant of the former sub-connection of H; R5v and HATS5v represent the GETAWAY descriptor; RDF035m and RDF090m represent the radial distribution function descriptor; RCI represents the aromatic index descriptor; T (O..Cl) be between O and Cl atom topology distance and; n COORThe number of ehter bond in the expression molecule.Wherein, the VIP value of X1sol is maximum, shows that it is to determine compound molecule K OAThe main descriptor of value.
In optimization model, logK OABe expressed as the function of 11 descriptor variables, training set data collection number n=264.R 2 Y (adj)=0.966, standard deviation S E=0.818, p<0.001 show that model has the good goodness of fit.Q 2 CUM=0.949 shows that the robustness of model is good.
② QSAR-T:logK OA = –3.03 + 3.13 × 10 2X1sol/T – 8.57 × 10Mor13v/T + 4.32 × 10 2H050/T – 1.27 × 10 3R5v/T – 5.54 × T(O..Cl)/T + 1.25 × 10 2HATS5v/T – 1.33 × 10RDF035m/T – 6.11 × 10RCI/T – 3.76 × 10n COOR/T + 1.56 × 10 2Mor15u/T – 5.49 × RDF090m/T + 1.04 × 10 3/T
Wherein every meaning is the same with the QSAR model, has just added the impact of temperature.In the model of optimum, the K of total n=654 different temperatures OAData.R 2 Y (adj)=0.963, standard deviation S E=0.463, p<0.001 show that model has the good goodness of fit.Q 2 CUM=0.959, show that model has good robustness.
(3) checking of QSAR and QSAR-T model and application domain characterize
QSAR(and QSAR-T) predictive ability of model need to check by external certificate.The checking collection is n=282 data altogether.The result of external certificate can be by square Q of outside prediction related coefficient 2 EXTAnd external certificate result's root-mean-square error RMSE represents.These two parameters are defined as respectively:
Figure BDA0000249906806
Figure BDA0000249906807
QSAR model application domain characterizes the four kinds of methods that adopt simultaneously: the descriptor Furthest Neighbor, euclidean distance (0.33 ~ 1.48), Urban Streets method (0.68 ~ 4.29) and probability density, can determine whether delocalization of compound, as long as there wherein have a kind of method to determine that compound is in to be overseas, then this compound is the delocalization compound; The limits of error ± 3RMSE in conjunction with prediction determines whether compound peels off simultaneously, thus the application domain of decision model.For the QSAR model, checking collection compound K in application domain OAThe result of prediction is: n=101(15 kind compound is overseas), Q 2 EXT=0.953, RMSE=0.820.Show that this QSAR model has good predictive ability.For the result n of QSAR-T model=282, RMSE=1.27, Q 2 EXT=0.870, show that this model can be used for the K under the prediction different temperatures OA
(4) adopt the solvation model to calculate solvent free energy prediction K OA
From the beginning the solvation model SM8AD that employing is increased income calculates compound molecule free melting energy in n-octyl alcohol when 298.15 K.And calculate logK by thermodynamic relation formula (1) OAValue.Calculate the coordinate file that only needs the input compound molecule.
Contrast SM8AD model and the QSAR model calculating linear fit result (predicted value and experiment value mapping) on 264 kinds of compounds of training set compares the related coefficient of match, slope k (leveling off to 1), root-mean-square error.Can find that the accuracy of SM8AD model prediction is higher, but weaker than the result of QSAR model.(SM8AD: R 2 = 0.860, k = 0.80, RMSE = 1.29; QSAR: R 2 = 0.966, k = 0.97, RMSE = 0.524)。If therefore compound is in outside QSAR model and the QSAR-T model application domain, can pay the utmost attention to the K with SM8AD model prediction compound OA
Method provided by the invention has following features:
1. according to the guide rule of OECD about QSAR model construction and use, the QSAR model of foundation has the good goodness of fit, robustness and predictive ability.
2. the application domain of model is wider, contains the organic compound of various structures, adds the impact of temperature, can be used for predicting the K of different compounds under different temperatures OA, for the global environmental behaviour analysis of organic chemicals and ecological risk assessment provide basic data.
3. the solvation model can be K OAPrediction new reference is provided, can remedy the deficiency of QSAR model prediction, reliably K is provided more OAPredicted data, and from the beginning algorithm is easy, application domain is unrestricted in theory, can extend to the K of all compounds OAPrediction.
The present invention can fast and effeciently predict under the varying environment temperature, the n-octyl alcohol of different organic contaminants/air partition factor.The method is with low cost, easy and quick, can save a large amount of human and material resources and financial resources.The foundation of the QSAR model that this invention relates to and checking are strictly according to the QSAR model construction of OECD regulation and the guide rule of use, and accurately and reliably, the HF Ab initio solvation model SM8AD with reference to wide area predicts the outcome Obtaining Accurate K simultaneously OA, for the chemicals supervision provides important basic data, and ecological risk assessment is had important directive significance.
Description of drawings
Fig. 1 compound logK OAQSAR model and solvation model prediction method key diagram.
Fig. 2 is training set logK OAMeasured value and the fitted figure of QSAR model predication value.
Fig. 3 is training set QSAR model prediction K OAResidual distribution.
Fig. 4 is the predicted value of QSAR-T model on training set and the Linear Fit Chart of measured value.
Fig. 5 is the predicted value of SM8AD solvation model on training set and the Linear Fit Chart of measured value.
Embodiment
Embodiment 1
Adopt the inventive method to predict a kind of biphenyl compound-polychlorinated biphenyl PCB-66(2,3 ', 4,4 '-tetrachlorobiphenyl) at 25 ℃ n-octyl alcohol/air partition factor K OAForecasting Methodology is:
(1) at first, obtain and preserve the file that PCB-66 is the .mol form with chemical mapping software.Use MOPAC software that it is carried out preliminary structure optimization; (2) calculate 11 kinds of Dragon descriptors in the QSAR model by Dragon software, the result is: X1sol=8.73, Mor13v=-0.33, H-050=0, R5v=0.118, T (O..Cl)=0, HATS5v=0.254, RDF035m=0.443, RCI=1.48, n COOR=0, Mor15u=0.155, RDF090m=0.334; (3) judge according to the decision method of application domain: 1. descriptor distance: because the descriptor that the training set compound is determined distance is X1sol ~ (1,13.35), Mor13v ~ (1.538,0.938), H-050 ~ (0,1,2), R5v ~ (0,0.319), T (O..Cl) ~ (0,100), HATS5v ~ (0,1.115), RDF035m ~ (0,59.03), RCI ~ (0,1.54), n COOR~ (0 or 1), Mor15u ~ (1.25,1.75), RDF090m ~ (0,53.63).These 11 kinds of descriptors all meet the application domain scope; 2. Euclidean distance: the Euclidean distance value that calculates is 0.41, is between (0.33-1.48), illustrates that this compound is in the territory; 3. city block distance: the city block distance value of calculating is 0.86, is in (0.68 ~ 4.29); 4. probability density distribution, the probability density that calculates is in training set probability density distribution value (〉 0.01) in the scope.11 kinds of descriptors that calculate all are in the scope, and therefore, this compound is in the QSAR model application domain of setting up.So can utilize this model predicts.Linear representation with above descriptor difference substitution QSAR model.
logK OA = 0.509 + 0.986 × 8.73 – 1.018 ×(-0.33)+ 1.384 × 0 – 1.528 × 0.118 – 0.015 × 0 + 0.043 × 0.254 – 0.026 × 0.443 – 0.197 × 1.48 – 0.130 × 0 – 0.077 × 0.155 – 0.077 × 0.334 = 9.02
The logK of gained OAPredicted value 9.02 approached with its measured value of experiment in 8.97 ten minutes, and the error of prediction is 0.05 log unit.Show this QSAR model prediction reliable results.
Embodiment 2
Adopt naphthalene polychloride compound Isosorbide-5-Nitrae of the inventive method prediction, 6,7-tetrachloronapthalene is at 283.15 K, 293.15 K, 298.15 K, 303.15 K, 313.15 K, the n-octyl alcohol under the 323.15 K temperature/air partition factor value also compares with experiment value.Utilize the QSAR-T model to the K of this compound under different temperatures OAPredict.Prediction steps is as follows:
At first, obtain and preserve Isosorbide-5-Nitrae with chemical mapping software, 6,7-naphthalene tetrachloride is the file of .mol form.Use MOPAC2000 software that it is carried out preliminary structure optimization.Calculate X1sol by Draogon software, Mor13v, H-050, R5v, T (O..Cl), HATS5v, RDF035m, RCI, n COOR, Mor15u and RDF090m.Judge again the judgement of QSAR application domain is carried out in these 11 kinds of descriptions, determine that target compound is in the application domain of QSAR-T model.Then with the temperature of descriptor computation value divided by correspondence, obtain the value of these 11 kinds of descriptors under the different temperatures, as shown in the table respectively:
The Dragon descriptor value that different temperatures is proofreaied and correct in the table 1 QSAR-T model
Figure BDA0000249906808
With the linear relation of the value substitution QSAR-T model of the descriptor of above temperature correction, prediction obtains the logK of under different temperatures (from low to high) OAValue is 8.012,8.597,8.201,7.83,7.484,7.158.Corresponding measured value is 8.13,8.85,8.42,7.87,7.43,7.12.Both compare, and residual values is between 0.038~0.253.Predicted value and experiment value are very approaching.Linear fit predicted value and experiment value, both coefficient R 2=0.996, demonstrate good correlativity, show that this QSAR-T forecast model is reliable.
Embodiment 3
An any given Biphenyl Ether compounds methoxyl polybrominated diphenyl ethers 6-OH-BDE-157.Same 11 kinds of Dragon descriptors that calculate it, value is respectively X1sol=12.77, Mor13v=-0.921, H-050=1, R5v=0.103, T (O..Cl)=0, HATS5v=0.245, RDF035m=84.685, RCI=1.41, n COOR=0, Mor15u=1.33, RDF090m=14.19.Decision method according to application domain is judged: 1. descriptor distance: the descriptor distance that the training set compound is determined is X1sol ~ (1,13.35), Mor13v ~ (1.538,0.938), H-050 ~ (0,1,2), R5v ~ (0,0.319), T (O..Cl) ~ (0,100), HATS5v ~ (0,1.115), RDF035m ~ (0,59.03), RCI ~ (0,1.54), n COOR~ (0 or 1), Mor15u ~ (1.25,1.75), RDF090m ~ (0,53.63).RDF035m is in outside the descriptor scope in these 11 kinds of descriptors, so this compound is overseas; 2. Euclidean distance: the Euclidean distance value that calculates is 0.829, is between (0.33-1.48); 3. city block distance: the city block distance value of calculating is 2.09, is in (0.68 ~ 4.29); 4. probability density distribution, the probability density value that calculates is 4.36, is in training set probability density distribution value (〉 0.01) in the scope.A kind of method definition compound is arranged overseas, therefore, this compound is overseas at the QSAR model.Can be with reference to using the SM8AD model to calculate.Same, molecular structure of compounds is saved as the .inp formatted file.Add the calculating control parameter value of model, concrete coordinate file form is as follows:
Figure BDA00002499068010
Figure BDA00002499068011
$END
The free melting energy that calculates is 11.8kcal/mol, by relational expression logK OA=-△ G OA/ 2.303RT converses logK OAValue is 25 ℃ of 10.68(room temperatures), inquire its logK OAMeasured value is 10.95, and both differences are 0.27 log unit, proves that the result of calculation of SM8AD solvation model is more accurate.
Embodiment 4
Adopt the K of the inventive method prediction dimethylformamide (Dimethylformamide) compound when 25 ℃ of room temperatures OAValue.At first, calculating its Dragon descriptor is: X1sol=2.27, Mor13v=-0.01, H-050=0, R5v=0, T (O..Cl)=0, HATS5v=0, RDF035m=1.14, RCI=0, n COOR=0, Mor15u=-0.11, RDF090m=0.Decision method according to application domain is judged: 1. descriptor distance: because the descriptor that the training set compound is determined distance is X1sol ~ (1,13.35), Mor13v ~ (1.538,0.938), H-050 ~ (0,1,2), R5v ~ (0,0.319), T (O..Cl) ~ (0,100), HATS5v ~ (0,1.115), RDF035m ~ (0,59.03), RCI ~ (0,1.54), n COOR~ (0 or 1), Mor15u ~ (1.25,1.75), RDF090m ~ (0,53.63).These 11 kinds of descriptors all are in separately the descriptor scope; 2. Euclidean distance: the Euclidean distance value that calculates is 0.81, is between (0.33-1.48), illustrates that this compound is in the territory; 3. city block distance: the city block distance value of calculating is 1.37, is in (0.68 ~ 4.29); 4. probability density distribution, the probability density that calculates is in training set probability density distribution value (〉 0.01) in the scope.Judge that according to four kinds of methods this compound is in the application domain of QSAR model, substitution QSAR linear relation calculates logK OA=2.73, with experiment value logK OA=4.38 residual errors (1.65) are larger, training set ± 3RMSE(-1.57,1.57) outside, therefore belong to the outlier of prediction.Attempt adopting SM8AD solvation model to calculate.Result of calculation is logK OA=3.54, relatively more credible.
Embodiment 5
Adopt the inventive method to predict a kind of organo-chlorine pesticide DDT(p, the K when 25 ℃ of room temperatures of p '-DDT) OAValue.At first, calculating its Dragon descriptor is: X1sol=10.20, Mor13v=-0.31, H-050=0, R5v=0.151, T (O..Cl)=0, HATS5v=0.381, RDF035m=3.993, RCI=1.4, n COOR=0, Mor15u=0.157, RDF090m=0.Decision method according to application domain is judged: 1. descriptor distance: because the descriptor that the training set compound is determined distance is X1sol ~ (1,13.35), Mor13v ~ (1.538,0.938), H-050 ~ (0,1,2), R5v ~ (0,0.319), T (O..Cl) ~ (0,100), HATS5v ~ (0,1.115), RDF035m ~ (0,59.03), RCI ~ (0,1.54), n COOR~ (0 or 1), Mor15u ~ (1.25,1.75), RDF090m ~ (0,53.63).These 11 kinds of descriptors all are in separately the descriptor scope; 2. Euclidean distance: the Euclidean distance value that calculates is 0.48, is between (0.33-1.48), illustrates that this compound is in the territory; 3. city block distance: the city block distance value of calculating is 0.94, is in (0.68 ~ 4.29); 4. probability density distribution, the probability density that calculates is in training set probability density distribution value (〉 0.01) in the scope.Judge that according to four kinds of methods this compound is in the application domain of QSAR model, substitution QSAR linear relation calculates logK OA=10.28.And p, p '-DDT tests logK OA=9.82.Show this QSAR model prediction reliable results, be applicable to comprise the variety classes compound K of organic agricultural chemicals OAPrediction.

Claims (2)

1. by the n-octyl alcohol air partition factor K under quantitative structure activity relationship and the solvation model prediction different temperatures OAMethod, its feature may further comprise the steps:
(1) data collection and division: the n-octyl alcohol of measuring/air partition factor K in the collection document OAValue obtains 936 Ks of 380 kinds of organic compounds under different temperatures OAData; To at random wherein 264 kinds, 654 K OAValue divides training set into, and all the other are the checking collection;
(2) QSAR and QSAR-T model construction: logK when adopting multiple linear regression and offset minimum binary method to make up 25 ℃ OAWith the quantitative structure activity relationship QSAR model of training set compound molecule Dragon descriptor, expression formula is:
LogK OA=0.509+0.986 * X1sol – 1.018 * Mor13v+1.384 * H-050 –, 1.528 * R5v – 0.015 * T (O..Cl)+0.043 * HATS5v –, 0.026 * RDF035m –, 0.197 * RCI –, 0.130 * n COOR– 0.077 * Mor15u – 0.077 * RDF090m, wherein, X1sol represents solvent Connectivity Index of Electronic Density chi-1; Mor13v and Mor15u are the 3D-MoRSE descriptors; H-050 represents the heteroatomic fragment constant of the former sub-connection of H; R5v and HATS5v represent the GETAWAY descriptor; RDF035m and RDF090m represent the radial distribution function descriptor; RCI represents the aromatic index descriptor; T (O..Cl) be between O and Cl atom topology distance and; n COORThe number of ehter bond in the expression molecule; LogK when after the Dragon of this QSAR model descriptor adds temperature correction, making up-10-50 ℃ OAAnd the temperature dependency QSAR-T model between the Dragon descriptor of proofreading and correct, expression formula is:
LogK OA=– 3.03+3.13 * 10 2X1sol/T – 8.57 * 10Mor13v/T+4.32 * 10 2H050/T – 1.27 * 10 3R5v/T – 5.54 * T (O..Cl)/T+ 1.25 * 10 2HATS5v/T – 1.33 * 10RDF035m/T – 6.11 * 10RCl/T – 3.76 * 10nCOOR/T+1.56 * 10 2Mor15u/T – 5.49 * RDF090m/T+1.04 * 10 3/ T; The correction coefficient of determination R of this QSAR and QSAR-T model 2 Adj0.9, accumulation cross validation coefficient Q 2 CUM0.9;
(3) checking of QSAR and QSAR-T model and application domain characterize: the result of QSAR and QSAR-T model is with square Q of outside prediction related coefficient 2 EXTRMSE represents with root-mean-square error; The compound application domain of QSAR and QSAR-T model is identical, all adopts simultaneously following four kinds of application domain characterizing methods: descriptor distance range method, euclidean distance, city block distance method and probability density distribution method;
(4) solvation model: adopt the solvation model SM8AD that increases income of HF Ab initio to calculate organic compound molecule free melting energy △ G in n-octyl alcohol in the time of 25 ℃ OA, by K OAThermodynamic principles formula logK OA=-△ G OA/ 2.303RT calculates logK OAValue;
(5) K of unknown compound OAPrediction: calculate the Dragon descriptor of unknown compound, judge whether this descriptor is in QSAR and the QSAR-T model application domain; If, the K when then adopting 25 ℃ of QSAR model predictions OAValue obtains K under other temperature such as need OA, then adopt the QSAR-T model prediction; If be in overseasly, then adopt solvation model SM8AD to calculate and obtain K OA
2. method according to claim 1 is characterized in that, described compound comprises alkanes, alcohols, ethers, ketone, carboxylic acids and substituent, benzene, biphenyl, phenol, palycyclic aromatic and substituted compound thereof, organic agricultural chemicals.
CN2012105059356A 2012-11-30 2012-11-30 Method for predicting n-octyl alcohol air distribution coefficient (KOA) at different temperatures through quantitative structure-activity relationship and solvent model Pending CN102999705A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012105059356A CN102999705A (en) 2012-11-30 2012-11-30 Method for predicting n-octyl alcohol air distribution coefficient (KOA) at different temperatures through quantitative structure-activity relationship and solvent model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012105059356A CN102999705A (en) 2012-11-30 2012-11-30 Method for predicting n-octyl alcohol air distribution coefficient (KOA) at different temperatures through quantitative structure-activity relationship and solvent model

Publications (1)

Publication Number Publication Date
CN102999705A true CN102999705A (en) 2013-03-27

Family

ID=47928264

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012105059356A Pending CN102999705A (en) 2012-11-30 2012-11-30 Method for predicting n-octyl alcohol air distribution coefficient (KOA) at different temperatures through quantitative structure-activity relationship and solvent model

Country Status (1)

Country Link
CN (1) CN102999705A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488901A (en) * 2013-09-25 2014-01-01 大连理工大学 Method for adopting quantitative structure-activity relationship model to predicting soil or sediment adsorption coefficients of organic compound
CN103714220A (en) * 2014-01-07 2014-04-09 中国科学院烟台海岸带研究所 Method for predicting elimination speed of persistent organic pollutants on coastal zones
CN105548463A (en) * 2015-11-26 2016-05-04 昆明理工大学 Method for predicating adsorption rate of sulfur-containing compounds in atmosphere
CN105678069A (en) * 2016-01-06 2016-06-15 昆明理工大学 Method for predicting elimination rate coefficient of gas state sulfur compound on low-temperature hydrolysis condition
CN105868540A (en) * 2016-03-25 2016-08-17 哈尔滨理工大学 A polycyclic aromatic hydrocarbon property/toxicity prediction method using an intelligent support vector machine
CN107516016A (en) * 2017-08-30 2017-12-26 华南理工大学 A kind of method by building the silicone oil air distribution coefficient of quantitative structure activity relationship model prediction hydrophobic compound
CN110110934A (en) * 2019-05-10 2019-08-09 大连民族大学 A method of pollutant plant-atmosphere distribution coefficient is predicted based on plant growth difference factor
CN110534163A (en) * 2019-08-22 2019-12-03 大连理工大学 Using the method for the Octanol/water Partition Coefficients of multi-parameter linear free energy relationship model prediction organic compound
CN111189869A (en) * 2018-11-15 2020-05-22 中国科学院大连化学物理研究所 Method for measuring key parameters of semi-volatile organic compound release in building decoration material
CN111613266A (en) * 2020-05-20 2020-09-01 中南大学 Outlier detection method based on quantitative structure-activity relationship
CN113591394A (en) * 2021-08-11 2021-11-02 清华大学 Method for predicting organic compound n-hexadecane/air distribution coefficient
CN113705008A (en) * 2021-08-31 2021-11-26 扬州大学 Prediction model, modeling method and prediction method for distribution coefficient of POPs between XAD films and air
CN116312854A (en) * 2023-03-06 2023-06-23 杭州以勒标准技术有限公司 Method for predicting n-octanol water distribution coefficient of sulfamethoxazole substances
CN116246717B (en) * 2021-12-08 2024-06-28 中国科学院大连化学物理研究所 Additive screening method for improving solubility of ferrocyanide in water

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101673321A (en) * 2009-10-17 2010-03-17 大连理工大学 Method for fast predicting organic pollutant n-caprylic alcohol/air distribution coefficient based on molecular structure
CN102507630A (en) * 2011-11-30 2012-06-20 大连理工大学 Method for forecasting oxidation reaction rate constant of chemical substance and ozone based on molecular structure and environmental temperature

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101673321A (en) * 2009-10-17 2010-03-17 大连理工大学 Method for fast predicting organic pollutant n-caprylic alcohol/air distribution coefficient based on molecular structure
CN102507630A (en) * 2011-11-30 2012-06-20 大连理工大学 Method for forecasting oxidation reaction rate constant of chemical substance and ozone based on molecular structure and environmental temperature

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李雪花: "有毒有机污染物正辛醇/空气分配系数(KOA)的定量预测方法", 《中国博士学位论文全文数据库 工程科技I辑》, 31 May 2009 (2009-05-31) *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488901A (en) * 2013-09-25 2014-01-01 大连理工大学 Method for adopting quantitative structure-activity relationship model to predicting soil or sediment adsorption coefficients of organic compound
CN103488901B (en) * 2013-09-25 2016-06-22 大连理工大学 Adopt the soil of Quantitative structure-activity relationship model prediction organic compound or the method for sediment sorption coefficients
CN103714220B (en) * 2014-01-07 2017-01-11 中国科学院烟台海岸带研究所 Method for predicting elimination speed of persistent organic pollutants on coastal zones
CN103714220A (en) * 2014-01-07 2014-04-09 中国科学院烟台海岸带研究所 Method for predicting elimination speed of persistent organic pollutants on coastal zones
CN105548463A (en) * 2015-11-26 2016-05-04 昆明理工大学 Method for predicating adsorption rate of sulfur-containing compounds in atmosphere
CN105548463B (en) * 2015-11-26 2017-11-10 昆明理工大学 A kind of method of the sulfur-containing compound rate of adsorption in prediction air
CN105678069A (en) * 2016-01-06 2016-06-15 昆明理工大学 Method for predicting elimination rate coefficient of gas state sulfur compound on low-temperature hydrolysis condition
CN105868540A (en) * 2016-03-25 2016-08-17 哈尔滨理工大学 A polycyclic aromatic hydrocarbon property/toxicity prediction method using an intelligent support vector machine
CN105868540B (en) * 2016-03-25 2018-04-13 哈尔滨理工大学 Forecasting Methodology using Intelligent Support vector machine to polycyclic aromatic hydrocarbon property/toxicity
CN107516016A (en) * 2017-08-30 2017-12-26 华南理工大学 A kind of method by building the silicone oil air distribution coefficient of quantitative structure activity relationship model prediction hydrophobic compound
CN107516016B (en) * 2017-08-30 2021-01-19 华南理工大学 Method for predicting silicone oil-air distribution coefficient of hydrophobic compound by structure mode
CN111189869B (en) * 2018-11-15 2022-04-08 中国科学院大连化学物理研究所 Method for determining SVOC release key parameters in building decoration material
CN111189869A (en) * 2018-11-15 2020-05-22 中国科学院大连化学物理研究所 Method for measuring key parameters of semi-volatile organic compound release in building decoration material
CN110110934A (en) * 2019-05-10 2019-08-09 大连民族大学 A method of pollutant plant-atmosphere distribution coefficient is predicted based on plant growth difference factor
CN110110934B (en) * 2019-05-10 2021-03-30 大连民族大学 Method for predicting pollutant plant-atmosphere distribution coefficient based on plant growth difference factor
CN110534163A (en) * 2019-08-22 2019-12-03 大连理工大学 Using the method for the Octanol/water Partition Coefficients of multi-parameter linear free energy relationship model prediction organic compound
CN110534163B (en) * 2019-08-22 2022-09-06 大连理工大学 Method for predicting octanol/water distribution coefficient of organic compound by adopting multi-parameter linear free energy relation model
CN111613266A (en) * 2020-05-20 2020-09-01 中南大学 Outlier detection method based on quantitative structure-activity relationship
CN113591394A (en) * 2021-08-11 2021-11-02 清华大学 Method for predicting organic compound n-hexadecane/air distribution coefficient
CN113591394B (en) * 2021-08-11 2024-02-23 清华大学 Method for predicting n-hexadecane/air distribution coefficient of organic compound
CN113705008A (en) * 2021-08-31 2021-11-26 扬州大学 Prediction model, modeling method and prediction method for distribution coefficient of POPs between XAD films and air
CN116246717B (en) * 2021-12-08 2024-06-28 中国科学院大连化学物理研究所 Additive screening method for improving solubility of ferrocyanide in water
CN116312854A (en) * 2023-03-06 2023-06-23 杭州以勒标准技术有限公司 Method for predicting n-octanol water distribution coefficient of sulfamethoxazole substances

Similar Documents

Publication Publication Date Title
CN102999705A (en) Method for predicting n-octyl alcohol air distribution coefficient (KOA) at different temperatures through quantitative structure-activity relationship and solvent model
Lian et al. CN-China: Revised runoff curve number by using rainfall-runoff events data in China
Uddin et al. A novel approach for estimating and predicting uncertainty in water quality index model using machine learning approaches
Man et al. Forecasting COD load in municipal sewage based on ARMA and VAR algorithms
Sabouri et al. Impervious surfaces and sewer pipe effects on stormwater runoff temperature
Tzuc et al. Modeling of hygrothermal behavior for green facade's concrete wall exposed to nordic climate using artificial intelligence and global sensitivity analysis
Song et al. Parameter identification and global sensitivity analysis of Xin'anjiang model using meta-modeling approach
Roshni et al. Development and evaluation of hybrid artificial neural network architectures for modeling spatio-temporal groundwater fluctuations in a complex aquifer system
Zhou et al. Impacts of building configurations on urban stormwater management at a block scale using XGBoost
CN103488901B (en) Adopt the soil of Quantitative structure-activity relationship model prediction organic compound or the method for sediment sorption coefficients
Alamdari et al. Evaluating the impact of climate change on water quality and quantity in an urban watershed using an ensemble approach
Sabouri et al. Event-based stormwater management pond runoff temperature model
Donatelli et al. A generic framework for evaluating hybrid models by reuse and composition–a case study on soil temperature simulation
Gu et al. Achieving the objective of ecological planning for arid inland river basin under uncertainty based on ecological risk assessment
CN104573863B (en) Predict organic compound and the method for hydroxyl radical reaction speed constant in aqueous phase
Wu et al. Runoff modeling in ungauged catchments using machine learning algorithm-based model parameters regionalization methodology
Abdul-Wahab et al. Optimization of multistage flash desalination process by using a two-level factorial design
Kabir et al. Investigating capabilities of machine learning techniques in forecasting stream flow
Li et al. Role of multimodel combination and data assimilation in improving streamflow prediction over multiple time scales
de Souza et al. Regional flood frequency analysis and uncertainties: Maximum streamflow estimates in ungauged basins in the region of Lavras, MG, Brazil
Rautela et al. Modelling of streamflow and water balance in the Kuttiyadi River Basin using SWAT and remote sensing/GIS tools
Nayak et al. A novel framework to determine the usefulness of satellite-based soil moisture data in streamflow prediction using dynamic Budyko model
Hou et al. Parameter sensitivity analysis and optimization of Noah land surface model with field measurements from Huaihe River Basin, China
Li et al. Impacts of climate and reservoirs on the downstream design flood hydrograph: a case study of Yichang Station
Huang et al. Eutrophication prediction using a markov chain model: Application to lakes in the Yangtze River basin, China

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130327