WO2016179864A1 - 一种基于金属定量构效关系的淡水急性基准预测方法 - Google Patents

一种基于金属定量构效关系的淡水急性基准预测方法 Download PDF

Info

Publication number
WO2016179864A1
WO2016179864A1 PCT/CN2015/080631 CN2015080631W WO2016179864A1 WO 2016179864 A1 WO2016179864 A1 WO 2016179864A1 CN 2015080631 W CN2015080631 W CN 2015080631W WO 2016179864 A1 WO2016179864 A1 WO 2016179864A1
Authority
WO
WIPO (PCT)
Prior art keywords
metal
toxicity
model
acute
prediction
Prior art date
Application number
PCT/CN2015/080631
Other languages
English (en)
French (fr)
Inventor
吴丰昌
穆云松
赵晓丽
王颖
白英臣
廖海清
Original Assignee
中国环境科学研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国环境科学研究院 filed Critical 中国环境科学研究院
Publication of WO2016179864A1 publication Critical patent/WO2016179864A1/zh
Priority to US15/659,608 priority Critical patent/US10650914B2/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/30Prediction of properties of chemical compounds, compositions or mixtures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B99/00Subject matter not provided for in other groups of this subclass

Definitions

  • the invention relates to the field of freshwater water quality reference model, in particular to a freshwater acute baseline prediction method based on metal quantitative structure-activity relationship.
  • Metal pollution is one of the most challenging environmental issues of this century. Excessive heavy metals enter the natural environment, destroying biodiversity and causing harmful effects on the ecological environment and human health. Establishing a scientific environmental benchmark for metals is the foundation for environmental protection and risk assessment. The first country in the world to conduct benchmark research is the United States. China's existing benchmark system mainly copies or draws on foreign achievements and lacks scientificity. In the latest benchmark document, 15 metals were listed in the list of controlled pollutants and non-optimized pollutants, but only 10 metals have reference values. The lack of water quality reference values for most metals is mainly due to insufficient biotoxicity data, followed by environmental factors, and only metal, nickel and other metal benchmarks are more in-depth. Currently, standardized biotoxicity testing is the only way to obtain baseline values.
  • the quantitative structure activity correlation (QSAR) method uses statistical analysis to find the intrinsic relationship between the structure and biological activity of target pollutants. As an effective means of toxicological mechanism research, it has been widely used in the prediction and evaluation of various toxic effects.
  • the QSAR method is not limited by experimental conditions and test instruments.
  • Various computational chemistry and data mining techniques are used to study and predict the biological activity of pollutants. Therefore, it has a particularly obvious advantage in the face of bulk pollutants and multi-test species.
  • QSAR is showing unique appeal in terms of toxicity prediction and risk assessment. It is well known that ionic form is the most active form of metal, and the biological activity of dissolved metals is closely related to the free ion concentration.
  • Newman et al. established the QSAR equation and predicted metal toxicity using the toxicity data of marine luminescent bacteria (V. fischeri). It was found that the first hydrolysis constant
  • Bogaerts et al. also evaluated the relationship between the toxic effects of protozoa (T. pyriformis) and the physicochemical properties of metal ions.
  • the metal ion soft index ⁇ p is the best modeling parameter for the toxicity prediction equation.
  • the above methods are based on a single-parameter prediction model for a single species, lacking systemic toxicity to multiple species in the ecosystem. Forecasting and analysis, the model's predictive power and application range is very limited.
  • the creator of the present invention finally obtained the creation after a long period of research and practice.
  • the present invention provides an acute baseline prediction method for freshwater based on quantitative structure-activity relationship of metals, which predicts the toxicity end point of unknown metals according to the quantitative relationship between the structural characteristics of heavy metal ions and the acute toxicity effects of aquatic organisms, and combines the sensitivity of different species. Degree distribution analysis to derive the dangerous concentrations that protect different proportions of aquatic organisms;
  • Step a modeling toxicity data collection, screening, calculation and summarization
  • Step b biological screening of the five-door eight families aquatic model
  • Step c constructing a metal ion structure descriptor data set, performing linear correlation analysis on the structural parameters of the respective metals as independent variables, and sorting the correlation coefficients to obtain the top two structural descriptors;
  • Step d construct a toxicity prediction model and a robustness test; establish a multiple regression equation, estimate the parameters, and use the P value corresponding to the F statistic to test;
  • Step e internal verification of the QSAR model
  • Step f the model applicable range calculation; the verified model, with the leverage value h as the abscissa, and the standard residual of each data point as the ordinate, draw the Williams diagram;
  • Step g using the obtained toxicity prediction value and species sensitivity analysis to quickly screen and predict the toxicity and baseline prediction values of unknown metals.
  • step c the linearity correlation analysis is performed on the structural parameter of each metal as the dependent variable, and the correlation coefficient r is calculated according to the following formula (1);
  • the correlation coefficient r>0.8 is a significant correlation parameter.
  • a metal ion structure descriptor set is constructed, including a soft index ⁇ p, a maximum complex stability constant log- ⁇ n , a Pauling electronegativity X m , a covalent index X m 2 r , atomic ionization potential AN/ ⁇ IP, first hydrolysis constant
  • step d is:
  • Step d1 construction of multiple regression equations and parameter estimation
  • the two optimal structural parameters determined in the above step d are the independent variable X, and the metal activity value is the dependent variable Y.
  • n is the number of observations.
  • Step d2 the goodness test of the goodness of fit test and the regression equation, using the F test;
  • Goodness of fit test is a model index: correlation coefficient R 2 level and the degree of freedom of the correction coefficient Standard deviation RMSE;
  • the index of the F test is the F-value calculated by multi-factor analysis of variance (Multi-ANOVA) and the correlation probability p (Significance F); the P value corresponding to the F statistic is used for testing;
  • Step d3 criterion: according to the toxicity data acquisition route, in vitro experiment R 2 ⁇ 0.81, in vivo test R 2 ⁇ 0.64; significant level is ⁇ , when p ⁇ ⁇ , the regression equation is significant.
  • step d3 is calculated according to the following formula,
  • R 2 represents the square of the correlation coefficient
  • R 2 represents a correction coefficient of the degree of freedom
  • RMSE indicate standard deviations.
  • step e is:
  • step e1 in a given modeling sample, most of the samples are selected for modeling, and a small portion of the samples are predicted by the established model, and the prediction error of the small portion of the samples is calculated;
  • step e2 the square sum of the prediction errors in each equation is recorded until all samples are predicted once and only once;
  • step e3 the cross-validation correlation coefficient Q 2 cv and the cross-validation root mean square error RMSECV are calculated, and the discrimination basis is: Q 2 cv >0.6, and R 2 -Q 2 cv ⁇ 0.3.
  • x i represents the column vector of the structural parameters of the ith metal; for the two-parameter model, X T represents the transposed matrix of the matrix X, and (X T X) -1 represents the inverse matrix of the X T X matrix.
  • n the number of compounds in the model training set, which is determined according to the number of metals in the training set in each QSAR equation after the step a-e is verified;
  • the coordinate space of h ⁇ h* in the Williams diagram is the applicable range of the model.
  • step g is:
  • Step g1 sequentially obtaining a two-parameter QSAR prediction equation of the preferred eight families of aquatic organisms according to the method described in the above steps a-f;
  • Step g1 collecting and sorting the values of all the structural descriptors appearing in the eight equations of the metal to be predicted, and substituting the equation to calculate an acute toxicity end point of the metal to be predicted for each species;
  • Step g3 the metal toxicity data of each species corresponding to each metal is sorted from low to high, and the species sensitivity map is constructed with the cumulative percentage ordinate;
  • step g4 the curve is fitted by a nonlinear Sigmoidal-Logistic fitting equation, and the corresponding concentrations of HC 5 , HC 10 and HC 20 are calculated according to the fitting equations with cumulative percentages of 0.05, 0.1 and 0.2.
  • the beneficial effects of the present invention are as follows: 1.
  • the prior art only predicts the toxicity end point of a single species, the model prediction is not accurate enough, and the prediction error is about two orders of magnitude.
  • the invention systematically screens the five aquatic species as the smallest biological prediction set, and constructs a multi-parameter toxicity prediction model to improve the accuracy and prediction ability of the model.
  • the toxicity end point value is obtained by experimental testing means, and then the species sensitivity analysis is performed to derive the reference value.
  • This patent predicts the toxicity value of various metals by the QSAR model method. It is fast, simple, and relies on less test data to complete the prediction of metals with multiple toxicity data.
  • FIG. 1 is a schematic flow chart of a freshwater acute baseline prediction method based on a quantitative structure-activity relationship of a metal according to the present invention
  • Figure 2a is a Williams diagram 1 for evaluating the scope of application of the model of the present invention
  • Figure 2b is a Williams diagram 1 for evaluating the scope of application of the model of the present invention
  • Figure 3 is a plot of species sensitivity distribution for mercury toxicity predictions of the present invention.
  • the principle of the present invention is to predict the toxicity end point of unknown metals based on the quantitative relationship between the structural characteristics of heavy metal ions and the acute toxicity effects of aquatic organisms, and to combine the sensitivity distribution analysis of different species to derive the dangerous concentrations of 5%, 10% and 20% aquatic organisms. . It is a method for predicting the reference value of unknown reference by establishing the QSAR metal toxicity prediction model by combining the physical and chemical structure parameters of heavy metals and the toxicity mechanism of different aquatic organisms.
  • FIG. 1 is a schematic flowchart of a freshwater acute baseline prediction method based on a quantitative structure-activity relationship of a metal according to the present invention.
  • the specific process is as follows:
  • Step a modeling toxicity data collection, screening, calculation and summarization
  • Step a1 data collection process
  • Step a2 the data screening process; the conditions for data screening are:
  • Acute toxicity data for each species must be from the same test source, same study group and same test conditions;
  • Each species contains toxicity data for at least 6 metals
  • Toxicity endpoint data types include mortality, growth rate, and reproductive rate, expressed as EC 50 or LC 50 ;
  • Toxicity testing must be carried out in a standard operating procedure under a range of environmental conditions
  • Step a3 the data operation process; the operation method in the embodiment of the present invention is:
  • the metal free ion concentration is used as a measure of the data, such as the mass concentration divided by the molecular weight and converted into a molar concentration, that is, mol/L.
  • Step a4 the data summary process:
  • the resulting data set includes the molecular formula of the metal compound, the type of organism tested, the type of toxic effect, the endpoint, the test conditions, the exposure time, and the source of the data.
  • the modeled acute toxicity data was prioritized from the US Environmental Protection Agency's ECOTOX Toxicity Database (http://cfpub.epa.gov/ecotox/). If the toxicity data is insufficient, supplement it with the SIR Science Citation Index for nearly 30 years. Through the database and literature search engine, enter the metal name, the name of the species to be tested and the acute toxicity to derive the toxicity data set that meets the conditions. Qualified toxicity data were screened out on the premise that the conditions of step a2 were met. The metal free ion concentration is used as a measure of the data, if the raw data is based on the ionic compound mass as the toxicity end point indicator.
  • Step b biological screening of the five-door eight families aquatic model
  • Acute model organisms are screened out of freshwater based on the Sanmenba creatures recommended by the US Environmental Protection Agency for deriving water quality benchmarks.
  • Five species of eight-eight-mode organisms sensitive to heavy metals including three species of planktonic arthropods, two species of chordate, one species of mollusks, rotifers and duckweed.
  • the corresponding toxicity data should be in strict accordance with the requirements of data collection and screening, and the acute toxicity data of each species should be summarized in turn. If the number of species that meet the requirements exceeds the minimum number of species, a wide variety of tested metals are selected for modeling.
  • model organisms For example, through data collection, there are five types of species that meet the conditions in planktonic crustaceans, sorted according to the number of metal elements tested, and the first three are selected as model organisms. After the screening of the model organisms, determine the scientific naming of the eight organisms, the gate and the section.
  • Step c constructing a metal ion structure descriptor data set
  • Step c1 taking the toxicity end point of the single species as the dependent variable, the structural parameter corresponding to each metal as the independent variable for linear correlation analysis, and calculating the Pearson correlation coefficient r according to the following formula (1);
  • x i and y i represent the structural parameters and measured toxicity values of the i-th metal, respectively.
  • the average values of the structural parameters and the measured toxicity values are respectively indicated.
  • the correlation coefficient r>0.8 is a significant correlation parameter. Using Pearson correlation can easily and objectively measure the degree of association between two factors.
  • step c2 under the premise of significant correlation, the structure descriptors ranked in the first two bits are obtained by sorting the correlation coefficients.
  • the structural parameters significantly related to toxicity are screened by the correlation coefficient r, and the pseudo-correlation parameters are introduced into the model.
  • Step d construct a toxicity prediction model and a robustness test
  • Step d1 construction of multiple regression equations and parameter estimation
  • the two optimal structural parameters determined in the above step d are the independent variable X, and the metal activity value is the dependent variable Y.
  • Equation (2) uses multiple linear regression to establish the relationship between two different structural parameters and metal toxicity values, and to fully and accurately express the relationship between predicted objects and related factors.
  • Least squares regression is a parameter estimation of the regression model from the angle of error fitting. It is a standard multivariate modeling tool, especially suitable for predictive analysis.
  • Step d2 the goodness of fit test and the significance test of the regression equation (F test);
  • the goodness-of-fit test indicators for the model are: the square of the correlation coefficient (R 2 ) and the correlation coefficient of the degree of freedom correction. Standard deviation (RMSE).
  • the index of the F test is the F value calculated by multi-factor analysis of variance (Multi-ANOVA) and the correlation probability p (Significance F). The P value corresponding to the F statistic is usually used for the test.
  • Step d3 criterion: according to the toxicity data acquisition route, in vitro experiment R 2 ⁇ 0.81, in vivo test R 2 ⁇ 0.64.
  • the significance level is ⁇ , and when p ⁇ ⁇ , the regression equation is significant.
  • y i represents the measured toxicity value of the i-th metal
  • equation (6) is a general method for checking whether the linear relationship between the dependent variable and multiple independent variables is significant.
  • Step e internal verification of the QSAR model
  • the QSAR model of each species should also be verified by the pumping method.
  • the core idea of the method is to randomly draw from the training set. A data is generated, and the multivariate regression model is established by using other toxicity data and the optimal structure descriptor obtained in step c, and the established network model is verified according to the comparison between the predicted values of the extracted data and the experimental values.
  • a sample data set is divided into multiple different times to obtain different complementary subsets, and multiple cross-validations are performed. In this step, the average value of multiple verifications is taken as the verification result.
  • step e1 in a given modeling sample, most of the samples are selected for modeling, and a small portion of the samples are predicted by the established model, and the prediction error of the small portion of the samples is calculated;
  • step e2 the square sum of the prediction errors in each equation is recorded until all samples are predicted once and only once;
  • step e3 the cross-validation correlation coefficient Q 2 cv and the cross-validation root mean square error RMSECV are calculated, and the calculation formula is as follows; the discrimination basis is: Q 2 cv >0.6, R 2 -Q 2 cv ⁇ 0.3;
  • Equations (7) and (8) are the indication parameters of the internal verification of the pumping method, which can effectively reduce the over-fitting of the model to the training set data, and determine whether the specific metal has a certain influence on the robustness of the model in the training set.
  • Step f the model is applied to the range calculation
  • the verified model uses the leverage method to calculate the scope of the model and is visually represented by the Williams diagram. This method ensures that the model has the best reliability in the prediction process.
  • x i represents the column vector of the structural parameters of the ith metal; for the two-parameter model, X T represents the transposed matrix of the matrix X, and (X T X) -1 represents the inverse matrix of the X T X matrix.
  • n the number of compounds in the model training set, which is determined according to the number of metals in the training set in each QSAR equation after the step a-e.
  • the Williams diagram is drawn by taking the leverage value h as the abscissa and the standard residual of each data point as the ordinate.
  • the coordinate space of h ⁇ h* is the applicable range of the model.
  • Step g using the obtained toxicity prediction value and species sensitivity analysis to quickly screen and predict the toxicity and baseline prediction values of unknown metals.
  • step g1 the two-parameter QSAR prediction equation of the preferred five-door eight aquatic species is sequentially obtained according to the method described in steps a-f.
  • step g1 the values of all the structural descriptors appearing in the eight equations of the metal to be predicted are collected and sorted, and substituted into the equation to calculate the acute toxicity end point of the metal to be predicted for each species.
  • step g4 the curve is fitted (formula) using a nonlinear Sigmoidal-logistic fitting equation, and the corresponding dangerous concentrations HC 5 , HC 10 and HC 20 are calculated according to the fitting equations at cumulative percentages of 0.05, 0.1 and 0.2.
  • the discriminant indicators of the goodness of curve fitting include F and P.
  • the calculation method is shown in equations (4)-(6).
  • a represents the amplitude of the fitted curve
  • x c represents the center value
  • k represents the slope of the curve.
  • step a of the present invention the acute toxicity data of large cockroaches are summarized, as shown in Table 1.
  • step b of the present invention the aquatic information of the five families of eight families is preferred, as shown in Table 2.
  • the method of the present invention is used to predict the toxicity value of metal mercury to the eight families of model organisms, and the reference reference threshold is predicted by combining the SSDs curve.
  • the toxicity prediction equations of the eight-species model organisms were respectively constructed, as shown in Table 3.
  • the equations were substituted into the equations to obtain the predicted toxicity values of each species.
  • step f of the present invention is used to calculate the applicable range of the model, and the Williams diagram is drawn.
  • the structural parameters and toxicity end points of each metal in the training set are shown in Table 5.
  • the leverage value of the two optimal structural parameters of each metal is plotted on the abscissa, and the predicted residual is plotted on the ordinate as a Williams diagram (Fig. 2a, b).
  • the space inside the three dotted lines in the figure is the applicable range of the model.
  • the calculation results show that the six metals in the training set are within the prediction range of the model.
  • step g of the present invention the QSAR-SSDs curve fitting equation of the metal mercury is obtained:

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Operations Research (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Chemical & Material Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biotechnology (AREA)

Abstract

一种基于金属定量构效关系的淡水急性基准预测方法,根据重金属离子的结构特征与水生生物急性毒性效应的定量关系预测未知金属的毒性终点,结合不同物种的敏感度分布分析推导保护不同比例的水生生物的危险浓度;是综合重金属理化结构参数和不同水生生物的致毒机理建立的QSAR金属毒性预测模型,并将其应用于预测未知基准参考值的一种方法。所述方法基于生态学原理,系统筛选五门八科水生物种作为最小生物预测集,分别构建多参数的毒性预测模型,提高模型精度和预测能力。

Description

一种基于金属定量构效关系的淡水急性基准预测方法 技术领域
本发明涉及淡水水质基准模型领域,尤其涉及一种基于金属定量构效关系的淡水急性基准预测方法。
背景技术
金属污染是本世纪最具挑战性的环境问题之一。过量的重金属进入自然环境,破坏生物多样性,给生态环境和人体健康造成有害影响。为金属制定科学的环境基准是环境保护和风险评估的基础。国际上最先开展基准研究的国家是美国,我国现有基准体系主要照搬或借鉴国外成果,缺乏科学性。在最新的基准文件中,15种金属被列入优控污染物和非优控污染物名录,但只有10种金属具有基准参考值。大部分金属的水质基准值缺失,主要原因是生物毒性数据不足,其次受到环境要素的影响,只有铜,镍等金属基准的研究较为深入。目前,通过标准化的生物毒性测试手段,是目前获得基准值的唯一途径。但是,由于重金属的种类繁多,结构和形态复杂,用于基准推导的大量毒性测试需要耗费人力,物力和财力,并且复杂生物体系中金属形态难于准确测定,因此阻碍了重金属水质基准研究的发展。虽然也有研究者采用计算手段对各种毒性终点进行预测,但真正用于毒性和水质基准预测的还未见报道。发展不依赖于试验测定的基准预测方法,更符合我国的国情,节省大量的人力、物力和财力。
定量结构活性相关(QSAR)方法采用统计分析手段寻找目标污染物的结构与生物活性间的内在联系,作为毒理机制研究的有效手段已被广泛应用于各类毒性效应的预测评价中。QSAR方法不受实验条件和测试仪器的限制,采用各种计算化学和数据挖掘技术来研究和预测污染物的生物活性,因而在面对批量污染物和多受试物种时具有尤为明显的优势,QSAR正在毒性预测与风险评价方面展现出独特魅力。众所周知,离子形态是金属最为活跃的形态,溶解态金属的生物活性与自由离子浓度密切相关。在理想体系下,研究者尝试开展金属离子的定量构效关系研究,提出定量离子特征-活性相关预测其生物活性的方法。Newman等利用海洋发光细菌(V.fischeri)的毒性实验数据建立了QSAR方程并预测了金属毒性。结果发现,第一水解常数|logKOH|与金属离子对生物体的毒性效应有很强的作用关系。Bogaerts等在评价原生动物(T.pyriformis)的毒性效应与金属离子理化特征之间的作用关系时也指出金属离子软指数σp为毒性预测方程最佳建模参数。
以上方法都是基于单一物种的单参数预测模型,缺乏对生态系统中多物种的系统性毒性 预测和分析,模型的预测能力和应用范围非常有限。
鉴于上述缺陷,本发明创作者经过长时间的研究和实践终于获得了本创作。
发明内容
本发明的目的在于提供一种基于金属定量构效关系的淡水急性基准预测方法,用以克服上述技术缺陷。
为实现上述目的,本发明提供一种基于金属定量构效关系的淡水急性基准预测方法,根据重金属离子的结构特征与水生生物急性毒性效应的定量关系预测未知金属的毒性终点,结合不同物种的敏感度分布分析推导保护不同比例的水生生物的危险浓度;
该具体过程为:
步骤a,建模毒性数据采集,筛选,运算和汇总;
步骤b,五门八科水生模式生物筛选;
步骤c,构建金属离子结构描述符数据集,通过各金属对应的结构参数为自变量进行线性相关性分析,通过相关系数排序,获得排在前两位的结构描述符;
步骤d,构建毒性预测模型及稳健性检验;建立多元回归方程,对参数进行估计,采用F统计量对应的P值进行检验;
步骤e,QSAR模型的内部验证;
步骤f,模型适用范围计算;经过校验的模型,以杠杆值h为横坐标,以各数据点的标准残差为纵坐标,绘制Williams图;
步骤g,采用获得的毒性预测值和物种敏感度分析对未知金属的毒性和基准预测值进行快速筛选与预测。
进一步地,在上述步骤c中,以单物种的毒性终点为因变量,各金属对应的结构参数为自变量进行线性相关性分析,根据下述公式(1)计算相关系数r;
Figure PCTCN2015080631-appb-000001
式中,
Figure PCTCN2015080631-appb-000002
分别表示各结构参数和毒性值的平均值,xi和yi分别表示第i种金属对应的结构参数和毒性值;
相关系数r>0.8为显著相关参数。
进一步地,在上述步骤c中,构建金属离子结构描述符集合,包括软指数σp、最大配合 物稳定常数log-βn、鲍林(Pauling)电负性Xm、共价指数Xm 2r、原子电离势AN/ΔIP、第一水解常数|logKOH|、电化学势ΔE0、原子大小AR/AW、极化力参数Z/r、Z/r2、Z2/r、似极化力参数Z/AR、Z/AR2
进一步地,上述步骤d的过程为:
步骤d1,多元回归方程的构建与参数估计;
以上述步骤d中确定的两最佳结构参数为自变量X,金属活性值为因变量Y,利用多元线性回归分析方法构建各模式生物的QICAR方程Y=XB+E,请参阅下述公式(2),其中:
Figure PCTCN2015080631-appb-000003
n为观测值个数。
采用最小二乘法对方程中参数进行估计,X′为X的转置矩阵:
Figure PCTCN2015080631-appb-000004
步骤d2,拟合优度检验和回归方程的显著性检验,采用F检验;
模型的拟合优度检验指标为:相关系数的平R2和自由度校正的相关系数
Figure PCTCN2015080631-appb-000005
标准偏差RMSE;
F检验的指标为多因子方差分析(Multi-ANOVA)计算得到的F值和相关概率p(Significance F);采用F统计量对应的P值进行检验;
步骤d3,判别标准:根据毒性数据获取途径,体外实验R2≥0.81,体内试验R2≥0.64;显著水平为α,当p<α时,回归方程显著。
进一步地,上述步骤d3按照下述公式计算,
Figure PCTCN2015080631-appb-000006
Figure PCTCN2015080631-appb-000007
Figure PCTCN2015080631-appb-000008
Figure PCTCN2015080631-appb-000009
式中,R2表示相关系数的平方,R2表示自由度校正的相关系数,RMSE表示标准偏差。
进一步地,上述步骤e的具体过程为:
步骤e1,在给定的建模样本中,选取大部分样本进行建模型,留小部分样本用建立的模型进行预测,并计算这小部分样本的预测误差;
步骤e2,记录每个方程中预测误差的平方加和,直到所有的样本都被预报了一次而且仅被预报一次;
步骤e3,计算交叉验证相关系数Q2 cv和交叉验证均方根误差RMSECV,判别依据:Q2 cv>0.6,R2-Q2 cv≤0.3。
进一步地,上述步骤e3采用的计算公式为:
Figure PCTCN2015080631-appb-000010
Figure PCTCN2015080631-appb-000011
式中,
Figure PCTCN2015080631-appb-000012
表示第i个化合物毒性的实测值,
Figure PCTCN2015080631-appb-000013
代表第i个化合物毒性的预测值,
Figure PCTCN2015080631-appb-000014
代表训练集毒性的平均值,n表示训练集中化合物数。
进一步地,在上述步骤f中,杠杆值hi的计算公式为:
hi=xi T(XTX)-1xi   (9)
式中,xi代表第i个金属的结构参数组成的列向量;对于双参数模型,
Figure PCTCN2015080631-appb-000015
Figure PCTCN2015080631-appb-000016
XT表示矩阵X的转置矩阵,(XTX)-1表示对XTX矩阵的逆矩阵。
进一步地,在上述步骤f中,临界值h*的计算公式为:
Figure PCTCN2015080631-appb-000017
式中,p代表模型中变量数,双参数模型中p=2,n代表模型训练集化合物的数量,根据步骤a-e校验过后各QSAR方程中训练集金属个数决定;
在Williams图中h<h*的坐标空间为模型的适用范围。
进一步地,上述步骤g的具体过程为:
步骤g1,按照上述步骤a-f所述的方法,依次获得优选八科水生生物的双参数QSAR预测方程;
步骤g1,搜集并整理待预测金属在八个方程中出现的所有结构描述符的值,代入方程计算待预测金属对各物种的急性毒性终点;
步骤g3,每种金属对应的各物种金属毒性数据由低到高排序后,以累积百分率为纵坐标构建物种敏感度分布图;
步骤g4,采用非线性Sigmoidal-Logistic拟合方程对曲线进行拟合,根据拟合方程计算累积百分率为0.05,0.1和0.2时对应的危险浓度HC5,HC10和HC20
与现有技术相比本发明的有益效果为:1、现有技术只对单一物种的毒性终点进行预测,模型预测不够准确,预测误差在两个数量级左右。本发明基于生态学原理,系统筛选五门八科水生物种作为最小生物预测集,分别构建多参数的毒性预测模型,提高模型精度和预测能力。
2.QSAR模型与SSD分析结合预测基准最大浓度(CMCs)
现有技术通过实验测试手段获得毒性终点值,再进行物种敏感度分析进而推导基准值。本专利通过QSAR模型方法预测多种金属的毒性值,快速、简单,依赖较少的试验测试数据完成多种毒性数据缺乏的金属的基准预测。
附图说明
图1为本发明的基于金属定量构效关系的淡水急性基准预测方法的流程示意图;
图2a为本发明模型适用范围评价的Williams图一;
图2b为本发明模型适用范围评价的Williams图一;
图3为本发明的汞毒性预测值的物种敏感度分布曲线。
具体实施方式
以下结合附图,对本发明上述的和另外的技术特征和优点作更详细的说明。
本发明的原理是根据重金属离子的结构特征与水生生物急性毒性效应的定量关系预测未知金属的毒性终点,结合不同物种的敏感度分布分析推导保护5%、10%和20%水生生物的危险浓度。是综合重金属理化结构参数和不同水生生物的致毒机理建立QSAR金属毒性预测模型,并将其应用于预测未知基准参考值的一种方法。
请参阅图1所示,其为本发明基于金属定量构效关系的淡水急性基准预测方法的流程示意图,该具体过程为:
步骤a,建模毒性数据采集,筛选,运算和汇总;
步骤a1,数据采集过程;
步骤a2,数据筛选过程;数据筛选满足的条件为:
1)每个物种的急性毒性数据须来自同一试验来源,同一研究组和相同试验条件;
2)每个物种包含至少6种金属的毒性数据;
3)毒性终点数据类型包括致死率,生长率和繁殖率,表现为EC50或LC50
4)毒性测试必须在一定范围的环境条件下以标准的操作流程进行;
5)生物测试暴露时间48~96小时。
步骤a3,数据运算过程;在本发明实施例中的运算方法为:
以金属自由离子浓度为数据的衡量指标,如单位为质量浓度除以分子量统一转化为摩尔浓度,即mol/L。
步骤a4,数据汇总过程:
最终得到的数据集包括金属化合物分子式,受试生物类型,毒性效应类型,终点指标,试验条件,暴露时间,数据来源。
详细的毒性数据获取过程如下:
建模的急性毒性数据优先采集自美国环保局ECOTOX毒性数据库(http://cfpub.epa.gov/ecotox/)。如果毒性数据不足,以近30年SCI科学引文索引查询的有效数据(ISI Web of Knowledge)作为补充。通过数据库和文献检索引擎,输入金属名称、待测物种名称和急性毒性等关键词,导出满足条件的毒性数据集。在满足步骤a2条件的前提下,筛选出合格的毒性数据。以金属自由离子浓度为数据的衡量指标,如果原始数据以离子化合物质量为毒性终点指标。需除以分子量统一转化为微摩尔浓度,即μmol/L。在数据汇编过程中,记录金属原子或分子式,原子或分子量,终点指标,受试生物类型,试验条件,毒性效应类型,暴露时间,数据来源等信息,整理成Excel表格作为建模依据。
步骤b,五门八科水生模式生物筛选;
急性模式生物以美国环保局推荐的推导水质基准的三门八科生物为基础,筛选出淡水中 对重金属敏感的五门八科模式生物,包括浮游甲壳类节肢动物3种,脊索动物2种,软体动物,轮虫和浮萍各1种。对于每一类模式生物,对应的毒性数据需严格按照数据采集和筛选的要求,依次汇总各物种急性毒性数据。如果满足要求的物种数超过最少物种数要求,选择受试金属种类丰富的进行建模。例如,通过数据收集,浮游甲壳类生物中满足条件的物种类型有五种,按照受试金属元素的数量进行排序,选取前三种作为模式生物。在进行模式生物筛选后,确定八种生物的科学命名,所属门和科。
步骤c,构建金属离子结构描述符数据集;
构建金属离子结构描述符集合,包括软指数σp、最大配合物稳定常数log-βn、鲍林(Pauling)电负性Xm、共价指数Xm 2r、原子电离势AN/ΔIP、第一水解常数|logKOH|、电化学势ΔE0、原子大小AR/AW、极化力参数Z/r、Z/r2、Z2/r、似极化力参数Z/AR、Z/AR2
步骤c1,以单物种的毒性终点为因变量,各金属对应的结构参数为自变量进行线性相关性分析,根据下述公式(1)计算皮尔逊相关系数r;
Figure PCTCN2015080631-appb-000018
式中,xi和yi分别表示第i种金属对应的结构参数和实测毒性值,
Figure PCTCN2015080631-appb-000019
分别表示各结构参数和实测毒性值的平均值。相关系数r>0.8为显著相关参数。采用皮尔逊相关可以简便,客观的度量两个因子之间的关联程度。
步骤c2,在显著相关的前提下,通过相关系数排序,获得排在前两位的结构描述符。该步骤中通过相关系数r,筛选出与毒性显著相关的结构参数,避免了伪相关参数引入模型。
步骤d,构建毒性预测模型及稳健性检验;
步骤d1,多元回归方程的构建与参数估计;
以上述步骤d中确定的两最佳结构参数为自变量X,金属活性值为因变量Y,利用多元线性回归分析方法构建各模式生物的QICAR方程Y=XB+E,请参阅下述公式(2),其中:
Figure PCTCN2015080631-appb-000020
n为观测值个数;B代表未知参数,是方程中需要通过最小二乘法进行估计的;E代表随机误差项,反映了除x1,x2对y的线性关系之外的随机因素对y的影响。与一元线性回归相比,方程(2)采用多元线性回归建立了两种不同结构参数与金属毒性值的关系,完整、准确地表 达预测对象与相关因素的关系。
采用最小二乘法对方程中参数进行估计,X′为X的转置矩阵:
Figure PCTCN2015080631-appb-000021
最小二乘回归是从误差拟合角度对回归模型进行参数估计,是一种标准的多元建模工具,尤其适用于预测分析。
步骤d2,拟合优度检验和回归方程的显著性检验(F检验);
模型的拟合优度检验指标为:相关系数的平方(R2)和自由度校正的相关系数
Figure PCTCN2015080631-appb-000022
标准偏差(RMSE)。F检验的指标为多因子方差分析(Multi-ANOVA)计算得到的F值和相关概率p(Significance F)。通常采用F统计量对应的P值进行检验。
步骤d3,判别标准:根据毒性数据获取途径,体外实验R2≥0.81,体内试验R2≥0.64。显著水平为α,当p<α时,回归方程显著。
Figure PCTCN2015080631-appb-000023
Figure PCTCN2015080631-appb-000024
Figure PCTCN2015080631-appb-000025
Figure PCTCN2015080631-appb-000026
式中,yi表示第i种金属实测的毒性值,
Figure PCTCN2015080631-appb-000027
表示第i种金属预测的毒性值,
Figure PCTCN2015080631-appb-000028
表示各毒性值的平均值,n为训练集中金属的个数。
方程(4)、(5)的相关系数和标准偏差可以度量回归直线的拟合优度;方程(6)是检验因变量与多个自变量的线性关系是否显著的通用方法。
步骤e,QSAR模型的内部验证;
每个物种的QSAR模型还应采用抽一法进行验证,方法的核心思想是随机从训练集中抽 出一个数据,用其他的毒性数据和步骤c获得的最佳结构描述符建立多元回归模型,根据抽出数据的预测值与实验值的比较,来校验所建立的网络模型。为了减少交叉验证结果的可变性,对一个样本数据集进行多次不同的划分,得到不同的互补子集,进行多次交叉验证。本步骤中,取多次验证的平均值作为验证结果。
此内部验证方法的优势在于用几乎所有的样本来训练模型,最接近样本,这样评估所得的结果比较可靠;实验没有随机因素,整个过程是可重复的。
具体步骤如下:
步骤e1,在给定的建模样本中,选取大部分样本进行建模型,留小部分样本用建立的模型进行预测,并计算这小部分样本的预测误差;
步骤e2,记录每个方程中预测误差的平方加和,直到所有的样本都被预报了一次而且仅被预报一次;
步骤e3,计算交叉验证相关系数Q2 cv和交叉验证均方根误差RMSECV,计算公式如下所述;判别依据:Q2 cv>0.6,R2-Q2 cv≤0.3;
Figure PCTCN2015080631-appb-000029
Figure PCTCN2015080631-appb-000030
式中,
Figure PCTCN2015080631-appb-000031
表示第i个化合物毒性的实测值,
Figure PCTCN2015080631-appb-000032
代表第i个化合物毒性的预测值,
Figure PCTCN2015080631-appb-000033
代表训练集毒性的平均值,n表示训练集中化合物数。
方程(7),(8)是抽一法内部验证的指示参数,可有效降低模型对训练集数据的过拟合,测定训练集中有无特定金属对模型稳健性的影响。
步骤f,模型适用范围计算;
经过校验的模型,采用杠杆值法计算模型的适用范围,以Williams图直观表示。此方法可保证模型在预测过程中具有最佳的可靠性。
杠杆值hi的计算公式为:
hi=xi T(XTX)-1xi   (9)
式中,xi代表第i个金属的结构参数组成的列向量;对于双参数模型,
Figure PCTCN2015080631-appb-000034
Figure PCTCN2015080631-appb-000035
XT表示矩阵X的转置矩阵,(XTX)-1表示对XTX矩阵的逆矩阵。
临界值h*的计算公式为:
Figure PCTCN2015080631-appb-000036
式中,p代表模型中变量数,双参数模型中p=2,n代表模型训练集化合物的数量,根据步骤a-e校验过后各QSAR方程中训练集金属个数决定。
以杠杆值h为横坐标,以各数据点的标准残差为纵坐标,绘制Williams图。在图中h<h*的坐标空间为模型的适用范围。
步骤g,采用获得的毒性预测值和物种敏感度分析对未知金属的毒性和基准预测值进行快速筛选与预测。
步骤g1,按照步骤a-f所述的方法,依次获得优选五门八科水生生物的双参数QSAR预测方程。
步骤g1,搜集并整理待预测金属在八个方程中出现的所有结构描述符的值,代入方程计算待预测金属对各物种的急性毒性终点。
步骤g3,每种金属对应的各物种金属毒性数据由低(最敏感物种)到高(最不敏感物种)排序后,以累积百分率为纵坐标(P=(R-0.5)/N,R物种序号,N物种数)构建物种敏感度分布图。
步骤g4,采用非线性Sigmoidal-logistic拟合方程对曲线进行拟合(公式),根据拟合方程计算累积百分率为0.05,0.1和0.2时对应的危险浓度HC5,HC10和HC20
曲线拟合优度的判别指标包括
Figure PCTCN2015080631-appb-000037
F和P。计算方法见方程(4)-(6)。
Figure PCTCN2015080631-appb-000038
式中,a代表拟合曲线的振幅,xc代表中心值,k代表曲线斜率。大量研究证实,非线性Sigmoidal-Logistic拟合模型对物种敏感度曲线的拟合效果最佳。故本发明采用此方法作为推导金属基准预测值的方法。
以下通过实施例结合附图对本发明进一步说明。
实施例1:
采用本发明步骤a所述的方法,对大型蚤的急性毒性数据进行汇总,如表1所示
表1.急性毒性数据筛选,运算和汇总范例
Figure PCTCN2015080631-appb-000039
实施例2:
采用本发明步骤b所述的方法,优选五门八科水生生物信息,如表2所示
表2 优选急慢性模式生物
Figure PCTCN2015080631-appb-000040
实施例3:
采用本发明方法预测金属汞对八科模式生物的毒性值,结合SSDs曲线预测基准参考阈值。
按照步骤a-d所述的方法,分别构建八科模式生物的毒性预测方程,如表3所示。计算汞的最优结构参数σp=0.065,log-βn=21.7,Xm 2r=4.08,AN/ΔIP=9.62,Z/r=1.96,|logKOH|=3.4,ΔE0=0.91。依次代入方程获得各物种的毒性预测值。
表3 八科模式生物的QSAR毒性预测方程
Figure PCTCN2015080631-appb-000041
Figure PCTCN2015080631-appb-000042
实施例4:
采用本发明步骤e所述的方法对模型进行内部验证。以大型蚤的急性毒性预测方程log-EC50=(-0.272±18.674)σp+(-0.360±0.136)log-βn+(6.604±4.093)为例,对模型进行抽一法内部验证,相关拟合参数见表4。根据步骤e中的公式(7)和(8),计算Q2 cv=0.63,RMSECV=1.139,R2-Q2 cv=0.239。满足模型稳健性判别依据Q2 cv>0.6,R2-Q2 cv≤0.3,该模型通过内部验证。
表4 模型内部验证抽一法相关参数
Figure PCTCN2015080631-appb-000043
实施例5:
采用本发明步骤f所述的方法计算模型适用范围,绘制Williams图。以鲤鱼的急性毒性预测方程log LC50=(33.439±6.256)σp+(0.412±0.137)Z/r+(-3.159±0.559)为例,训练集各金属的结构参数和毒性终点为如表5所示。临界值h*=3*(2+1)/6=1.5。
表5 鲤鱼的急性毒性预测方程适用范围的计算
Figure PCTCN2015080631-appb-000044
Figure PCTCN2015080631-appb-000045
以各金属的两最优结构参数的杠杆值为横坐标,预测残差为纵坐标绘制Williams图(图2a,b)。图中三条虚线内部的空间为模型的适用范围,计算结果显示训练集的6种金属在模型的预测范围之内。
实施例5:
根据本发明步骤g所述,获得金属汞的QSAR-SSDs曲线拟合方程:
Figure PCTCN2015080631-appb-000046
评价拟合优度的各参数分别为:Adj.r2=0.9594,RSS=0.019,F=231.176,P=1.18×10-5。根据SSDs曲线(如图3),当y等于0.05,0.10和0.20时,对应的logHC5,logHC10和logHC20的值为-1.6352,-1.4022,-1.1658。美国环保局1985年发布的水质基准指南中,基于实验室测定推导出汞的危害浓度为-1.8560,预测误差为0.119。
上述详细说明是针对本发明其中之一可行实施例的具体说明,该实施例并非用以限制本发明的专利范围,凡未脱离本发明所为的等效实施或变更,均应包含于本发明技术方案的范围内。

Claims (10)

  1. 一种基于金属定量构效关系的淡水急性基准预测方法,其特征在于,根据重金属离子的结构特征与水生生物急性毒性效应的定量关系预测未知金属的毒性终点,结合不同物种的敏感度分布分析推导保护不同比例的水生生物的危险浓度;
    该具体过程为:
    步骤a,建模毒性数据采集,筛选,运算和汇总;
    步骤b,五门八科水生模式生物筛选;
    步骤c,构建金属离子结构描述符数据集,通过各金属对应的结构参数为自变量进行线性相关性分析,通过相关系数排序,获得排在前两位的结构描述符;
    步骤d,构建毒性预测模型及稳健性检验;建立多元回归方程,对参数进行估计,采用F统计量对应的P值进行检验;
    步骤e,QSAR模型的内部验证;
    步骤f,模型适用范围计算;经过校验的模型,以杠杆值h为横坐标,以各数据点的标准残差为纵坐标,绘制Williams图;
    步骤g,采用获得的毒性预测值和物种敏感度分析对未知金属的毒性和基准预测值进行快速筛选与预测。
  2. 根据权利要求1所述的基于金属定量构效关系的淡水急性基准预测方法,其特征在于,在上述步骤c中,以单物种的毒性终点为因变量,各金属对应的结构参数为自变量进行线性相关性分析,根据下述公式(1)计算相关系数r;
    Figure PCTCN2015080631-appb-100001
    式中,
    Figure PCTCN2015080631-appb-100002
    分别表示各结构参数和毒性值的平均值,xi和yi分别表示第i种金属对应的结构参数和毒性值;
    相关系数r>0.8为显著相关参数。
  3. 根据权利要求2所述的基于金属定量构效关系的淡水急性基准预测方法,其特征在于,在上述步骤c中,构建金属离子结构描述符集合,包括软指数σp、最大配合物稳定常数log-βn、鲍林(Pauling)电负性Xm、共价指数Xm 2r、原子电离势AN/ΔIP、第一水解常数|logKOH|、电化学势ΔE0、原子大小AR/AW、极化力参数Z/r、Z/r2、Z2/r、似极化力参数Z/AR、Z/AR2
  4. 根据权利要求2所述的基于金属定量构效关系的淡水急性基准预测方法,其特征在于,上述步骤d的过程为:
    步骤d1,多元回归方程的构建与参数估计;
    以上述步骤d中确定的两最佳结构参数为自变量X,金属活性值为因变量Y,利用多元线性回归分析方法构建各模式生物的QICAR方程Y=XB+E,请参阅下述公式(2),其中:
    Figure PCTCN2015080631-appb-100003
    n为观测值个数;
    采用最小二乘法对方程中参数进行估计,X′为X的转置矩阵:
    Figure PCTCN2015080631-appb-100004
    步骤d2,拟合优度检验和回归方程的显著性检验,采用F检验;
    模型的拟合优度检验指标为:相关系数的平R2和自由度校正的相关系数
    Figure PCTCN2015080631-appb-100005
    标准偏差RMSE;
    F检验的指标为多因子方差分析(Multi-ANOVA)计算得到的F值和相关概率p(Significance F);采用F统计量对应的P值进行检验;
    步骤d3,判别标准:根据毒性数据获取途径,体外实验R2≥0.81,体内试验R2≥0.64;显著水平为α,当p<α时,回归方程显著。
  5. 根据权利要求4所述的基于金属定量构效关系的淡水急性基准预测方法,其特征在于,上述步骤d3按照下述公式计算,
    Figure PCTCN2015080631-appb-100006
    Figure PCTCN2015080631-appb-100007
    Figure PCTCN2015080631-appb-100008
    Figure PCTCN2015080631-appb-100009
    式中,R2表示相关系数的平方,R2表示自由度校正的相关系数,RMSE表示标准偏差。
  6. 根据权利要求1所述的基于金属定量构效关系的淡水急性基准预测方法,其特征在于,上述步骤e的具体过程为:
    步骤e1,在给定的建模样本中,选取大部分样本进行建模型,留小部分样本用建立的模型进行预测,并计算这小部分样本的预测误差;
    步骤e2,记录每个方程中预测误差的平方加和,直到所有的样本都被预报了一次而且仅被预报一次;
    步骤e3,计算交叉验证相关系数Q2 cv和交叉验证均方根误差RMSECV,判别依据:Q2 cv>0.6,R2-Q2 cv≤0.3。
  7. 根据权利要求6所述的基于金属定量构效关系的淡水急性基准预测方法,其特征在于,上述步骤e3采用的计算公式为:
    Figure PCTCN2015080631-appb-100010
    Figure PCTCN2015080631-appb-100011
    式中,
    Figure PCTCN2015080631-appb-100012
    表示第i个化合物毒性的实测值,
    Figure PCTCN2015080631-appb-100013
    代表第i个化合物毒性的预测值,
    Figure PCTCN2015080631-appb-100014
    代表训练集毒性的平均值,n表示训练集中化合物数。
  8. 根据权利要求1所述的基于金属定量构效关系的淡水急性基准预测方法,其特征在于,在上述步骤f中,杠杆值hi的计算公式为:
    hi=xi T(XTX)-1xi   (9)
    式中,xi代表第i个金属的结构参数组成的列向量;对于双参数模型,
    Figure PCTCN2015080631-appb-100015
    Figure PCTCN2015080631-appb-100016
    XT表示矩阵X的转置矩阵,(XTX)-1表示对XTX矩阵的逆矩阵。
  9. 根据权利要求8所述的基于金属定量构效关系的淡水急性基准预测方法,其特征在于,在上述步骤f中,临界值h*的计算公式为:
    Figure PCTCN2015080631-appb-100017
    式中,p代表模型中变量数,双参数模型中p=2,n代表模型训练集化合物的数量,根据步骤a-e校验过后各QSAR方程中训练集金属个数决定;
    在Williams图中h<h*的坐标空间为模型的适用范围。
  10. 根据权利要求1所述的基于金属定量构效关系的淡水急性基准预测方法,其特征在于,上述步骤g的具体过程为:
    步骤g1,按照上述步骤a-f所述的方法,依次获得优选五门八科水生生物的双参数QSAR预测方程;
    步骤g1,搜集并整理待预测金属在八个方程中出现的所有结构描述符的值,代入方程计算待预测金属对各物种的急性毒性终点;
    步骤g3,每种金属对应的各物种金属毒性数据由低到高排序后,以累积百分率为纵坐标构建物种敏感度分布图;
    步骤g4,采用非线性Sigmoidal-Logistic拟合方程对曲线进行拟合,根据拟合方程计算累积百分率为0.05,0.1和0.2时对应的危险浓度HC5,HC10和HC20
PCT/CN2015/080631 2015-05-13 2015-06-03 一种基于金属定量构效关系的淡水急性基准预测方法 WO2016179864A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/659,608 US10650914B2 (en) 2015-05-13 2017-07-25 Fresh water acute criteria prediction method based on quantitative structure-activity relationship for metals

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510240546.9A CN104820873B (zh) 2015-05-13 2015-05-13 一种基于金属定量构效关系的淡水急性基准预测方法
CN201510240546.9 2015-05-13

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/659,608 Continuation US10650914B2 (en) 2015-05-13 2017-07-25 Fresh water acute criteria prediction method based on quantitative structure-activity relationship for metals

Publications (1)

Publication Number Publication Date
WO2016179864A1 true WO2016179864A1 (zh) 2016-11-17

Family

ID=53731159

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/080631 WO2016179864A1 (zh) 2015-05-13 2015-06-03 一种基于金属定量构效关系的淡水急性基准预测方法

Country Status (3)

Country Link
US (1) US10650914B2 (zh)
CN (1) CN104820873B (zh)
WO (1) WO2016179864A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110378071A (zh) * 2019-08-05 2019-10-25 西安建筑科技大学 一种基于生态安全性控制的浅池处理单元设计方法
CN113707225A (zh) * 2020-05-21 2021-11-26 中国科学院过程工程研究所 一种基于金属离子形态预测溶液中金属分离能力的方法及其应用
CN113793651A (zh) * 2021-08-20 2021-12-14 大连民族大学 一种提高污染物qsar模型预测毒性效应终点值准确度的方法
CN116402225A (zh) * 2023-04-13 2023-07-07 西南石油大学 一种致密砂岩气藏产气量预测方法

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915563B (zh) * 2015-06-16 2018-06-08 中国环境科学研究院 基于金属定量构效关系的淡水慢性基准预测方法
CN105069315A (zh) * 2015-08-26 2015-11-18 中国环境科学研究院 基于金属形态和有效性的水生生物毒性预测方法
CN105447248B (zh) * 2015-11-24 2019-03-19 中国环境科学研究院 基于金属定量构效关系的海水急性基准预测方法
US10366779B2 (en) 2015-12-30 2019-07-30 International Business Machines Corporation Scheme of new materials
CN105738591B (zh) * 2016-02-24 2017-05-31 中国环境科学研究院 硒的水质基准推导方法及水质安全评价方法
CN106556683A (zh) * 2016-11-23 2017-04-05 中国环境科学研究院 一种水生生物基准值测定方法
CN108805343A (zh) * 2018-05-29 2018-11-13 祝恩元 一种基于多元线性回归的科技服务业发展水平预测方法
CN109447341B (zh) * 2018-10-24 2020-10-09 中国环境科学研究院 一种估算目标区域水生生物基准值的方法
CN110222302B (zh) * 2019-06-06 2023-05-26 哈尔滨锅炉厂有限责任公司 一种利用百叶窗叶片参数预测浓淡分离特性的计算方法
CN110577253B (zh) * 2019-08-21 2021-10-22 河北大学 一种二维材料MXene去除污水中重金属阴离子基团性能的预测方法
KR102466620B1 (ko) 2019-12-19 2022-11-14 한국생산기술연구원 생물종 별 민감도 차이를 적용한 화학물질의 독성 예측 시스템 및 독성 예측 방법
CN111239083A (zh) * 2020-02-26 2020-06-05 东莞市晶博光电有限公司 一种手机玻璃油墨红外线透过率测试设备及相关性算法
CN111554358A (zh) * 2020-04-22 2020-08-18 中国人民大学 一种重金属毒性终点和海洋水质基准阈值的预测方法
CN111524559B (zh) * 2020-04-23 2023-07-07 浙江省农业科学院 一种化学物对生物的最大无作用浓度的分析方法
CN112253102B (zh) * 2020-11-05 2023-09-26 中国石油天然气股份有限公司 油井套管放气压力的确定方法和装置
CN112487640B (zh) * 2020-11-27 2023-02-14 交通运输部天津水运工程科学研究所 一种内河航道整治工程生态影响模拟预测方法
CN112750508A (zh) * 2021-01-15 2021-05-04 首都师范大学 土壤金属毒性预测方法、装置、电子设备及存储介质
CN113268918B (zh) * 2021-05-10 2023-04-07 云南省农业科学院农业环境资源研究所 一种预测浅层地下水中氮浓度的方法
CN117423387B (zh) * 2023-12-18 2024-03-08 中国科学院水生生物研究所 基于数字驱动的水生生物群落时空差异的评估方法及系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050085626A1 (en) * 2002-02-15 2005-04-21 Mount Sinai Hospital Polo domain structure
CN103577714A (zh) * 2013-11-17 2014-02-12 桂林理工大学 一种定量预测环境复合污染物联合毒性的方法
CN103646180A (zh) * 2013-12-19 2014-03-19 山东大学 一种通过量子化学方法构建定量构效关系模型来预测有机化合物急性毒性的方法

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101329699B (zh) * 2008-07-31 2011-01-26 四川大学 基于支持向量机的药物分子药代动力学性质和毒性预测方法
US20110098933A1 (en) * 2009-10-26 2011-04-28 Nellcor Puritan Bennett Ireland Systems And Methods For Processing Oximetry Signals Using Least Median Squares Techniques
CN103814273B (zh) * 2011-06-14 2016-10-12 南方创新国际股份有限公司 用于识别在检测器输出数据中脉冲的方法和设备
FR2979705B1 (fr) * 2011-09-05 2014-05-09 Commissariat Energie Atomique Procede et dispositif d'estimation d'un parametre de masse moleculaire dans un echantillon
WO2013040315A1 (en) * 2011-09-16 2013-03-21 Sentient Corporation Method and system for predicting surface contact fatigue life
AU2012388949B2 (en) * 2012-08-31 2016-11-10 Toshiba Mitsubishi-Electric Industrial Systems Corporation Material structure prediction apparatus, product manufacturing method and material structure prediction method
RU2604565C2 (ru) * 2012-09-12 2016-12-10 Бп Эксплорейшн Оперейтинг Компани Лимитед Система и способ для определения количества удерживаемого углеводородного флюида
CN104822844B (zh) * 2012-10-01 2019-05-07 米伦纽姆医药公司 预测对抑制剂的反应的生物标记物和方法以及其用途
US20140134745A1 (en) * 2012-11-15 2014-05-15 Vertichem Corporation Method for evaluation of lignin
US20150220488A1 (en) * 2014-02-05 2015-08-06 The Government Of The Us, As Represented By The Secretary Of The Navy System and method for interferometrically tracking objects using a low-antenna-count antenna array

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050085626A1 (en) * 2002-02-15 2005-04-21 Mount Sinai Hospital Polo domain structure
CN103577714A (zh) * 2013-11-17 2014-02-12 桂林理工大学 一种定量预测环境复合污染物联合毒性的方法
CN103646180A (zh) * 2013-12-19 2014-03-19 山东大学 一种通过量子化学方法构建定量构效关系模型来预测有机化合物急性毒性的方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHENG, CHEN ET AL.: "Derivation of Marine Water Quality Criteria for Metals Based on a Novel QICAR-SSD Model", ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, vol. 22, no. 6, 31 March 2015 (2015-03-31), XP035459744 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110378071A (zh) * 2019-08-05 2019-10-25 西安建筑科技大学 一种基于生态安全性控制的浅池处理单元设计方法
CN110378071B (zh) * 2019-08-05 2022-09-30 西安建筑科技大学 一种基于生态安全性控制的浅池处理单元设计方法
CN113707225A (zh) * 2020-05-21 2021-11-26 中国科学院过程工程研究所 一种基于金属离子形态预测溶液中金属分离能力的方法及其应用
CN113707225B (zh) * 2020-05-21 2024-03-15 中国科学院过程工程研究所 一种基于金属离子形态预测溶液中金属分离能力的方法及其应用
CN113793651A (zh) * 2021-08-20 2021-12-14 大连民族大学 一种提高污染物qsar模型预测毒性效应终点值准确度的方法
CN113793651B (zh) * 2021-08-20 2023-11-07 大连民族大学 一种提高污染物qsar模型预测毒性效应终点值准确度的方法
CN116402225A (zh) * 2023-04-13 2023-07-07 西南石油大学 一种致密砂岩气藏产气量预测方法
CN116402225B (zh) * 2023-04-13 2024-05-14 西南石油大学 一种致密砂岩气藏产气量预测方法

Also Published As

Publication number Publication date
CN104820873A (zh) 2015-08-05
US10650914B2 (en) 2020-05-12
US20170323085A1 (en) 2017-11-09
CN104820873B (zh) 2017-12-26

Similar Documents

Publication Publication Date Title
WO2016179864A1 (zh) 一种基于金属定量构效关系的淡水急性基准预测方法
Arhonditsis et al. Exploring ecological patterns with structural equation modeling and Bayesian analysis
Leonardsson et al. Theoretical and practical aspects on benthic quality assessment according to the EU-Water Framework Directive–examples from Swedish waters
CN105447248B (zh) 基于金属定量构效关系的海水急性基准预测方法
Gad et al. Combining water quality indices and multivariate modeling to assess surface water quality in the Northern Nile Delta, Egypt
CN110308255B (zh) 一种基于污染指示菌群对近海水体污染程度定量预测方法
Yucel et al. Impact of non-normal random effects on inference by multiple imputation: A simulation assessment
CN105069315A (zh) 基于金属形态和有效性的水生生物毒性预测方法
CN111554358A (zh) 一种重金属毒性终点和海洋水质基准阈值的预测方法
Rooney et al. Development and testing of an index of biotic integrity based on submersed and floating vegetation and its application to assess reclamation wetlands in Alberta’s oil sands area, Canada
CN104915563B (zh) 基于金属定量构效关系的淡水慢性基准预测方法
Jiang et al. Are non-loricate ciliates a primary contributor to ecological pattern of planktonic ciliate communities? A case study in Jiaozhou Bay, northern China
Yuan et al. Using national-scale data to develop nutrient–microcystin relationships that guide management decisions
CN107679756B (zh) 土壤适宜性评价方法及装置
Wunder et al. Analysis and design for isotope-based studies of migratory animals
CN114217025B (zh) 评估空气质量浓度预测中气象数据对其影响的分析方法
Freeman et al. Nutrient criteria for lakes, ponds, and reservoirs: a Bayesian TREED model approach
Wahlin et al. Uncertainty in water quality data and its implications for trend detection: lessons from Swedish environmental data
Lindegarth et al. Uncertainty of biological indicators for the WFD in Swedish water bodies: current procedures and a proposed framework for the future
Kpidi et al. Monitoring and Modeling of Chlorophyll-a Dynamics in a Eutrophic Lake: M'koa Lake (Jacqueville, Ivory Coast)
CN103530515A (zh) 底栖生物完整性评价指数结构方程模型的构建方法
Nassar et al. Robustness of indoor aquatic mesocosm experimentations and data reusability to assess the environmental risks of nanomaterials
Bertoni et al. A non-deterministic approach to forecasting the trophic evolution of lakes
CN112561290A (zh) 一种基于pser模型的北方河流湿地生态数据采集和处理方法
Çelik Predicting chlorophyll-concentrations in two temperate reservoirs with different trophic states using Principal Component Regression (PCR)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15891567

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15891567

Country of ref document: EP

Kind code of ref document: A1