CN110146695B

CN110146695B - Method for screening human transthyretin interferent by adopting k nearest neighbor algorithm

Info

Publication number: CN110146695B
Application number: CN201910378233.8A
Authority: CN
Inventors: 杨先海; 刘会会
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2019-05-08
Filing date: 2019-05-08
Publication date: 2021-12-10
Anticipated expiration: 2039-05-08
Also published as: CN110146695A

Abstract

The invention discloses a method for screening human transthyretin interferent by adopting a k-nearest neighbor algorithm. The method comprises the steps of firstly, calculating a quantitative descriptor based on morphological correction aiming at ionizable organic chemicals, and then constructing a binary classification model and a quantitative prediction model by adopting the quantitative descriptor based on morphological correction, a functional group, a molecular fragment descriptor and a k-nearest neighbor algorithm; when screening target organic chemicals, firstly, dividing the target organic chemicals into active and inactive organic chemicals based on a binary classification model; then predicting interference effect data of active organic chemicals by using a quantitative model; and finally, judging whether the target organic chemical is a potential human transthyretin interferon or not according to the predicted effect value. The descriptor mechanism is clear and easy to calculate, the prediction method is easy to program, the prediction model has good goodness of fit, robustness and prediction capability, and the screening method has good expandability and is suitable for screening potential human transthyretin interferents in the application domain.

Description

Method for screening human transthyretin interferent by adopting k nearest neighbor algorithm

Technical Field

The invention relates to a method for screening human transthyretin interferent by adopting a k-nearest neighbor algorithm, belonging to the technical field of endocrine interferent screening strategies.

Background

Endocrine disrupting effects caused by environmental Endocrine Disruptors (EDCs) seriously threaten the safety of people and wild animals, and are becoming global environmental problems for human beings. In management, how to effectively identify and evaluate potential EDCs from commercial chemicals is a primary problem to be solved by chemical management departments of various countries. However, years of practice show that problems such as low flux (50-100 chemicals per year), high cost (100 ten thousand dollars are consumed for each chemical) and the like exist in screening and evaluating potential EDCs by only adopting an experimental method, so that it is difficult to test commercial chemicals one by one according to the existing test system (the commercial chemicals are more than 14 ten thousand). Therefore, the development of a prediction model of endocrine disrupting effect indexes is of great significance for implementation of EDCs control.

Research has shown that endocrine-related diseases and disorders are often associated with the interfering effects of EDCs on biological macromolecules such as hormone receptors and transporters. Over the past, activation or inhibition of hormone receptor-mediated signal transduction processes has been considered to be the primary mechanism of action of EDCs, and much work has focused on studying the effects of EDCs and hormone receptors. However, recent studies have shown that in the pathogenic process of EDCs, the interference of EDCs with non-receptor mediated processes such as hormone transport is equally important. However, the current research on the prediction model of the hormone transporter disruptors is still poor.

Chinese patent CN106407665B discloses a virtual screening method for human transthyretin (hTTR) interferents, which comprises the steps of firstly classifying chemicals into 5 classes based on 10 groups, and then predicting interference effect data of target organic chemicals on hTTR by adopting an aromatic organic chemical quantitative prediction model or an alkane organic chemical quantitative prediction model. However, the above method has the following limitations: (1) the method only classifies the target organic chemical based on 10 groups, and if the target organic chemical does not contain the 10 groups, the target organic chemical cannot be classified, so that the interference effect of the target organic chemical cannot be predicted for the organic chemical which does not contain the 10 groups; (2) the descriptor of the method is only a Dragon descriptor calculated based on the molecular state of organic chemicals, however, Yang et al (Yang XH, Xie HB, Chen JW, LiXH. anionic polymeric bound strand with a transition fluoride in the molecular form of inorganic acids; non-organic polymers in a viral screening of inorganic solvents. chem Res. Toxicol,2013,26(9): 1340-1347; Yang XH, Lyakura F, Xie HB, Chen JW, Li XH XL, Cai XY. binding monomers of inorganic and ionic polymers of organic and fluorinated organic chemicals, the interaction of the organic chemicals with the transition fluoride in the molecular form of organic acids, T5 and T5, and the interaction of the organic chemicals with the transition fluoride in the molecular form of organic acids, T5, aromatic rings in the phenolic organic chemicals can form cation-pi interaction with residues of hTTR, namely part of ionizable organic chemicals can be dissociated into ionic states under experimental or physiological pH conditions, and the ionic states and molecular states have non-negligible effects in the interaction process of the ionizable organic chemicals and the hTTR, so that the method does not consider the influence of the ionic states of the ionizable organic chemicals when an hTTR interferent prediction model is constructed.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a method for screening human transthyretin interferent by adopting a k-nearest neighbor algorithm, which has a wide application range and comprehensively considers the interaction between the molecular state and the ionic state of an organic chemical and hTTR.

The technical scheme of the invention is as follows:

the method for screening the human transthyretin interferent by adopting the k-nearest neighbor algorithm comprises the following specific steps:

(1) collecting organic chemical interference effect data

Collecting interference effect data of organic chemicals, wherein the interference effect data are organic chemicals and¹²⁵I-T₄or the ability of the fluorescent probe molecule to compete for hTTR binding sites, i.e., half the competition effect concentration IC₅₀；

(2) Computing descriptors

The impact of ionizable group dissociation is characterized using morphological correction based quantitative descriptors: optimizing the molecular state and ionic state structure of the organic chemical by using Gaussian 16 software, directly extracting or calculating the quantitative descriptors of the molecular state and the ionic state of the organic chemical based on an output file of the Gaussian 16, and calculating the quantitative descriptor X based on the morphological correction according to the formula (1)_Correction

X_Correction＝δ_M·X_M+δ_I·X_I (1)

Wherein X is_MAnd X_IRespectively the values of the descriptors, delta, for the molecular and ionic states of the organic chemical_MAnd delta_IAre the fractional proportions of the molecular and ionic states, respectively; and calculating functional group and molecular fragment descriptors by using Dragon 6.0 software to represent the influence of various groups of organic chemicals on interference effect;

(3) construction and characterization of binary classification model

Adopting collected qualitative data of organic chemicals with or without activity, constructing a binary classification model according to a Euclidean distance-based kNN algorithm, characterizing the model by adopting a guide rule of economic cooperation and development organization about model construction and verification, and determining an optimal model, wherein the optimal model comprises three descriptors V_aver-adj(morphologically modified average molecular electrostatic potential), F-083 (fluorine atoms connected to carbon atoms hybridized sp 3) and H-047 (hydrogen atoms connected to carbon atoms hybridized sp3 or sp 2), a proximity number (k) of 3, the domain of application of said binary classification model being euclidean distance less than 0.928;

(4) construction and characterization of quantitative prediction model

Selecting quantitative data obtained by the same test method and test conditions, constructing a quantitative model according to a Euclidean distance-based kNN algorithm, and representing organic chemicals and organic chemicals by logarithm-derived relative effect potential RP during modeling¹²⁵I-T₄The ability to compete for hTTR binding site, RP is defined as:

wherein, IC₅₀(T₄) And IC₅₀(organic chemicals) represents thyroxine (T) respectively₄) And IC of organic chemicals₅₀(ii) a Determining an optimal model, wherein the optimal model comprises four descriptors: nCb- (number of carbon atoms of substituted benzene hybridized with Sp 2), nAROH (number of phenolic hydroxyl groups), nHBonds (number of intramolecular hydrogen bonds), and V_adj(average dispersion (pi) of morphology correction), the number of neighbors (k) is 3; the application domain of the quantitative prediction model is that the Euclidean distance is less than 1.11;

(5) screening for human transthyretin interferents

Computing a descriptor, i.e. V, required for the classification model_aver-adj(morphologically corrected average molecular electrostatic potential), F-083 (fluorine atom connected to sp3 hybridized carbon atom), H-047 (hydrogen atom connected to sp3 hybridized or sp2 hybridized carbon atom), assessing whether the target organic chemical is within the binary classification model application domain;

if the target organic chemical is in the application domain range of the binary classification model, calculating whether the target organic chemical has the hTTR interference activity or not according to the binary classification model; if the target organic chemical is inactive, no further evaluation is required; if the target organic chemical is active, predicting the interference effect value according to a quantitative prediction model; if the target organic chemicals are not in the application domain range of the model, the quantitative prediction model cannot be used for prediction;

② for active target organic chemicals, required descriptors, namely nCb- (number of substituted benzene carbon atoms hybridized by Sp 2), nAROH (number of phenolic hydroxyl groups), nHBonds (number of intramolecular hydrogen bonds) and V, are calculated according to the requirements of quantitative prediction model_adj(the morphology-corrected average dispersion (pi)) evaluating whether it is within the application domain of the quantitative prediction model;

if the target organic chemical is in the application domain range of the quantitative prediction model, calculating the logRP value of the target organic chemical to the hTTR according to the selected quantitative prediction model; if the target organic chemicals are not in the application domain range of the quantitative prediction model, the target organic chemicals cannot be predicted by the quantitative prediction model;

thirdly, judging whether the target organic chemical has the ability of interfering the hTTR to transport thyroxine according to the logRP value predicted by the quantitative prediction model:

if the logRP of the organic chemical is greater than 0, the binding capacity of the target organic chemical and the hTTR is stronger than that of thyroxine;

if the logRP of the organic chemical is 0, the binding capacity of the target organic chemical and the hTTR is similar to that of thyroxine;

if the logRP of the organic chemical is less than 0, the target organic chemical is weaker than thyroxine in binding capacity with the hTTR.

The half competition effect concentration IC of the invention₅₀Specifically 50% of¹²⁵I-T₄Or the concentration of organic chemical required to displace the fluorescent probe molecule from the hTTR binding site.

In a specific embodiment of the present invention, in step (1), interference effect data of 355 organic chemicals are collected, wherein the classes of the organic chemicals include uv sunscreens, organotins, organochlorine pesticides, substituted phenols, halogenated benzenes, alkyl carboxylic acids, bisphenol a and derivatives thereof, per/polyfluoro carboxylic acids and per/polyfluoro sulfonic acids, hydroxypolybromobiphenyl ethers, hydroxypolychlorobiphenyls, chlorinated alkenes, phosphate esters, sulfonic acid polychlorinated biphenyls, sulfonamide antibiotics, dioxin-type organic chemicals, polybromobiphenyl ethers, polychlorinated biphenyls, aniline-type organic chemicals, and the like.

In a specific embodiment of the present invention, in step (1), the interference effect data is determined by methods conventional in the art, including a radioligand competition binding method and a fluorescent competition displacement method.

In the embodiment of the present invention, in the step (3), among 355 organic chemicals, 175 and 180 organic chemicals are active and inactive, respectively.

In the specific embodiment of the present invention, in the step (4), quantitative data obtained by using a radioligand competition binding method under a condition of pH 8.0 is selected, and a quantitative model is constructed according to a euclidean distance-based kNN algorithm.

Compared with the prior art, the invention has the following advantages:

(1) in the aspect of data, by looking up the latest literature, collecting interference effect data of more chemicals on the hTTR, expanding the application domain of a model, and being capable of representing the influence of organic chemicals with different forms (molecular state and ionic state) on the action of the organic chemicals and the hTTR;

(2) aiming at the problems of effect existence and effect size prediction, Euclidean distance is adopted to represent the similarity of organic chemicals, a k nearest neighbor algorithm (kNN algorithm) which is easy to program is used for constructing a binary classification model and a quantitative prediction model, the existence of the effect of the target organic chemicals is distinguished by constructing the binary classification model, then the effect value of the target organic chemicals is predicted by the quantitative model, the descriptor mechanism is clear, the calculation is easy, the prediction method is easy to program, and the prediction model has better goodness of fit, robustness and prediction capability;

(3) the screening method has good expandability, and new classification models and quantitative prediction models can be conveniently added into the screening system.

Drawings

Fig. 1 is a graph showing the relationship between the logRP experimental value and the predicted value of the quantitative prediction model.

FIG. 2 is a graph of a binary classification model application domain characterized based on Euclidean distance.

FIG. 3 is a graph of a quantitative predictive model application domain based on Euclidean distance characterization.

Figure 4 is a flow chart of a human transthyretin interferon screen.

Detailed Description

The present invention will be described in more detail with reference to the following examples and the accompanying drawings.

The method for screening the human transthyretin interferent by adopting the k-nearest neighbor algorithm is shown in a flow chart of fig. 4, and comprises the following specific steps:

the interference effect data of the organic chemicals on the hTTR reported in 1990-2018 literature is collected, and 546 effect data of 382 organic chemicals are obtained in total. The classes of organic chemicals include UV sunscreens, organotins, organochlorine pesticides, substituted phenols, halobenzenes, alkyl carboxylic acids, bisphenol A and derivatives thereof, per/polyfluoro carboxylic acids and per/polyfluoro sulfonic acids, hydroxypolybromodiphenyl ethers, hydroxypolychlorobiphenyls, chloroolefins, phosphate esters, sulfonic acid polychlorinated biphenyls, sulfonamide antibiotics, dioxins, polybromodiphenyl ethers, polychlorinated biphenyls, anilines, and the like. Statistics show that 225 organic chemicals out of 382 organic chemicals contain ionizable groups. 355 organic chemical data were used for modeling after data validity analysis and organic chemical deduplication. Number of interference effectsThe determination method comprises a radioligand competitive binding method and a fluorescence competitive displacement method. Organic chemicals and¹²⁵I-T₄or the ability of fluorescent probe molecules to compete for hTTR binding sites using IC₅₀Represents, IC₅₀Is 50% of¹²⁵I-T₄Or the concentration of organic chemical required to displace the fluorescent probe molecule from the hTTR binding site.

(2) Computing descriptors

Quantitative descriptors based on morphological modifications are used to characterize the impact of ionizable group dissociation. Morphological correction based quantization descriptor X_CorrectionThe calculation method comprises the following steps:

X_correction＝δ_M·X_M+δ_I·X_I (1)

Wherein X is_MAnd X_IDescriptor values for the molecular state and the ionic state of the organic chemical, respectively; delta_MAnd delta_IAre the fractional fractions of the molecular and ionic states, respectively. The method comprises the steps of optimizing the structures of the molecular state and the ionic state of the organic chemical by adopting Gaussian 16 software, directly extracting or calculating the quantitative descriptors of the molecular state and the ionic state of the organic chemical based on an output file of the Gaussian 16, and calculating the quantitative descriptors based on the morphological correction according to the formula (1). In addition, functional groups and molecular fragment descriptors are selected to characterize the influence of various groups of organic chemicals on interference effects, and the descriptors are calculated by using Dragon 6.0 software.

(3) Construction and characterization of binary classification model

The qualitative data collected for the presence or absence of activity of 355 organic chemicals, 175 and 180 for active and inactive organic chemicals, respectively, were used to construct a classification model. And constructing a binary classification model according to a kNN algorithm based on the Euclidean distance. The model is characterized by adopting the guidance of the economic cooperation and development organization on model construction and verification. The results show that the optimal model contains three descriptors: v_aver-adj(form correction)Average molecular electrostatic potential of), F-083 (fluorine atom attached to sp3 hybridized carbon atom), H-047 (hydrogen atom attached to sp3 hybridized or sp2 hybridized carbon atom). The neighborhood number (k) is 3. The model evaluation results show the predicted sensitivity S of the training set and the validation set_nPredicted specificities S of 0.867 and 0.844, training and validation sets, respectively_p0.844 and 0.897, respectively, and the prediction accuracy Q of the training set and validation set was 0.856 and 0.873, respectively. The prediction accuracy of the organic chemicals in the training set or the verification set is greater than 0.85, which means that more than 85% of the organic chemicals can be correctly distinguished as active or inactive, and the constructed model has better prediction capability. The application domain of the model is characterized by a euclidean distance, and the application domain of the binary classification model is characterized by a euclidean distance of less than 0.928 (as shown in fig. 2).

(4) Construction and characterization of quantitative prediction model

Because many quantitative data testing methods and testing conditions in the data set are different, in order to reduce data errors, quantitative data with the same testing method and testing conditions are selected to construct a quantitative model. Analysis found that the number of data points was the highest using the radioligand competition binding method and the pH 8.0 condition, and therefore a quantitative prediction model was constructed according to the euclidean distance-based kNN algorithm using 88 quantitative data under this condition. Wherein the training and validation sets comprise 70 and 18 organic chemicals, respectively. Characterization of organic chemicals by logarithmic relative effect potential (RP) in modeling¹²⁵I-T₄The ability to compete for hTTR binding site, RP is defined as:

wherein: IC (integrated circuit)₅₀(T₄) And IC₅₀(organic chemicals) represents thyroxine (T) respectively₄) And IC of organic chemicals₅₀(nM)。

The results show that the optimal model contains four descriptors: nCb- (number of carbon atoms of substituted benzene hybridized with Sp 2), nAROH (number of phenolic hydroxyl groups), nHBonds (number of hydrogen bonds in molecule), V_adj(average dispersion (II) of morphology correction). The neighborhood number (k) is 3. Using the square of the correlation coefficient (R) between the experimental value and the predicted value of the training set² _{Training set}) Cross validation factor (Q) by one-out method² _{Training set}) Correlation coefficient (Q) of external verification set² _{Verification set}) Training set, and external validation set Root Mean Square Error (RMSE)_{Training set}And RMSE_{Verification set}) Training set, and external validation set Mean Absolute Error (MAE)_{Training set}And MAE_{Verification set}) And evaluating the goodness-of-fit, robustness and prediction capability of the model. The training set characterization results are: r² _{Training set}＝0.910,Q² _{Training set}＝0.804,RMSE_{Training set}＝0.397，MAE_{Training set}0.298; the verification set characterization results are: q² _{Verification set}＝0.852,RMSE_{Verification set}＝0.544,MAE_{Verification set}0.414. According to a model acceptance criterion, i.e. R² _{Training set}>0.6、Q² _{Training set}>0.6、Q² _{Verification set}>0.7, the model has better goodness-of-fit, robustness and predictive ability (as shown in fig. 1). The application domain of the model is characterized by the Euclidean distance, and the application domain of the quantitative prediction model is that the Euclidean distance is less than 1.11 (shown in figure 3).

(5) Human transthyretin interferon screening method

Computing a descriptor, i.e. V, required for the classification model_aver-adj(morphologically modified average molecular electrostatic potential), F-083 (fluorine atom connected to sp3 hybridized carbon atom), H-047 (hydrogen atom connected to sp3 hybridized or sp2 hybridized carbon atom); evaluating whether the target organic chemical is within the binary classification model application domain.

If the target organic chemical is in the range of the model application domain, calculating whether the target organic chemical has the hTTR interference activity or not according to the classification model; and judging the next processing step according to the classification result. If the target organic chemical is inactive, no further evaluation is required; if the target organic chemical is active, the magnitude of the interference effect is predicted according to the following quantitative prediction model.

If the target organic chemical is not within the application domain of the quantitative prediction model, prediction cannot be performed by the model.

② for active target organic chemicals, calculating required descriptors, namely nCb- (substituted benzene carbon number hybridized by Sp 2), nAROH (phenolic hydroxyl number), nHBonds (hydrogen bond number in molecule), V according to the requirements of quantitative prediction model_adj(average dispersion (II) of morphology correction). It is evaluated whether it is within the application domain of the quantitative prediction model.

If the target organic chemical is in the application domain range of the model, calculating the logRP value of the target organic chemical to the hTTR according to the selected model;

if the target organic chemical is not within the application domain of the model, the model cannot be used for prediction.

And thirdly, judging whether the target organic chemical has the ability of interfering the hTTR to transport thyroxine according to the predicted logRP value. By definition, logRP of T4 is 0. Therefore, the ability of the target organic chemical to compete with thyroxine for binding to the hTTR site can be judged according to the size relationship between the organic chemical logRP and 0.

If the organic chemical logRP is greater than 0, the binding capacity of the target organic chemical and the hTTR is stronger than that of thyroxine, so that the organic chemical has higher priority;

if the organic chemical logRP <0, it indicates that the target organic chemical has weaker binding ability to hTTR than thyroxine, and thus has lower priority.

Example 1

2,3,3',5,5' -pentachlorodiphenyl has no hTTR interference activity. The steps for predicting the interference activity by using the method are as follows:

calculating the descriptor needed by the classification model according to Gaussian 16 and Dragon 6.0, namely V_aver-adj(morphologically modified average molecular electrostatic potential), F-083 (fluorine atom connected to sp3 hybridized carbon atom), H-047 (hydrogen atom connected to sp3 hybridized or sp2 hybridized carbon atom). Then calculate its European tableThe reed distance is 0.191, within the application domain of the binary classification model (euclidean distance is less than 0.928). Thus, a binary classification model can be used to distinguish the interfering activity of 2,3,3',5,5' -pentachlorodiphenyl on hTTR. And predicting the hTTR interference free activity of the 2,3,3',5,5' -pentachlorodiphenyl by adopting a kNN algorithm based on Euclidean distance according to the descriptors of the organic chemicals and the descriptors of the 2,3,3',5,5' -pentachlorodiphenyl in the binary classification model training set, and the hTTR interference free activity is consistent with the experimental determination result. No further evaluation was necessary.

Example 2

4' -HO-3,3',4,5,5' -pentachlorodiphenyl has hTTR interference activity (logRP is 0.933). The steps for predicting the interference activity by using the method are as follows:

calculating the required descriptor of the required classification model, namely V, according to Gaussian 16 and Dragon 6.0_aver-adj(morphologically modified average molecular electrostatic potential), F-083 (fluorine atom connected to sp3 hybridized carbon atom), H-047 (hydrogen atom connected to sp3 hybridized or sp2 hybridized carbon atom). The Euclidean distance was then calculated to be 0.187, within the application domain of the binary classification model (Euclidean distance less than 0.928). Therefore, a binary classification model can be used to distinguish the interference activity of 4' -HO-3,3',4,5,5' -pentachlorodiphenyl on hTTR. And (3) predicting that the 4'-HO-3,3',4,5,5 '-pentachlorobiphenyl has hTTR interference activity by adopting a kNN algorithm based on Euclidean distance according to the descriptors of the organic chemicals and the descriptors of the 4' -HO-3,3',4,5,5' -pentachlorobiphenyl in the binary classification model training set, and the hTTR interference activity is consistent with the experimental determination result. Further evaluation is required.

Then, predicting the interference effect value by adopting a quantitative prediction model: the descriptors required for the quantitative prediction model, namely nCb- (number of substituted benzene carbon atoms hybridized by Sp 2), nAROH (number of phenolic hydroxyl groups), nHBonds (number of intramolecular hydrogen bonds), V, were calculated from Gaussian 16 and Dragon 6.0_adj(average dispersion (II) of morphology correction). The euclidean distance is then calculated to be 0.265, within the application domain of the quantitative predictive model (euclidean distance less than 1.11). Therefore, a quantitative prediction model can be used for predicting the interference effect value of 4' -HO-3,3',4,5,5' -pentachlorodiphenyl on hTTR. Training set of organic chemical descriptors and 4' -HO-3,3',4,5,5' -pentachloro according to quantitative prediction modelAnd (3) predicting the interference effect value logRP of 4' -HO-3,3',4,5,5' -pentachlorodiphenyl on the hTTR to be 0.673 by adopting a kNN algorithm based on Euclidean distance, wherein the experimental value logRP is 0.933, and the predicted value is consistent with the experimental value. Due to logRP>0.933, which shows that 4'-HO-3,3',4,5,5 '-pentachlorodiphenyl has stronger binding capacity with hTTR than thyroxine, and needs to pay high attention to the way that 4' -HO-3,3',4,5,5' -pentachlorodiphenyl interferes with the thyroid system by interfering with the transport of thyroxine by hTTR.

Claims

1. The method for screening the human transthyretin hTTR interferent by adopting the k-nearest neighbor algorithm is characterized by comprising the following specific steps of:

(1) collecting organic chemical interference effect data

Collecting interference effect data of organic chemicals, wherein the interference effect data are organic chemicals and¹²⁵I-T₄ability to compete for hTTR binding site, i.e., half the competition effect concentrationIC ₅₀；

(2) Computing descriptors

The impact of ionizable group dissociation is characterized using morphological correction based quantitative descriptors: optimizing the molecular state and ionic state structure of the organic chemical by using Gaussian 16 software, directly extracting or calculating the quantitative descriptors of the molecular state and the ionic state of the organic chemical based on an output file of the Gaussian 16, and calculating the quantitative descriptors based on the morphological correction according to the formula (1)X _Correction

(1)

，

(2)

Wherein the content of the first and second substances,X _MandX _Iare respectively provided withIs a descriptor value of the molecular state and the ionic state of the organic chemical,δ _Mandδ _Iare the fractional proportions of the molecular and ionic states, respectively; and calculating functional group and molecular fragment descriptors by using Dragon 6.0 software to represent the influence of various groups of organic chemicals on interference effect;

(3) construction and characterization of binary classification model

Establishing a binary classification model according to a Euclidean distance-based kNN algorithm by using collected qualitative data of the existence of activity of organic chemicals, characterizing the model by adopting a guide rule of an economic cooperation and development organization on model establishment and verification, and determining an optimal model, wherein the optimal model comprises three descriptors, namely a form-corrected average molecular electrostatic potentialV _aver-adjFluorine atom bonded to sp 3-hybridized carbon atomF-083And a hydrogen atom bonded to a carbon atom that is sp3 hybridized or sp2 hybridizedH-047Number of neighborsk3, the application domain of the binary classification model is that the Euclidean distance is less than 0.928;

(4) construction and characterization of quantitative prediction model

Selecting quantitative data obtained by adopting the same test method and test conditions, constructing a quantitative prediction model according to a Euclidean distance-based kNN algorithm, and using logarithm relative effect potential in modelingRPCharacterization of organic chemicals and¹²⁵I-T₄the ability to compete for the binding site of hTTR,RPis defined as:

(3)

wherein the content of the first and second substances,IC ₅₀(T₄) AndIC ₅₀(organic chemicals) representing thyroxine and organic chemicals, respectivelyIC ₅₀(ii) a Determining an optimal model, wherein the optimal model comprises four descriptors: number of carbon atoms of Sp2 hybridized substituted benzenenCb-Number of phenolic hydroxyl groupsnArOHNumber of intramolecular hydrogen bondsnHBondsAnd morphology corrected average dispersionV _adjNumber of neighborskIs 3; the application domain of the quantitative prediction model isThe Euclidean distance is less than 1.11;

(5) screening for human transthyretin interferents

Calculating the descriptor needed by the classification model, namely the form corrected average molecular electrostatic potentialV _aver-adjFluorine atom bonded to sp 3-hybridized carbon atomF-083A hydrogen atom bonded to a carbon atom that is sp3 hybridized or sp2 hybridizedH-047Evaluating whether the target organic chemicals are in the application domain of the binary classification model;

if the target organic chemical is in the application domain range of the binary classification model, calculating whether the target organic chemical has the hTTR interference activity or not according to the binary classification model; if the target organic chemical is inactive, no further evaluation is required; if the target organic chemical is active, predicting the interference effect value according to a quantitative prediction model; if the target organic chemicals are not in the application domain range of the binary classification model, the target organic chemicals cannot be predicted by the binary classification model;

secondly, for active target organic chemicals, calculating required descriptors, namely the number of carbon atoms of the substituted benzene hybridized by Sp2 according to the requirements of a quantitative prediction modelnCb-Number of phenolic hydroxyl groupsnArOHNumber of intramolecular hydrogen bondsnHBondsAnd morphology corrected average dispersionV _adjEvaluating whether the model is within the application domain of the quantitative prediction model;

if the target organic chemical is in the application domain range of the quantitative prediction model, calculating the log of the target organic chemical to the hTTR according to the selected quantitative prediction modelRPA value; if the target organic chemicals are not in the application domain range of the quantitative prediction model, the target organic chemicals cannot be predicted by the quantitative prediction model;

log predicted according to quantitative prediction modelRPValues judge whether the target organic chemical has the ability to interfere with hTTR transport of thyroxine:

if organic chemical logRP>0, indicating that the binding capacity of the target organic chemical and the hTTR is stronger than that of thyroxine;

if organic chemical logRP= 0, indicating that the target organic chemical has a binding capacity similar to that of thyroxine;

if organic chemical logRP<0, indicating that the target organic chemical binds hTTR less strongly than thyroxine.

2. The method of claim 1, wherein in step (1), interference effect data is collected for 355 organic chemicals, said organic chemical classes including UV sunscreens, organotins, organochlorine pesticides, substituted phenols, halobenzenes, alkyl carboxylic acids, bisphenol A and derivatives thereof, per/polyfluoro carboxylic acids and per/polyfluoro sulfonic acids, hydroxypolybromodiphenyl ethers, hydroxypolychlorodiphenyl, chloroolefins, phosphate esters, sulfonic polychlorinated diphenyl, sulfonamide antibiotics, dioxins, polybromodiphenyl ethers, polychlorinated diphenyl, aniline organic chemicals.

3. The method according to claim 1, wherein in the step (1), the interference effect data is measured by a radioligand competitive binding method or a fluorescent competitive displacement method.

4. The method of claim 2, wherein in step (3), the number of active and inactive organic chemicals in the 355 organic chemicals is 175 and 180, respectively.

5. The method according to claim 1, wherein in the step (4), quantitative data obtained by using a radioligand competition binding method under the condition of pH = 8.0 is selected, and a quantitative prediction model is constructed according to a kNN algorithm based on Euclidean distance.