CN110428875B - Cytochrome P450 metabolic site prediction method of small molecule drug - Google Patents
Cytochrome P450 metabolic site prediction method of small molecule drug Download PDFInfo
- Publication number
- CN110428875B CN110428875B CN201910631539.XA CN201910631539A CN110428875B CN 110428875 B CN110428875 B CN 110428875B CN 201910631539 A CN201910631539 A CN 201910631539A CN 110428875 B CN110428875 B CN 110428875B
- Authority
- CN
- China
- Prior art keywords
- cytochrome
- small molecule
- enzyme
- metabolic
- machine learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/50—Molecular design, e.g. of drugs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/70—Machine learning, data mining or chemometrics
Abstract
The invention provides a prediction method of cytochrome P450 metabolic sites of small molecule drugs, which adopts a machine learning model Whichyp based on a support vector machine classifier to predict substrates of small molecules belonging to one or more of cytochrome P450 enzyme subtypes 1A2,2C9,2C19,2D6 and 3A 4; predicting and sequencing metabolic sites of the small molecule drugs by using a machine learning model based on a convolutional neural network to the cytochrome P450 enzymes of the corresponding subtype; performing calculation and evaluation on the thermodynamic and kinetic interaction of a complete cytochrome P450 enzyme system and a complete molecule; performing high-precision MMGBSA calculation on each conformation to obtain the binding energy of different small molecule conformations and cytochrome P450 enzyme; and training the process by using the collected small molecule training set until the prediction accuracy is greater than 80%. The invention improves the accuracy of prediction.
Description
Technical Field
The invention belongs to the technical field of drug metabolism, and particularly relates to a cytochrome P450 metabolic site prediction method of a small molecule drug.
Background
Whichyp is a support vector machine classifier-based machine learning model developed with PubChem Bioassay 1851 as a dataset. The model can predict that the small molecules belong to one or more of cytochrome P450 enzyme subtypes 1A2,2C9,2C19,2D6 and 3A 4.
SMARTCyp is an energy barrier calculated between 139 small-molecule fragments and catalytic reaction centers of cytochrome P450 enzymes, and forms a small database. And when the small molecule metabolic sites are predicted, calculating the energy barrier sequence of each site of the small molecule according to the corresponding segment. The method is used for predicting cytochrome P450 enzyme 2D6, 2C subtype, and mainly based on atoms in small molecules and COO-And NH3 +The distance of (a) can be adjusted to rank the energy barrier.
Whichcrypt can only predict which substrate of cytochrome P450 enzyme subtype a small molecule belongs to, but cannot predict the metabolic site of a particular small molecule.
SMARTCyp only calculates the energy barrier of 139 small molecule fragments and cytochrome P450 enzyme catalytic reaction centers, the covered chemical space is very limited, small molecules needing to be predicted can not be covered, and therefore a large approximation exists in the prediction process. The prediction of the metabolic site of cytochrome P450 enzyme 2D6, subtype 2C is also more approximate, and other subtypes cannot be distinguished.
SMARTCyp generally predicts inconsistent results from experiments for small molecules containing more than 40 non-hydrogen atoms, probably because the metabolic sites of larger small molecules are more affected by the binding pattern of small molecules to cytochrome P450 enzymes.
The existing prediction methods have more approximations, and when the metabolic sites of different cytochrome P450 enzyme subtypes on small molecules are predicted, the prediction is carried out only on the basis of the structures of the small molecules, and the influence of residues near catalytic reaction centers of the P450 enzymes of different subtypes on the metabolic sites is not considered.
Disclosure of Invention
The invention provides a method for predicting cytochrome P450 metabolic site of small molecule drug, which can predict the site of small molecule drug possibly metabolized by cytochrome P450, so as to modify the site, such as deuteration, fluoro and other substituent modification, to prolong the drug action time and maintain better drug effect.
The adopted technical scheme is as follows:
the cytochrome P450 metabolic site prediction method of the small molecule drug comprises the following steps:
(1) predicting substrates of small molecules belonging to one or more of cytochrome P450 enzyme subtypes 1A2,2C9,2C19,2D6 and 3A4 by adopting a machine learning model Whichyp based on a support vector machine classifier;
(2) predicting and sequencing metabolic sites of the small molecule drugs by using a machine learning model based on a convolutional neural network to the cytochrome P450 enzymes of the corresponding subtype;
(3) performing calculation and evaluation on the thermodynamic and kinetic interaction of a complete cytochrome P450 enzyme system and a complete molecule;
downloading a crystal structure of a cytochrome P450 enzyme subtype predicted by Whichyp in the first step from a Protein Data Bank website, carrying out docking of a small molecule drug and the cytochrome P450 enzyme by using a molecular docking tool Autodock, and obtaining different conformations of the combination of the small molecule drug and the cytochrome P450 enzyme;
(4) performing high-precision MMGBSA calculation on each conformation to obtain the binding energy of different small molecule conformations and cytochrome P450 enzyme;
simultaneously, quantum chemistry/molecular dynamics calculation is carried out on the different binding conformations to obtain the reaction energy barrier of different sites in the micromolecules and the reaction center of cytochrome P450 enzyme;
the calculation considers the whole micromolecule and the whole cytochrome P450 system, the reaction center adopts a high-precision quantum chemical calculation method, and the rest part adopts a molecular dynamics method to ensure the calculation speed.
(5) The process is trained by using a small molecule training set collected from the literature, and whether the prediction accuracy of the process is greater than 80% is tested, if so, the process can be used for predicting the metabolic site of the small molecule, otherwise, machine learning is needed to judge the P450 enzyme subtype, machine learning is needed to predict the metabolic site, and quantum chemistry/molecular dynamics (QM/MM) is further optimized until the prediction accuracy is greater than 80%.
The method for predicting the cytochrome P450 metabolic site of the small molecule drug has the following technical effects:
(1) the invention creatively combines WhichCp and machine learning to predict the metabolic sites of different subtypes of cytochrome P450 enzymes, firstly determines which type of cytochrome P450 enzyme the small molecule belongs to, and then carries out sequencing prediction on the metabolic sites of the small molecule by utilizing a subtype machine learning prediction model.
(2) The model construction of the machine learning prediction metabolic site is carried out, and the energy barrier of the small molecular fragment and cytochrome P450 enzyme catalytic reaction center is calculated by using a quantum chemical method and is used as one of the characteristics of the model construction. Compared with SMARTCyp, the energy barrier database constructed by the invention greatly exceeds SMARTCyp. For small molecule fragments which are not covered by the database, supplementary calculation can be carried out, and the small molecule fragments can be added into the database, so that the accuracy of prediction is greatly improved.
(3) The process adopts molecular docking, can fully consider the combination mode of the small molecules and an enzyme catalysis reaction center for the small molecules containing non-hydrogen atoms with the number exceeding 40, adopts a high-precision MMGBSA method to calculate the combination free energy again and carries out sequencing.
(4) A QM/MM method is utilized to calculate the catalytic reaction of a specific cytochrome P450 enzyme subtype and small molecules, the inner layer adopts high-precision quantum chemistry, and the outer layer adopts a molecular mechanics method with higher speed. The method can consider the influence of specific amino acid residues around the catalytic reaction centers of different subtypes on the energy barrier of the catalytic reaction, thereby more accurately predicting the small molecule metabolic sites.
(5) According to the method, a large amount of computing resources are configured by combining an amazon cloud computing scheduling platform, so that multi-node parallel computing can be performed simultaneously, and the efficiency of metabolic site prediction is greatly improved.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
fig. 2 shows the application of the present invention in the design of deuterated drugs.
Detailed Description
The specific technical scheme of the invention is described by combining the embodiment.
According to the data set and method introduction disclosed by Whichyp, the data of PubChem bioassay 1851 are obtained, a machine learning model based on a support vector machine classifier is realized, and the method is applied to distinguishing which cytochrome P450 enzyme subtype substrate a small molecule drug belongs to.
From the published literature database, this example searches for and collects about 680 small molecule data with cytochrome P450 metabolism experiments, and uses them as a training set for the prediction of cytochrome P450 metabolism sites of small molecule drugs. The training set data of different cytochrome P450 enzyme subtypes are 1A2 (271), 2A6 (105), 2B6 (151), 2C8 (142), 2C9 (226), 2C19 (218), 2D6 (270), 2E1 (145), and 3A4 (475), wherein partial molecules are metabolized by several cytochrome P450 enzyme subtypes simultaneously. Intercepting fragments in three bond length ranges of sites which are possibly metabolized in the molecules to form a fragment library, and calculating an energy barrier of the fragments reacting with a cytochrome P450 enzyme reaction center by using a quantum chemistry method to form a database of the fragments corresponding to the energy barrier. The method is characterized by taking an energy barrier, an atom type, a three-dimensional coordinate of a molecule and the like as features, and training a machine learning model based on a convolutional neural network aiming at each cytochrome P450 enzyme subtype for predicting the possible metabolic site sequencing of the molecule.
As shown in figure 1, the invention firstly adopts a machine learning model Whichyp based on a support vector machine classifier to predict the substrate of which one or more of the cytochrome P450 enzyme subtypes 1A2,2C9,2C19,2D6 and 3A4 belongs to. And then, carrying out predictive sequencing on the metabolic sites of the small molecule drugs by using a machine learning model based on a convolutional neural network for cytochrome P450 enzymes of corresponding subtypes. The method adopts the fragment of the drug molecule and the reaction center of the cytochrome P450 enzyme to carry out quantum chemical calculation, greatly simplifies the calculation model, and further needs to carry out calculation and evaluation on the thermodynamic and kinetic interaction of the complete cytochrome P450 enzyme system and the complete molecule. Downloading a crystal structure of a cytochrome P450 enzyme subtype predicted by Whichyp in the first step from a Protein Data Bank website, carrying out docking of a small molecule drug and the cytochrome P450 enzyme by using a molecular docking tool Autodock, and obtaining different conformations of the combination of the small molecule drug and the cytochrome P450 enzyme. And performing high-precision MMGBSA calculation on each conformation to obtain the binding energy of different small molecule conformations and cytochrome P450 enzyme. Meanwhile, quantum chemistry/molecular dynamics calculation is carried out on the different binding conformations, so that the reaction energy barrier of different sites in the micromolecules and the reaction center of cytochrome P450 enzyme can be obtained. The calculation considers the whole micromolecule and the whole cytochrome P450 system, the reaction center adopts a high-precision quantum chemical calculation method, and the rest part adopts a molecular dynamics method to ensure the calculation speed. The process is trained by utilizing a 680 small molecule training set collected from the literature, and whether the prediction accuracy of the process is more than 80% is tested, if so, the process can be used for predicting the metabolic site of the small molecule, otherwise, machine learning is needed to judge the P450 enzyme subtype, machine learning is needed to predict the metabolic site, and quantum chemistry/molecular dynamics (QM/MM) is further optimized until the prediction accuracy is more than 80%.
For the above process, quantum chemical computation is required during the construction of the machine learning prediction metabolic site model, and both MMGBSA computation and quantum chemistry/molecular dynamics (QM/MM) computation need a large amount of computation for a long time, and a general computation node cannot bear such a large computation amount. Therefore, in this embodiment, the flow shown in fig. 1 is deployed to the amazon cloud platform, and multi-node parallel computing can be performed.
Since cytochrome P450 enzyme metabolism is the main drug metabolism mode of human body, after the cytochrome P450 enzyme metabolism site of small molecule drugs is predicted, possible metabolism sites can be modified, such as deuteration, fluoro, other functional groups and the like. Since cytochrome P450 enzyme metabolism mainly oxidizes C-H bonds in small molecule drugs into C-OH bonds, the C-H bonds at metabolic sites in small molecules can be replaced by C-D bonds to reduce the metabolism of the small molecule drugs. The general method for developing deuterated drugs is to replace all C-H bonds in small molecule drugs with C-D bonds in a permutation and combination mode, and the synthesis of the deuterated compounds and the metabolic experimental test one by one require very high time and economic cost. Therefore, the process shown in the invention can recommend possible metabolic sites of small molecules, and reduces the synthesis of dozens of deuterated compounds originally required to be synthesized to the synthesis of only compounds within 3.
As shown in fig. 2, this example utilizes 8 deuterated drug molecules currently in clinical stage to test the process of the present invention. The site indicated by the dotted circle is the site of metabolism for the experiment, which has been deuterated. The site pointed by the arrow is the metabolic site of the Top 3 in the sequence predicted by the metabolic site prediction process of the invention, and the success rate of the (Top 2) prediction of the Top 2 in the sequence is 87.5%. The cytochrome P450 enzyme metabolic site prediction process disclosed by the invention has a very obvious effect on saving the cost of drug development.
For the embodiment in fig. 2, first, the determination of which P450 enzyme subtype is a small molecule compound is made using the Whichcyp software deployed on amazon cloud platform. For example, Sorafenib is predicted to be a substrate for subtype P450 2C 9. Then, the specific metabolic sites of the P450 enzyme 2C9 subtype are predicted by using a metabolic site prediction model trained by a graph convolution neural network, and the possibility ranking of metabolizing each atom can be obtained. Downloading a crystal structure of 2C9 subtype enzyme from a Protein Data Bank, butting 2C9 and Sorafenib by adopting academic open source software Autodock to obtain 5 dominant binding conformations, and calculating the 5 binding conformations by utilizing an MMGBSA method to obtain binding energy. And meanwhile, QM/MM calculation is carried out on the combined conformation in the step 5 by utilizing an ONIOM module of Gaussian software to obtain an energy barrier of the reaction of the C-H bond in the molecule and the catalytic reaction center. And sequencing atoms in the molecules by combining the binding energy of the MMGBSA and a reaction energy barrier obtained by Gaussian calculation, comparing the sequencing with the sequencing of the metabolic sites predicted by a machine learning model, and simultaneously determining the atoms positioned in the first 3 of the sequencing as possible metabolic sites. The metabolic performance of the compound can be optimized and modified by deuteration, fluorination or modification with other substituents at the sites.
Claims (2)
1. The method for predicting the cytochrome P450 metabolic site of the small molecule drug is characterized by comprising the following steps of:
(1) judging substrates of small molecules belonging to one or more of cytochrome P450 enzyme subtypes 1A2,2C9,2C19,2D6 and 3A4 by adopting a machine learning model Whichyp based on a support vector machine classifier;
(2) predicting the metabolic sites of the cytochrome P450 enzymes of the corresponding subtypes on the small molecule drugs by using a machine learning model based on a convolutional neural network to obtain the possibility sequence of metabolizing each atom;
(3) performing calculation and evaluation on the thermodynamic and kinetic interaction of a complete cytochrome P450 enzyme system and a complete molecule;
downloading a crystal structure of a cytochrome P450 enzyme subtype predicted by Whichyp in the first step from a Protein Data Bank website, carrying out docking of a small molecule drug and the cytochrome P450 enzyme by using a molecular docking tool Autodock, and obtaining different conformations of the combination of the small molecule drug and the cytochrome P450 enzyme;
(4) performing high-precision MMGBSA calculation on each conformation to obtain the binding energy of different small molecule conformations and cytochrome P450 enzyme;
meanwhile, the different binding conformations are calculated to obtain the reaction energy barrier of different sites in the small molecule and the reaction center of cytochrome P450 enzyme;
(5) the process is trained by utilizing a small molecule training set collected from the literature, whether the prediction accuracy of the process is more than 80% is tested, if so, the process can be used for predicting the metabolic site of the small molecule, otherwise, machine learning is needed to judge the P450 enzyme subtype, machine learning is needed to predict the metabolic site, and quantum chemistry/molecular dynamics are further optimized until the prediction accuracy is more than 80%.
2. The method for predicting the metabolic site of cytochrome P450 as claimed in claim 1, wherein the different binding conformations are calculated in step (4), taking into account the whole small molecule and the whole cytochrome P450 system, the reaction center is calculated by high precision quantum chemistry, and the rest is calculated by molecular dynamics to ensure the calculation speed.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910631539.XA CN110428875B (en) | 2019-07-12 | 2019-07-12 | Cytochrome P450 metabolic site prediction method of small molecule drug |
PCT/CN2019/104543 WO2021003834A1 (en) | 2019-07-12 | 2019-09-05 | Small-molecule drug cytochrome p450 metabolism locus prediction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910631539.XA CN110428875B (en) | 2019-07-12 | 2019-07-12 | Cytochrome P450 metabolic site prediction method of small molecule drug |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110428875A CN110428875A (en) | 2019-11-08 |
CN110428875B true CN110428875B (en) | 2021-07-02 |
Family
ID=68409305
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910631539.XA Active CN110428875B (en) | 2019-07-12 | 2019-07-12 | Cytochrome P450 metabolic site prediction method of small molecule drug |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110428875B (en) |
WO (1) | WO2021003834A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113270152B (en) * | 2021-04-19 | 2023-10-20 | 北京晶泰科技有限公司 | Method and system for predicting metabolic site of small molecule CYP metabolic enzyme |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101748198A (en) * | 2008-12-12 | 2010-06-23 | 上海人类基因组研究中心 | SNP rs11632814 of CYP1A2 gene and application thereof in relevant drug metabolism activity detection |
CN102650620A (en) * | 2012-03-15 | 2012-08-29 | 天津医科大学 | Preparation method, detection method and application of probe drug composition for determination of metabolic activity of cytochrome P450 |
-
2019
- 2019-07-12 CN CN201910631539.XA patent/CN110428875B/en active Active
- 2019-09-05 WO PCT/CN2019/104543 patent/WO2021003834A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101748198A (en) * | 2008-12-12 | 2010-06-23 | 上海人类基因组研究中心 | SNP rs11632814 of CYP1A2 gene and application thereof in relevant drug metabolism activity detection |
CN102650620A (en) * | 2012-03-15 | 2012-08-29 | 天津医科大学 | Preparation method, detection method and application of probe drug composition for determination of metabolic activity of cytochrome P450 |
Non-Patent Citations (3)
Title |
---|
Interactions of 2-phenyl-benzotriazole Xenobiotic Compounds with Human Cytochrome P450-CYP1A1 by Means of Docking,Molecular Dynamics Simulations and MM-GBSA Calculations;ULECIA K. M.;《Computational Biology and Chemistry》;20180407;第74卷;第253-262页 * |
L'ea El Khoury等.Comparison of ligand affinity ranking using AutoDock-GPU and MM-GBSA scores in the D3R Grand Challenge 4.《http://ChemRxiv.org》.2019,第1-11页. * |
WhichCyp: prediction of cytochromes P450 inhibition;Michał Rostkowski等;《Structural bioinformatics》;20131231;第29卷(第16期);第2051–2052页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110428875A (en) | 2019-11-08 |
WO2021003834A1 (en) | 2021-01-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108804869B (en) | Molecular structure and chemical reaction energy function construction method based on neural network | |
US5854992A (en) | System and method for structure-based drug design that includes accurate prediction of binding free energy | |
Sadowski et al. | Synergies between quantum mechanics and machine learning in reaction prediction | |
Bergeler et al. | Heuristics-guided exploration of reaction mechanisms | |
Nakajima et al. | Multicanonical ensemble generated by molecular dynamics simulation for enhanced conformational sampling of peptides | |
Rastelli et al. | Binding estimation after refinement, a new automated procedure for the refinement and rescoring of docked ligands in virtual screening | |
Xu et al. | Induced fit docking, and the use of QM/MM methods in docking | |
Trnka et al. | Automated training of ReaxFF reactive force fields for Energetics of Enzymatic reactions | |
Yu et al. | Full kinetics of CO entry, internal diffusion, and exit in myoglobin from transition-path theory simulations | |
Xiong et al. | Survey of machine learning techniques for prediction of the isoform specificity of cytochrome P450 substrates | |
CN110428875B (en) | Cytochrome P450 metabolic site prediction method of small molecule drug | |
CN111243660B (en) | Parallel marine drug screening method based on heterogeneous many-core architecture | |
Chen et al. | Automated construction and optimization combined with machine learning to generate pt (ii) methane c–h activation transition states | |
CA2415787A1 (en) | Method for determining three-dimensional protein structure from primary protein sequence | |
CN114822696B (en) | Attention mechanism-based antibody non-sequencing prediction method and device | |
JP2008081435A (en) | Virtual screening method and device for compound | |
Tan et al. | Software for metabolism prediction | |
Lam et al. | Macrocycle modeling in ICM: benchmarking and evaluation in D3R Grand Challenge 4 | |
Suat | Molecular Modelling and Computer Aided Drug Design: The Skill Set Every Scientist in Drug Research Needs and Can Easily Get | |
Akbal-Delibas et al. | Accurate refinement of docked protein complexes using evolutionary information and deep learning | |
CN110534165B (en) | Virtual screening system and method for activity of drug molecules | |
JP2002533477A (en) | Systems and methods for structure-based drug design including accurate prediction of binding free energy | |
Yuan et al. | Accurate and Efficient Multilevel Free Energy Simulations with Neural Network-Assisted Enhanced Sampling | |
Li et al. | Development of a polynomial scoring function P3-Score for improved scoring and ranking powers | |
JPWO2007105794A1 (en) | Molecular structure prediction system, method and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 3 / F, Shunfeng industrial building, No.2 Hongliu Road, Fubao community, Fubao street, Futian District, Shenzhen City, Guangdong Province Applicant after: Shenzhen Jingtai Technology Co.,Ltd. Address before: 518000 4th floor, No.9 Hualian Industrial Zone, Xinshi community, Dalang street, Longhua District, Shenzhen City, Guangdong Province Applicant before: Shenzhen Jingtai Technology Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |